Audio Visual Video Recognition
Audio-visual video recognition (AVVR) integrates audio and visual information to improve the accuracy and robustness of video categorization and speech recognition. Current research emphasizes improving the handling of incremental learning, where models must adapt to new classes without forgetting previously learned information, often employing transformer-based architectures and techniques like knowledge distillation to address this challenge. This field is significant for advancing both fundamental understanding of multimodal perception and practical applications such as robust speech recognition in noisy environments and more efficient video indexing and retrieval systems.
Papers
January 11, 2024
August 21, 2023
February 28, 2023