Audio Visual Model
Audio-visual models integrate audio and visual data to improve performance on various tasks, ranging from speech recognition and synthesis to deepfake detection and video understanding. Current research focuses on developing robust models, often employing transformer-based architectures and techniques like contrastive learning and iterative fine-tuning, to address challenges such as noisy environments, sparse data, and the need for efficient, lightweight systems. These advancements have significant implications for applications like improved human-computer interaction, enhanced multimedia content analysis, and more reliable detection of manipulated media.
Papers
October 15, 2024
September 18, 2024
August 13, 2024
July 21, 2024
June 1, 2023
May 31, 2023
April 12, 2023
January 23, 2023
December 1, 2022
October 13, 2022