Audio Visual Speech Recognition
Audio-visual speech recognition (AVSR) aims to improve the accuracy and robustness of automatic speech recognition by incorporating visual information, such as lip movements, to complement audio signals. Current research emphasizes developing robust models that generalize well across diverse video conditions, often employing techniques like mixture-of-experts, large language models, and efficient architectures such as conformers and transformers, sometimes incorporating self-supervised learning to address data scarcity. These advancements are significant for improving speech recognition in noisy environments and for applications requiring multimodal understanding, such as virtual assistants and accessibility technologies.
Papers
September 29, 2024
September 19, 2024
September 18, 2024
August 31, 2024
August 1, 2024
July 9, 2024
July 4, 2024
June 25, 2024
June 14, 2024
May 27, 2024
March 21, 2024
March 14, 2024
March 7, 2024
January 18, 2024
January 12, 2024
January 7, 2024
December 14, 2023
September 29, 2023
August 15, 2023
August 14, 2023