Multilingual Audio Visual
Multilingual audio-visual research focuses on developing computational models that understand and process audio and visual information across multiple languages. Current efforts concentrate on improving the accuracy and scalability of tasks like sign language translation, 3D talking head generation, and cross-lingual video-text alignment, often employing encoder-decoder architectures and knowledge distillation techniques to leverage data from high-resource languages to benefit low-resource ones. These advancements are significant for bridging communication barriers, enabling more inclusive technologies, and advancing the field of multimodal learning by creating larger, more diverse datasets and robust models.
Papers
October 15, 2024
July 16, 2024
June 20, 2024
June 18, 2024
September 15, 2023
March 1, 2023
October 7, 2022
November 8, 2021