Audio Visual Speech Representation
Audio-visual speech representation focuses on creating computational models that effectively integrate audio and visual information from speech, aiming to improve tasks like speech recognition and lip reading. Current research emphasizes self-supervised learning methods, often employing transformer-based architectures like HuBERT variants, to learn robust representations from large, unlabeled datasets, and incorporating techniques like viseme analysis and contextual modeling via LLMs to enhance accuracy. These advancements hold significant promise for improving human-computer interaction, accessibility technologies for the hearing impaired, and robust speech processing in noisy environments.
Papers
May 7, 2024
February 23, 2024
July 19, 2023
March 14, 2023
February 10, 2023
December 6, 2022
October 31, 2022
October 12, 2022
May 15, 2022
March 31, 2022
February 15, 2022