Visual Speech Recognition
Visual speech recognition (VSR) aims to decipher spoken language solely from lip movements, a challenging task due to the inherent ambiguity of visual speech cues. Current research heavily focuses on improving model accuracy and efficiency through techniques like knowledge distillation from audio-based speech recognition models, end-to-end architectures incorporating CTC/attention mechanisms, and the use of large language models for context modeling. Advances in VSR hold significant implications for applications requiring silent communication or enhancing speech recognition in noisy environments, and are driving innovation in both computer vision and speech processing.
Papers
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish
David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
Analysis of Visual Features for Continuous Lipreading in Spanish
David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos