Visual Speech Recognition
Visual speech recognition (VSR) aims to decipher spoken language solely from lip movements, a challenging task due to the inherent ambiguity of visual speech cues. Current research heavily focuses on improving model accuracy and efficiency through techniques like knowledge distillation from audio-based speech recognition models, end-to-end architectures incorporating CTC/attention mechanisms, and the use of large language models for context modeling. Advances in VSR hold significant implications for applications requiring silent communication or enhancing speech recognition in noisy environments, and are driving innovation in both computer vision and speech processing.
Papers
July 10, 2023
June 14, 2023
May 23, 2023
May 8, 2023
March 30, 2023
February 27, 2023
February 17, 2023
February 16, 2023
December 12, 2022
November 21, 2022
June 5, 2022
May 28, 2022
May 22, 2022
May 11, 2022
February 26, 2022
February 15, 2022