Visual Speech

Visual speech research focuses on understanding and utilizing the visual aspects of spoken language, aiming to improve speech recognition and translation, particularly in noisy environments or for individuals with hearing impairments. Current research employs deep learning models, including transformers and autoencoders, often incorporating self-supervised learning and multimodal fusion techniques to integrate audio and visual information effectively. This field is significant for its potential to enhance human-computer interaction, improve accessibility for the hearing impaired, and advance applications such as speech-to-speech translation and video dubbing.

Papers