Audio Visual Speech

Audio-visual speech research focuses on leveraging the combined information from audio and visual speech signals to improve speech processing tasks. Current research emphasizes direct audio-visual to audio-visual translation, employing models that learn unified audio-visual representations through self-supervised learning and transformer-based architectures to achieve real-time, high-fidelity translation and robust speech recognition even in noisy conditions. This interdisciplinary field is significant for advancing speech technology, enabling improved speech recognition, translation, and enhancement, with applications ranging from virtual meetings to assistive technologies for the hearing impaired.

Papers