Multi Temporal Lip Audio Memory
Multi-temporal lip audio memory research aims to improve visual speech recognition (VSR) by leveraging audio information to compensate for the inherent ambiguity of lip movements. Current efforts focus on developing models that effectively integrate multi-temporal audio features (capturing short- and long-term context) with visual lip data, often employing Siamese networks, transformers, and attention mechanisms to learn robust visual-to-audio mappings. This research is significant because it addresses limitations in current VSR systems, potentially leading to more accurate and robust speech recognition in noisy environments or situations with limited visual clarity, with applications in assistive technologies and deepfake detection.
Papers
October 28, 2024
September 9, 2024
July 11, 2024
June 25, 2024
January 28, 2024
October 29, 2023
July 3, 2023
June 18, 2023
June 14, 2023
June 5, 2023
May 8, 2023
March 28, 2023
November 20, 2022
November 2, 2022
May 28, 2022
April 5, 2022
April 4, 2022