State of the Art Whisper
Whisper, a large-scale multilingual speech recognition model, is the focus of intense research aimed at improving its accuracy, efficiency, and robustness across diverse speech characteristics and applications. Current research emphasizes adapting Whisper for low-resource languages, improving streaming capabilities, mitigating adversarial attacks, and integrating it with other modalities like vision for audio-visual speech recognition. These advancements have significant implications for various fields, including healthcare (e.g., aphasia diagnosis), accessibility (e.g., improved speech-to-text for individuals with speech impairments), and security (e.g., developing defenses against malicious audio manipulation).
Papers
December 31, 2024
December 30, 2024
December 16, 2024
December 15, 2024
December 7, 2024
December 1, 2024
November 19, 2024
September 24, 2024
September 20, 2024
September 19, 2024
September 14, 2024
September 12, 2024
August 27, 2024
August 25, 2024
August 20, 2024
July 14, 2024
July 13, 2024
July 11, 2024
July 5, 2024
July 2, 2024