State of the Art Whisper

Whisper, a large-scale multilingual speech recognition model, is the focus of intense research aimed at improving its accuracy, efficiency, and robustness across diverse speech characteristics and applications. Current research emphasizes adapting Whisper for low-resource languages, improving streaming capabilities, mitigating adversarial attacks, and integrating it with other modalities like vision for audio-visual speech recognition. These advancements have significant implications for various fields, including healthcare (e.g., aphasia diagnosis), accessibility (e.g., improved speech-to-text for individuals with speech impairments), and security (e.g., developing defenses against malicious audio manipulation).

Papers