State of the Art Whisper

Whisper, a large-scale multilingual speech recognition model, is the focus of intense research aimed at improving its accuracy, efficiency, and robustness across diverse speech characteristics and applications. Current research emphasizes adapting Whisper for low-resource languages, improving streaming capabilities, mitigating adversarial attacks, and integrating it with other modalities like vision for audio-visual speech recognition. These advancements have significant implications for various fields, including healthcare (e.g., aphasia diagnosis), accessibility (e.g., improved speech-to-text for individuals with speech impairments), and security (e.g., developing defenses against malicious audio manipulation).

Papers

July 27, 2023

Turning Whisper into Real-Time Transcription System
Dominik Macháček, Raj Dabre, Ondřej Bojar
Speech Recognition Real Time State of the Art Whisper

July 18, 2023

OxfordVGG Submission to the EGO4D AV Transcription Challenge
Jaesung Huh, Max Bain, Andrew Zisserman
Speech Recognition Real Time State of the Art Whisper Long Form Audio

July 6, 2023

Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Yuan Gong, Sameer Khurana, Leonid Karlinsky, James Glass
Automatic Speech Recognition Speech Recognition Speech Corpus State of the Art Whisper Audio Tagging

July 4, 2023

Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos
Ashwin Rao
Artificial Intelligence Automatic Speech Recognition Preliminary Study Video Text State of the Art Whisper Educational Video Speech Text Transcript

June 5, 2023

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed
Self Supervised Speech Recognition State of the Art Whisper Arabic Speaker Recognition Benchmark Arabic Speech

June 2, 2023

May 18, 2023

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Hang Shao, Wei Wang, Bei Liu, Xun Gong, Haoyu Wang, Yanmin Qian
Knowledge Distillation Speech Recognition Quantization Operator State of the Art Whisper Whisper Model Quantization Loss

March 3, 2023

WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
Jun Rekimoto
Zero Shot Voice Conversion State of the Art Whisper Open Whisper Style Speech Model Whisper Encoder Speech Reconstruction Discrete Speech Unit

March 1, 2023

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman
Speech Recognition Real Time State of the Art Whisper Long Form Audio

February 18, 2023

Speaker and Language Change Detection using Wav2vec2 and Whisper
Tijn Berns, Nik Vaessen, David A. van Leeuwen
Automatic Speech Recognition Transformer Network Speaker Recognition Speaker Identity State of the Art Whisper Wav2vec U Speech Driven Pre Trained Network Speaker Change Detection

January 24, 2023

WhisperWand: Simultaneous Voice and Gesture Tracking Interface
Yang Bai, Irtaza Shahid, Harshvardhan Takawale, Nirupam Roy
State of the Art Whisper Multi Speaker Voice Assistant Voice Authentication Optimal Beacon Training Gesture Classification System

October 26, 2022

There is more than one kind of robustness: Fooling Whisper with adversarial examples
Raphael Olivier, Bhiksha Raj
Native Robustness Automatic Speech Recognition Adversarial Example Speech Recognition Adversarial Noise State of the Art Whisper Open Whisper Style Speech Model Whisper Model

February 22, 2022

Hidden bawls, whispers, and yelps: can text be made to sound more than just its words?
Caluã de Lacerda Pataca, Paula Dornhofer Paro Costa
Speech Analysis Visual Representation Word List Prosodic Feature State of the Art Whisper Paralinguistic Feature Hidden Pattern Kinetic Typography