Automatic Speech Recognition Error Correction
Automatic speech recognition (ASR) error correction aims to improve the accuracy and readability of ASR transcripts by leveraging the power of large language models (LLMs). Current research focuses on refining LLMs for this task through techniques like prompt engineering, constrained decoding using N-best lists or lattices, and multi-modal approaches incorporating visual cues (e.g., lip movements) or phonetic information. These advancements are significant because accurate transcriptions are crucial for various applications, including emotion recognition, clinical documentation, and improving the overall performance of downstream tasks that rely on speech-to-text conversion.
Papers
HTEC: Human Transcription Error Correction
Hanbo Sun, Jian Gao, Xiaomin Wu, Anjie Fang, Cheng Cao, Zheng Du
A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
Yuang Li, Min Zhang, Chang Su, Yinglu Li, Xiaosong Qiao, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Shimin Tao, Hao Yang