Automatic Speech Recognition Error

Automatic Speech Recognition (ASR) errors, stemming from inaccuracies in transcribing spoken language to text, significantly hinder the performance of downstream natural language processing tasks. Current research focuses on mitigating these errors through techniques like incorporating ASR confidence scores and phoneme sequences into models, developing error detection and correction mechanisms using sequence-to-sequence models and large language models (LLMs), and employing multimodal fusion to leverage audio information alongside text. Addressing ASR errors is crucial for improving the reliability and accuracy of various applications, including voice assistants, medical transcription, and spoken language understanding systems.

Papers