Automatic Speech Recognition Hypothesis

Automatic speech recognition (ASR) hypothesis research focuses on improving the accuracy and robustness of speech-to-text transcriptions, primarily by addressing errors in recognizing infrequent words or noisy audio. Current efforts leverage large language models (LLMs) for tasks like rescoring N-best ASR hypotheses, correcting errors using retrieval-augmented generation or conservative data filtering, and improving confidence estimation. These advancements are significant because they enhance the reliability of ASR systems across various applications, from voice assistants and speech emotion recognition to spoken language understanding, ultimately leading to more natural and effective human-computer interaction.

Papers