Generative Error Correction

Generative error correction (GEC) uses large language models (LLMs) to improve the accuracy of automatic speech recognition (ASR) systems by refining initial transcriptions. Current research focuses on enhancing GEC performance through techniques like multi-pass processing of ASR hypotheses, incorporating multimodal information (e.g., lip movements, acoustic features), and optimizing prompt engineering for LLMs. This approach holds significant promise for improving the robustness and accuracy of ASR across diverse languages and challenging acoustic conditions, leading to more effective speech-based interfaces and applications.

Papers