Speech Recognition Accuracy

Automatic speech recognition (ASR) aims to accurately convert spoken language into text, a crucial task with broad applications. Current research focuses on improving accuracy, particularly for challenging scenarios like low-resource languages, accented speech, and noisy environments, often employing techniques like retrieval-augmented generation and contextual awareness within transformer-based models (e.g., Conformers) and large language models (LLMs). These advancements are vital for enhancing the accessibility and reliability of speech technologies across diverse populations and applications, including healthcare, assistive technologies, and human-computer interaction.

Papers