Word Error Rate

Word Error Rate (WER) quantifies the accuracy of automatic speech recognition (ASR) systems by measuring the discrepancies between machine-generated and human-created transcripts. Current research focuses on improving WER across diverse languages and speech styles, employing techniques like contextualization with large language models, advanced algorithms beyond Levenshtein distance for more granular error analysis, and novel model architectures such as Conformers and Mixture-of-Experts models to enhance accuracy and efficiency. Lowering WER is crucial for advancing ASR applications, impacting fields ranging from voice assistants and transcription services to accessibility technologies and multilingual communication.

Papers