Speech Recognition Corpus

Speech recognition corpora are collections of transcribed speech data used to train and evaluate automatic speech recognition (ASR) systems. Current research focuses on creating larger, more diverse corpora with richer annotations (e.g., punctuation, speaker demographics) and addressing challenges like long-form speech and cross-lingual transfer. Popular model architectures include transformer-based decoders, attention-based encoder-decoders, and transducer models, with a growing emphasis on efficient training and robust performance across diverse speech styles and languages. These advancements are crucial for improving the accuracy and robustness of ASR systems, impacting applications ranging from virtual assistants to accessibility technologies.

Papers