Non Streaming Automatic Speech Recognition

Non-streaming automatic speech recognition (ASR) focuses on accurately transcribing speech recordings after the entire utterance is available, prioritizing accuracy over real-time processing. Current research emphasizes improving model architectures like transformers and transducers, often incorporating techniques such as self-supervised pre-training, knowledge distillation between streaming and non-streaming models, and novel attention mechanisms to enhance accuracy and efficiency. These advancements aim to bridge the performance gap between non-streaming and streaming ASR, leading to more robust and accurate speech transcription systems for various applications, including improved voice assistants and transcription services.

Papers