Non Streaming

Non-streaming automatic speech recognition (ASR) models process the entire audio input before generating transcriptions, offering superior accuracy compared to their streaming counterparts which process audio in real-time. Current research focuses on bridging the performance gap between streaming and non-streaming ASR, employing techniques like knowledge distillation to transfer knowledge from non-streaming models to streaming ones, and using contextual biasing and contrastive learning to improve accuracy. These advancements aim to improve the accuracy of real-time speech recognition systems while maintaining low latency, impacting applications such as voice search, virtual assistants, and on-device speech processing.

Papers