Streaming End to End
Streaming end-to-end (E2E) automatic speech recognition (ASR) focuses on building real-time speech recognition systems that minimize latency without sacrificing accuracy. Current research emphasizes efficient model architectures like decoder-only models and sequence transducers, often incorporating techniques such as chunk-based processing, CTC loss functions, and multi-pass rescoring to improve both speed and accuracy. These advancements are crucial for applications requiring low-latency interaction, such as real-time transcription, voice assistants, and interactive dialogue systems, driving improvements in both the efficiency and accuracy of speech processing technology.
Papers
June 23, 2024
September 9, 2023
June 2, 2023
November 4, 2022
October 31, 2022
July 6, 2022