Non Autoregressive End to End

Non-autoregressive (NAR) end-to-end speech recognition aims to build faster and more efficient speech recognition systems by abandoning the sequential, token-by-token generation of autoregressive models. Current research focuses on improving the accuracy of NAR models, particularly using architectures like Paraformer, which employ techniques such as continuous integrate-and-fire mechanisms and glancing language models to address the inherent limitations of parallel decoding. This research is significant because it promises to drastically reduce the inference time of speech recognition systems, making them more suitable for real-time applications and resource-constrained environments while maintaining competitive accuracy.

Papers