Non Autoregressive End to End
Non-autoregressive (NAR) end-to-end speech recognition aims to build faster and more efficient speech recognition systems by abandoning the sequential, token-by-token generation of autoregressive models. Current research focuses on improving the accuracy of NAR models, particularly using architectures like Paraformer, which employ techniques such as continuous integrate-and-fire mechanisms and glancing language models to address the inherent limitations of parallel decoding. This research is significant because it promises to drastically reduce the inference time of speech recognition systems, making them more suitable for real-time applications and resource-constrained environments while maintaining competitive accuracy.
Papers
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan