Non Autoregressive ASR

Non-autoregressive (NAR) automatic speech recognition (ASR) aims to improve the speed and efficiency of speech-to-text conversion by processing the entire audio sequence simultaneously, unlike traditional autoregressive methods. Current research focuses on enhancing the accuracy of NAR ASR, particularly through techniques like incorporating lexical information, leveraging pre-trained models, and developing novel architectures such as folded encoders and contextual Paraformers to address limitations in handling rare words and customizing hotwords. These advancements offer significant potential for faster and more efficient speech processing in various applications, including real-time transcription and personalized voice assistants.

Papers