Mt RNN
Recurrent Neural Network Transducers (RNN-Ts) are a prominent architecture for end-to-end speech recognition, aiming to improve accuracy and efficiency in streaming applications. Current research focuses on optimizing RNN-T models through techniques like efficient decoding algorithms (e.g., greedy decoding, beam search), model compression (e.g., weight binarization), and addressing challenges such as training data imperfections and robustness to adversarial attacks. These advancements are significant for improving the speed, memory efficiency, and accuracy of speech recognition systems, impacting both research in automatic speech recognition and the development of real-world applications like voice assistants and voice search.
Papers
CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Chengxin Chen, Pengyuan Zhang
An Empirical Study of Language Model Integration for Transducer based Speech Recognition
Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan