RNN Transducer

Recurrent Neural Network Transducers (RNN-Ts) are end-to-end models for sequence-to-sequence tasks, primarily used in automatic speech recognition (ASR) for their accuracy and streaming capabilities. Current research focuses on improving RNN-T efficiency through faster decoding algorithms (like greedy decoding on GPUs), architectural modifications (such as incorporating duration prediction or multi-blank symbols), and novel training methods (including alignment-free training and lattice-free discriminative training) to enhance speed and accuracy. These advancements are significant because they address limitations in computational cost and latency, making RNN-Ts more suitable for real-time and resource-constrained applications, while also improving accuracy and robustness in challenging scenarios like multi-talker speech.

Papers

February 26, 2022

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo
Training Data Automatic Speech Recognition System Hybrid Automatic Speech Recognition RNN Transducer End 2 End Automatic Speech RNN T Model

February 21, 2022

Adaptive Discounting of Implicit Language Models in RNN-Transducers
Vinit Unni, Shreya Khare, Ashish Mittal, Preethi Jyothi, Sunita Sarawagi, Samarth Bharadwaj
Language Model Mt RNN RNN Architecture RNN Transducer

February 8, 2022

RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences
Hagen Soltau, Izhak Shafran, Mingqiu Wang, Laurent El Shafey
Alignment Problem Long Sequence Participation Constraint Sequence to Sequence Model Mt RNN RNN Transducer Nested Named Entity Recognition

January 28, 2022

Improving End-to-End Models for Set Prediction in Spoken Language Understanding
Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon
Spoken Language Understanding Long Sequence Prediction Set End to End Model RNN Transducer Speech Modeling

December 1, 2021

Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding
Weiran Wang, Ke Hu, Tara Sainath
Transformer Decoder Mt RNN Hypothesis Generation RNN Transducer Political Deliberation

RNN Transducer

Papers

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Adaptive Discounting of Implicit Language Models in RNN-Transducers

RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding