Recurrent Neural Network Transducer

Recurrent Neural Network Transducers (RNN-Ts) are a prominent architecture for end-to-end automatic speech recognition (ASR), aiming to improve accuracy and efficiency in streaming applications. Current research focuses on optimizing RNN-T models for various constraints, including low latency, reduced memory footprint (through techniques like binarization and knowledge distillation), and robustness to noisy data or diverse acoustic conditions. These advancements are significant for deploying accurate and efficient ASR systems on resource-limited devices and improving the performance of various speech-related applications, such as voice assistants and conversational AI.

Papers

June 30, 2022

Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition
Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow
Quantization Aware Training Recurrent Neural Network Transducer Neural Network Accelerator Compressor Based Machine

May 10, 2022

Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
Ilya Sklyar, Anna Piunova, Christian Osendorfer
Speech Recognition Segmentation Based Approach Speech Separation Multi Party Mt RNN Recurrent Neural Network Transducer

March 29, 2022

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata
Strong Generalization Automatic Speech Recognition Acoustic Model Label Smoothing Recurrent Neural Network Transducer Phoneme Sequence

January 25, 2022

Improving the fusion of acoustic and text representations in RNN-T
Chao Zhang, Bo Li, Zhiyun Lu, Tara N. Sainath, Shuo-yiin Chang
Hybrid Fusion Text Representation Multilingual Automatic Speech Recognition Speech Encoder Mt RNN Recurrent Neural Network Transducer RNN T Training

January 10, 2022

A Likelihood Ratio based Domain Adaptation Method for E2E Models
Chhavi Choudhury, Ankur Gandhe, Xiaohan Ding, Ivan Bulyko
Domain Adaptation Contextual Biasing Speech Recognition Model Mt RNN End 2 End Recurrent Neural Network Transducer Likelihood Ratio

November 19, 2021

A comparison of streaming models and data augmentation methods for robust speech recognition
Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim
Speech Recognition Consistent Comparison Data Augmentation Method Mt RNN Recurrent Neural Network Transducer Streaming Model RNN T Model

Recurrent Neural Network Transducer

Papers

Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition

Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Improving the fusion of acoustic and text representations in RNN-T

A Likelihood Ratio based Domain Adaptation Method for E2E Models

A comparison of streaming models and data augmentation methods for robust speech recognition