Transformer Transducer

Transformer Transducers are neural network architectures designed for sequence-to-sequence tasks, primarily focusing on improving the speed and accuracy of automatic speech recognition (ASR). Current research emphasizes enhancing efficiency through lightweight models, optimized decoding algorithms (like label-looping), and novel training strategies such as token-level loss functions and global normalization. These advancements aim to improve the accuracy and reduce the latency of streaming ASR, impacting both research through improved benchmarks and practical applications like real-time speech translation and keyword spotting.

Papers

January 14, 2022

A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, Shinji Watanabe
End to End Multi Task Learning Librispeech Speech Recognition Decoding Method Auxiliary Loss Transformer Transducer RNN T Loss ESPnet ST

Transformer Transducer

Papers

A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies