Transformer Transducer
Transformer Transducers are neural network architectures designed for sequence-to-sequence tasks, primarily focusing on improving the speed and accuracy of automatic speech recognition (ASR). Current research emphasizes enhancing efficiency through lightweight models, optimized decoding algorithms (like label-looping), and novel training strategies such as token-level loss functions and global normalization. These advancements aim to improve the accuracy and reduce the latency of streaming ASR, impacting both research through improved benchmarks and practical applications like real-time speech translation and keyword spotting.
Papers
Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss
Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno
Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting
Beltrán Labrador, Guanlong Zhao, Ignacio López Moreno, Angelo Scorza Scarpati, Liam Fowl, Quan Wang