Token and Duration Transducer

Token-and-Duration Transducers (TDTs) are a novel sequence-to-sequence model architecture designed to improve the speed and accuracy of tasks like speech recognition and translation. Research focuses on optimizing TDT inference through novel decoding algorithms, such as label-looping, which prioritize label processing over frame-by-frame analysis, leading to significant speedups. This approach, by jointly predicting tokens and their durations, allows for faster processing by skipping irrelevant input frames, resulting in improved efficiency and accuracy across various applications compared to traditional transducer models.

Papers

June 10, 2024

Label-Looping: Highly Efficient Decoding for Transducers
Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg
Finite State Transducer Transformer Transducer Sequence Transducer CUR Decomposition Efficient Decoding Token and Duration Transducer

March 20, 2024

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu
Keyword Spotting Word Detection Keyword Search Token and Duration Transducer

April 13, 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg
Speech Recognition Speech Translation Token Prediction Sequence to Sequence Task RNN Transducer Long Duration Sequence Transduction Token and Duration Transducer

Token and Duration Transducer

Papers

Label-Looping: Highly Efficient Decoding for Transducers

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations