Connectionist Temporal Classification
Connectionist Temporal Classification (CTC) is a widely used technique in sequence modeling, particularly for automatic speech recognition (ASR), aiming to efficiently align input sequences (e.g., audio spectrograms) with output sequences (e.g., text transcriptions) without requiring explicit frame-level alignment. Current research focuses on improving CTC's performance and efficiency through methods like consistency regularization, speaker-aware training, and encoder prompting, often integrated with transformer architectures or hybrid CTC/attention-based models. These advancements are significantly impacting ASR applications by enhancing accuracy, speed, and robustness, particularly in challenging scenarios such as multi-talker speech, low-resource languages, and code-switching.
Papers
LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu
Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition
Chen Xu, Xiaoqian Liu, Erfeng He, Yuhao Zhang, Qianqian Dong, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang
Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed