Connectionist Temporal Classification
Connectionist Temporal Classification (CTC) is a widely used technique in sequence modeling, particularly for automatic speech recognition (ASR), aiming to efficiently align input sequences (e.g., audio spectrograms) with output sequences (e.g., text transcriptions) without requiring explicit frame-level alignment. Current research focuses on improving CTC's performance and efficiency through methods like consistency regularization, speaker-aware training, and encoder prompting, often integrated with transformer architectures or hybrid CTC/attention-based models. These advancements are significantly impacting ASR applications by enhancing accuracy, speed, and robustness, particularly in challenging scenarios such as multi-talker speech, low-resource languages, and code-switching.
Papers
HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch
Tina Raissi, Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney
Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting
Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff