CTC Based
Connectionist Temporal Classification (CTC) is a widely used technique for sequence modeling, primarily in speech recognition and related areas like machine translation and text recognition. Current research focuses on improving CTC's accuracy and efficiency through methods like consistency regularization, hybrid CTC/attention architectures, and incorporating pretrained language models or acoustic models. These advancements aim to address limitations such as latency, robustness to noise, and handling of unseen words, ultimately leading to more accurate and efficient systems for various applications including medical image analysis and cross-technology communication.
Papers
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Rui Zhao, Jinyu Li, Ruchao Fan, Matt Post
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao, Wei Kang, Xiaoyu Yang, Fangjun Kuang, Liyong Guo, Han Zhu, Zengrui Jin, Zhaoqing Li, Long Lin, Daniel Povey