Multilingual Automatic Speech Recognition
Multilingual Automatic Speech Recognition (MASR) aims to build systems capable of accurately transcribing speech across multiple languages, overcoming the limitations of monolingual models. Current research focuses on improving accuracy, particularly for low-resource languages, through techniques like weighted cross-entropy loss functions, N-best re-ranking, and efficient adapter modules within architectures such as Conformers and Whisper. These advancements are crucial for bridging language barriers in various applications, from healthcare to global communication, and are driving significant progress in both the theoretical understanding and practical deployment of speech technology.
Papers
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang