M2MeT Challenge
The M2MeT Challenge benchmarks the state-of-the-art in speaker-attributed automatic speech recognition (SA-ASR), focusing on accurately transcribing multi-speaker, multi-channel meetings and identifying who spoke what when. Current research emphasizes robust voice activity detection (VAD) techniques, often incorporating cross-channel attention mechanisms and advanced model architectures like Conformers, to handle overlapping speech and noisy environments. Success in this challenge directly impacts the development of more accurate and efficient transcription systems for real-world applications like meeting summarization and assistive technologies.
Papers
Audio-Based Deep Learning Frameworks for Detecting COVID-19
Dat Ngo, Lam Pham, Truong Hoang, Sefki Kolozali, Delaram Jarchi
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge
Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee