Speaker Diarization
Speaker diarization is the task of identifying "who spoke when" in an audio recording, a crucial preprocessing step for many speech applications. Current research focuses on improving accuracy and efficiency, particularly in challenging scenarios like multi-speaker conversations and noisy environments, using techniques such as end-to-end neural networks, spectral clustering, and the integration of audio-visual or semantic information. These advancements are driving progress in areas like meeting transcription, multilingual speech processing, and improving the performance of downstream tasks such as automatic speech recognition.
Papers
Privacy-preserving Automatic Speaker Diarization
Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Bowen Pang, Huan Zhao, Gaosheng Zhang, Xiaoyue Yang, Yang Sun, Li Zhang, Qing Wang, Lei Xie
Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting
Yuxuan Du, Ruohua Zhou