Language Diarization

Language diarization (LD) aims to automatically identify the spoken language(s) and their temporal boundaries within a multi-speaker conversation, often in multilingual settings. Current research focuses on improving LD accuracy using techniques like spectral clustering, self-supervised learning with architectures such as WavLM, and integrating LD with speaker diarization and speech recognition systems, often employing implicit language modeling to handle low-resource languages. These advancements are crucial for improving the performance of speech technologies in diverse, real-world scenarios, such as multilingual transcription and cross-lingual information retrieval. The development of robust LD systems is vital for bridging the language gap in increasingly globalized communication.

Papers