Speaker Diarization
Speaker diarization is the task of identifying "who spoke when" in an audio recording, a crucial preprocessing step for many speech applications. Current research focuses on improving accuracy and efficiency, particularly in challenging scenarios like multi-speaker conversations and noisy environments, using techniques such as end-to-end neural networks, spectral clustering, and the integration of audio-visual or semantic information. These advancements are driving progress in areas like meeting transcription, multilingual speech processing, and improving the performance of downstream tasks such as automatic speech recognition.
Papers
Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control
Alexander Blatt, Aravind Krishnan, Dietrich Klakow
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu
The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments
Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
Martina Valente, Fabio Brugnara, Giovanni Morrone, Enrico Zovato, Leonardo Badino
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mukhtar Mohamed, Oli Danyi Liu, Hao Tang, Sharon Goldwater
Neural Blind Source Separation and Diarization for Distant Speech Recognition
Yoshiaki Bando, Tomohiko Nakamura, Shinji Watanabe
Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions
Anfeng Xu, Kevin Huang, Tiantian Feng, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan