Diarization System
Speaker diarization aims to identify "who spoke when" in an audio recording, a crucial preprocessing step for various speech applications. Current research emphasizes developing more efficient and accurate systems, focusing on both modular approaches (combining embedding extraction, clustering, and other modules) and end-to-end neural models (like transformers and those based on Mask2Former architecture) that directly predict speaker labels. These advancements are improving the accuracy of diarization, particularly in handling overlapping speech and multiple speakers, leading to better performance in downstream tasks such as speech recognition and meeting transcription.
Papers
September 7, 2024
July 5, 2024
June 27, 2024
June 24, 2024
June 20, 2024
January 30, 2024
January 23, 2024
November 27, 2023
October 18, 2023
October 12, 2023
October 4, 2023
September 28, 2023
September 22, 2023
September 15, 2023
September 14, 2023
September 11, 2023
May 29, 2023
May 23, 2023
March 21, 2023