Diarization System
Speaker diarization aims to identify "who spoke when" in an audio recording, a crucial preprocessing step for various speech applications. Current research emphasizes developing more efficient and accurate systems, focusing on both modular approaches (combining embedding extraction, clustering, and other modules) and end-to-end neural models (like transformers and those based on Mask2Former architecture) that directly predict speaker labels. These advancements are improving the accuracy of diarization, particularly in handling overlapping speech and multiple speakers, leading to better performance in downstream tasks such as speech recognition and meeting transcription.
Papers
November 12, 2022
November 1, 2022
October 31, 2022
October 25, 2022
August 5, 2022
July 28, 2022
July 13, 2022
April 26, 2022
April 2, 2022
March 30, 2022
February 14, 2022