End to End Speaker Diarization
End-to-end speaker diarization aims to automatically determine "who spoke when" in an audio recording using a single neural network, overcoming limitations of traditional modular approaches, especially in handling overlapping speech. Current research emphasizes the development and refinement of end-to-end models, often employing encoder-decoder architectures, self-attention mechanisms, and novel clustering techniques to improve accuracy and efficiency, particularly in challenging multi-speaker scenarios. These advancements are significantly impacting speech processing applications, such as improving the accuracy of automatic speech recognition in complex audio environments and enabling more robust human-computer interaction systems.
Papers
July 2, 2024
June 27, 2024
May 9, 2024
October 2, 2023
September 25, 2023
August 28, 2023
March 2, 2023