Speaker Change

Speaker change detection (SCD) focuses on accurately identifying transitions between speakers in audio recordings, a crucial task for applications like automatic speech recognition and transcription. Current research emphasizes improving SCD accuracy using various deep learning models, including transformer-transducer architectures and those incorporating both speaker-specific and content-based information, often leveraging self-supervised learning techniques. These advancements are driving improvements in real-time speech processing, particularly for multi-speaker scenarios such as meetings and broadcast media, and are also informing related tasks like spoken language change detection.

Papers