Speaker Label

Speaker labeling, crucial for tasks like speaker diarization and speech recognition, involves identifying which speaker uttered which segment of audio. Current research focuses on improving accuracy and efficiency, particularly through the development of online diarization methods, self-supervised models (including contrastive and generative architectures), and novel training strategies like multi-label training and contrastive loss for knowledge distillation. These advancements are driving improvements in various applications, including personalized speech services, human-computer interaction, and analysis of vocal interactions in developmental studies, by enabling more robust and efficient processing of speech data.

Papers