Cross Speaker

Cross-speaker modeling in speech processing aims to improve the accuracy and robustness of systems that handle multiple speakers, particularly in challenging scenarios like overlapping speech or conversations. Current research focuses on integrating single-speaker and multi-speaker models, leveraging long-range contextual information through transformer networks and graph-based methods, and developing efficient representations of cross-utterance and cross-speaker context. These advancements are leading to significant improvements in automatic speech recognition (ASR), speaker diarization, and related tasks, with potential applications in human-computer interaction, meeting transcription, and accessibility technologies.

Papers