Overlapped Speech

Overlapped speech, the simultaneous utterance of multiple speakers, presents a significant challenge for automatic speech recognition (ASR) systems. Current research focuses on developing end-to-end models, often employing architectures like Connectionist Temporal Classification (CTC) or encoder-decoder networks with serialized output training (SOT), to simultaneously separate and transcribe overlapping speech, often integrating speaker diarization (identifying "who spoke when"). These advancements aim to improve the accuracy of ASR in real-world scenarios like meetings and conversations, impacting fields ranging from human-computer interaction to social science research through improved transcription and analysis of multi-speaker audio.

Papers