Multi Speaker

Multi-speaker research focuses on developing robust systems capable of processing and understanding audio and video containing multiple simultaneous speakers. Current efforts concentrate on improving speech separation and recognition techniques, often employing deep neural networks like Conformers and Transformers, along with innovative training methods such as Serialized Output Training and speaker-aware CTC. These advancements are crucial for applications ranging from meeting transcription and voice assistants to improving accessibility for individuals with hearing impairments, driving significant progress in both speech processing and human-computer interaction.

Papers