Multi Speaker Automatic Speech Recognition
Multi-speaker automatic speech recognition (ASR) aims to accurately transcribe speech from recordings containing multiple overlapping speakers, a challenging problem with significant real-world applications. Current research focuses on improving the robustness of ASR models to overlapping speech and noise, often employing techniques like speech separation, advanced attention mechanisms (e.g., cross-channel attention), and non-autoregressive architectures such as Paraformer to enhance speed and accuracy. These advancements are driven by the need for more efficient and accurate transcription in scenarios like meetings and multi-party conversations, impacting fields ranging from voice assistants to meeting summarization.
Papers
September 2, 2024
September 1, 2024
February 15, 2024
October 18, 2023
October 7, 2023
September 28, 2023
July 23, 2023
April 14, 2023
February 19, 2023
November 29, 2022
November 2, 2022
October 11, 2022
April 1, 2022
March 1, 2022
February 8, 2022
February 3, 2022