Target Speaker
Target speaker extraction (TSE) aims to isolate a specific person's voice from a noisy audio mixture, mimicking the human "cocktail party effect." Current research focuses on improving robustness to challenging conditions (e.g., overlapping speech, low signal-to-noise ratios) using various techniques, including curriculum learning, beamforming, and neural networks (e.g., convolutional recurrent networks, LSTMs) often incorporating visual cues or textual descriptions to enhance accuracy. These advancements have significant implications for improving speech recognition in noisy environments, enhancing hearing aids, and enabling more natural and effective human-computer interaction.
Papers
October 27, 2024
October 15, 2024
August 30, 2024
June 12, 2024
May 10, 2024
April 29, 2024
January 29, 2024
December 18, 2023
October 16, 2023
October 11, 2023
June 11, 2023
March 15, 2023
February 25, 2023
January 31, 2023
August 18, 2022
August 9, 2022
June 17, 2022
June 16, 2022
April 11, 2022