Speech Separation
Speech separation aims to isolate individual voices from a mixture of sounds, a crucial task for applications like hearing aids and voice assistants. Current research emphasizes developing efficient and robust models, focusing on architectures like Transformers and state-space models (e.g., Mamba) to handle complex acoustic environments (noise, reverberation, moving sources) and varying numbers of speakers. This involves creating large, realistic datasets, incorporating visual cues (audio-visual models), and exploring techniques like unsupervised learning and efficient model compression to improve performance and reduce computational demands for real-time applications. Advances in this field directly impact the development of more effective and user-friendly speech technologies.
Papers
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Kai Li, Wendi Sang, Chang Zeng, Runxuan Yang, Guo Chen, Xiaolin Hu
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Mohan Xu, Kai Li, Guo Chen, Xiaolin Hu
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux
Enhanced Reverberation as Supervision for Unsupervised Speech Separation
Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux