Monaural Speech Separation

Monaural speech separation aims to isolate individual voices from a single audio recording, a challenging problem with significant implications for hearing aids and speech recognition systems. Recent research focuses on improving model architectures, such as transformers and conformers, often incorporating recurrent modules or convolutional layers to better capture both long-range and fine-scale temporal dependencies in speech signals. Unsupervised learning techniques are also gaining traction, leveraging strategies like remixing or exploiting over-determined mixtures to train models without requiring large labeled datasets. These advancements are leading to substantial performance gains in separating speech sources, even in noisy and reverberant environments.

Papers