Single Channel Speech Separation

Single-channel speech separation aims to isolate individual voices from a single-microphone recording containing overlapping speech, a crucial task for improving speech recognition and human-computer interaction in noisy environments. Current research focuses on developing computationally efficient models, such as lightweight Transformers and modified Conv-TasNets, that address the limitations of resource-intensive architectures while maintaining high accuracy, particularly in challenging conditions like reverberation and similar-pitch speakers. Efforts also concentrate on enhancing the perceptual quality of separated speech and improving robustness to mismatched training and testing conditions, leveraging techniques like diffusion models and refined permutation invariant training. These advancements have significant implications for applications ranging from hearing aids and voice assistants to meeting transcription and robotics.

Papers

October 28, 2024

SepMamba: State-space models for speaker separation using Mamba
Thor Højhus Avenstrup, Boldizsár Elek, István László Mádi, András Bence Schin, Morten Mørup, Bjørn Sand Jensen, Kenny Falkær Olsen
State Space Model Mamba in Mamba Speech Separation Transformer Based Architecture Transformer Attention Speaker Separation Single Channel Speech Separation Mamba Layer

July 22, 2024

Robustness of Speech Separation Models for Similar-pitch Speakers
Bunlong Lay, Sebastian Zaczek, Kristina Tesch, Timo Gerkmann
Native Robustness Speech Recognition System Multi Speaker Speaker Similarity Speech Separation Model Single Channel Speech Separation Pitch Variation

July 1, 2024

Papez: Resource-Efficient Speech Separation with Auditory Working Memory
Hyunseok Oh, Juheon Yi, Youngki Lee
Transformer Based Model Speech Separation Working Memory Recurrent Transformer Action Chunking Single Channel Speech Separation

January 7, 2024

Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments
Renana Opochinsky, Mordehay Moradi, Sharon Gannot
Industrial Disturbing Noise Speech Separation Voice Activity Detection Reverberant Environment Separation Model Single Channel Speech Separation Time Frequency Attention

May 10, 2023

Diffusion-based Signal Refiner for Speech Separation
Masato Hirano, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji
Speech Separation Signal Recovery Single Channel Speech Separation Diffusion Based Speech Enhancement

March 14, 2023

Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments
Julian Neri, Sebastian Braun
Industrial Disturbing Noise Speech Separation Reverberant Environment Single Channel Speech Separation Channel Separation

March 6, 2023

Scaling strategies for on-device low-complexity source separation with Conv-Tasnet
Mohamed Nabih Ali, Francesco Paissan, Daniele Falavigna, Alessio Brutti
Neural Model Speech Separation Neural Approach Single Channel Speech Separation Scaling Strategy Conv TasNet

December 14, 2022

Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation
Yinhao Xu, Jian Zhou, Liang Tao, Hon Keung Kwan
Multi Scale Speech Separation Dual Path Single Channel Speech Separation Time Domain Audio Separation Network

November 1, 2022

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings
Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai
Automatic Speech Recognition Speech Recognition Comparative Study Diarization System Single Channel Speech Separation Multi Channel Input Speaker Attributed Automatic Speech Recognition

October 23, 2022

Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation
Xiaoyu Liu, Xu Li, Joan Serrà
Speaker Embeddings Empirical Evidence Speaker Separation Single Channel Speech Separation Social Entity Embeddings Overlooked Aspect

May 24, 2022

SepIt: Approaching a Single Channel Speech Separation Bound
Shahar Lutati, Eliya Nachmani, Lior Wolf
Deep Neural Network Speech Analysis Mutual Information Maximization Speaker Change Single Channel Speech Separation

April 27, 2022

Ultra Fast Speech Separation Model with Teacher Student Learning
Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu
Speech Separation Teacher Student Tiny Transformer Single Channel Speech Separation

November 16, 2021

Single-channel speech separation using Soft-minimum Permutation Invariant Training
Midia Yousefi, John H. L. Hansen
Speech Separation Speech Separation Model Speaker Label Single Channel Speech Separation

Single Channel Speech Separation

Papers

SepMamba: State-space models for speaker separation using Mamba

Robustness of Speech Separation Models for Similar-pitch Speakers

Papez: Resource-Efficient Speech Separation with Auditory Working Memory

Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments

Diffusion-based Signal Refiner for Speech Separation

Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments

Scaling strategies for on-device low-complexity source separation with Conv-Tasnet

Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings

Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation

SepIt: Approaching a Single Channel Speech Separation Bound

Ultra Fast Speech Separation Model with Teacher Student Learning

Single-channel speech separation using Soft-minimum Permutation Invariant Training