Speech Enhancement
Speech enhancement aims to improve the clarity and intelligibility of speech signals degraded by noise and reverberation, crucial for applications like hearing aids and voice assistants. Current research focuses on developing computationally efficient models, including lightweight convolutional neural networks, recurrent neural networks (like LSTMs), and diffusion models, often incorporating techniques like multi-channel processing, attention mechanisms, and self-supervised learning to achieve high performance with minimal latency. These advancements are driving progress towards more robust and resource-efficient speech enhancement systems for a wide range of real-world applications, particularly in low-power devices and challenging acoustic environments. The field also explores the integration of visual information and advanced signal processing techniques to further enhance performance.
Papers
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation
Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Parnamaa, Huaming Wang
Speech enhancement using ego-noise references with a microphone array embedded in an unmanned aerial vehicle
Elisa Tengan, Thomas Dietzen, Santiago Ruiz, Mansour Alkmim, João Cardenuto, Toon van Waterschoot
Self-Supervised Learning for Speech Enhancement through Synthesis
Bryce Irvin, Marko Stamenovic, Mikolaj Kegler, Li-Chia Yang
Cold Diffusion for Speech Enhancement
Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux
Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration
Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann
Analysis of Noisy-target Training for DNN-based speech enhancement
Takuya Fujimura, Tomoki Toda
Inference and Denoise: Causal Inference-based Neural Speech Enhancement
Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao
Fast and efficient speech enhancement with variational autoencoders
Mostafa Sadeghi, Romain Serizel
A weighted-variance variational autoencoder model for speech enhancement
Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel
A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech
Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji
Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting
Yuxuan Du, Ruohua Zhou
Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement
Jianqiao Cui, Stefan Bleeck
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks
Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Yufeng Yang, Ashutosh Pandey, DeLiang Wang
A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids
Abhijeet Bishnu, Ankit Gupta, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Amir Hussain, Mathini Sellathurai, Tharmalingam Ratnarajah
TridentSE: Guiding Speech Enhancement with 32 Global Tokens
Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo