Speech Enhancement
Speech enhancement aims to improve the clarity and intelligibility of speech signals degraded by noise and reverberation, crucial for applications like hearing aids and voice assistants. Current research focuses on developing computationally efficient models, including lightweight convolutional neural networks, recurrent neural networks (like LSTMs), and diffusion models, often incorporating techniques like multi-channel processing, attention mechanisms, and self-supervised learning to achieve high performance with minimal latency. These advancements are driving progress towards more robust and resource-efficient speech enhancement systems for a wide range of real-world applications, particularly in low-power devices and challenging acoustic environments. The field also explores the integration of visual information and advanced signal processing techniques to further enhance performance.
Papers
A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech
Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji
Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting
Yuxuan Du, Ruohua Zhou
Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement
Jianqiao Cui, Stefan Bleeck
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks
Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Yufeng Yang, Ashutosh Pandey, DeLiang Wang
A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids
Abhijeet Bishnu, Ankit Gupta, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Amir Hussain, Mathini Sellathurai, Tharmalingam Ratnarajah
TridentSE: Guiding Speech Enhancement with 32 Global Tokens
Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo