Speech Signal
Speech signals are the acoustic representations of spoken language, and research focuses on improving their processing for various applications. Current efforts concentrate on developing robust models for speech enhancement (e.g., using diffusion models and state-space models like Mamba), source separation (leveraging techniques like attention mechanisms and incorporating spatial information), and accurate recognition, even in noisy or challenging environments. These advancements have significant implications for improving human-computer interaction, assistive technologies for individuals with hearing impairments, and applications in healthcare (e.g., disease detection using speech biomarkers) and security (e.g., synthetic speech detection).
Papers
ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets
Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention
Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie