Speaker Adaptation
Speaker adaptation in speech and visual processing aims to personalize models for individual speakers, overcoming limitations of generic models that struggle with variations in voice, lip movements, and speaking styles. Current research focuses on efficient adaptation techniques, often employing lightweight modules like Low-Rank Adaptation (LoRA) within larger architectures such as transformers and diffusion models, or leveraging techniques like k-Nearest Neighbors and prototype-based methods. These advancements are significant for improving the robustness and personalization of speech recognition, text-to-speech, lip reading, and other related applications, particularly in low-resource scenarios or for individuals with speech impairments.
Papers
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing
Tao Wang, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen
Investigation of Data Augmentation Techniques for Disordered Speech Recognition
Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng