Speaker Modeling
Speaker modeling aims to create robust representations of individual voices from speech data, enabling applications like speaker recognition, diarization, and voice conversion. Current research emphasizes developing more efficient and accurate speaker embeddings, often using deep learning architectures like convolutional neural networks and incorporating attention mechanisms to capture crucial temporal and spectral information. This work is crucial for improving the performance of various speech technologies and addressing challenges like limited data availability for under-represented languages and accents, as well as the need for more nuanced models that account for voice flexibility and individual vocal characteristics.
Papers
Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification
Jingyu Li, Yusheng Tian, Tan Lee
Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian