Speaker Representation
Speaker representation focuses on extracting meaningful and discriminative features from speech data to characterize individual speakers. Current research emphasizes unsupervised and self-supervised learning methods, often employing architectures like transformers, conformers, and contrastive learning frameworks, to overcome limitations of data scarcity and improve robustness to noise and speaking style variations. These advancements are crucial for improving performance in various speech applications, including speaker recognition, diarization, voice conversion, and speech synthesis, ultimately leading to more accurate and efficient systems. The development of robust and versatile speaker representations is a key driver of progress in the broader field of speech processing.
Papers
A comprehensive study on self-supervised distillation for speaker representation learning
Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng
Hierarchical speaker representation for target speaker extraction
Shulin He, Huaiwen Zhang, Wei Rao, Kanghao Zhang, Yukai Ju, Yang Yang, Xueliang Zhang
Generation of Speaker Representations Using Heterogeneous Training Batch Assembly
Yu-Huai Peng, Hung-Shin Lee, Pin-Tuan Huang, Hsin-Min Wang
Multi-target Extractor and Detector for Unknown-number Speaker Diarization
Chin-Yi Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Multi-scale Speaker Diarization with Dynamic Scale Weighting
Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg