Speaker Embeddings
Speaker embeddings are numerical representations of speakers' voices, aiming to capture unique vocal characteristics for tasks like speaker recognition, diarization, and speech synthesis. Current research focuses on improving embedding robustness to noise and variations (e.g., through disentanglement techniques and adversarial training), enhancing their utility in multi-speaker scenarios (e.g., using recursive attention pooling and demultiplexing), and integrating them with other models (e.g., large language models and speech enhancement systems). These advancements have significant implications for improving the accuracy and efficiency of various speech processing applications, including improved privacy-preserving techniques and more natural-sounding speech synthesis.
Papers
Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings
Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni
Encoder-decoder multimodal speaker change detection
Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee
A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures
Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach