Speaker Embeddings

Speaker embeddings are numerical representations of speakers' voices, aiming to capture unique vocal characteristics for tasks like speaker recognition, diarization, and speech synthesis. Current research focuses on improving embedding robustness to noise and variations (e.g., through disentanglement techniques and adversarial training), enhancing their utility in multi-speaker scenarios (e.g., using recursive attention pooling and demultiplexing), and integrating them with other models (e.g., large language models and speech enhancement systems). These advancements have significant implications for improving the accuracy and efficiency of various speech processing applications, including improved privacy-preserving techniques and more natural-sounding speech synthesis.

Papers

April 24, 2022

April 22, 2022

Unifying Cosine and PLDA Back-ends for Speaker Verification
Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan
Speaker Verification Speaker Embeddings Cosine Similarity Probabilistic Linear Discriminant Analysis

April 21, 2022

The NIST CTS Speaker Recognition Challenge
Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason, Douglas Reynolds
Speech Recognition Speaker Embeddings

April 11, 2022

April 8, 2022

Self-supervised Speaker Diarization
Yehoshua Dissen, Felix Kreuk, Joseph Keshet
Speaker Verification Speaker Diarization Speaker Embeddings Speaker Representation Self Supervised Speaker Verification

April 4, 2022

April 3, 2022

Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng
Fine Grained Speaker Embeddings Speech Encoder Speaker Similarity Text to Speech Synthesis Zero Shot Speaker Adaptation

March 31, 2022

Improved Relation Networks for End-to-End Speaker Verification and Identification
Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose
End to End Speaker Verification Person Identification Speaker Embeddings Speaker Identification RelAtion Network Unseen Speaker

March 29, 2022

VoiceMe: Personalized voice generation in TTS
Pol van Rijn, Silvan Mertes, Dominik Schiller, Piotr Dura, Hubert Siuzdak, Peter M. C. Harrison, Elisabeth André, Nori Jacoby
Speaker Verification Speaker Embeddings Speaker Information Personalized Speech

March 23, 2022

A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization
Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno
Training Data Fine Tuning Speech Recognition Scientific Inference Speaker Embeddings

March 18, 2022

March 17, 2022

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Junhai Xu, Lin Zhang, Yantao Ji, Jianwu Dang
Speaker Verification Speaker Embeddings Time Scale Dual Branch Speech Driven Discriminative Reply

February 28, 2022

Magnitude-aware Probabilistic Speaker Embeddings
Nikita Kuzmin, Igor Fedorov, Alexey Sholokhov
Speech Recognition Speaker Verification Speaker Embeddings Hyperspherical Learning Euclidean Embeddings

February 14, 2022

Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model
Keisuke Kinoshita, Marc Delcroix, Tomoharu Iwata
Speaker Diarization Speaker Embeddings Mixture Model Deep Unfolding Diarization System Coupling Mechanism Diarization Error Rate

February 11, 2022

The xmuspeech system for multi-channel multi-party meeting transcription challenge
Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li1, Shipeng Xia, Jiayang Zhang, Lin Li1, Qingyang Hong, Feng Tong
Sequence to Sequence Speaker Diarization Speaker Embeddings Multi Channel Multi Party Meeting

December 16, 2021

Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification
Sung Hwan Mun, Min Hyun Han, Dongjune Lee, Jihwan Kim, Nam Soo Kim
Speaker Embeddings Speaker Representation Equilibrium State Self Supervised Speaker Verification

Speaker Embeddings

Papers

An Item Response Theory Framework for Persuasion

Dictionary Attacks on Speaker Verification

Unifying Cosine and PLDA Back-ends for Speaker Verification

The NIST CTS Speaker Recognition Challenge

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

Self-supervised Speaker Diarization

Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis

Improved Relation Networks for End-to-End Speaker Verification and Identification

VoiceMe: Personalized voice generation in TTS

A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

DGC-vector: A new speaker embedding for zero-shot voice conversion

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

Magnitude-aware Probabilistic Speaker Embeddings

Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model

The xmuspeech system for multi-channel multi-party meeting transcription challenge

Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification