Speaker Information

Speaker information extraction and utilization are central to advancing speech processing, aiming to identify and isolate individual speakers within audio recordings, regardless of background noise or overlapping speech. Current research focuses on developing robust models, often employing transformer-based architectures and techniques like prompt learning, to achieve this, particularly in challenging scenarios with multiple speakers or low-resource languages. These advancements have significant implications for applications such as meeting transcription, voice assistants, and personalized speech technologies, improving accessibility and enhancing user experience.

Papers

May 22, 2023

Target Active Speaker Detection with Audio-visual Cues
Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li
Audio Visual Speaker Information Active Speaker Detection Target Speaker Voice Activity Detection Audio Visual Cue Ava ActiveSpeaker

March 13, 2023

Analysing the Masked predictive coding training criterion for pre-training a Speech Representation Model
Hemant Yadav, Sunayana Sitaram, Rajiv Ratn Shah
Pre Training Speech Representation Speaker Information Learned Model

March 2, 2023

Speaker-Aware Anti-Spoofing
Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen
Speaker Information Speaker Independent Spoofing Aware Speaker Verification Speaker Conditioning

February 20, 2023

Personalized speech enhancement combining band-split RNN and speaker attentive module
Xiaohuai Le, Li Chen, Chao He, Yiqing Guo, Cheng Chen, Xianjun Xia, Jing Lu
Speech Enhancement Attention Module Speaker Information Speech Driven Speech Enhancement Model Personalized Speech Enhancement Signal Processing Grand Challenge

January 18, 2023

KILDST: Effective Knowledge-Integrated Learning for Dialogue State Tracking using Gazetteer and Speaker Information
Hyungtak Choi, Hyeonmok Ko, Gurpreet Kaur, Lohith Ravuru, Kiranmayi Gandikota, Manisha Jhawar, Simma Dharani, Pranamya Patil
Dialogue System Conversational AI Dialogue State Tracking Speaker Information Dialogue State Knowledge Infused Learning Geonames Gazetteer

October 20, 2022

Large-scale learning of generalised representations for speaker recognition
Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesong Lee, Hye-jin Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe
Speaker Recognition Speaker Information Generalizable Representation Large Scale Learning Ecapa TDNN Speaker Recognition Model High Quality Training Datasets

June 28, 2022

Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Ahmad Aloradi, Wolfgang Mack, Mohamed Elminshawi, Emanuël A. P. Habets
Speaker Verification Temporal Feature Multi Speaker Speaker Information

June 20, 2022

COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection
Andreas Triantafyllopoulos, Anastasia Semertzidou, Meishu Song, Florian B. Pokorny, Björn W. Schuller
Covid 19 Speaker Information TikTok Video 19 Dataset Infection Risk

June 14, 2022

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction
Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Xin Jing, Björn W. Schuller
Encoder Side Emotion Prediction Speaker Information Encoder Model Emotion Encoder Shot Personalization

June 6, 2022

Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors
Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, Yohei Kawaguchi
World Event Speaker Diarization Speaker Information Neural Diarization Unknown Number Different Attractor Speaker Attractor

May 17, 2022

Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay
Arash Shahmansoori, Utz Roedig
Speaker Recognition Contrastive Example Speaker Information Speaker Similarity RePLAy Loss Consent Management

April 26, 2022

You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas
Haoran Li, Yangqiu Song, Lixin Fan
Chatbot Response Private Data Model Inversion Attack Color Object Speaker Information Dialogue Representation Social Chatbots

March 31, 2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers
Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu
End to End Speaker Diarization Speech Separation Speaker Information Neural Diarization End to End Neural Diarization Unknown Number

March 30, 2022

March 29, 2022

VoiceMe: Personalized voice generation in TTS
Pol van Rijn, Silvan Mertes, Dominik Schiller, Piotr Dura, Hubert Siuzdak, Peter M. C. Harrison, Elisabeth André, Nori Jacoby
Speaker Verification Speaker Embeddings Speaker Information Personalized Speech

March 16, 2022

Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching
Alissa Ostapenko, Shuly Wintner, Melinda Fricke, Yulia Tsvetkov
Natural Language Processing Full Model Case Study Inductive Bias Linguistic Information Speaker Information Code Switching

January 15, 2022

KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics
Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol
Text to Speech Speaker Information Significant Topic Tt System

November 28, 2021

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information
Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei
Text Modality Speaker Embeddings Speaker Information Neural Diarization Diarization Error Rate Speaker Label Unknown Number

November 7, 2021

Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition
Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia Tomashenko, Yannick Estève
Speech Recognition Speaker Verification Speaker Identity Acoustic Model Speaker Information

Speaker Information

Papers

Target Active Speaker Detection with Audio-visual Cues

Analysing the Masked predictive coding training criterion for pre-training a Speech Representation Model

Speaker-Aware Anti-Spoofing

Personalized speech enhancement combining band-split RNN and speaker attentive module

KILDST: Effective Knowledge-Integrated Learning for Dialogue State Tracking using Gazetteer and Speaker Information

Large-scale learning of generalised representations for speaker recognition

Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion

COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors

Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay

You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

Probing phoneme, language and speaker information in unsupervised speech representations

Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

VoiceMe: Personalized voice generation in TTS

Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching

KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition