Speech Representation

Speech representation research focuses on creating effective numerical encodings of spoken language, aiming to capture both linguistic content and speaker-specific characteristics for various downstream tasks like speech recognition and voice conversion. Current research heavily utilizes transformer-based architectures and self-supervised learning methods, exploring techniques like masked prediction and contrastive learning to learn robust representations from large, unlabeled datasets. These advancements are driving improvements in efficiency and accuracy across numerous applications, including automatic speech recognition, speaker identification, and speech synthesis, while also revealing insights into the internal workings of these complex models. Furthermore, efforts are underway to improve the disentanglement of content and speaker information within these representations, leading to more robust and versatile models.

Papers

June 9, 2024

MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Hemant Yadav, Sunayana Sitaram, Rajiv Ratn Shah
Automatic Speech Recognition Pre Trained Speech Representation Masked Language Librispeech Speech Recognition Self Supervised Pre Training Method Robust Speaker Representation Mismatch Classification

June 8, 2024

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
Tzu-Quan Lin, Hung-yi Lee, Hao Tang
Speech Representation Self Supervised Speech Model Early eXit Early Exit Self Supervised Loss Data Adaptation

June 4, 2024

Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models
Victor Miara, Theo Lepage, Reda Dehak
Self Supervised Learning Automatic Speech Recognition Speaker Verification Speech Representation Speaker Representation Self Supervised Speaker Verification

May 30, 2024

Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting
Ihab Asaad, Maxime Jacquelin, Olivier Perrotin, Laurent Girin, Thomas Hueber
Speech Analysis Speech Representation Self Supervised Representation Learning Speech Signal Speaker Recognition Neural Vocoder Speech Supervised Learning Model Neural Audio Synthesis

April 2, 2024

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition
Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic
Speech Recognition Speech Representation Self Supervised Method ASR Model Learning Brave

April 1, 2024

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
Injune Hwang, Kyogu Lee
Speech Synthesis Speech Representation Pooling Layer Speaker Information Speaker Recognition Task Event Representation Learning

March 13, 2024

An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk
End to End Multi Task Learning Speech Representation Self Supervised Speech Representation Learning Distillation Method Noise Representation Novel Knowledge Distillation

February 16, 2024

Pushing the Limits of Zero-shot End-to-End Speech Translation
Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà
Continuum Limit Speech Representation Multilingual Machine Translation Speech Encoder End to End Speech Translation Modality Gap Zero Shot Speech

February 10, 2024

CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition
Ioannis Ziogas, Hessa Alfalahi, Ahsan H. Khandoker, Leontios J. Hadjileontiadis
Contrastive Learning Speech Recognition Speech Emotion Recognition Speech Representation Patch Masking Noise Masking Cochlear Model Bio Inspired Cochlear Cepstrogram

January 31, 2024

January 16, 2024

January 10, 2024

Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording
Bo Wang, Xiran Xu, Zechen Zhang, Haolin Zhu, YuJie Yan, Xihong Wu, Jing Chen
Context Information Speech Representation Self Supervised Speech Representation Electroencephalography Recording EEG Signal Analysis Electroencephalogram Data Mismatch Classification

January 7, 2024

Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Xi Chen, Jiakun Pei, Liumeng Xue, Mingyang Zhang
Text to Speech Speech Representation Linguistic Representation Non Parallel Accent Conversion Accent Adaptation

December 20, 2023

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
Self Supervised Learning Automatic Speech Recognition Supervised Learning Domain Knowledge Speech Representation Continual Pre Training Domain Keywords Target Domain Adaptation

December 15, 2023

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Hendrik Laux, Emil Mededovic, Ahmed Hallawa, Lukas Martin, Arne Peine, Anke Schmeink
Automatic Speech Recognition Speech Representation Unlabeled Data Visual Speech Recognition

December 6, 2023

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi
Self Supervised Learning Low Resource Language Speech Representation Speech Processing Speech Model Chinese Language

November 15, 2023

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang, James Glass
Speech Representation Speech Encoder Speech Segment Acoustic Unit Noise Representation

November 14, 2023

Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion
Anders R. Bargum, Stefania Serafin, Cumhur Erkut
Speech Analysis Voice Conversion Speech Representation Scoping Review

Speech Representation

Papers

MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models

Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models

Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling

An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

Pushing the Limits of Zero-shot End-to-End Speech Translation

CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording

Transfer the linguistic representations from TTS to accent conversion with non-parallel data

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion