Self Supervised Speech Representation

Self-supervised speech representation learning aims to create powerful speech embeddings from vast amounts of unlabeled audio data, improving downstream tasks like speech recognition and enhancement without relying heavily on transcribed data. Current research focuses on refining model architectures like Wav2Vec 2.0, HuBERT, and XLSR, investigating the properties of these representations (e.g., orthogonality of speaker and phonetic information), and addressing biases in performance across different language varieties. This field is significant because it enables advancements in speech technology for low-resource languages and diverse speaker populations, while also providing insights into the fundamental nature of speech representation itself.

Papers

November 30, 2022

EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur
Self Supervised Speech Representation Wav2vec U Unsupervised Automatic Speech Recognition

November 16, 2022

L2 proficiency assessment using self-supervised speech representations
Stefano Bannò, Kate M. Knill, Marco Matassoni, Vyas Raina, Mark J. F. Gales
Speech Recognition Speech Recognition System Self Supervised Speech Representation Speech Transcription Spoken Language Assessment Language Test

November 14, 2022

Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations
Renee Lu, Mostafa Shahin, Beena Ahmed
Speech Recognition Self Supervised Speech Representation Child Speech Non Native Multi Domain Corpus

November 12, 2022

Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement
Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk
Native Robustness Data Augmentation Robust Representation Self Supervised Speech Representation Noisy Environment Multi Task Fusion Edge Speech Application

November 2, 2022

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin
Speech Representation Self Supervised Speech Representation Planning Loss Speech Supervised Learning Model Blessing Generation Device Automatic Speech Recognition Multilingual Application

October 27, 2022

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran
Speech to Text Self Supervised Speech Representation High Quality Speech Speech Text Multilingual Tt

October 16, 2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee
Self Supervised Learning Strong Generalization High Efficiency Challenge Task Speaker Recognition Self Supervised Speech Representation Speech Processing Task Self Supervised Speech Representation Learning

July 10, 2022

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion
Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Tomoki Toda
Comparative Study Voice Conversion Self Supervised Speech Representation Phonetic PosteriorGrams

June 24, 2022

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos Cernak
Large Scale Audio Representation Self Supervised Speech Representation Acoustic Scene Classification Speech Feature Bootstrapping End to End

May 21, 2022

Self-Supervised Speech Representation Learning: A Review
Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
Automatic Speech Recognition Speech Recognition Narrative Review Speech Representation Self Supervised Speech Representation Self Supervised Speech Representation Learning

April 27, 2022

Unsupervised Word Segmentation using K Nearest Neighbors
Tzeviya Sylvia Fuchs, Yedid Hoshen, Joseph Keshet
Nearest Neighbor Self Supervised Speech Representation Speech Utterance Word Segmentation Syllable Discovery

April 20, 2022

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang
Speech Representation Disentangled Representation Self Supervised Speech Representation Speech Representation Disentanglement

April 4, 2022

March 31, 2022

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations
Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
Self Supervised Speech Representation Self Supervised Speech Representation Learning Based Pruning Magnitude Based Pruning

March 1, 2022

Towards a Common Speech Analysis Engine
Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory
Speech Processing Self Supervised Speech Representation Voice Trigger

December 14, 2021

On the Use of External Data for Spoken Named Entity Recognition
Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han
Speech Recognition Entity Recognition Greater Public Use Named Entity Recognition Spoken Language Understanding Self Supervised Speech Representation Data Source

Self Supervised Speech Representation

Papers

EURO: ESPnet Unsupervised ASR Open-source Toolkit

L2 proficiency assessment using self-supervised speech representations

Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations

Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

Self-Supervised Speech Representation Learning: A Review

Unsupervised Word Segmentation using K Nearest Neighbors

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices

Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations

Towards a Common Speech Analysis Engine

On the Use of External Data for Spoken Named Entity Recognition