Wav2vec U

Wav2vec U, and its successor models like Wav2vec 2.0, are self-supervised learning frameworks designed to generate robust speech representations from raw audio data. Current research focuses on applying these pre-trained models to diverse downstream tasks, including speech recognition, emotion recognition, and voice disorder diagnosis, often leveraging transfer learning and multi-task learning architectures to improve performance, particularly in low-resource scenarios. This approach offers significant advantages in handling noisy or limited datasets, leading to improved accuracy and efficiency across a wide range of speech-related applications in healthcare, language technology, and beyond. The resulting high-quality speech embeddings are proving valuable for various applications, demonstrating the power of self-supervised learning in speech processing.

Papers

June 27, 2022

Wav2Vec-Aug: Improved self-supervised training with limited data
Anuroop Sriram, Michael Auli, Alexei Baevski
Data Augmentation Self Supervised Speech Representation Limited Data Wav2vec U Self Supervised Training Word Error Rate

April 11, 2022

Fusion of Self-supervised Learned Models for MOS Prediction
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao
Automatic Speech Recognition Hybrid Fusion Synthesized Speech Self Supervised Model Wav2vec U Mean Opinion Score Prediction Challenge

April 6, 2022

A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition
Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Dan Bigioi, Peter Corcoran, Horia Cucu
Wav2vec U Self Supervised Learning Method Child Speech Child Speech Recognition

April 5, 2022

Towards End-to-end Unsupervised Speech Recognition
Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Automatic Speech Recognition Audio Processing Wav2vec U Unsupervised Automatic Speech Recognition

April 4, 2022

Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang
Speech Representation Self Supervised Speech Representation Wav2vec U Dysarthric Speech Dysarthric Speech Recognition Hypokinetic Dysarthria

April 2, 2022

Speaker adaptation for Wav2vec2 based dysarthric ASR
Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Diez, Tim Polzehl, Lukáš Burget, Jan "Honza'' Černocký
Speaker Adaptation Wav2vec U Dysarthric Speech Recognition

March 31, 2022

WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak, Piotr Dura, Pol van Rijn, Nori Jacoby
Speech Corpus Wav2vec U Neural Speech Synthesis Latent Speech Intermediate Speech Representation

March 28, 2022

Training speaker recognition systems with limited data
Nik Vaessen, David A. van Leeuwen
Automatic Speech Recognition Limited Data Speaker Recognition Wav2vec U Speaker Recognition System Voxceleb2 Dataset

March 24, 2022

Automatic Speech Recognition for Speech Assessment of Persian Preschool Children
Amirhossein Abaskohi, Fatemeh Mortazavi, Hadi Moradi
Automatic Speech Recognition Wav2vec U Spoken Language Assessment Common Voice

March 3, 2022

The Vicomtech Audio Deepfake Detection System based on Wav2Vec2 for the 2022 ADD Challenge
Juan M. Martín-Doñas, Aitor Álvarez
Speech Representation Deep Fake Open Challenge Wav2vec U Audio Codec

Wav2vec U

Papers

Wav2Vec-Aug: Improved self-supervised training with limited data

Fusion of Self-supervised Learned Models for MOS Prediction

A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition

Towards End-to-end Unsupervised Speech Recognition

Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

Speaker adaptation for Wav2vec2 based dysarthric ASR

WavThruVec: Latent speech representation as intermediate features for neural speech synthesis

Training speaker recognition systems with limited data

Automatic Speech Recognition for Speech Assessment of Persian Preschool Children

The Vicomtech Audio Deepfake Detection System based on Wav2Vec2 for the 2022 ADD Challenge