Self Supervised Speech Representation
Self-supervised speech representation learning aims to create powerful speech embeddings from vast amounts of unlabeled audio data, improving downstream tasks like speech recognition and enhancement without relying heavily on transcribed data. Current research focuses on refining model architectures like Wav2Vec 2.0, HuBERT, and XLSR, investigating the properties of these representations (e.g., orthogonality of speaker and phonetic information), and addressing biases in performance across different language varieties. This field is significant because it enables advancements in speech technology for low-resource languages and diverse speaker populations, while also providing insights into the fundamental nature of speech representation itself.
Papers
Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices
Abner Hernandez, Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang