Universal Speech Representation
Universal speech representation aims to create a common, language-independent encoding of speech audio, enabling models to generalize across diverse languages and tasks. Current research focuses on developing robust and efficient models, often leveraging self-supervised learning and transformer architectures like those inspired by Whisper, to achieve this representation, with a particular emphasis on improving generalizability across different datasets and noise conditions, and reducing model size for practical deployment. This work has significant implications for various applications, including improved speech recognition in low-resource languages, enhanced deepfake detection, and more accurate health diagnostics from speech analysis.
Papers
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang