Pre Trained Speech Model
Pre-trained speech models leverage large datasets to learn robust representations of speech, enabling efficient adaptation to various downstream tasks like speech recognition, emotion recognition, and speaker verification. Current research emphasizes improving these models' efficiency and robustness, focusing on techniques like adapter tuning, prompt engineering, and innovative training strategies such as incorporating textual data or brain activations to refine representations. This work is significant because it reduces the reliance on extensive labeled data, improves performance on low-resource languages and challenging acoustic conditions, and facilitates the development of more versatile and accurate speech processing systems across diverse applications.
Papers
Refining Self-Supervised Learnt Speech Representation using Brain Activations
Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Liping Chen, Jie Zhang, Zhenhua Ling
Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection
Zihan Pan, Tianchi Liu, Hardik B. Sailor, Qiongqiong Wang