Speech Encoder
Speech encoders are crucial components in many speech processing systems, aiming to convert raw audio into meaningful representations for downstream tasks like speech recognition, translation, and synthesis. Current research focuses on improving encoder robustness to noise and variations in speaking style, often employing transformer-based architectures and self-supervised learning techniques to achieve better generalization and efficiency. These advancements are driving progress in various applications, including more accurate and natural-sounding speech technologies and improved spoken language understanding in diverse and low-resource settings.
Papers
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang, James Glass
CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation
Yimin Deng, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Direct Text to Speech Translation System using Acoustic Units
Victoria Mingote, Pablo Gimeno, Luis Vicente, Sameer Khurana, Antoine Laurent, Jarod Duret
PromptASR for contextualized ASR with controllable style
Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey