Speech Processing Task
Speech processing research focuses on developing efficient and accurate methods for automatically understanding and manipulating spoken language. Current efforts concentrate on improving model architectures like Transformers and Conformers, leveraging self-supervised learning techniques (e.g., HuBERT, WavLM), and exploring innovative approaches such as prompting and parameter-efficient fine-tuning (e.g., adapters) to enhance performance across diverse tasks (speech recognition, speaker verification, emotion recognition, etc.). These advancements are driving progress in various applications, including virtual assistants, healthcare diagnostics, and multilingual communication technologies, by enabling more robust and resource-efficient systems.
Papers
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang, Taejin Park, Kunal Dhawan, Ivan Medennikov, Krishna C. Puvvada, Nithin Rao Koluguri, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee