Speech Processing
Speech processing research aims to enable computers to understand, interpret, and generate human speech, focusing on tasks like speech recognition, synthesis, and enhancement. Current efforts concentrate on improving model efficiency (e.g., using linear-complexity attention mechanisms) and robustness across diverse languages and acoustic conditions, often leveraging large language models and self-supervised learning techniques. These advancements are crucial for broader accessibility of speech technology, impacting fields ranging from healthcare (e.g., depression screening) to assistive technologies and improving human-computer interaction.
Papers
Do learned speech symbols follow Zipf's law?
Shinnosuke Takamichi, Hiroki Maeda, Joonyong Park, Daisuke Saito, Hiroshi Saruwatari
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks
Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li