Speech Translation Task
Speech translation aims to directly convert spoken language in one language to spoken language in another, bridging communication barriers. Current research focuses on improving end-to-end models, which directly translate speech without intermediate text transcription, alongside cascaded systems combining automatic speech recognition, machine translation, and text-to-speech. Key challenges include handling noisy audio, limited training data for low-resource languages, and maintaining naturalness and speaker characteristics in the output speech. Advances in this field have significant implications for global communication, particularly in healthcare and multilingual settings.
Papers
Translatotron 3: Speech to Speech Translation with Monolingual Data
Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich
Bridging the Granularity Gap for Acoustic Modeling
Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, JingBo Zhu