Speech Translation
Speech translation (ST) aims to automatically convert spoken language in one language into written or spoken text in another, bridging communication barriers. Current research heavily utilizes large language models (LLMs) integrated with speech foundation models (SFMs), often employing techniques like chain-of-thought prompting and multimodal approaches to improve accuracy and reduce latency, particularly in simultaneous ST. These advancements are significant for improving cross-lingual communication in various applications, from real-time interpretation to accessibility tools, and are driving innovation in both model architectures and evaluation methodologies.
Papers
Translatotron 3: Speech to Speech Translation with Monolingual Data
Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich
CTC-based Non-autoregressive Speech Translation
Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Wei-Qiang Zhang
Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation
Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Zhen Li