Speech Translation
Speech translation (ST) aims to automatically convert spoken language in one language into written or spoken text in another, bridging communication barriers. Current research heavily utilizes large language models (LLMs) integrated with speech foundation models (SFMs), often employing techniques like chain-of-thought prompting and multimodal approaches to improve accuracy and reduce latency, particularly in simultaneous ST. These advancements are significant for improving cross-lingual communication in various applications, from real-time interpretation to accessibility tools, and are driving innovation in both model architectures and evaluation methodologies.
Papers
Simple and Effective Unsupervised Speech Translation
Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino
Simultaneous Translation for Unsegmented Input: A Sliding Window Approach
Sukanta Sen, Ondřej Bojar, Barry Haddow
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Chen Wang, Yuchen Liu, Boxing Chen, Jiajun Zhang, Wei Luo, Zhongqiang Huang, Chengqing Zong