Speech Translation
Speech translation (ST) aims to automatically convert spoken language in one language into written or spoken text in another, bridging communication barriers. Current research heavily utilizes large language models (LLMs) integrated with speech foundation models (SFMs), often employing techniques like chain-of-thought prompting and multimodal approaches to improve accuracy and reduce latency, particularly in simultaneous ST. These advancements are significant for improving cross-lingual communication in various applications, from real-time interpretation to accessibility tools, and are driving innovation in both model architectures and evaluation methodologies.
Papers
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana, Joseph Liu, Eloi DuBois, Dao Le, Nicolas Thiebaut, Colin Sinclair, Kyle Spence, Charles Shang, Zoe Abrams, Morgan McGuire
Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation
Xiaoman Wang, Claudio Fantinuoli
Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
Keqi Deng, Philip C. Woodland
Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation
Matthias Sperber, Ondřej Bojar, Barry Haddow, Dávid Javorský, Xutai Ma, Matteo Negri, Jan Niehues, Peter Polák, Elizabeth Salesky, Katsuhito Sudoh, Marco Turchi