Speech Translation Benchmark

Speech translation benchmarks evaluate the performance of systems that translate spoken language across different languages. Current research focuses on improving end-to-end models, often leveraging large language models and techniques like data augmentation and knowledge distillation to address data scarcity and improve cross-lingual transfer learning. These advancements utilize various architectures, including transformer-based encoder-decoder models and cascaded systems, aiming to enhance translation accuracy, particularly in challenging scenarios like accented speech or specialized terminology. Improved speech translation technology has significant implications for cross-cultural communication, accessibility, and multilingual information processing.

Papers

October 3, 2023

Tuning Large language model for End-to-end Speech Translation
Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Xiaolin Jiao
Large Language Model Large Multimodal Model Speech Translation End to End Speech Translation Modal Translation Speech Translation Benchmark

June 8, 2023

KIT's Multilingual Speech Translation System for IWSLT 2023
Danni Liu, Thai Binh Nguyen, Sai Koneru, Enes Yavuz Ugan, Ngoc-Quan Pham, Tuan-Nam Nguyen, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues
Speech Translation Field Kit Speech Translation Benchmark

June 1, 2023

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass
Cross Lingual Transfer Speech Translation Cross Lingual Transfer Learning Multilingual Encoders Cross Lingual Knowledge Transfer Speech Translation Benchmark

December 7, 2022

M3ST: Mix at Three Levels for Speech Translation
Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou
Speech Translation Training Corpus Machine Translation System Multiple Level End Speech to Text Translation Speech Translation Benchmark

October 5, 2022

JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Mayumi Ohta, Julia Kreutzer, Stefan Riezler
Speech Translation Speech to Text NMT System Speech Translation Benchmark

March 20, 2022

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation
Qingkai Fang, Rong Ye, Lei Li, Yang Feng, Mingxuan Wang
Speech Representation Speech Translation Multimodal Sequence End Speech to Text Translation Speech Translation Benchmark Speech Text Manifold Mixup

November 17, 2021

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
Speech Recognition Visual Analogue Scale Cross Lingual Speech Translation Benchmark Cross Lingual Speech Representation