Speech Translation

Speech translation (ST) aims to automatically convert spoken language in one language into written or spoken text in another, bridging communication barriers. Current research heavily utilizes large language models (LLMs) integrated with speech foundation models (SFMs), often employing techniques like chain-of-thought prompting and multimodal approaches to improve accuracy and reduce latency, particularly in simultaneous ST. These advancements are significant for improving cross-lingual communication in various applications, from real-time interpretation to accessibility tools, and are driving innovation in both model architectures and evaluation methodologies.

Papers

June 26, 2024

Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation
Rastislav Rabatin, Frank Seide, Ernie Chang
Machine Translation Speech Translation Beam Search Simultaneous Machine Translation

June 25, 2024

Leveraging Synthetic Audio Data for End-to-End Low-Resource Speech Translation
Yasmin Moslem
Speech Translation Speech to Speech Translation Noise Augmentation

June 24, 2024

Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024
Sai Koneru, Thai-Binh Nguyen, Ngoc-Quan Pham, Danni Liu, Zhaolin Li, Alexander Waibel, Jan Niehues
Large Language Model Automatic Speech Recognition Machine Translation Speech Translation End to End Speech Translation Field Kit Offline Speech Translation

June 16, 2024

CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar, Preethi Jyothi, Pushpak Bhattacharyya
Speech Translation Text to Speech Model Code Switched Speech Text Alignment

June 15, 2024

Lightweight Audio Segmentation for Long-form Speech Translation
Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung
Speech Translation Speech Segmentation Audio Segmentation

June 14, 2024

June 11, 2024

Translating speech with just images
Dan Oneata, Herman Kamper
Speech Analysis Low Resource Language Speech Translation Speech Model Digital Image

June 10, 2024

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli
Speech Translation Link Stream Latency Metric

June 6, 2024

June 5, 2024

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng
Multi Task Learning Speech Synthesis Speech Translation Speech Enhancement Module

June 4, 2024

SimulTron: On-Device Simultaneous Speech to Speech Translation
Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich
Speech Translation Device Use Case Direct S2ST

May 28, 2024

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng
Speech Analysis Speech Translation Human VOICE Speech to Speech Translation End to End Model Source Speech

May 21, 2024

MELD-ST: An Emotion-aware Speech Translation Dataset
Sirou Chen, Sakiko Yahata, Shuichiro Shimizu, Zhengdong Yang, Yihang Li, Chenhui Chu, Sadao Kurohashi
Speech Translation Speech Emotion Emotion Label

April 30, 2024

Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation
Eyal Liron Dolev, Clemens Fidel Lutz, Noëmi Aepli
Automatic Speech Recognition Speech Translation Human Evaluation Automatic Speech Recognition System State of the Art Whisper Parliamentary Corpus

March 25, 2024

Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
Shannon Wotherspoon, William Hartmann, Matthew Snover
Machine Translation Large Corpus Speech Translation Chinese Character End to End Speech Translation Source Speech Mandarin Speech

February 25, 2024

Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur, L. Andrew M. Bush, Weisong Shi
Speech Translation Speech to Speech Translation Direct Speech to Speech Translation Speech to Unit Indic Script

February 19, 2024

Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli
Natural Language Processing Speech Translation Speech Foundation Model

February 2, 2024

A Case Study on Filtering for End-to-End Speech Translation
Md Mahfuz Ibn Alam, Antonios Anastasopoulos
Case Study Large Corpus Speech Translation Parallel Corpus Speech to Text Speech to Speech Translation End to End Speech Translation Online Filtering

Speech Translation

Papers

Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation

Leveraging Synthetic Audio Data for End-to-End Low-Resource Speech Translation

Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024

CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving

Lightweight Audio Segmentation for Long-form Speech Translation

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation

Translating speech with just images

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection

Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation

Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

SimulTron: On-Device Simultaneous Speech to Speech Translation

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

MELD-ST: An Emotion-aware Speech Translation Dataset

Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation

Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech

Direct Punjabi to English speech translation using discrete units

Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

A Case Study on Filtering for End-to-End Speech Translation