Direct Speech to Speech Translation

Direct speech-to-speech translation (S2ST) aims to translate spoken language from one language to another without intermediate text, offering faster and more natural-sounding translations than cascaded approaches. Current research focuses on improving model efficiency and accuracy through techniques like non-autoregressive architectures, pre-training with diverse data (including monolingual and audio-visual data), and the use of discrete speech units. These advancements are significant for bridging language barriers, particularly in low-resource settings, and have implications for applications such as real-time interpretation, subtitling, and voice-assisted technologies.

Papers

October 26, 2022

Improving Speech-to-Speech Translation Through Unlabeled Text
Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov, Hongyu Gong
Machine Translation Speech Recognition Text to Speech Unlabeled Speech Direct Speech to Speech Translation

October 21, 2022

September 27, 2022

Direct Speech Translation for Automatic Subtitling
Sara Papi, Marco Gaido, Alina Karakanta, Mauro Cettolo, Matteo Negri, Marco Turchi
Audio Visual Language Pair Direct Speech to Speech Translation Well Formed Subtitle

May 25, 2022

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, Zhou Zhao
Speech Representation Speech Translation Direct Speech to Speech Translation Speech Reconstruction Audio Modality

May 18, 2022

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang
Language Pair Pseudo Labeled Data Direct Speech to Speech Translation Transformer Baseline Speech Mapping

May 14, 2022

Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
Gerard Sant, Gerard I. Gállego, Belen Alastruey, Marta R. Costa-Jussà
Self Attention Attention Mechanism Transformer Based Model Attention Pattern Direct Speech to Speech Translation Configurable Software System

April 19, 2022

On the Locality of Attention in Direct Speech Translation
Belen Alastruey, Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà
Self Attention Human Attention Attention Weight Direct Speech to Speech Translation Speech Domain Self Attention Mechanism

April 6, 2022

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee
Data Augmentation Self Supervised Speech Data Speech to Speech Translation Direct Speech to Speech Translation Speech to Unit

March 24, 2022

Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation
Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang, Alexis Conneau, Nobuyuki Morioka
Training Data Speech Representation Weakly Supervised Speech to Speech Translation Direct Speech to Speech Translation Direct S2ST