Speech to Speech Translation
Speech-to-speech translation (S2ST) aims to directly convert spoken language from one language to another, bypassing intermediate text transcription. Current research emphasizes improving model efficiency and quality through techniques like non-autoregressive architectures, diffusion models, and the integration of large language models, often focusing on handling low-resource languages and preserving speaker characteristics. These advancements are significant for bridging communication barriers across languages, particularly for those lacking written forms, and have implications for various applications, including real-time interpretation and multilingual voice assistants.
Papers
Task Arithmetic for Language Expansion in Speech Translation
Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
Hsi-Che Lin, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang