Speech to Speech Translation
Speech-to-speech translation (S2ST) aims to directly convert spoken language from one language to another, bypassing intermediate text transcription. Current research emphasizes improving model efficiency and quality through techniques like non-autoregressive architectures, diffusion models, and the integration of large language models, often focusing on handling low-resource languages and preserving speaker characteristics. These advancements are significant for bridging communication barriers across languages, particularly for those lacking written forms, and have implications for various applications, including real-time interpretation and multilingual voice assistants.
Papers
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee
Prosodic Alignment for off-screen automatic dubbing
Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote