Speech Translation

Speech translation (ST) aims to automatically convert spoken language in one language into written or spoken text in another, bridging communication barriers. Current research heavily utilizes large language models (LLMs) integrated with speech foundation models (SFMs), often employing techniques like chain-of-thought prompting and multimodal approaches to improve accuracy and reduce latency, particularly in simultaneous ST. These advancements are significant for improving cross-lingual communication in various applications, from real-time interpretation to accessibility tools, and are driving innovation in both model architectures and evaluation methodologies.

Papers