Zero Shot Speech
Zero-shot speech translation aims to translate speech from one language to text in another without using any paired speech-text training data for that specific language pair. Current research focuses on bridging the "modality gap" between speech and text using techniques like multilingual training, shared embedding spaces (often fixed-size representations), and discrete cross-modal alignment to map speech and text into a common semantic space. These advancements leverage existing large language models and automatic speech recognition data to achieve surprisingly strong performance, even rivaling supervised methods in some cases, opening up possibilities for more efficient and broadly applicable speech translation systems.
Papers
February 16, 2024
October 5, 2023
August 22, 2023
June 22, 2023
March 7, 2023
October 18, 2022