Unseen Language

Unseen language research focuses on enabling machine learning models to process and generate text and speech in languages not included in their training data. Current efforts concentrate on adapting existing multilingual models, often leveraging transformer networks and diffusion models, through techniques like zero-shot learning, few-shot learning, and in-context learning with linguistic resources such as dictionaries and grammars. This work is crucial for bridging the digital divide, preserving linguistic diversity, and advancing applications like machine translation, speech recognition, and text-to-speech in low-resource and endangered languages.

Papers