Cross Lingual Cross Modal

Cross-lingual cross-modal research focuses on developing AI models that understand and process information across different languages and modalities (e.g., text and images, or text and speech). Current efforts concentrate on large-scale pre-training of encoder-decoder and contrastive learning models using multilingual and multimodal datasets, often leveraging techniques like masked language modeling and contrastive learning to align representations across languages and modalities. This work is significant because it enables zero-shot and few-shot cross-lingual and cross-modal transfer learning, improving performance on tasks like sign language translation, visual question answering, and multilingual speech translation, and potentially bridging communication barriers across languages and sensory inputs.

Papers