Cross Lingual Cross Modal Retrieval

Cross-lingual cross-modal retrieval (CCR) aims to retrieve images or videos relevant to text queries in multiple languages, a crucial step towards truly multilingual information access. Current research focuses on improving the alignment of visual and textual representations, often leveraging large language models (LLMs) and contrastive learning techniques to overcome the challenges posed by noisy translations and the inherent semantic gap between modalities. These advancements are driven by the need for robust and efficient multilingual search and information retrieval systems, impacting fields like web search, multimedia indexing, and cross-cultural communication.

Papers