Cross Modal Retrieval
Cross-modal retrieval aims to find relevant items across different data types (e.g., images and text, audio and video) by learning shared representations that capture semantic similarities. Current research focuses on improving retrieval accuracy in the face of noisy data, mismatched pairs, and the "modality gap" using techniques like contrastive learning, masked autoencoders, and optimal transport. These advancements are crucial for applications ranging from medical image analysis and robotics to multimedia search and music recommendation, enabling more effective information access and integration across diverse data sources.
Papers
Multi-Modal Generative Embedding Model
Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun
CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval
Xintong Jiang, Yaxiong Wang, Mengjian Li, Yujiao Wu, Bingwen Hu, Xueming Qian