Cross Modal Information Retrieval
Cross-modal information retrieval aims to find semantically equivalent information across different data types, such as images and text, enabling powerful search and analysis capabilities. Current research focuses on improving the alignment of embeddings from pre-trained models (like Vision Transformers and BERT) using techniques such as contrastive learning and refined attention mechanisms (e.g., dual attention networks incorporating self-attention) to better capture relationships within and between modalities. These advancements are significantly impacting fields like medical image analysis and multimedia search by enabling more effective retrieval of relevant information from diverse data sources.
Papers
December 11, 2024
April 20, 2023
February 13, 2023
October 10, 2022
March 2, 2022