Image Text Retrieval
Image-text retrieval (ITR) aims to find the most relevant images for a given text query, and vice versa, bridging the semantic gap between visual and textual data. Current research emphasizes improving the accuracy and efficiency of ITR, focusing on advancements in vision-language models (VLMs) like CLIP and its variants, exploring techniques such as contrastive learning, fine-grained alignment, and efficient model architectures (e.g., dual-stream, lightweight models). The field is significant for its applications in various domains, including multimedia search, medical image analysis, and remote sensing, driving improvements in information retrieval and cross-modal understanding.
Papers
Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples
Dae Ung Jo, Kyuewang Lee, JaeHo Chung, Jin Young Choi
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias Niepert