Image Caption Pair
Image-caption pairs, comprising an image and its corresponding textual description, are fundamental to vision-language research, primarily aiming to improve multimodal understanding and generation. Current research focuses on leveraging these pairs to enhance model capabilities in tasks like image captioning, object detection, and retrieval, often employing contrastive learning and diffusion models, as well as large language models for caption enrichment. This area is significant because improved vision-language alignment enables advancements in various applications, including zero-shot learning, medical image analysis, and more robust and efficient multimodal systems.
Papers
October 30, 2024
October 11, 2024
September 2, 2024
August 30, 2024
July 18, 2024
July 12, 2024
July 10, 2024
July 1, 2024
June 10, 2024
April 23, 2024
April 19, 2024
April 9, 2024
March 1, 2024
February 7, 2024
February 3, 2024
January 9, 2024
December 14, 2023
November 5, 2023
October 25, 2023