Image Text Pair
Image-text pairs are fundamental to training multimodal models that understand and generate both visual and textual information. Current research focuses on improving the alignment between image and text representations, often employing contrastive learning, multi-graph alignment, and various attention mechanisms within transformer-based architectures. These advancements aim to address challenges like data scarcity, compositional understanding, and robustness to noise and adversarial attacks, ultimately leading to more accurate and efficient vision-language models. The resulting improvements have significant implications for various applications, including image retrieval, text-to-image generation, and medical image analysis.
Papers
January 17, 2023
January 16, 2023
January 5, 2023
December 13, 2022
December 1, 2022
November 14, 2022
November 2, 2022
October 31, 2022
October 16, 2022
October 13, 2022
October 4, 2022
July 29, 2022
July 12, 2022
July 5, 2022
June 22, 2022
June 16, 2022
May 28, 2022
April 30, 2022