Image Text Pair
Image-text pairs are fundamental to training multimodal models that understand and generate both visual and textual information. Current research focuses on improving the alignment between image and text representations, often employing contrastive learning, multi-graph alignment, and various attention mechanisms within transformer-based architectures. These advancements aim to address challenges like data scarcity, compositional understanding, and robustness to noise and adversarial attacks, ultimately leading to more accurate and efficient vision-language models. The resulting improvements have significant implications for various applications, including image retrieval, text-to-image generation, and medical image analysis.
Papers
April 15, 2022
March 31, 2022
March 14, 2022
March 8, 2022
February 26, 2022
December 21, 2021
December 17, 2021
December 16, 2021
December 7, 2021
December 2, 2021
November 30, 2021
November 27, 2021
November 22, 2021
November 9, 2021
November 6, 2021
November 3, 2021
September 4, 2021