Image Text Pair
Image-text pairs are fundamental to training multimodal models that understand and generate both visual and textual information. Current research focuses on improving the alignment between image and text representations, often employing contrastive learning, multi-graph alignment, and various attention mechanisms within transformer-based architectures. These advancements aim to address challenges like data scarcity, compositional understanding, and robustness to noise and adversarial attacks, ultimately leading to more accurate and efficient vision-language models. The resulting improvements have significant implications for various applications, including image retrieval, text-to-image generation, and medical image analysis.
Papers
October 17, 2023
October 15, 2023
October 13, 2023
October 12, 2023
October 11, 2023
October 10, 2023
October 7, 2023
October 3, 2023
September 27, 2023
September 24, 2023
September 23, 2023
September 18, 2023
August 31, 2023
August 30, 2023
August 29, 2023
August 28, 2023
August 26, 2023
August 19, 2023
August 18, 2023