Image Text Pair
Image-text pairs are fundamental to training multimodal models that understand and generate both visual and textual information. Current research focuses on improving the alignment between image and text representations, often employing contrastive learning, multi-graph alignment, and various attention mechanisms within transformer-based architectures. These advancements aim to address challenges like data scarcity, compositional understanding, and robustness to noise and adversarial attacks, ultimately leading to more accurate and efficient vision-language models. The resulting improvements have significant implications for various applications, including image retrieval, text-to-image generation, and medical image analysis.
Papers
April 18, 2024
April 17, 2024
April 12, 2024
April 11, 2024
April 7, 2024
April 5, 2024
March 29, 2024
March 28, 2024
March 27, 2024
March 26, 2024
March 24, 2024
March 22, 2024
March 11, 2024
March 10, 2024
March 5, 2024
February 16, 2024
January 24, 2024
January 20, 2024
January 18, 2024