Image Captioning Model
Image captioning models automatically generate textual descriptions of images, aiming to create captions that are both accurate and engaging. Current research focuses on improving caption quality through techniques like direct optimization using CLIP scores, developing more efficient architectures (e.g., those based on Fourier transforms), and enhancing robustness against adversarial attacks. These advancements are significant for various applications, including accessibility tools, content creation, and improving the performance of larger vision-language models, while also raising important considerations around AI safety and ethical deployment.
32papers
Papers
February 26, 2025
January 8, 2025
December 16, 2024
October 16, 2024
May 1, 2024
April 11, 2024
March 23, 2024