Image Caption Generation

Image caption generation aims to automatically create textual descriptions of images, bridging computer vision and natural language processing. Current research emphasizes improving caption quality through advanced architectures like transformers and graph neural networks, often incorporating attention mechanisms to focus on relevant image regions and leveraging pre-trained models for efficient feature extraction. Furthermore, research explores methods to enhance caption diversity, address efficiency concerns, and incorporate human feedback for improved alignment with user preferences, particularly in specialized domains like scientific figure captioning. These advancements have significant implications for accessibility, automated image indexing, and content creation across various applications.

Papers