Image Caption

Image captioning aims to automatically generate descriptive text for images, bridging the gap between computer vision and natural language processing. Current research emphasizes improving caption quality, accuracy, and diversity, often focusing on advancements in transformer-based models and contrastive learning approaches, as well as addressing biases and limitations in training data through techniques like data augmentation and deduplication. This field is crucial for enhancing accessibility of visual information, improving cross-modal retrieval systems, and advancing the understanding of human-computer interaction and multimodal learning.

Papers