Image Captioning
Image captioning aims to automatically generate descriptive text for images, bridging the gap between computer vision and natural language processing. Current research focuses on improving efficiency (e.g., through early exits and knowledge distillation), enhancing performance on fine-grained datasets (e.g., by incorporating object-part details), and developing more robust evaluation metrics (e.g., addressing hallucinations). These advancements are significant for applications ranging from assisting visually impaired individuals to improving image search and retrieval, and are driving innovation in both vision-language models and evaluation methodologies.
Papers
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi, Aishwarya Agrawal
Exploring Diverse In-Context Configurations for Image Captioning
Xu Yang, Yongliang Wu, Mingzhuo Yang, Haokun Chen, Xin Geng
Alt-Text with Context: Improving Accessibility for Images on Twitter
Nikita Srivatsan, Sofia Samaniego, Omar Florez, Taylor Berg-Kirkpatrick
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Haoyi Qiu, Zi-Yi Dou, Tianlu Wang, Asli Celikyilmaz, Nanyun Peng