Image Captioning

Image captioning aims to automatically generate descriptive text for images, bridging the gap between computer vision and natural language processing. Current research focuses on improving efficiency (e.g., through early exits and knowledge distillation), enhancing performance on fine-grained datasets (e.g., by incorporating object-part details), and developing more robust evaluation metrics (e.g., addressing hallucinations). These advancements are significant for applications ranging from assisting visually impaired individuals to improving image search and retrieval, and are driving innovation in both vision-language models and evaluation methodologies.

Papers

June 26, 2023

Self-Supervised Image Captioning with CLIP
Chuanyang Jin
Self Supervised Image Captioning Single CLIP Image Caption Pair Captioning Method

June 25, 2023

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long Chen
Image Captioning Generated Caption Distinctive Caption Contrastive Reward

June 23, 2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
Zihao Yue, Anwen Hu, Liang Zhang, Qin Jin
Image Captioning Maximum Likelihood Captioning Model

June 20, 2023

Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
Simone Bianco, Luigi Celona, Marco Donzella, Paolo Napoletano
Language Model Image Captioning Partial Ranking

June 13, 2023

June 12, 2023

Scalable 3D Captioning with Pretrained Models
Tiange Luo, Chris Rockwell, Honglak Lee, Justin Johnson
Image Captioning Generated Caption Text to 3D Large Scale 3D 3D Dense Captioning

June 6, 2023

June 5, 2023

Cheap-fake Detection with LLM using Prompt Engineering
Guangyang Wu, Weijie Wu, Xiaohong Liu, Kele Xu, Tianjiao Wan, Wenyi Wang
Large Language Model Image Captioning Prompt Engineering Text to Image Synthesis Context Image Fake Detection

June 1, 2023

May 31, 2023

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting
Rita Ramos, Bruno Martins, Desmond Elliott
Image Captioning Multilingual CLIP

May 29, 2023

Image Captioning with Multi-Context Synthetic Data
Feipeng Ma, Yizhou Zhou, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun
Synthetic Data Image Captioning Context Generation Synthetic Text Image

May 28, 2023

FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein, David Bensaid, Shaked Brody, Roy Ganz, Ron Kimmel
Vision Language Image Captioning Generated Caption Caption Generation Image Caption Pair

May 25, 2023

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo, Zsolt Kira
Image Captioning Caption Generation Level Aggregation Augmented View Visual Encoding

May 24, 2023