Image Captioning

Image captioning aims to automatically generate descriptive text for images, bridging the gap between computer vision and natural language processing. Current research focuses on improving efficiency (e.g., through early exits and knowledge distillation), enhancing performance on fine-grained datasets (e.g., by incorporating object-part details), and developing more robust evaluation metrics (e.g., addressing hallucinations). These advancements are significant for applications ranging from assisting visually impaired individuals to improving image search and retrieval, and are driving innovation in both vision-language models and evaluation methodologies.

Papers

October 15, 2023

Bounding and Filling: A Fast and Flexible Framework for Image Captioning
Zheng Ma, Changxin Wang, Bo Huang, Zixuan Zhu, Jianbing Zhang
Image Captioning Non Autoregressive General Bound Flexible Framework COCO Benchmark

October 11, 2023

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation
Rashid Khan, Bingding Huang, Haseeb Hassan, Asim Zaman, Zhongfu Ye
Pre Trained Comparative Study Human Attention Image Captioning Deep Learning Framework Attention Model Image Caption Generation

October 10, 2023

The Solution for the CVPR2023 NICE Image Captioning Challenge
Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu
Zero Shot Image Captioning Solution Path Image Text Pair Reference Caption

September 26, 2023

BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile Screenshot Captioning
Ching-Yu Chiang, I-Hua Chang, Shih-Wei Liao
Image Captioning Parameter Efficient Transfer Learning Image Captioning Model Bit Level Information Preserving

September 24, 2023

FaceAtt: Enhancing Image Captioning with Facial Attributes for Portrait Images
Naimul Haque, Iffat Labiba, Sadia Akter
Image Captioning Facial Attribute Portrait Image

September 10, 2023

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Guisheng Liu, Yi Li, Zhengcong Fei, Haiyan Fu, Xiangyang Luo, Yanqing Guo
Diffusion Model Image Captioning Image Embeddings Distinctive Caption

August 26, 2023

Towards Real Time Egocentric Segment Captioning for The Blind and Visually Impaired in RGB-D Theatre Images
Khadidja Delloul, Slimane Larabi
Computer Vision Image Captioning Scene Structure RGB D Datasets RGB D Action

August 25, 2023

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang Yang, Fenglin Liu, Xian Wu, Yaowei Wang, Xu Sun, Yuexian Zou
Image Captioning Complex Prompt Video Captioning Caption Pair Caption Editing Visual Captioning Model

August 23, 2023

August 5, 2023

August 2, 2023

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Kanzhi Cheng, Wenpo Song, Zheng Ma, Wenhao Zhu, Zixuan Zhu, Jianbing Zhang
Vision Language Image Captioning Captioning Method Captioning Benchmark Social Generic Knowledge Hallucination Knowledge Prediction

July 24, 2023

Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed, Mohamed Yousef, Khaled F. Hussain, Yousef Bassyouni Mahdy
Image Captioning Transformer Based Framework Captioning Datasets Depth Information

July 19, 2023

July 14, 2023

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu, Ying Liu, Vladimir Vlassov
Neural Network Image Captioning Textual Feature Caption Generation Spatial Attention Attention Framework Attention Based Network

July 10, 2023

Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer, Markus Hofmarcher, Sepp Hochreiter, Thomas Adler
Vision Language Model Image Captioning Synthetic Caption Captioning Evaluation Linear Arrangement

June 28, 2023

Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu, Yi Zhu, Tiffany Deng, Abhay Mittal, Yanbei Chen, Manchen Wang, Paolo Favaro, Joseph Tighe, Davide Modolo
Zero Shot Vision Language Model Technical Challenge Image Captioning

Image Captioning

Papers

Bounding and Filling: A Fast and Flexible Framework for Image Captioning

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

The Solution for the CVPR2023 NICE Image Captioning Challenge

BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile Screenshot Captioning

FaceAtt: Enhancing Image Captioning with Facial Attributes for Portrait Images

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

Towards Real Time Egocentric Segment Captioning for The Blind and Visually Impaired in RGB-D Theatre Images

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

CgT-GAN: CLIP-guided Text GAN for Image Captioning

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

Improving Generalization of Image Captioning with Unsupervised Prompt Learning

A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model

Enhancing image captioning with depth information using a Transformer-based framework

Improving Multimodal Datasets with Image Captioning

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

Linear Alignment of Vision-language Models for Image Captioning

Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity