Image Caption
Image captioning aims to automatically generate descriptive text for images, bridging the gap between computer vision and natural language processing. Current research emphasizes improving caption quality, accuracy, and diversity, often focusing on advancements in transformer-based models and contrastive learning approaches, as well as addressing biases and limitations in training data through techniques like data augmentation and deduplication. This field is crucial for enhancing accessibility of visual information, improving cross-modal retrieval systems, and advancing the understanding of human-computer interaction and multimodal learning.
Papers
Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing
Ho Yin (Sam)Ng, Ting-Yao Hsu, Jiyoo Min, Sungchul Kim, Ryan A. Rossi, Tong Yu, Hyunggu Jung, Ting-Hao 'Kenneth' Huang
Scalable Vision Language Model Training via High Quality Data Curation
Hongyuan Dong, Zijian Kang, Weijie Yin, Xiao Liang, Chao Feng, Jiao Ran
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
Sheng Cheng, Maitreya Patel, Yezhou Yang
LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Weiquan Huang, Aoqi Wu, Yifan Yang, Xufang Luo, Yuqing Yang, Liang Hu, Qi Dai, Xiyang Dai, Dongdong Chen, Chong Luo, Lili Qiu