Generated Caption
Image and video captioning research aims to automatically generate descriptive text summarizing visual content, improving accessibility and enabling new applications in diverse fields. Current efforts focus on enhancing model accuracy and addressing limitations like bias and hallucination through techniques such as improved data alignment, graph-based captioning, and the integration of large language models (LLMs) with various encoder-decoder architectures, including transformers and LSTMs. These advancements are driving progress in areas such as remote sensing, medical image analysis, and retail analytics, where automated captioning can facilitate efficient data processing and analysis. Furthermore, research is actively exploring methods for improving caption quality, including length control, sentiment analysis, and the incorporation of contextual information.
Papers
SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings
Ting-Yao Hsu, Chieh-Yang Huang, Shih-Hong Huang, Ryan Rossi, Sungchul Kim, Tong Yu, C. Lee Giles, Ting-Hao K. Huang
The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge
Dian Chao, Xin Song, Shupeng Zhong, Boyuan Wang, Xiangyu Wu, Chen Zhu, Yang Yang