Generated Caption

Image and video captioning research aims to automatically generate descriptive text summarizing visual content, improving accessibility and enabling new applications in diverse fields. Current efforts focus on enhancing model accuracy and addressing limitations like bias and hallucination through techniques such as improved data alignment, graph-based captioning, and the integration of large language models (LLMs) with various encoder-decoder architectures, including transformers and LSTMs. These advancements are driving progress in areas such as remote sensing, medical image analysis, and retail analytics, where automated captioning can facilitate efficient data processing and analysis. Furthermore, research is actively exploring methods for improving caption quality, including length control, sentiment analysis, and the incorporation of contextual information.

Papers