Captioning Evaluation

Evaluating the quality of automatically generated image captions is a crucial but challenging task, aiming to create metrics that accurately reflect human judgment of fluency, detail, and factual accuracy. Current research focuses on developing both reference-free metrics, often leveraging CLIP-based models and contrastive learning, and reference-based metrics that address limitations of existing methods like CIDEr and METEOR, particularly in handling detailed descriptions and visual hallucinations. These advancements are significant because improved evaluation methods are essential for driving progress in image captioning models and their applications in areas like accessibility and visual information retrieval.

Papers