Visual Hallucination

Visual hallucinations in artificial intelligence refer to instances where AI models, particularly large language models (LLMs) and vision-language models (VLMs), generate outputs that are factually incorrect or inconsistent with input data. Current research focuses on identifying and quantifying these hallucinations across various modalities (text, images, video), developing automated evaluation metrics, and exploring methods to mitigate them through techniques like improved training data, refined model architectures (e.g., incorporating pose information, scene graphs), and contrastive decoding strategies. Understanding and addressing visual hallucinations is crucial for building trustworthy and reliable AI systems, with implications for applications ranging from healthcare and finance to creative content generation.

Papers