Visual Hallucination
Visual hallucinations in artificial intelligence refer to instances where AI models, particularly large language models (LLMs) and vision-language models (VLMs), generate outputs that are factually incorrect or inconsistent with input data. Current research focuses on identifying and quantifying these hallucinations across various modalities (text, images, video), developing automated evaluation metrics, and exploring methods to mitigate them through techniques like improved training data, refined model architectures (e.g., incorporating pose information, scene graphs), and contrastive decoding strategies. Understanding and addressing visual hallucinations is crucial for building trustworthy and reliable AI systems, with implications for applications ranging from healthcare and finance to creative content generation.
Papers
Detecting Hallucinations in Virtual Histology with Neural Precursors
Ji-Hun Oh, Kianoush Falahkheirkhah, Rohit Bhargava
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen, Tianshu Zhang, Shiyu Huang, Yuwei Niu, Linfeng Zhang, Lijie Wen, Xuming Hu