Object Hallucination
Object hallucination, the tendency of large vision-language models (LVLMs) to generate descriptions containing objects not present in the input image, is a significant challenge hindering their reliability. Current research focuses on understanding the root causes of this phenomenon, exploring attention mechanisms within transformer-based architectures and investigating the roles of visual encoders and language decoders. This research aims to develop methods, both training-based and training-free, to mitigate hallucinations, improving the accuracy and trustworthiness of LVLMs for various applications, such as image captioning and visual question answering. Improved evaluation metrics are also being developed to more accurately assess and compare different approaches to reducing object hallucination.
Papers
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing Wang, Guang Dai, Ping Chen, Shijian Lu
Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
Mingqian Feng, Yunlong Tang, Zeliang Zhang, Chenliang Xu