Object Hallucination

Object hallucination, the tendency of large vision-language models (LVLMs) to generate descriptions containing objects not present in the input image, is a significant challenge hindering their reliability. Current research focuses on understanding the root causes of this phenomenon, exploring attention mechanisms within transformer-based architectures and investigating the roles of visual encoders and language decoders. This research aims to develop methods, both training-based and training-free, to mitigate hallucinations, improving the accuracy and trustworthiness of LVLMs for various applications, such as image captioning and visual question answering. Improved evaluation metrics are also being developed to more accurately assess and compare different approaches to reducing object hallucination.

Papers