Multimodal Hallucination
Multimodal hallucination refers to the generation of inaccurate or fabricated information by large vision-language models (LVLMs) that combines visual and textual data. Current research focuses on understanding the underlying causes of these hallucinations, developing methods to detect them (often using novel metrics and datasets), and mitigating their occurrence through techniques like hierarchical feedback learning, data filtering, and self-supervised revision mechanisms. This work is crucial for improving the reliability and trustworthiness of LVLMs, impacting various applications from medical diagnosis to question answering systems, where factual accuracy is paramount.
Papers
December 6, 2024
December 4, 2024
October 15, 2024
September 30, 2024
September 2, 2024
July 22, 2024
July 5, 2024
June 30, 2024
May 30, 2024
February 23, 2024
February 22, 2024
February 2, 2024
December 31, 2023
December 8, 2023
November 13, 2023
September 29, 2023
July 14, 2022