Image Understanding
Image understanding research aims to enable computers to interpret and reason about the content of images, mirroring human visual perception and comprehension. Current efforts focus on improving the accuracy and robustness of large multimodal models (like LLMs and VLMs), particularly addressing challenges such as occlusion, cross-domain generalization, and hallucinations, often through techniques like contrastive learning, retrieval augmentation, and self-training. These advancements are crucial for applications ranging from medical image analysis and remote sensing to e-commerce and web accessibility, driving progress in both fundamental computer vision and practical AI systems.
Papers
May 22, 2023
April 14, 2023
February 8, 2023
October 21, 2022
April 10, 2022
March 11, 2022
January 21, 2022