Image Understanding

Image understanding research aims to enable computers to interpret and reason about the content of images, mirroring human visual perception and comprehension. Current efforts focus on improving the accuracy and robustness of large multimodal models (like LLMs and VLMs), particularly addressing challenges such as occlusion, cross-domain generalization, and hallucinations, often through techniques like contrastive learning, retrieval augmentation, and self-training. These advancements are crucial for applications ranging from medical image analysis and remote sensing to e-commerce and web accessibility, driving progress in both fundamental computer vision and practical AI systems.

Papers