Image Understanding
Image understanding research aims to enable computers to interpret and reason about the content of images, mirroring human visual perception and comprehension. Current efforts focus on improving the accuracy and robustness of large multimodal models (like LLMs and VLMs), particularly addressing challenges such as occlusion, cross-domain generalization, and hallucinations, often through techniques like contrastive learning, retrieval augmentation, and self-training. These advancements are crucial for applications ranging from medical image analysis and remote sensing to e-commerce and web accessibility, driving progress in both fundamental computer vision and practical AI systems.
Papers
Hierarchical Open-vocabulary Universal Image Segmentation
Xudong Wang, Shufan Li, Konstantinos Kallidromitis, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun, Junting Pan, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, Jifeng Dai, Yu Qiao, Limin Wang, Hongsheng Li