Geometric Vision
Geometric vision focuses on enabling computers to understand and interpret the geometry of scenes from visual data, aiming for robust and efficient solutions to problems like 3D reconstruction and object pose estimation. Current research emphasizes developing and evaluating large language and multimodal models for improved geometric reasoning, particularly addressing limitations in depth and height perception, and exploring novel algorithms like the Burer-Monteiro method for optimizing computationally expensive tasks. These advancements are crucial for improving the accuracy and efficiency of applications ranging from robotics and autonomous navigation to augmented reality and medical imaging.
Papers
GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models
Shangyu Xing, Changhao Xiang, Yuteng Han, Yifan Yue, Zhen Wu, Xinyu Liu, Zhangtai Wu, Fei Zhao, Xinyu Dai
Slow Perception: Let's Perceive Geometric Figures Step-by-step
Haoran Wei, Youyang Yin, Yumeng Li, Jia Wang, Liang Zhao, Jianjian Sun, Zheng Ge, Xiangyu Zhang