Spatial Understanding
Spatial understanding in artificial intelligence focuses on enabling machines to comprehend and reason about spatial relationships within 2D and 3D environments, mirroring human cognitive abilities. Current research heavily utilizes large language models (LLMs) and vision-language models (VLMs), often incorporating novel architectures like spatial alignment modules and embedding pose graphs to improve spatial reasoning and navigation tasks. This field is crucial for advancing embodied AI, robotics, and applications requiring precise spatial awareness, such as autonomous navigation, real estate appraisal, and medical image analysis. The development of comprehensive benchmarks and datasets is driving progress in evaluating and improving model performance.
Papers
CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
Gaoyang Zhang, Bingtao Fu, Qingnan Fan, Qi Zhang, Runxing Liu, Hong Gu, Huaqi Zhang, Xinguo Liu
SPHERE: A Hierarchical Evaluation on Spatial Perception and Reasoning for Vision-Language Models
Wenyu Zhang, Wei En Ng, Lixin Ma, Yuwen Wang, Jungqi Zhao, Boyang Li, Lu Wang