Spatial Understanding
Spatial understanding in artificial intelligence focuses on enabling machines to comprehend and reason about spatial relationships within 2D and 3D environments, mirroring human cognitive abilities. Current research heavily utilizes large language models (LLMs) and vision-language models (VLMs), often incorporating novel architectures like spatial alignment modules and embedding pose graphs to improve spatial reasoning and navigation tasks. This field is crucial for advancing embodied AI, robotics, and applications requiring precise spatial awareness, such as autonomous navigation, real estate appraisal, and medical image analysis. The development of comprehensive benchmarks and datasets is driving progress in evaluating and improving model performance.
Papers
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari
Evaluating Spatial Understanding of Large Language Models
Yutaro Yamada, Yihan Bao, Andrew K. Lampinen, Jungo Kasai, Ilker Yildirim
GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding
Zekun Li, Wenxuan Zhou, Yao-Yi Chiang, Muhao Chen