Spatial Reasoning
Spatial reasoning, the ability to understand and manipulate spatial relationships, is a crucial area of research in artificial intelligence, focusing on enabling machines to perform tasks requiring spatial understanding and manipulation. Current research emphasizes improving the spatial reasoning capabilities of large language models (LLMs) and vision-language models (VLMs) through techniques like prompt engineering, 3D scene graph integration, and the development of novel training datasets and benchmarks that specifically target spatial reasoning challenges. These advancements are significant because improved spatial reasoning is essential for progress in robotics, autonomous navigation, and other applications requiring interaction with the physical world.
Papers
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu, Jintong Li, Yichen Jiang, Niranjan Sujay, Zhicheng Yang, Juexiao Zhang, John Abanes, Jing Zhang, Chen Feng
APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents
Jun Yu Chen, Tao Gao
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su, Stan Birchfield
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Linqing Zhong, Chen Gao, Zihan Ding, Yue Liao, Si Liu
AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality
Uttamasha Monjoree, Wei Yan
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
Fatemeh Shiri, Xiao-Yu Guo, Mona Golestan Far, Xin Yu, Gholamreza Haffari, Yuan-Fang Li