Spatial Reasoning
Spatial reasoning, the ability to understand and manipulate spatial relationships, is a crucial area of research in artificial intelligence, focusing on enabling machines to perform tasks requiring spatial understanding and manipulation. Current research emphasizes improving the spatial reasoning capabilities of large language models (LLMs) and vision-language models (VLMs) through techniques like prompt engineering, 3D scene graph integration, and the development of novel training datasets and benchmarks that specifically target spatial reasoning challenges. These advancements are significant because improved spatial reasoning is essential for progress in robotics, autonomous navigation, and other applications requiring interaction with the physical world.
Papers
AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality
Uttamasha Monjoree, Wei Yan
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
Fatemeh Shiri, Xiao-Yu Guo, Mona Golestan Far, Xin Yu, Gholamreza Haffari, Yuan-Fang Li
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning
Yihong Tang, Ao Qu, Zhaokai Wang, Dingyi Zhuang, Zhaofeng Wu, Wei Ma, Shenhao Wang, Yunhan Zheng, Zhan Zhao, Jinhua Zhao
Task-oriented Robotic Manipulation with Vision Language Models
Nurhan Bulus Guran, Hanchi Ren, Jingjing Deng, Xianghua Xie