Visual Spatial Description
Visual spatial description (VSD) focuses on automatically generating textual descriptions of the spatial relationships between objects in images or scenes. Current research emphasizes improving the accuracy and diversity of these descriptions, exploring both 2D and 3D scene understanding, and leveraging large language models (LLMs) and convolutional neural networks (CNNs) for improved performance. This field is significant for advancing human-computer interaction, particularly in robotics and navigation, by enabling more natural and robust communication about spatial environments. Furthermore, VSD contributes to a deeper understanding of how humans perceive and describe spatial relationships.
Papers
November 18, 2024
October 28, 2024
August 9, 2024
February 26, 2024
May 19, 2023
October 20, 2022