Paper ID: 2502.07183 • Published Feb 11, 2025
Space-Aware Instruction Tuning: Dataset and Benchmark for Guide Dog Robots Assisting the Visually Impaired
ByungOk Han, Woo-han Yun, Beom-Su Seo, Jaehong Kim
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Guide dog robots offer promising solutions to enhance mobility and safety for
visually impaired individuals, addressing the limitations of traditional guide
dogs, particularly in perceptual intelligence and communication. With the
emergence of Vision-Language Models (VLMs), robots are now capable of
generating natural language descriptions of their surroundings, aiding in safer
decision-making. However, existing VLMs often struggle to accurately interpret
and convey spatial relationships, which is crucial for navigation in complex
environments such as street crossings. We introduce the Space-Aware Instruction
Tuning (SAIT) dataset and the Space-Aware Benchmark (SA-Bench) to address the
limitations of current VLMs in understanding physical environments. Our
automated data generation pipeline focuses on the virtual path to the
destination in 3D space and the surroundings, enhancing environmental
comprehension and enabling VLMs to provide more accurate guidance to visually
impaired individuals. We also propose an evaluation protocol to assess VLM
effectiveness in delivering walking guidance. Comparative experiments
demonstrate that our space-aware instruction-tuned model outperforms
state-of-the-art algorithms. We have fully open-sourced the SAIT dataset and
SA-Bench, along with the related code, at
this https URL