Vision and Language Navigation
Vision-and-Language Navigation (VLN) focuses on enabling agents to navigate 3D environments by following natural language instructions, aiming to bridge the gap between visual perception and linguistic understanding. Current research emphasizes improving model efficiency (e.g., through knowledge distillation), exploring zero-shot navigation with large language models (LLMs) and incorporating safety mechanisms, and addressing challenges like instruction errors and robustness to environmental changes. This field is significant for advancing embodied AI and has potential applications in robotics, autonomous systems, and human-computer interaction.
Papers
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh
Iterative Vision-and-Language Navigation
Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason