Vision and Language Navigation

Vision-and-Language Navigation (VLN) focuses on enabling agents to navigate 3D environments by following natural language instructions, aiming to bridge the gap between visual perception and linguistic understanding. Current research emphasizes improving model efficiency (e.g., through knowledge distillation), exploring zero-shot navigation with large language models (LLMs) and incorporating safety mechanisms, and addressing challenges like instruction errors and robustness to environmental changes. This field is significant for advancing embodied AI and has potential applications in robotics, autonomous systems, and human-computer interaction.

Papers