Vision and Language Navigation
Vision-and-Language Navigation (VLN) focuses on enabling agents to navigate 3D environments by following natural language instructions, aiming to bridge the gap between visual perception and linguistic understanding. Current research emphasizes improving model efficiency (e.g., through knowledge distillation), exploring zero-shot navigation with large language models (LLMs) and incorporating safety mechanisms, and addressing challenges like instruction errors and robustness to environmental changes. This field is significant for advancing embodied AI and has potential applications in robotics, autonomous systems, and human-computer interaction.
Papers
MLANet: Multi-Level Attention Network with Sub-instruction for Continuous Vision-and-Language Navigation
Zongtao He, Liuyi Wang, Shu Li, Qingqing Yan, Chengju Liu, Qijun Chen
ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, Dacheng Tao