Vision and Language Navigation
Vision-and-Language Navigation (VLN) focuses on enabling agents to navigate 3D environments by following natural language instructions, aiming to bridge the gap between visual perception and linguistic understanding. Current research emphasizes improving model efficiency (e.g., through knowledge distillation), exploring zero-shot navigation with large language models (LLMs) and incorporating safety mechanisms, and addressing challenges like instruction errors and robustness to environmental changes. This field is significant for advancing embodied AI and has potential applications in robotics, autonomous systems, and human-computer interaction.
Papers
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Mengfei Du, Binhao Wu, Jiwen Zhang, Zhihao Fan, Zejun Li, Ruipu Luo, Xuanjing Huang, Zhongyu Wei
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang
Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation
Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang
Language to Map: Topological map generation from natural language path instructions
Hideki Deguchi, Kazuki Shibata, Shun Taguchi