Vision Language Planning
Vision-language planning (VLP) aims to create AI systems capable of understanding and acting upon instructions combining visual and textual information, bridging the gap between perception and action. Current research focuses on integrating large language and multi-modal models with computer vision techniques, often employing diffusion models and incorporating egocentric perspectives to improve task completion in complex, real-world scenarios like autonomous driving and robotic manipulation. This interdisciplinary field is significant for advancing AI capabilities in robotics, autonomous systems, and human-computer interaction, ultimately leading to more robust and adaptable intelligent agents.
Papers
October 30, 2024
August 11, 2024
February 16, 2024
January 10, 2024
November 29, 2023