Visual Planning
Visual planning focuses on enabling machines to devise sequences of actions to achieve goals based on visual input, mirroring human decision-making in complex environments. Current research emphasizes integrating large language models (LLMs) and vision-language models (VLMs) with various planning algorithms, such as tree search and roadmap methods, often within learned latent spaces to improve efficiency and generalization. This field is crucial for advancing robotics, AI assistants, and other applications requiring intelligent agents to interact with and manipulate the physical world, particularly in scenarios with incomplete or uncertain information. The development of robust benchmarks and datasets is also a key focus to facilitate objective evaluation and comparison of different approaches.