Embodied Vision

Embodied vision research focuses on enabling artificial agents to understand and interact with the world through the integration of vision and physical action. Current efforts concentrate on developing robust and generalizable models, often leveraging large language models (LLMs) and vision-language models (VLMs) within frameworks that incorporate chain-of-thought reasoning and hierarchical skill decomposition to improve planning and execution of complex tasks. This interdisciplinary field is significantly advancing robotics, autonomous driving, and human-computer interaction by creating agents capable of more nuanced perception, reasoning, and interaction in dynamic environments.

Papers