Visual Agent
Visual agents are AI systems designed to perceive and interact with the world through visual input, aiming to replicate aspects of human visual intelligence. Current research focuses on enhancing their reasoning capabilities, particularly by incorporating "fast" and "slow" thinking mechanisms and leveraging large language models (LLMs) to enable complex tasks like video generation, understanding, and editing. These advancements are improving performance on benchmarks and demonstrating potential for applications in areas such as video analysis, robotics, and interactive AI systems. The ultimate goal is to create more robust and generalizable visual agents capable of handling real-world complexities.
Papers
November 15, 2024
October 14, 2024
August 16, 2024
March 20, 2024
March 18, 2024
March 15, 2024
March 10, 2024
January 16, 2024
March 2, 2023
November 24, 2022
May 31, 2022
March 25, 2022
January 8, 2022