Visual Understanding
Visual understanding research aims to enable computers to interpret and reason about images and videos as humans do, focusing on tasks like object recognition, scene description, and complex visual reasoning. Current research heavily utilizes large language and vision models (LLVMs), often incorporating vision transformers and leveraging techniques like chain-of-thought prompting and visual instruction tuning to improve performance. This field is crucial for advancing artificial intelligence, with applications ranging from robotics and autonomous driving to medical image analysis and accessibility tools for visually impaired individuals.
Papers
November 15, 2024
November 13, 2024
October 31, 2024
October 28, 2024
October 27, 2024
October 21, 2024
October 17, 2024
October 16, 2024
October 13, 2024
October 7, 2024
September 19, 2024
September 11, 2024
September 3, 2024
August 21, 2024
August 20, 2024
August 15, 2024
August 13, 2024
August 6, 2024
July 19, 2024