Visual Understanding
Visual understanding research aims to enable computers to interpret and reason about images and videos as humans do, focusing on tasks like object recognition, scene description, and complex visual reasoning. Current research heavily utilizes large language and vision models (LLVMs), often incorporating vision transformers and leveraging techniques like chain-of-thought prompting and visual instruction tuning to improve performance. This field is crucial for advancing artificial intelligence, with applications ranging from robotics and autonomous driving to medical image analysis and accessibility tools for visually impaired individuals.
Papers
October 7, 2024
September 19, 2024
September 11, 2024
September 3, 2024
August 21, 2024
August 20, 2024
August 15, 2024
August 13, 2024
August 6, 2024
July 19, 2024
July 15, 2024
July 6, 2024
June 25, 2024
June 24, 2024
June 15, 2024
June 10, 2024
May 31, 2024
May 24, 2024
May 21, 2024