Visual Understanding
Visual understanding research aims to enable computers to interpret and reason about images and videos as humans do, focusing on tasks like object recognition, scene description, and complex visual reasoning. Current research heavily utilizes large language and vision models (LLVMs), often incorporating vision transformers and leveraging techniques like chain-of-thought prompting and visual instruction tuning to improve performance. This field is crucial for advancing artificial intelligence, with applications ranging from robotics and autonomous driving to medical image analysis and accessibility tools for visually impaired individuals.
Papers
November 10, 2023
November 9, 2023
November 8, 2023
October 25, 2023
October 16, 2023
October 9, 2023
October 3, 2023
September 4, 2023
August 21, 2023
August 17, 2023
July 27, 2023
June 7, 2023
May 24, 2023
December 5, 2022
November 29, 2022
October 28, 2022
September 13, 2022
July 27, 2022
February 25, 2022