Large Vision Language
Large vision-language models (VLMs) aim to integrate visual and textual information, enabling computers to understand and reason about images and text simultaneously. Current research focuses on improving VLM performance in challenging scenarios, such as handling occluded objects in images and extending capabilities to longer videos and more complex tasks like chart comprehension. This involves developing novel architectures, efficient fine-tuning techniques, and large-scale datasets to address limitations in existing models. Advances in VLMs have significant implications for various applications, including robotics, image retrieval, and question answering systems.
Papers
November 4, 2024
October 30, 2024
October 21, 2024
October 2, 2024
September 20, 2024
August 19, 2024
May 9, 2024
March 14, 2024
March 3, 2024
May 24, 2023
April 7, 2023
November 29, 2022