Visual Input
Visual input processing is a rapidly evolving field aiming to enable machines to understand and reason with visual information as effectively as humans. Current research focuses on improving the visual comprehension of large language and vision-language models (VLMs) through techniques like active perception, attention mechanisms inspired by human gaze, and multimodal prompt engineering, often employing transformer-based architectures. These advancements are crucial for improving the performance of autonomous systems, assistive technologies for the visually impaired, and applications requiring robust visual reasoning, while also revealing and mitigating biases inherent in these models.
Papers
November 20, 2023
October 14, 2023
September 26, 2023
September 14, 2023
September 9, 2023
August 29, 2023
August 26, 2023
July 27, 2023
June 19, 2023
June 12, 2023
June 4, 2023
May 24, 2023
May 19, 2023
April 28, 2023
March 20, 2023
March 14, 2023
February 28, 2023
February 9, 2023
December 21, 2022