Visual Input
Visual input processing is a rapidly evolving field aiming to enable machines to understand and reason with visual information as effectively as humans. Current research focuses on improving the visual comprehension of large language and vision-language models (VLMs) through techniques like active perception, attention mechanisms inspired by human gaze, and multimodal prompt engineering, often employing transformer-based architectures. These advancements are crucial for improving the performance of autonomous systems, assistive technologies for the visually impaired, and applications requiring robust visual reasoning, while also revealing and mitigating biases inherent in these models.
Papers
April 12, 2024
April 10, 2024
April 7, 2024
March 29, 2024
March 25, 2024
March 8, 2024
March 6, 2024
February 28, 2024
February 15, 2024
December 19, 2023
December 8, 2023
November 28, 2023
November 20, 2023
October 14, 2023
September 26, 2023
September 14, 2023
September 9, 2023
August 29, 2023
August 26, 2023
July 27, 2023