Visual Input
Visual input processing is a rapidly evolving field aiming to enable machines to understand and reason with visual information as effectively as humans. Current research focuses on improving the visual comprehension of large language and vision-language models (VLMs) through techniques like active perception, attention mechanisms inspired by human gaze, and multimodal prompt engineering, often employing transformer-based architectures. These advancements are crucial for improving the performance of autonomous systems, assistive technologies for the visually impaired, and applications requiring robust visual reasoning, while also revealing and mitigating biases inherent in these models.
Papers
July 31, 2024
July 28, 2024
June 27, 2024
June 4, 2024
May 30, 2024
May 28, 2024
May 24, 2024
April 15, 2024
April 12, 2024
April 10, 2024
April 7, 2024
March 29, 2024
March 25, 2024
March 8, 2024
March 6, 2024
February 28, 2024
February 15, 2024
December 19, 2023
December 8, 2023
November 28, 2023