Visual Input
Visual input processing is a rapidly evolving field aiming to enable machines to understand and reason with visual information as effectively as humans. Current research focuses on improving the visual comprehension of large language and vision-language models (VLMs) through techniques like active perception, attention mechanisms inspired by human gaze, and multimodal prompt engineering, often employing transformer-based architectures. These advancements are crucial for improving the performance of autonomous systems, assistive technologies for the visually impaired, and applications requiring robust visual reasoning, while also revealing and mitigating biases inherent in these models.
Papers
June 19, 2023
June 12, 2023
June 4, 2023
May 24, 2023
May 19, 2023
April 28, 2023
March 20, 2023
March 14, 2023
February 28, 2023
February 9, 2023
December 21, 2022
November 29, 2022
September 26, 2022
August 30, 2022
April 23, 2022
April 13, 2022
March 31, 2022
February 10, 2022
December 16, 2021