Visual Commonsense Reasoning
Visual commonsense reasoning (VCR) aims to equip AI systems with the ability to understand and reason about everyday visual scenes, going beyond simple object recognition to encompass contextual understanding and inference. Current research focuses on integrating large language models (LLMs) with vision-language models (VLMs), often employing transformer architectures and techniques like multi-modal fusion and attention mechanisms to improve performance on VCR benchmarks. This research is significant because it addresses a crucial gap in AI's ability to interact meaningfully with the real world, with potential applications in areas like visual question answering, robotics, and assistive technologies.
Papers
June 19, 2024
June 11, 2024
June 9, 2024
May 27, 2024
April 22, 2024
November 9, 2023
October 30, 2023
October 9, 2023
September 7, 2023
May 26, 2023
March 13, 2023
February 4, 2023
January 30, 2023
December 14, 2022
September 13, 2022
May 20, 2022
April 25, 2022
April 17, 2022
March 21, 2022
February 25, 2022