Visual Reasoning
Visual reasoning aims to enable artificial intelligence systems to understand and reason using visual information, mirroring human cognitive abilities. Current research focuses on developing and evaluating large vision-language models (VLMs) and multimodal large language models (MLLMs), often employing transformer architectures and techniques like chain-of-thought prompting and active perception, to improve performance on various visual reasoning tasks such as visual question answering and object manipulation. These advancements are significant because they address limitations in existing AI systems and hold potential for applications in robotics, medical image analysis, and other fields requiring complex visual interpretation and decision-making.
Papers
Abductive Symbolic Solver on Abstraction and Reasoning Corpus
Mintaek Lim, Seokki Lee, Liyew Woletemaryam Abitew, Sundong Kim
Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
Jingming Liu, Yumeng Li, Boyuan Xiao, Yichang Jian, Ziang Qin, Tianjia Shao, Yao-Xiang Ding, Kun Zhou