Visual Reasoning
Visual reasoning aims to enable artificial intelligence systems to understand and reason using visual information, mirroring human cognitive abilities. Current research focuses on developing and evaluating large vision-language models (VLMs) and multimodal large language models (MLLMs), often employing transformer architectures and techniques like chain-of-thought prompting and active perception, to improve performance on various visual reasoning tasks such as visual question answering and object manipulation. These advancements are significant because they address limitations in existing AI systems and hold potential for applications in robotics, medical image analysis, and other fields requiring complex visual interpretation and decision-making.
Papers
An in-depth experimental study of sensor usage and visual reasoning of robots navigating in real environments
Assem Sadek, Guillaume Bono, Boris Chidlovskii, Christian Wolf
Recurrent Vision Transformer for Solving Visual Reasoning Problems
Nicola Messina, Giuseppe Amato, Fabio Carrara, Claudio Gennaro, Fabrizio Falchi