Visual Learning

Visual learning research aims to enable computers to understand and interpret images and videos, mirroring human visual capabilities. Current efforts focus on improving the robustness and efficiency of self-supervised learning methods, often employing transformer architectures and contrastive learning algorithms, as well as exploring how to incorporate contextual information (spatial, temporal, linguistic) to enhance learning. These advancements have significant implications for various applications, including robotics, medical image analysis, and large-scale data analysis, by enabling more accurate and efficient processing of visual data.

Papers