Visual Representation
Visual representation research focuses on creating effective ways for computers to understand and utilize visual information, primarily aiming to bridge the gap between raw image data and higher-level semantic understanding. Current research emphasizes developing robust and efficient visual representations through various techniques, including contrastive learning, masked image modeling, and the integration of vision models with large language models (LLMs), often employing transformer-based architectures. These advancements have significant implications for numerous applications, such as robotic control, medical image analysis, and improving the capabilities of multimodal AI systems.
Papers
CLAWS: Contrastive Learning with hard Attention and Weak Supervision
Jansel Herrera-Gerena, Ramakrishnan Sundareswaran, John Just, Matthew Darr, Ali Jannesari
PreViTS: Contrastive Pretraining with Video Tracking Supervision
Brian Chen, Ramprasaath R. Selvaraju, Shih-Fu Chang, Juan Carlos Niebles, Nikhil Naik
Rethink, Revisit, Revise: A Spiral Reinforced Self-Revised Network for Zero-Shot Learning
Zhe Liu, Yun Li, Lina Yao, Julian McAuley, Sam Dixon