Scalable Vision

Scalable vision research aims to develop computer vision systems capable of efficiently processing and learning from massive datasets and high-resolution images, overcoming limitations of traditional approaches. Current efforts focus on novel architectures like Vision Transformers (ViTs) and their variants, exploring both self-supervised learning methods such as masked autoencoders (MAEs) and contrastive language-image pretraining, to improve model scalability and performance. These advancements are crucial for enabling applications requiring real-time processing of large visual data, such as autonomous driving, high-throughput material characterization, and advanced robotics.

Papers