Simple Vision Transformer
Simple Vision Transformers (ViTs) aim to leverage the power of transformer architectures for visual tasks while minimizing complexity and computational cost. Current research focuses on refining the basic ViT architecture, exploring variations like sliding windows and masked autoencoding for improved feature extraction and efficient training, often achieving state-of-the-art results with surprisingly simple designs. This focus on simplicity and efficiency makes these models attractive for various applications, including image deraining, object tracking, interactive segmentation, and general image classification, potentially democratizing access to high-performing vision models.
Papers
August 7, 2023
January 26, 2023
October 20, 2022
June 16, 2022
December 24, 2021
December 17, 2021