Novel Vision Transformer
Novel Vision Transformers (ViTs) aim to improve upon the limitations of traditional ViTs, primarily their reliance on fixed-size patch partitioning and the resulting disruption of image context. Current research focuses on developing architectures that adapt to image content, such as using superpixels or learned patterns as input tokens, and incorporating convolutional layers to better handle local information alongside global dependencies. These advancements lead to improved performance on various computer vision tasks, including image classification, object detection, and semantic segmentation, and offer enhanced interpretability and efficiency compared to earlier models.
Papers
January 6, 2025
October 8, 2024
January 5, 2024
August 21, 2023
July 17, 2023
July 16, 2023
June 2, 2023
March 23, 2023
February 18, 2023
February 6, 2023
July 12, 2022
July 8, 2022
June 4, 2022
April 19, 2022
March 20, 2022