ViT Architecture
Vision Transformers (ViTs) are a powerful class of neural networks increasingly used for image analysis, offering performance comparable to or exceeding convolutional neural networks (CNNs). Current research focuses on improving ViT efficiency through techniques like structured pruning to reduce computational cost and power consumption, as well as exploring hybrid CNN-ViT architectures to leverage the strengths of both approaches. These advancements aim to make ViTs more practical for deployment in resource-constrained environments and broaden their applicability across various computer vision tasks, including object detection, segmentation, and 3D vision.
Papers
July 2, 2024
February 5, 2024
November 9, 2023
October 11, 2023
May 5, 2023
December 15, 2022
October 27, 2022
September 15, 2022
February 24, 2022
December 17, 2021