Lightweight Vision Transformer
Lightweight Vision Transformers (ViTs) aim to reduce the computational cost and memory footprint of standard ViTs, making them suitable for resource-constrained devices while maintaining competitive performance. Current research focuses on improving efficiency through novel architectures like latency-aware blocks incorporating convolutions and sparse self-attention, and leveraging pre-training techniques such as masked image modeling and knowledge distillation to enhance performance on limited data. These advancements are significant because they enable the deployment of powerful transformer-based models in mobile and edge computing applications, expanding the reach of advanced computer vision capabilities.
Papers
December 23, 2024
April 18, 2024
April 9, 2024
February 6, 2024
January 22, 2024
July 18, 2023
March 31, 2023