Pre Trained ViT
Pre-trained Vision Transformers (ViTs) are large, powerful models trained on massive datasets to learn general visual representations, which are then fine-tuned for specific downstream tasks like image classification, segmentation, and object tracking. Current research focuses on improving efficiency (e.g., through lightweight adapters and dynamic tuning), enhancing robustness to adversarial attacks, and exploring novel training methods such as masked image modeling and self-supervised learning. These advancements are significantly impacting computer vision, enabling more accurate and resource-efficient applications across diverse fields, including medical image analysis and mobile device deployment.
Papers
October 20, 2024
August 3, 2024
July 5, 2024
June 14, 2024
June 7, 2024
May 30, 2024
May 28, 2024
April 18, 2024
March 27, 2024
March 21, 2024
March 18, 2024
November 7, 2023
October 19, 2023
September 28, 2023
August 19, 2023
July 26, 2023
June 8, 2023
May 30, 2023