Visual Transformer
Visual Transformers (ViTs) adapt the transformer architecture, known for its success in natural language processing, to image and video analysis. Current research focuses on improving ViT efficiency (e.g., through dynamic compression and lightweight architectures), enhancing feature extraction (e.g., by incorporating frequency domain information and structure-aware modules), and applying ViTs to diverse tasks including medical image analysis, 3D reconstruction, and object detection. This approach offers the potential for improved accuracy and efficiency in various computer vision applications, particularly where global context is crucial, while also addressing challenges related to computational cost and data privacy.
Papers
January 6, 2024
December 28, 2023
October 17, 2023
September 28, 2023
June 3, 2023
April 1, 2023
November 4, 2022
October 25, 2022
October 16, 2022
September 27, 2022
September 7, 2022
July 27, 2022
July 21, 2022
July 1, 2022
June 2, 2022
June 1, 2022
May 25, 2022
March 24, 2022