Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights
Ibrahim Fayad, Philippe Ciais, Martin Schwartz, Jean-Pierre Wigneron, Nicolas Baghdadi, Aurélien de Truchis, Alexandre d'Aspremont, Frederic Frappart, Sassan Saatchi, Agnes Pellissier-Tanon, Hassan Bazzi
Self-supervised Learning by View Synthesis
Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia
Fibroglandular Tissue Segmentation in Breast MRI using Vision Transformers -- A multi-institutional evaluation
Gustav Müller-Franzes, Fritz Müller-Franzes, Luisa Huck, Vanessa Raaff, Eva Kemmer, Firas Khader, Soroosh Tayebi Arasteh, Teresa Nolte, Jakob Nikolas Kather, Sven Nebelung, Christiane Kuhl, Daniel Truhn
AutoTaskFormer: Searching Vision Transformers for Multi-task Learning
Yang Liu, Shen Yan, Yuge Zhang, Kan Ren, Quanlu Zhang, Zebin Ren, Deng Cai, Mi Zhang
Transformer with Selective Shuffled Position Embedding and Key-Patch Exchange Strategy for Early Detection of Knee Osteoarthritis
Zhe Wang, Aladine Chetouani, Mohamed Jarraya, Didier Hans, Rachid Jennane
ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
Jeeseung Park, Jin-Woo Park, Jong-Seok Lee