Non Hierarchical Vision Transformer
Non-hierarchical Vision Transformers (ViTs) represent a simplified approach to computer vision, aiming to achieve high performance with less complex architectures than traditional hierarchical models. Current research focuses on adapting these plain ViTs for various tasks, including semantic segmentation, object detection, and pose estimation, often employing minimal modifications like simple feature pyramids or lightweight decoders. This streamlined approach offers advantages in efficiency and transferability, potentially leading to faster and more adaptable vision systems for diverse applications, as demonstrated by their competitive performance in several benchmark datasets.
Papers
April 15, 2024
October 19, 2023
July 11, 2023
December 13, 2022
December 7, 2022
October 20, 2022
April 26, 2022