Token Mixer
Token mixers are the core components within vision transformers and related architectures responsible for aggregating information across image features (tokens). Current research focuses on developing efficient and effective token mixers, exploring alternatives to computationally expensive self-attention mechanisms, such as convolutional operations, MLPs, and frequency-based methods, with architectures like MetaFormer providing a general framework for evaluating various mixer designs. These advancements aim to improve the speed and efficiency of vision models while maintaining or improving accuracy, impacting both resource-constrained applications (e.g., mobile devices) and large-scale image processing tasks.
Papers
November 28, 2024
August 7, 2024
June 4, 2024
May 2, 2024
March 22, 2024
March 4, 2024
October 30, 2023
August 22, 2023
July 26, 2023
April 12, 2023
March 7, 2023
February 22, 2023
October 24, 2022
October 14, 2022
July 18, 2022
March 11, 2022