Standard Transformer

Standard Transformers are a fundamental deep learning architecture achieving state-of-the-art results across diverse tasks, from natural language processing to computer vision and even 3D point cloud analysis. Current research focuses on improving efficiency through architectural simplifications, exploring alternative attention mechanisms (e.g., ReLU and addition-based), and leveraging pre-training strategies like masked autoencoders, particularly for applications with limited data. These advancements aim to enhance both the speed and performance of Transformers, broadening their applicability in resource-constrained environments and specialized domains while also highlighting the importance of data preprocessing techniques.

Papers

June 20, 2024

On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier
Jiachen Jiang, Jinxin Zhou, Zhihui Zhu
Application Proficiency Transformer Model High Similarity Internal Representation Multi Exit Kernel Alignment Standard Transformer Layer Similarity

March 24, 2024

L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression prediction
Rachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho, Yihao Li, Alireza Rezaei, Hugo Le Boité, Ramin Tadayoni, Pascal Massin, Béatrice Cochener, Ikram Brahim, Gwenolé Quellec, Mathieu Lamard
Self Supervised Learning Time Matter Auto Encoder Masking Strategy Longitudinal Analysis Longitudinal Study Standard Transformer

November 3, 2023

Simplifying Transformer Blocks
Bobby He, Thomas Hofmann
Deep Transformer Transformer Block Simple Transformer Standard Transformer

October 3, 2023

The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers
Rickard Brännvall
Efficient Transformer ReLU Layer Dot Product Attention Additive Attention Standard Transformer

August 25, 2022

Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding
Guocheng Qian, Abdullah Hamdi, Xingdi Zhang, Bernard Ghanem
Vision Transformer Point Cloud Classification Point Cloud Understanding Point Cloud Transformer Patch Embeddings Standard Transformer

January 22, 2022

glassoformer: a query-sparse transformer for post-fault power grid voltage prediction
Yunling Zheng, Carson Hu, Guang Lin, Meng Yue, Bao Wang, Jack Xin
Efficient Transformer Sparse Transformer Group Lasso Standard Transformer Voltage Prediction

November 20, 2021

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21
Lintang Sutawika, Jan Christian Blaise Cruz
Artificial Intelligence Translation Task Machine Translation System Data Processing Seq2Seq Transformer Standard Transformer

Standard Transformer

Papers

On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier

L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression prediction

Simplifying Transformer Blocks

The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers

Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding

glassoformer: a query-sparse transformer for post-fault power grid voltage prediction

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21