Transformer Performance

Transformer model performance is a central research area aiming to understand and improve the efficiency and accuracy of these powerful neural networks across diverse applications. Current research focuses on optimizing hyperparameters, exploring alternative attention mechanisms (e.g., modifying the key-query interaction), and developing more efficient architectures (e.g., reducing parameter counts through pruning or using simpler components like MLPs). These efforts are crucial for expanding the applicability of Transformers to resource-constrained environments and for gaining deeper insights into their internal workings, ultimately leading to more robust and interpretable AI systems.

Papers

April 30, 2022

Leveraging Emotion-specific Features to Improve Transformer Performance for Emotion Classification
Shaily Desai, Atharva Kshirsagar, Aditi Sidnerlikar, Nikhil Khodake, Manisha Marathe
Emotion Classification Transformer Performance Sequence Classification Emotion Representation Emotional Feature Multi Label Emotion

March 28, 2022

Enhancing Transformer Efficiency for Multivariate Time Series Classification
Yuqing Wang, Yun Zhao, Linda Petzold
Multivariate Time Series Classification Algorithm Prediction Accuracy Efficient Model Multivariate Time Series Classification Transformer Performance Progressive Pruning

February 21, 2022

Transformer Quality in Linear Time
Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le
Transformer Megatron Decepticons Long Sequence Attention Head Transformer Performance Linear Time

February 13, 2022

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi, Yair Carmon, Jonathan Berant
NLP Task Large Model Hyperparameter Tuning Scaling Law Language Understanding Task Transformer Performance Smartphone Microscope

Transformer Performance

Papers

Leveraging Emotion-specific Features to Improve Transformer Performance for Emotion Classification

Enhancing Transformer Efficiency for Multivariate Time Series Classification

Transformer Quality in Linear Time

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments