Layer Normalization

Layer normalization (LN) is a technique used in deep neural networks to stabilize training and improve performance by normalizing the activations of neurons within a layer. Current research focuses on understanding LN's geometric properties, its interaction with other normalization methods (like RMSNorm and Batch Normalization), and its impact on model stability and efficiency, particularly within transformer architectures and various applications such as natural language processing and image generation. These investigations aim to optimize LN's implementation, potentially leading to more efficient and robust deep learning models across diverse domains.

Papers

August 2, 2022

Unified Normalization for Accelerating and Stabilizing Transformers
Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, Shiliang Pu
Transformer Megatron Decepticons Batch Normalization Layer Normalization Divisive Normalization Unified Normalization

July 10, 2022

Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth Prediction
Rizki Ramadhan Fitra, Novanto Yudistira, Wayan Firdaus Mahmudy
Covid 19 Layer Normalization Sars Cov 2 Transformer Based Deep COVID 19 Case Coronavirus Disease

June 20, 2022

Revisiting lp-constrained Softmax Loss: A Comprehensive Study
Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis
Real World Neural Network Architecture Comprehensive Study Normalization Dictionary Layer Normalization Softmax Loss

June 14, 2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu, Zhiyuan Li, Sanjeev Arora
Batch Normalization Normalization Dictionary Layer Normalization Normalization Layer Generalization Benefit Sharpness Reduction

June 1, 2022

B2T Connection: Serving Stability and Performance in Deep Transformers
Sho Takase, Shun Kiyono, Sosuke Kobayashi, Jun Suzuki
System Performance Back Propagation Core Stability Layer Normalization Deep Transformer Shallow Transformer

April 28, 2022

A Probabilistic Interpretation of Transformers
Alexander Shim
Contrastive Learning Transformer Megatron Decepticons Layer Normalization Probabilistic Interpretation Hopfield Model

April 9, 2022

FoundationLayerNorm: Scaling BERT and GPT to 1,000 Layers
Dezhou Shen
Multi Layer Ticket BERT GPT Neo Efficient Training Layer Normalization

February 17, 2022

Revisiting Over-smoothing in BERT from the Perspective of Graph
Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James T. Kwok
Graph Drawing Transformer Based Model Ticket BERT Visual Perspective Layer Normalization Undisciplined Over Smoothing Smoothing Problem

December 5, 2021

Dynamic Token Normalization Improves Vision Transformers
Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo
Vision Transformer Supervised ImageNet Layer Normalization Various Vision Transformer