Transformer Layer

Transformer layers are the fundamental building blocks of large language models and other deep learning architectures, with research focusing on improving their efficiency, interpretability, and performance. Current efforts explore architectural modifications like incorporating convolutional layers, employing low-rank approximations and structured pruning for compression, and developing novel training objectives and regularization techniques to enhance model accuracy and reduce computational costs. Understanding the internal workings of these layers, including information flow and the role of individual components (e.g., attention heads, feed-forward networks), is crucial for advancing both theoretical understanding and practical applications of transformer-based models across diverse domains.

Papers

August 17, 2023

PMET: Precise Model Editing in a Transformer
Xiaopeng Li, Shasha Li, Shezheng Song, Jing Yang, Jun Ma, Jie Yu
Neural Network Transformer Based Model Editing Transformer Layer Key Value Memory

May 2, 2023

Exploring vision transformer layer choosing for semantic segmentation
Fangjian Lin, Yizhe Ma, Shengwei Tian
Vision Transformer Semantic Segmentation Transformer Layer Plain Vision Transformer Adaptive Fusion Layer Feature Adaptive Space Fusion

April 27, 2023

Vision Conformer: Incorporating Convolutions into Vision Transformer Layers
Brian Kenji Iwana, Akihiro Kusuda
Convolutional Neural Network Vision Transformer Direct Convolution Convolutional Layer Recognition Task Transformer Layer One Pas Multiple Conformer

February 1, 2023

Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui
Transformer Megatron Decepticons Camera Lens Attention Map Causal Language Transformer Layer Input Context Feed Forward

December 12, 2022

A Neural ODE Interpretation of Transformer Layers
Yaofeng Desmond Zhong, Tongtao Zhang, Amit Chakraborty, Biswadip Dey
Multi Layer Neural ODE Transformer Network Multi Head Attention Transformer Layer

December 7, 2022

Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data
Matthias Zeller, Jens Behley, Michael Heidingsfeld, Cyrill Stachniss
Semantic Segmentation Noisy Data Radar Point Cloud Transformer Layer Adaptive Transformer LiDAR Perception

October 26, 2022

Can Transformer Attention Spread Give Insights Into Uncertainty of Detected and Tracked Objects?
Felicia Ruppel, Florian Faion, Claudius Gläser, Klaus Dietmayer
High Uncertainty Anticipation DCU Insight AQ Attention Weight Attention Matrix Transformer Layer Transformer Attention Detection of Uncertainty in Exceedance

October 7, 2022

Breaking BERT: Evaluating and Optimizing Sparsified Attention
Siddhartha Brahma, Polina Zablotskaia, David Mimno
Human Attention Many Sparse Sparsity Increase Transformer Layer Code BERT Sparsity Tradeoff

August 27, 2022

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu
Transformer Megatron Decepticons Speaker Diarization Transformer Layer End to End Neural Diarization Target Speaker Voice Activity Detection Cross Speaker

July 29, 2022

GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation
Jian Yang, Yuwei Yin, Liqun Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Furu Wei, Zhoujun Li
Neural Machine Translation Translation Task Group Setting Transformer Layer Layer Representation Group Transformer

May 21, 2022

A Study on Transformer Configuration and Training Objective
Fuzhao Xue, Jianghai Chen, Aixin Sun, Xiaozhe Ren, Zangwei Zheng, Xiaoxin He, Yongming Chen, Xin Jiang, Yang You
Supervised Autoencoder Study Feature Deep Transformer Autoencoder Model Transformer Layer Deeper Model Transformer Performance Training Objective

April 30, 2022

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, Hung-yi Lee
Fine Tuning NLP Task Token Level Adapter Module Transformer Layer Adapter Learning Transformer Based Pre Trained Model Representation Shift Hadamard Adapter

March 25, 2022

Vision Transformer Compression with Structured Pruning and Low Rank Approximation
Ankur Kumar
Vision Transformer Transformer Architecture Structured Pruning Compression Technique Recognition Task Low Rank Approximation Transformer Layer Vision Transformer Compression

March 15, 2022

Efficient Long Sequence Encoding via Synchronization
Xiangyang Mou, Mo Yu, Bingsheng Yao, Lifu Huang
Pre Trained Transformer Input Sequence Synchronization Parameter Update Barrier Anchor Graph Transformer Layer Hierarchical Transformer Encoder

January 26, 2022

Joint Liver and Hepatic Lesion Segmentation in MRI using a Hybrid CNN with Transformer Layers
Georg Hille, Shubham Agrawal, Pavan Tummala, Christian Wybranski, Maciej Pech, Alexey Surov, Sylvia Saalfeld
Lesion Segmentation Transformer Layer Hybrid Convolutional Liver Tumor Segmentation Fetal Liver LIver Lesion

January 15, 2022

Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara, Luca Soldaini, Eric Lind, Alessandro Moschitti
Application Proficiency High Efficiency Question Answering System Transformer Layer Large Transformer Model Model Distillation Large Scale Transformer Ranking Consistency Transformer Ensemble Heterogeneous Transformer