Attention Layer

Attention layers are fundamental components of neural networks, particularly transformers, designed to selectively focus on relevant information within input data. Current research emphasizes improving attention's efficiency and theoretical understanding, exploring variations like sparse, hyperbolic, and grouped query attention within models such as transformers, and investigating the interplay between attention and other layers (e.g., convolutional, MLP). This work is crucial for advancing the capabilities of large language models and other deep learning architectures, impacting diverse applications from image generation and compression to natural language processing and even seismic analysis.

Papers

February 28, 2023

February 21, 2023

Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré
Language Model Deep Learning Attention Layer Attention Based Model Dominance Hierarchy Attention Operator Long Convolution

February 14, 2023

Energy Transformer
Benjamin Hoover, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt, Duen Horng Chau, Mohammed J. Zaki, Dmitry Krotov
Transformer Based Attention Mechanism Attention Layer Energy Based Model Associative Memory

February 10, 2023

Element-Wise Attention Layers: an option for optimization
Giovanni Araujo Bacochina, Rodrigo Clemente Thom de Souza
Optimization Purpose Attention Mechanism Attention Layer Continuous Option Dot Product Attention Fashion MNIST

February 8, 2023

Cross-Layer Retrospective Retrieving via Layer Attention
Yanwen Fang, Yuxi Cai, Jintai Chen, Jingyu Zhao, Guangjian Tian, Guodong Li
Vision Transformer Attention Layer Multi Head Attention Cross Layer Cross Layer Attention

January 22, 2023

Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
Transformer Megatron Decepticons Latent Space Inherent Interpretability Comprehensive Survey Attention Layer Self Attention Layer Activation Space Feed Forward Layer

January 20, 2023

Holistically Explainable Vision Transformers
Moritz Böhle, Mario Fritz, Bernt Schiele
Vision Transformer Supervised ImageNet Attention Layer Attention Module B Co

January 17, 2023

SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation
Junjie Zhou, Yongping Xiong, Chinwai Chiu, Fangyu Liu, Xiangyang Gong
Vision Transformer Attention Layer Point Cloud Semantic Segmentation Granularity Attention Size Generalization

December 20, 2022

Pretraining Without Attention
Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush
Human Attention Attention Layer Sequence Modeling Large Scale Pretraining Deep SSM Syntactic Representation

November 11, 2022

Palm Vein Recognition via Multi-task Loss Function and Attention Layer
Jiashu Lou, Jie zou, Baohua Wang
Attention Layer Palm Vein Palmprint Recognition Multi Task Loss

October 23, 2022

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks
Ankur Sikarwar, Arkil Patel, Navin Goyal
Transformer Based Model DCU Insight AQ Compositional Generalization Attention Layer Depth Network Form Like Document Structured Input Compositional Generalization Benchmark Foundation Transformer

October 16, 2022

Modeling Context With Linear Attention for Scalable Document-Level Translation
Zhaofeng Wu, Hao Peng, Nikolaos Pappas, Noah A. Smith
Attention Layer Linear Attention Document Level Professional Sign Language Interpreter Scalable Attention Linear Attention Model

October 4, 2022

Accurate Image Restoration with Attention Retractable Transformer
Jiale Zhang, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, Xin Yuan
Image Restoration Attention Layer Sparse Attention Attention Transformer

October 2, 2022

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Yuxuan Li, James L. McClelland
Transformer Megatron Decepticons Latent Representation Attention Layer Transformer Network Layer Transformer Task Specification Causal Transformer Systematic Generalization Emergent Structure

September 28, 2022

Deeply Supervised Layer Selective Attention Network: Towards Label-Efficient Learning for Medical Image Classification
Peng Jiang, Juan Liu, Lang Wang, Zhihui Ynag, Hongyu Dong, Jing Feng
Medical Image Medical Image Classification Attention Layer Label Efficient

September 13, 2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno
General Analysis Attention Mechanism Attention Layer Multi Head Attention Multi Head

August 18, 2022

Treeformer: Dense Gradient Trees for Efficient Attention Computation
Lovish Madaan, Srinadh Bhojanapalli, Himanshu Jain, Prateek Jain
Attention Layer Attention Computation Gradient Boosted Tree

August 9, 2022

Synthetic Aperture Radar Image Change Detection via Layer Attention-Based Noise-Tolerant Network
Desen Meng, Feng Gao, Junyu Dong, Qian Du, Heng-Chao Li
Convolutional Neural Network Multi Layer Synthetic Aperture Radar Attention Layer Convolution Layer Noise Tolerant Network

August 2, 2022

Robust RGB-D Fusion for Saliency Detection
Zongwei Wu, Shriarulmozhivarman Gobichettipalayam, Brahim Tamadazte, Guillaume Allibert, Danda Pani Paudel, Cédric Demonceaux
Attention Layer Spatial Attention Saliency Detection RGB D Fusion RGB D Saliency Detection