Attention Layer

June 13, 2024

Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis
Weiyi Wu, Chongyang Gao, Xinwen Xu, Siting Li, Jiang Gui
Attention Mechanism Attention Layer Whole Slide Image Pyramid Attention

June 5, 2024

Pre-trained Large Language Models Use Fourier Features to Compute Addition
Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia
Large Language Model Attention Layer Token Level Pre Trained Representation Fourier Feature Different Calculation Method
Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies
Changye Li, Zhecheng Sheng, Trevor Cohen, Serguei Pakhomov
Attention Mechanism Attention Layer Larger Language Model Dementia Related Linguistic Anomaly Automatic Induction Semantic Anomaly

May 26, 2024

Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
Itamar Zimerman, Ameen Ali, Lior Wolf
Attention Layer Traditional RNNs Attention Matrix Causal Attention Gated Linear

May 24, 2024

Infinite Limits of Multi-head Transformer Dynamics
Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan
Transformer Model Attention Layer Training Dynamic Head Transformer Scaling Limit Dynamical Mean Field Theory

May 7, 2024

April 17, 2024

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Kuan-Chieh Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman
Text to Image Diffusion Model Attention Layer Personalized Image Generation Mixture of Attention

April 15, 2024

High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layers
Luiz Schirmer, Guilherme Schardong, Vinícius da Silva, Rogério Santos, Hélio Lopes
Convolutional Neural Network Attention Layer Amplitude Estimation Unknown Heterogeneity Structural Heterogeneity

April 8, 2024

MLP Can Be A Good Transformer Learner
Sihao Lin, Pumeng Lyu, Dongrui Liu, Tao Tang, Xiaodan Liang, Andy Song, Xiaojun Chang
Vision Transformer Attention Layer Single Scene Specific MLP Teacher Student Transformer

April 2, 2024

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Xingwu Chen, Difan Zou
Transformer Based Case Study Transformer Architecture Attention Layer Large Depth

March 26, 2024

AID: Attention Interpolation of Text-to-Image Diffusion
Qiyuan He, Jinghao Wang, Ziwei Liu, Angela Yao
Conditional Diffusion Model Attention Layer Conditional Generative Text to Image Diffusion AID User Image Interpolation Attention Based Interpolation Model

March 25, 2024

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
Omer Dahary, Or Patashnik, Kfir Aberman, Daniel Cohen-Or
Text to Image Diffusion Model Attention Layer Semantic Leakage

March 21, 2024

Application of Tensorized Neural Networks for Cloud Classification
Alifu Xiafukaiti, Devanshu Garg, Aruto Hosaka, Koichi Yanagisawa, Yuichiro Minato, Tsuyoshi Yoshida
Convolutional Neural Network Application Proficiency Attention Layer Contrastive Self Supervised Learning Cloud Classification Dense Layer Tensorized Neural Network

March 5, 2024

March 4, 2024

NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function
Abdullah Nazhat Abdullah, Tarkan Aydin
Vision Transformer Attention Mechanism Transformer Architecture Network Programming Attention Layer Gating Mechanism Based Method Token Mixer

March 3, 2024

You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism
Mehran Hosseini, Peyman Hosseini
Attention Mechanism Attention Layer Level Mathematics Attention Operation Efficient Attention Dot Product Attention

February 28, 2024

Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami, Ali Ghodsi
Attention Layer Extension Study Sequence Modeling Scalable Neural Global Convolution

February 9, 2024

Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings
Yichen Jiang, Xiang Zhou, Mohit Bansal
Transformer Megatron Decepticons Jina Embeddings Compositional Generalization Attention Layer Q Transformer

Papers

Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

Pre-trained Large Language Models Use Fourier Features to Compute Addition

Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies

Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation

Infinite Limits of Multi-head Transformer Dynamics

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layers

MLP Can Be A Good Transformer Learner

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

AID: Attention Interpolation of Text-to-Image Diffusion

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Application of Tensorized Neural Networks for Cloud Classification

ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures

Quantum Mixed-State Self-Attention Network

NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function

You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism

Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling

Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings