Attention Layer

Attention layers are fundamental components of neural networks, particularly transformers, designed to selectively focus on relevant information within input data. Current research emphasizes improving attention's efficiency and theoretical understanding, exploring variations like sparse, hyperbolic, and grouped query attention within models such as transformers, and investigating the interplay between attention and other layers (e.g., convolutional, MLP). This work is crucial for advancing the capabilities of large language models and other deep learning architectures, impacting diverse applications from image generation and compression to natural language processing and even seismic analysis.

Papers

September 2, 2024

Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
Sungmin Yun, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim, Byeongho Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn
Large Language Model Mixture Component Mixture of Expert Attention Layer Query Attention Processing in Memory Flexible Duplex Automatic Batching

August 27, 2024

MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
Hyunwoo Kim, Itai Lang, Noam Aigerman, Thibault Groueix, Vladimir G. Kim, Rana Hanocka
Attention Layer 3D Mesh Score Distillation Deformation Field Mesh Ratio Mesh Deformation Target Concept

August 14, 2024

BiLSTM and Attention-Based Modulation Classification of Realistic Wireless Signals
Rohit Udaiwal, Nayan Baishya, Yash Gupta, B. R. Manoj
Attention Layer Wireless Signal Novel RF Editing Pipeline Modulation Classification BiLSTM CNN CRF Quad Attention

August 7, 2024

Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression
Hamidreza Soltani, Erfan Ghasemi
Convolutional Neural Network Image Compression Attention Layer Learned Image Compression Spatial Context

August 4, 2024

Cross-layer Attention Sharing for Large Language Models
Yongyu Mu, Yuzhang Wu, Yuchun Fan, Chenglong Wang, Hengyu Li, Qiaozhi He, Murun Yang, Tong Xiao, Jingbo Zhu
Large Language Model Attention Layer Information Redundancy Attention Weight Attention Pattern Cross Layer Attention

July 31, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji
Multimodal Large Language Model Attention Layer Visual Prompt Visual Token

July 23, 2024

On the Benefits of Rank in Attention Layers
Noah Amsel, Gilad Yehudai, Joan Bruna
Attention Mechanism Attention Layer Complementary Benefit Stable Rank Attention Head Attention Matrix Low Rank Attention

July 22, 2024

Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
Georgy Tyukin, Gbetondji J-S Dovonon, Jean Kaddour, Pasquale Minervini
Scientific Inference Human Attention Attention Layer Low Latency LLM Benchmark Tuned Llama Model

July 15, 2024

InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models
Nirat Saini, Navaneeth Bodla, Ashish Shrivastava, Avinash Ravichandran, Xiao Zhang, Abhinav Shrivastava, Bharat Singh
Latent Diffusion Model Gameplay Video Attention Layer Object Insertion Blending Method One Shot Video Tuning

July 8, 2024

On the Power of Convolution Augmented Transformer
Mingchen Li, Xuechen Zhang, Yixiao Huang, Samet Oymak
Transformer Architecture Real Power Attention Layer Convolution Augmented Transformer

July 7, 2024

How Effective are State Space Models for Machine Translation?
Hugo Pitorro, Pavlo Vasylenko, Marcos Treviso, André F. T. Martins
Machine Translation State Space Model Attention Layer Linear Recurrent Paragraph Level

July 5, 2024

Looking into Black Box Code Language Models
Muhammad Umair Haider, Umar Farooq, A. B. Siddique, Mark Marron
Language Model Attention Layer Code Language Model Code LLM Feed Forward Layer Black Box Language Model

June 25, 2024

June 24, 2024

The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers
Abhi Kamboj
Deep Learning Computer Vision Transformer Megatron Decepticons Vision Paper Multi Object Tracking Attention Layer Much Progress Transformer Neural Network Architecture

June 22, 2024

What Matters in Transformers? Not All Attention is Needed
Shwai He, Guoheng Sun, Zheyu Shen, Ang Li
Large Language Model Transformer Megatron Decepticons Human Attention High Similarity Attention Layer Feature Redundancy MLP Layer

June 21, 2024

Generating Music with Structure Using Self-Similarity as Attention
Sophia Hager, Kathleen Hablutzel, Katherine M. Kinnaird
Human Attention Inner Structure Attention Layer Music Generation Self Similar Neural Generation

June 19, 2024

June 17, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang
Transformer Megatron Decepticons Transformer Architecture Attention Layer Bridging Text Dependency Analysis