Multi Head Attention

Multi-head attention is a mechanism within transformer networks that allows the model to attend to different aspects of input data simultaneously, improving performance on various tasks. Current research focuses on optimizing multi-head attention for efficiency, including exploring alternative architectures like grouped-query attention and methods to reduce computational complexity without sacrificing accuracy, such as pruning or low-precision approximations. These advancements are significant because they enable the application of transformer models to larger datasets and more complex problems across diverse fields, including image processing, audio classification, and natural language processing.

Papers

September 15, 2022

Hydra Attention: Efficient Attention with Many Heads
Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Judy Hoffman
Vision Transformer Multi Head Attention Attention Matrix Multi Head Efficient Attention Hydra MDP

September 13, 2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno
General Analysis Attention Mechanism Attention Layer Multi Head Attention Multi Head

August 16, 2022

Parallel Hierarchical Transformer with Attention Alignment for Abstractive Multi-Document Summarization
Ye Ma, Lu Zong
Hierarchical Transformer Multi Head Attention Multi Document Summarization Single Document Summarization Attention Alignment

July 31, 2022

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning
Mahdi Saleh, Yige Wang, Nassir Navab, Benjamin Busam, Federico Tombari
Global Attention Multi Head Attention Attention Head Cloud Detection Point Cloud Learning

July 27, 2022

Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks
Yooshin Cho, Youngsoo Kim, Hanbyel Cho, Jaesung Ahn, Hyeong Gwon Hong, Junmo Kim
Neural Network Attention Map Softmax Function Multi Head Attention Tiny ImageNet Feature Map

July 4, 2022

Solving the Traveling Salesperson Problem with Precedence Constraints by Deep Reinforcement Learning
Christian Löwens, Inaam Ashraf, Alexander Gembus, Genesis Cuizon, Jonas K. Falkner, Lars Schmidt-Thieme
Deep Reinforcement Learning Multi Head Attention Traveling Salesperson Problem Heterogeneous Attention Precedence Constraint

June 7, 2022

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
Timothee Mickus, Denis Paperno, Mathieu Constant
Transformer Based Transformer Architecture Inner Structure NLP Community Multi Head Attention Attention Weight Mixed Space Pre Trained Embeddings

May 23, 2022

Interpretable Feature Engineering for Time Series Predictors using Attention Networks
Tianjie Wang, Jie Chen, Joel Vaughan, Vijayan N. Nair
Attention Layer Attention Network Time Series Prediction Multi Head Attention Interpretable Feature

May 17, 2022

Multi-Head Attention Neural Network for Smartphone Invariant Indoor Localization
Saideep Tiku, Danish Gufran, Sudeep Pasricha
Indoor Localization Multi Head Attention

May 3, 2022

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention
Xinmeng Xu, Rongzhi Gu, Yuexian Zou
Speech Enhancement Multi Head Attention Channel Correlation Multi Head Cross Attention Cross Channel

March 23, 2022

Multi-label Transformer for Action Unit Detection
Gauthier Tallec, Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Multi Label Multi Head Attention Action Unit Detection Facial Muscle

March 17, 2022

Gaussian Multi-head Attention for Simultaneous Machine Translation
Shaolei Zhang, Yang Feng
Multi Head Attention Translation Model Simultaneous Machine Translation Alignment Regularization

March 8, 2022

Measuring the Mixing of Contextual Information in the Transformer
Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà
Transformer Based Transformer Architecture Contextual Information Multi Head Attention Attention Weight Mixing Process Interpretable Layer

January 14, 2022

Multi-head Temporal Attention-Augmented Bilinear Network for Financial time series prediction
Mostafa Shabani, Dat Thanh Tran, Martin Magris, Juho Kanniainen, Alexandros Iosifidis
Temporal Attention Multi Head Attention Financial Time Series Multi Head Bilinear Convolutional Neural Network Temporal Attention Module

December 27, 2021

MHATC: Autism Spectrum Disorder identification utilizing multi-head attention encoder along with temporal consolidation modules
Ranjeet Ranjan Jha, Abhishek Bhardwaj, Devin Garg, Arnav Bhavsar, Aditya Nigam
Autism Spectrum Disorder Multi Head Attention Memory Consolidation Resting State fMRI

December 20, 2021

Attention Based Communication and Control for Multi-UAV Path Planning
Hamid Shiri, Hyowoon Seo, Jihong Park, Mehdi Bennis
External Control Collision Avoidance Various Fast Moving Drone Multi Head Attention Multi UAV

December 2, 2021

Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data
Yifei Huang, Xiaoxiao Li, Lijin Yang, Lin Gu, Yingying Zhu, Hirofumi Seo, Qiuming Meng, Tatsuya Harada, Yoichi Sato
Training Data Medical Image Analysis Multi Head Attention Attention Block Human Gaze

December 1, 2021

A Daily Tourism Demand Prediction Framework Based on Multi-head Attention CNN: The Case of The Foreign Entrant in South Korea
Dong-Keon Kim, Sung Kuk Shyn, Donghee Kim, Seungwoo Jang, Kwangsu Kim
Deep Learning Case Relevance Multi Head Attention Travel Demand Demand Forecasting Migration Related Hotel Demand

Multi Head Attention

Papers

Hydra Attention: Efficient Attention with Many Heads

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

Parallel Hierarchical Transformer with Attention Alignment for Abstractive Multi-Document Summarization

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning

Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks

Solving the Traveling Salesperson Problem with Precedence Constraints by Deep Reinforcement Learning

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

Interpretable Feature Engineering for Time Series Predictors using Attention Networks

Multi-Head Attention Neural Network for Smartphone Invariant Indoor Localization

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

Multi-label Transformer for Action Unit Detection

Gaussian Multi-head Attention for Simultaneous Machine Translation

Measuring the Mixing of Contextual Information in the Transformer

Multi-head Temporal Attention-Augmented Bilinear Network for Financial time series prediction

MHATC: Autism Spectrum Disorder identification utilizing multi-head attention encoder along with temporal consolidation modules

Attention Based Communication and Control for Multi-UAV Path Planning

Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data

A Daily Tourism Demand Prediction Framework Based on Multi-head Attention CNN: The Case of The Foreign Entrant in South Korea