Cross Attention

Cross-attention is a mechanism that allows neural networks to relate information from different parts of an input, such as relating words in a sentence to pixels in an image, or aligning audio and video streams. Current research focuses on improving the efficiency and effectiveness of cross-attention in various applications, including image generation, video processing, and multimodal learning, often employing transformer architectures or state-space models like Mamba. This attention mechanism is proving crucial for enhancing performance in tasks requiring the integration of diverse data sources, leading to improvements in areas such as scene change detection, style transfer, and multimodal emotion recognition. The resulting advancements have significant implications for various fields, including computer vision, natural language processing, and healthcare.

Papers

October 19, 2023

Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps
Sidi Wu, Yizi Chen, Konrad Schindler, Lorenz Hurni
Semantic Segmentation Cross Attention Temporal Information Temporal Feature Aleatoric Uncertainty Historical Map

October 17, 2023

Image Compression using only Attention based Neural Networks
Natacha Luka, Romain Negrel, David Picard
Neural Network Human Attention Image Compression Cross Attention Attention Layer Learned Image Compression Convolution Free

October 15, 2023

October 10, 2023

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Cross Attention Visual Feature Audio Description

October 9, 2023

Robust Image Watermarking based on Cross-Attention and Invariant Domain Learning
Agnibh Dasgupta, Xin Zhong
Cross Attention Watermarking Method Domain Invariant Multi Head Cross Attention

October 5, 2023

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet
Cross Attention Encoder Decoder Model Decoder Only Transformer Encoder Decoder Transformer Interpretable Layer

October 2, 2023

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Yiming Xie, Huaizu Jiang, Georgia Gkioxari, Julian Straub
Cross Attention Multi View 3D Pixel Level Alignment

September 28, 2023

September 15, 2023

M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection
Yao Yuan, Pan Gao, XiaoYang Tan
Cross Attention Attention Network SALient Object Detection Salient Object Detection Mixed Speech

August 11, 2023

Exploring Predicate Visual Context in Detecting Human-Object Interactions
Frederic Z. Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong, Stephen Gould
Cross Attention Human Object Interaction Visual Context Object Feature Lex Leader Symmetry Breaking Predicate DETR Model

August 3, 2023

DETR Doesn't Need Multi-Scale or Locality Design
Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu
Fine Grained Multi Scale Cross Attention Cross Attention Mechanism Locality Sensitive Group DETR V2 DETR Based Detector

July 24, 2023

Automatic lobe segmentation using attentive cross entropy and end-to-end fissure generation
Qi Su, Na Wang, Jiawen Xie, Yinan Chen, Xiaofan Zhang
Cross Attention Pulmonary Fissure

July 20, 2023

Divide & Bind Your Attention for Improved Generative Semantic Nursing
Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva
Generative Model Human Attention Complex Prompt Cross Attention Data Divide

July 18, 2023

ECSIC: Epipolar Cross Attention for Stereo Image Compression
Matthias Wödlinger, Jan Kotera, Manuel Keglevic, Jan Xu, Robert Sablatnig
Cross Attention Stereo Image Stereo Image Compression Stereo Attention

July 17, 2023

Box-DETR: Understanding and Boxing Conditional Spatial Queries
Wenze Liu, Hao Lu, Yuliang Liu, Zhiguo Cao
Human Understanding Cross Attention Detection Transformer Multidimensional Projection Conditional Attention

June 27, 2023

Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data
Kai Chieh Chang, Mark Hasegawa-Johnson, Nancy L. McElwain, Bashima Islam
Classification Code Cross Attention Inertial Measurement Unit Large Pre Trained Transformer Wake Model

June 25, 2023

Cross Attention

Papers

Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps

Image Compression using only Attention based Neural Networks

OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer

MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Robust Image Watermarking based on Cross-Attention and Invariant Domain Learning

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection

Gated Cross-Attention Network for Depth Completion

Self-supervised Cross-view Representation Reconstruction for Change Captioning

M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection

Exploring Predicate Visual Context in Detecting Human-Object Interactions

DETR Doesn't Need Multi-Scale or Locality Design

Automatic lobe segmentation using attentive cross entropy and end-to-end fissure generation

Divide & Bind Your Attention for Improved Generative Semantic Nursing

ECSIC: Epipolar Cross Attention for Stereo Image Compression

Box-DETR: Understanding and Boxing Conditional Spatial Queries

Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data

Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck

AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction