Cross Attention
Cross-attention is a mechanism that allows neural networks to relate information from different parts of an input, such as relating words in a sentence to pixels in an image, or aligning audio and video streams. Current research focuses on improving the efficiency and effectiveness of cross-attention in various applications, including image generation, video processing, and multimodal learning, often employing transformer architectures or state-space models like Mamba. This attention mechanism is proving crucial for enhancing performance in tasks requiring the integration of diverse data sources, leading to improvements in areas such as scene change detection, style transfer, and multimodal emotion recognition. The resulting advancements have significant implications for various fields, including computer vision, natural language processing, and healthcare.
Papers
Object-level Visual Prompts for Compositional Image Generation
Gaurav Parmar, Or Patashnik, Kuan-Chieh Wang, Daniil Ostashev, Srinivasa Narasimhan, Jun-Yan Zhu, Daniel Cohen-Or, Kfir Aberman
nnY-Net: Swin-NeXt with Cross-Attention for 3D Medical Images Segmentation
Haixu Liu, Zerui Tao, Wenzhen Dong, Qiuzhuang Sun
Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction
Xuan Yu, Yuxuan Xie, Yili Liu, Haojian Lu, Rong Xiong, Yiyi Liao, Yue Wang
Geographical Information Alignment Boosts Traffic Analysis via Transpose Cross-attention
Xiangyu Jiang, Xiwen Chen, Hao Wang, Abolfazl Razi
CADMR: Cross-Attention and Disentangled Learning for Multimodal Recommender Systems
Yasser Khalafaoui (Alteca), Martino Lovisetto (Alteca), Basarab Matei, Nistor Grozavu (CY)
Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models
Jungwon Park, Jungmin Ko, Dongnam Byun, Jangwon Suh, Wonjong Rhee