Bilateral Local Attention Vision Transformer

Bilateral Local Attention Vision Transformers (BLATs) aim to improve the efficiency and effectiveness of Vision Transformers (ViTs) by strategically limiting the scope of attention mechanisms. Current research focuses on developing architectures that combine local attention within both image space (e.g., using sliding windows) and feature space (e.g., clustering similar features), thereby capturing both short-range and long-range dependencies more efficiently than global attention. This approach leads to improved performance on various computer vision tasks, such as video frame interpolation, object segmentation, and moment retrieval, while reducing computational costs associated with large image inputs. The resulting models offer a compelling alternative to traditional convolutional neural networks and are impacting the development of more efficient and powerful vision systems.

Papers

January 4, 2024

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
Hao Sun, Mingyao Zhou, Wenjing Chen, Wei Xie
Video Moment Retrieval Group DETR V2 Multi Modal Alignment Highlight Detection Bilateral Local Attention Vision Transformer

April 5, 2023

BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
Junheum Park, Jintae Kim, Chang-Su Kim
Video Frame Interpolation Bilateral Motion Bilateral Local Attention Vision Transformer

August 1, 2022

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation
Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, Mei Chen
Optical Flow Video Object Segmentation Motion Appearance Neighboring Space Bilateral Local Attention Vision Transformer

January 31, 2022

BOAT: Bilateral Local Attention Vision Transformer
Tan Yu, Gangming Zhao, Ping Li, Yizhou Yu
Vision Transformer Global Attention Bilateral Local Attention Vision Transformer

Bilateral Local Attention Vision Transformer

Papers

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

BOAT: Bilateral Local Attention Vision Transformer