Paper ID: 2207.11329

Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022

Maria Escobar, Laura Daza, Cristina González, Jordi Pont-Tuset, Pablo Arbeláez

We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.

Submitted: Jul 22, 2022

Topics

Swin Transformer
New Task
Value Laden Choice
Temporal Localization
Object State
Ego4D AudioVisual
Egocentric Video Understanding

Links

arXiv PDF