Video Domain
Video domain research focuses on developing robust and efficient methods for analyzing and manipulating video data, addressing challenges like object tracking, action recognition, and cross-domain adaptation. Current efforts concentrate on self-supervised learning techniques, leveraging masked autoencoders and contrastive learning, often incorporating transformer architectures and attention mechanisms for improved temporal modeling and cross-modal understanding (e.g., video-text). These advancements are crucial for applications ranging from video retrieval and editing to autonomous systems and healthcare, driving progress in both computer vision and artificial intelligence.
Papers
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun Reddy, William Paul, Corban Rivera, Ketul Shah, Celso M. de Melo, Rama Chellappa
DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language Grounding
Thong Nguyen, Xiaobao Wu, Xinshuai Dong, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan