Video Representation

Video representation research aims to create efficient and effective ways to encode and process video data for various applications. Current efforts focus on developing novel architectures, including implicit neural representations (INRs), transformers, and hybrid models combining convolutional neural networks (CNNs) and transformers, often incorporating self-supervised learning and leveraging multimodal information (e.g., audio, text). These advancements improve video compression, enhance downstream tasks like action recognition and video retrieval, and enable new capabilities such as video editing and generation. The resulting improvements in video understanding and manipulation have significant implications for fields ranging from surveillance and monitoring to entertainment and healthcare.

Papers

August 22, 2023

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition
Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng
LeArning Abstract Egocentric Video Video Representation Multi View Learning Semantic Alignment

August 15, 2023

August 14, 2023

On the Importance of Spatial Relations for Few-shot Action Recognition
Yilun Zhang, Yuqian Fu, Xingjun Ma, Lizhe Qi, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
Importance Aware Video Representation Video Recognition Spatial Relation Shot Action Recognition

August 7, 2023

Video-based Person Re-identification with Long Short-Term Representation Learning
Xuehu Liu, Pingping Zhang, Huchuan Lu
Representation Learning Long Short Term Memory Video Representation Long Span Video Based Person Re Identification

July 26, 2023

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou
Contrastive Learning Video Representation Game Theory Generalized Geodesic Video Grounding Ground a Video

July 24, 2023

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar
Audio Representation Video Representation Feature Alignment Text to Video Retrieval

June 15, 2023

Language-Guided Music Recommendation for Video via Prompt Analogies
Daniel McKee, Justin Salamon, Josef Sivic, Bryan Russell
Source Video Video Representation Music Recommendation Music Tagging YouTube Oriented Dataset

June 3, 2023

Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts
Pritam Sarkar, Ahmad Beirami, Ali Etemad
Self Supervised Learning Distribution Shift Video Representation Self Supervised Video State of the Art Self Hidden Dynamic Natural Distribution Shift

May 23, 2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan
Visual Analogue Scale Video Representation Visual Representation Learning Spatiotemporal Representation Robust Pre Tunable Deep

May 13, 2023

Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Han Fang, Zhifei Yang, Xianghao Zang, Chao Ban, Hao Sun
Video Representation Video Text Retrieval Mask Frozen DETR Attention Masking Semantic Completion

May 10, 2023

Self-Supervised Video Representation Learning via Latent Time Navigation
Di Yang, Yaohui Wang, Quan Kong, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond
Video Representation Latent Code Self Supervised Video Representation Temporal Latent

April 28, 2023

Improve Video Representation with Temporal Adversarial Augmentation
Jinhao Duan, Quanfu Fan, Hao Cheng, Xiaoshuang Shi, Kaidi Xu
Video Representation Adversarial Augmentation Time Augmentation

April 15, 2023

LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision
Jiani Huang, Ziyang Li, Mayur Naik, Ser-Nam Lim
Weak Supervision Neuro Symbolic Video Representation Fine Grained Video Fine Grained Video Representation Spatio Temporal Scene Graph Layer Selective Rank Reduction

April 6, 2023

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Zero Shot Text Modality Source Video Video Representation Visual Prompt Zero Shot Action Recognition Multimodal Prompting

April 5, 2023

HNeRV: A Hybrid Neural Representation for Videos
Hao Chen, Matt Gwilliam, Ser-Nam Lim, Abhinav Shrivastava
Gameplay Video Video Representation Implicit Representation Implicit Neural Hybrid Representation Deep Encoder Neural Representation for Video

April 3, 2023

Disorder-invariant Implicit Neural Representation
Hao Zhu, Shaowen Xie, Zhen Liu, Fengyi Liu, Qi Zhang, You Zhou, Yi Lin, Zhan Ma, Xun Cao
Neural Radiance Field Implicit Neural Representation Video Representation Refractive Index

March 31, 2023

Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Yiwu Zhong, Licheng Yu, Yang Bai, Shangwen Li, Xueting Yan, Yin Li
Video Representation Instructional Video Spoken Narrative Video Representation Learning Deep Probabilistic Model Temporal Order Step Recognition

March 28, 2023