Video Representation
Video representation research aims to create efficient and effective ways to encode and process video data for various applications. Current efforts focus on developing novel architectures, including implicit neural representations (INRs), transformers, and hybrid models combining convolutional neural networks (CNNs) and transformers, often incorporating self-supervised learning and leveraging multimodal information (e.g., audio, text). These advancements improve video compression, enhance downstream tasks like action recognition and video retrieval, and enable new capabilities such as video editing and generation. The resulting improvements in video understanding and manipulation have significant implications for fields ranging from surveillance and monitoring to entertainment and healthcare.
Papers
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Chaorui Deng, Qi Chen, Pengda Qin, Da Chen, Qi Wu
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah
SELF-VS: Self-supervised Encoding Learning For Video Summarization
Hojjat Mokhtarabadi, Kave Bahraman, Mehrdad HosseinZadeh, Mahdi Eftekhari