Video Representation

Video representation research aims to create efficient and effective ways to encode and process video data for various applications. Current efforts focus on developing novel architectures, including implicit neural representations (INRs), transformers, and hybrid models combining convolutional neural networks (CNNs) and transformers, often incorporating self-supervised learning and leveraging multimodal information (e.g., audio, text). These advancements improve video compression, enhance downstream tasks like action recognition and video retrieval, and enable new capabilities such as video editing and generation. The resulting improvements in video understanding and manipulation have significant implications for fields ranging from surveillance and monitoring to entertainment and healthcare.

Papers