Video Level Representation
Video-level representation learning aims to create concise, informative summaries of entire videos, capturing both spatial and temporal information for tasks like action recognition, scene classification, and video retrieval. Current research heavily utilizes transformer-based architectures and 3D/4D convolutional neural networks, often incorporating contrastive learning or self-supervised techniques to learn robust representations from diverse video data. These advancements are improving the accuracy and efficiency of video understanding systems, impacting applications ranging from automated video analysis to more effective multimedia search and retrieval.
Papers
August 5, 2024
January 9, 2024
November 28, 2023
November 27, 2023
October 20, 2022
May 26, 2022
April 7, 2022
March 30, 2022