Video Foundation Model
Video foundation models (VFMs) aim to learn general-purpose representations for diverse video understanding tasks, moving beyond task-specific models. Current research emphasizes improving the robustness and efficiency of VFMs, focusing on architectures like masked autoencoders and transformer-based models, and exploring effective pre-training strategies including contrastive learning and generative approaches. This work is significant because it enables more accurate and efficient video analysis across various applications, from action recognition and video-text retrieval to robotic learning and general video understanding. The development of more generalizable and efficient VFMs is a key area of advancement in computer vision.
Papers
January 15, 2025
December 16, 2024
December 3, 2024
November 15, 2024
November 13, 2024
November 4, 2024
October 19, 2024
October 17, 2024
September 27, 2024
August 21, 2024
July 18, 2024
July 9, 2024
May 6, 2024
April 30, 2024
April 26, 2024
April 18, 2024
April 3, 2024
April 1, 2024
March 22, 2024