Video Dataset
Video datasets are crucial for training and evaluating computer vision models capable of understanding video content, encompassing diverse tasks like action recognition, object tracking, and quality assessment. Current research emphasizes creating benchmarks with varied video sources (e.g., natural scenes, AI-generated content), incorporating multimodal information (text, audio), and focusing on challenging scenarios such as unusual activity localization and camouflaged object segmentation. These advancements are driving progress in video understanding, with applications ranging from improved surveillance systems and e-commerce experiences to more sophisticated content moderation and conservation efforts.
Papers
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy