Source Video
Source video analysis encompasses a broad range of research aiming to extract meaningful information and perform various tasks directly from video data. Current efforts focus on developing robust and efficient methods for tasks such as 3D motion estimation, object detection and tracking, and multimodal analysis integrating audio and other sensor data, often employing deep learning architectures like transformers and diffusion models. These advancements have significant implications for diverse fields, including autonomous driving, medical diagnosis, and multimedia content creation, by enabling more sophisticated and automated processing of visual information.
Papers
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval
Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu Chen, Michael J. Black, Otmar Hilliges
Pseudo-Generalized Dynamic View Synthesis from a Video
Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Angel Bautista, Joshua M. Susskind, Alexander G. Schwing
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis