Self Supervised Video
Self-supervised video learning aims to train powerful video representation models without relying on extensive manual labeling, focusing on learning from the inherent structure and dynamics within video data itself. Current research emphasizes developing novel pretext tasks, such as video ordering, temporal reconstruction, and contrastive learning across frames or video-text pairs, often employing transformer-based architectures. These advancements are improving performance on various downstream tasks like action recognition, video retrieval, and even applications such as traffic prediction and surgical video enhancement, demonstrating the potential of self-supervised learning to unlock the vast information contained in unlabeled video data.
Papers
Scaling 4D Representations
João Carreira, Dilara Gokay, Michael King, Chuhan Zhang, Ignacio Rocco, Aravindh Mahendran, Thomas Albert Keck, Joseph Heyward, Skanda Koppula, Etienne Pot, Goker Erdogan, Yana Hasson, Yi Yang, Klaus Greff, Guillaume Le Moing, Sjoerd van Steenkiste, Daniel Zoran, Drew A. Hudson, Pedro Vélez, Luisa Polanía, Luke Friedman, Chris Duvarney, Ross Goroshin, Kelsey Allen, Jacob Walker, Rishabh Kabra, Eric Aboussouan, Jennifer Sun, Thomas Kipf, Carl Doersch, Viorica Pătrăucean, Dima Damen, Pauline Luc, Mehdi S. M. Sajjadi, Andrew Zisserman
Efficient Self-Supervised Video Hashing with Selective State Spaces
Jinpeng Wang, Niu Lian, Jun Li, Yuting Wang, Yan Feng, Bin Chen, Yongbing Zhang, Shu-Tao Xia