Paper ID: 2405.06574

Deep video representation learning: a survey

Elham Ravanbakhsh, Yongqing Liang, J. Ramanujam, Xin Li

This paper provides a review on representation learning for videos. We classify recent spatiotemporal feature learning methods for sequential visual data and compare their pros and cons for general video analysis. Building effective features for videos is a fundamental problem in computer vision tasks involving video analysis and understanding. Existing features can be generally categorized into spatial and temporal features. Their effectiveness under variations of illumination, occlusion, view and background are discussed. Finally, we discuss the remaining challenges in existing deep video representation learning studies.

Submitted: May 10, 2024

Topics

Timely Survey
Representation Learning
Video Representation
Temporal Feature
Video Analysis
Video Representation Learning

Links

arXiv PDF