Video Classification
Video classification aims to automatically categorize video content based on its visual and/or audio features, facilitating efficient indexing, retrieval, and analysis. Current research emphasizes improving efficiency and robustness, focusing on architectures like transformers and convolutional neural networks, often incorporating multi-modal fusion (audio-visual) and techniques like knowledge distillation and active learning to optimize training with limited labeled data. These advancements are crucial for applications ranging from content moderation and surveillance to medical image analysis and industrial automation, driving progress in both computer vision and related fields.
Papers
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
Huseyin Coskun, Alireza Zareian, Joshua L. Moore, Federico Tombari, Chen Wang
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata