Shot Video Classification

Shot video classification focuses on accurately categorizing short video clips, particularly with limited labeled data (few-shot or zero-shot learning). Current research emphasizes leveraging pre-trained vision-language models like CLIP, adapting them through techniques such as spatial-temporal attention mechanisms and prototype modulation to improve feature extraction and classification accuracy. This area is crucial for applications like activity recognition in assistive robotics and healthcare, where obtaining large labeled datasets is often impractical, and advancements are driving progress in efficient and robust video understanding.

Papers