Self Supervised Action

Self-supervised action recognition aims to learn robust representations of human actions from unlabeled video data, focusing on efficient and accurate action classification without manual annotation. Current research heavily utilizes contrastive learning, often incorporating multi-stream approaches (e.g., combining skeleton joint, motion, and bone information) and attention mechanisms to highlight salient features like specific body parts or temporal dynamics. These advancements improve the generalizability and robustness of action recognition models, particularly in challenging scenarios with partial or noisy data, finding applications in healthcare monitoring and other areas requiring automated action understanding.

Papers