Compositional Action Recognition

Compositional action recognition aims to enable machines to understand and classify actions as combinations of simpler actions or object interactions, mirroring human cognitive abilities. Current research focuses on developing models that can generalize to unseen action combinations, often employing techniques like component-to-composition learning, multimodal knowledge distillation, and spatio-temporal interaction modeling using attention mechanisms and relational networks. These advancements are significant because they improve the robustness and generalization capabilities of action recognition systems, paving the way for more versatile applications in areas like video understanding and human-computer interaction.

Papers