Human Action Recognition

Human action recognition (HAR) aims to automatically identify and classify human actions from visual data, such as videos or images. Current research heavily emphasizes the use of deep learning models, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and increasingly, transformers, often incorporating multimodal data (e.g., RGB, depth, skeleton, audio) and advanced techniques like knowledge distillation and self-supervised learning to improve accuracy and efficiency. HAR has significant implications for various fields, including robotics, healthcare, security, and assistive technologies, enabling applications like autonomous robot control, activity monitoring, and violence detection.

Papers