Egocentric Action Recognition

Egocentric action recognition (EAR) focuses on automatically understanding actions performed by individuals from their first-person perspective, primarily using video data from wearable cameras. Current research emphasizes improving EAR's robustness and efficiency by integrating diverse data modalities (e.g., inertial measurement units, audio, object detection data) and employing advanced architectures like transformers and self-supervised learning techniques to address challenges like domain adaptation, data scarcity, and missing modalities. This field is significant for its potential applications in assistive technologies, human-robot interaction, and understanding daily activities, driving the development of more accurate and adaptable computer vision systems.

Papers