Egocentric 3D Hand Pose Estimation

Egocentric 3D hand pose estimation aims to accurately determine the 3D position and orientation of a person's hand in a first-person perspective, primarily using RGB video data. Current research focuses on improving accuracy using techniques like multi-view fusion, pseudo-depth generation from single RGB images, and advanced architectures such as Vision Transformers (ViTs) and state-space models, often incorporating uncertainty estimation. This field is crucial for advancing human-computer interaction in virtual and augmented reality, robotics, and activity recognition, with ongoing efforts to create robust and efficient methods applicable across diverse camera setups and lighting conditions.

Papers