Pose Induced Video Transformer

Pose-induced video transformers leverage human pose information (2D and/or 3D) to enhance video analysis tasks, primarily focusing on action recognition and 6D pose estimation. Current research employs transformer architectures, often incorporating modules to integrate pose data with RGB video streams, aiming for improved accuracy and efficiency, particularly in challenging scenarios like Activities of Daily Living (ADL) recognition and object pose estimation from limited data. This approach shows promise for advancing applications requiring precise understanding of human movement and object location in videos, such as robotics, augmented reality, and sign language translation.

Papers