Paper ID: 2202.02656

A survey of top-down approaches for human pose estimation

Thong Duy Nguyen, Milan Kresovic

Human pose estimation in two-dimensional images videos has been a hot topic in the computer vision problem recently due to its vast benefits and potential applications for improving human life, such as behaviors recognition, motion capture and augmented reality, training robots, and movement tracking. Many state-of-the-art methods implemented with Deep Learning have addressed several challenges and brought tremendous remarkable results in the field of human pose estimation. Approaches are classified into two kinds: the two-step framework (top-down approach) and the part-based framework (bottom-up approach). While the two-step framework first incorporates a person detector and then estimates the pose within each box independently, detecting all body parts in the image and associating parts belonging to distinct persons is conducted in the part-based framework. This paper aims to provide newcomers with an extensive review of deep learning methods-based 2D images for recognizing the pose of people, which only focuses on top-down approaches since 2016. The discussion through this paper presents significant detectors and estimators depending on mathematical background, the challenges and limitations, benchmark datasets, evaluation metrics, and comparison between methods.

Submitted: Feb 5, 2022