Paper ID: 2402.10077

Towards a large-scale fused and labeled dataset of human pose while interacting with robots in shared urban areas

E. Sherafat, B. Farooq

Over the last decade, Autonomous Delivery Robots (ADRs) have transformed conventional delivery methods, responding to the growing e-commerce demand. However, the readiness of ADRs to navigate safely among pedestrians in shared urban areas remains an open question. We contend that there are crucial research gaps in understanding their interactions with pedestrians in such environments. Human Pose Estimation is a vital stepping stone for various downstream applications, including pose prediction and socially aware robot path-planning. Yet, the absence of an enriched and pose-labeled dataset capturing human-robot interactions in shared urban areas hinders this objective. In this paper, we bridge this gap by repurposing, fusing, and labeling two datasets, MOT17 and NCLT, focused on pedestrian tracking and Simultaneous Localization and Mapping (SLAM), respectively. The resulting unique dataset represents thousands of real-world indoor and outdoor human-robot interaction scenarios. Leveraging YOLOv7, we obtained human pose visual and numeric outputs and provided ground truth poses using manual annotation. To overcome the distance bias present in the traditional MPJPE metric, this study introduces a novel human pose estimation error metric called Mean Scaled Joint Error (MSJE) by incorporating bounding box dimensions into it. Findings demonstrate that YOLOv7 effectively estimates human pose in both datasets. However, it exhibits weaker performance in specific scenarios, like indoor, crowded scenes with a focused light source, where both MPJPE and MSJE are recorded as 10.89 and 25.3, respectively. In contrast, YOLOv7 performs better in single-person estimation (NCLT seq 2) and outdoor scenarios (MOT17 seq1), achieving MSJE values of 5.29 and 3.38, respectively.

Submitted: Jan 28, 2024