Paper ID: 2211.07977

Deep Instance Segmentation and Visual Servoing to Play Jenga with a Cost-Effective Robotic System

Luca Marchionna, Giulio Pugliese, Mauro Martini, Simone Angarano, Francesco Salvetti, Marcello Chiaberge

The game of Jenga represents an inspiring benchmark for developing innovative manipulation solutions for complex tasks. Indeed, it encouraged the study of novel robotics methods to successfully extract blocks from the tower. A Jenga game round undoubtedly embeds many traits of complex industrial or surgical manipulation tasks, requiring a multi-step strategy, the combination of visual and tactile data, and the highly precise motion of the robotic arm to perform a single block extraction. In this work, we propose a novel, cost-effective architecture for playing Jenga with e.Do, a 6-DOF anthropomorphic manipulator manufactured by Comau, a standard depth camera, and an inexpensive monodirectional force sensor. Our solution focuses on a visual-based control strategy to accurately align the end-effector with the desired block, enabling block extraction by pushing. To this aim, we train an instance segmentation deep learning model on a synthetic custom dataset to segment each piece of the Jenga tower, allowing visual tracking of the desired block's pose during the motion of the manipulator. We integrate the visual-based strategy with a 1D force sensor to detect whether the block can be safely removed by identifying a force threshold value. Our experimentation shows that our low-cost solution allows e.DO to precisely reach removable blocks and perform up to 14 consecutive extractions in a row.

Submitted: Nov 15, 2022