Paper ID: 2407.13175
OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping
Li Meng, Zhao Qi, Lyu Shuchang, Wang Chunlei, Ma Yujing, Cheng Guangliang, Yang Chenguang
Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly handle novel objects. Our contributions are threefold. Firstly, we present a large-scale benchmark dataset specifically tailored for evaluating the performance of open-vocabulary grasping tasks. Secondly, we propose a unified visual-linguistic framework that serves as a guide for robots in successfully grasping both base and novel objects. Thirdly, we introduce two alignment modules designed to enhance visual-linguistic perception in the robotic grasping process. Extensive experiments validate the efficacy and utility of our approach. Notably, our framework achieves an average accuracy of 71.2\% and 64.4\% on base and novel categories in our new dataset, respectively.
Submitted: Jul 18, 2024