DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video

DexVIP: Learning Dexterous Grasping with
Human Hand Pose Priors from Video

Priyanka Mandikal^1,2 Kristen Grauman^1,2

¹UT Austin,²Facebook AI Research

Accepted at CoRL 2021

[Paper]

[Bibtex]

Dexterous multi-fingered robotic hands have a formidable action space, yet their morphological similarity to the human hand holds immense potential to accelerate robot learning. We propose DexVIP, an approach to learn dexterous robotic grasping from human-object interactions present in in-the-wild YouTube videos. We do this by curating grasp images from human-object interaction videos and imposing a prior over the agent's hand pose when learning to grasp with deep reinforcement learning. A key advantage of our method is that the learned policy is able to leverage free-form in-the-wild visual data. As a result, it can easily scale to new objects, and it sidesteps the standard practice of collecting human demonstrations in a lab---a much more expensive and indirect way to capture human expertise. Through experiments on 27 objects with a 30-DoF simulated robot hand, we demonstrate that DexVIP compares favorably to existing approaches that lack a hand pose prior or rely on specialized tele-operation equipment to obtain human demonstrations, while also being faster to train.

Overview

In this work, we learn dexterous grasping by watching human-object interactions in YouTube how-to videos. Using hand poses extracted from a repository of curated human grasp images, we train a dexterous robotic agent to learn to grasp objects in simulation. The key benefits include improved grasping performance and the ability to quickly scale the method to new objects.

Cite

If you find this work useful in your own research, please consider citing:

                    @inproceedings{mandikal2021dexvip,
                    title = {DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video},
                    author = {Mandikal, Priyanka and Grauman, Kristen},
                    booktitle = {Conference on Robot Learning (CoRL)},
                    year = {2021}
                    }