Visual Imitation Learning
Visual imitation learning (VIL) aims to enable robots to learn complex tasks by observing human demonstrations in video form. Current research focuses on improving sample efficiency and robustness by employing techniques such as waypoint extraction, active sensoring to address viewpoint discrepancies, and the use of keypoint-based representations for object-centric task understanding. These advancements, often incorporating neural networks like transformers and variational autoencoders, are leading to more efficient and generalizable robot learning, with applications ranging from manipulation tasks to autonomous driving. The resulting improvements in data efficiency and generalization capability are significant steps towards creating more adaptable and versatile robots.
Papers
Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks
Jianfeng Gao, Xiaoshu Jin, Franziska Krebs, Noémie Jaquier, Tamim Asfour
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches
Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean Kirmani, Tianhe Yu, Michael Stark, Ajinkya Jain, Karol Hausman, Dorsa Sadigh, Jeannette Bohg, Stefan Schaal