Paper ID: 2410.08751 • Published Oct 11, 2024
Zero-Shot Offline Imitation Learning via Optimal Transport
Thomas Rupf, Marco Bagatella, Nico Gürtler, Jonas Frey, Georg Martius
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Zero-shot imitation learning algorithms hold the promise of reproducing
unseen behavior from as little as a single demonstration at test time. Existing
practical approaches view the expert demonstration as a sequence of goals,
enabling imitation with a high-level goal selector, and a low-level
goal-conditioned policy. However, this framework can suffer from myopic
behavior: the agent's immediate actions towards achieving individual goals may
undermine long-term objectives. We introduce a novel method that mitigates this
issue by directly optimizing the occupancy matching objective that is intrinsic
to imitation learning. We propose to lift a goal-conditioned value function to
a distance between occupancies, which are in turn approximated via a learned
world model. The resulting method can learn from offline, suboptimal data, and
is capable of non-myopic, zero-shot imitation, as we demonstrate in complex,
continuous benchmarks.