Policy Distillation

Policy distillation in reinforcement learning aims to transfer knowledge from a complex, often computationally expensive "teacher" policy to a simpler, more efficient "student" policy. Current research focuses on improving sample efficiency, enhancing robustness to imperfect teacher policies, and achieving interpretability through distillation into models like decision trees, gradient boosting machines, or neuro-fuzzy systems. This technique is proving valuable across diverse applications, including robotics (manipulation, locomotion, grasping), finance (portfolio management), and healthcare (drug dosing), by enabling faster training, reduced computational cost, and improved explainability of learned behaviors.

Papers