Paper ID: 2503.13414 • Published Mar 17, 2025
Reward Adaptation Via Q-Manipulation
Kevin Vora, Yu Zhang
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
In this paper, we propose a new solution to reward adaptation (RA), the
problem where the learning agent adapts to a target reward function based on
one or multiple existing behaviors learned a priori under the same domain
dynamics but different reward functions. Learning the target behavior from
scratch is possible but often inefficient given the available source behaviors.
Our work represents a new approach to RA via the manipulation of Q-functions.
Assuming that the target reward function is a known function of the source
reward functions, our approach to RA computes bounds of the Q function. We
introduce an iterative process to tighten the bounds, similar to value
iteration. This enables action pruning in the target domain before learning
even starts. We refer to such a method as Q-Manipulation (Q-M). We formally
prove that our pruning strategy does not affect the optimality of the returned
policy while empirically show that it improves the sample complexity. Q-M is
evaluated in a variety of synthetic and simulation domains to demonstrate its
effectiveness, generalizability, and practicality.
Figures & Tables
Unlock access to paper figures and tables to enhance your research experience.