Policy Regularization
Policy regularization in reinforcement learning aims to improve the stability, robustness, and efficiency of learned policies by constraining their behavior during training. Current research focuses on developing novel regularization techniques, often leveraging diffusion models, variational autoencoders, or Q-function estimates to guide policy learning and mitigate issues like out-of-distribution actions and catastrophic forgetting in continual learning settings. These advancements are significant because they enhance the reliability and applicability of reinforcement learning agents across diverse domains, from robotics and autonomous driving to personalized recommendations and human-AI collaboration.
Papers
State Regularized Policy Optimization on Data with Dynamics Shift
Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
COPR: Consistency-Oriented Pre-Ranking for Online Advertising
Zhishan Zhao, Jingyue Gao, Yu Zhang, Shuguang Han, Siyuan Lou, Xiang-Rong Sheng, Zhe Wang, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng