Reward Feedback

Reward feedback, crucial for training intelligent agents, is a central focus in reinforcement learning research, aiming to optimize how agents learn from feedback signals to improve performance. Current research emphasizes efficient learning from various feedback types, including noisy preferences, delayed or composite rewards, and even indirect feedback derived from mutual information maximization, employing algorithms like EXP3 variants and posterior sampling methods within bandit and Markov Decision Process frameworks. These advancements are improving the robustness and efficiency of reinforcement learning in diverse applications, from robotics and personalized advertising to generative AI model fine-tuning.

Papers

December 27, 2024

ReNeg: Learning Negative Embedding with Reward Guidance
Xiaomin Li, Yixuan Liu, Takashi Isobe, Xu Jia, Qinpeng Cui, Dong Zhou, Dong Li, You He, Huchuan Lu, Zhongdao Wang, Emad Barsoum
Text to Image Text to Video Non Negative Reward Feedback Negative Preference

June 13, 2024

Online Bandit Learning with Offline Preference Data
Akhil Agnihotri, Rahul Jain, Deepak Ramachandran, Zheng Wen
Reinforcement Learning Preference Feedback Bandit Algorithm Offline Preference Bandit Learning Reward Feedback Noisy Preference

December 14, 2023

LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers
Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim, Karl Pertsch
Vision Language Model Foundation Model Good Teacher Language Conditioned Skill Discovery Unsupervised Reinforcement Learning Floor Lift Reward Feedback Non Cognitive Skill

November 28, 2023

Feedback RoI Features Improve Aerial Object Detection
Botao Ren, Botian Xu, Tengyu Liu, Jingyi Wang, Zhidong Deng
Detection Model Aerial Object Detection Reward Feedback

November 4, 2023

The equivalence of dynamic and strategic stability under regularized learning in games
Victor Boone, Panayotis Mertikopoulos
LeArning Abstract Nash Equilibrium Video Game Non Cooperative Game Reward Feedback

October 17, 2023

Adversarial Bandits with Multi-User Delayed Feedback: Theory and Application
Yandi Li, Jianxiong Guo, Yupeng Li, Tian Wang, Weijia Jia
Application Proficiency Multi Armed Bandit Theoretical Understanding Sublinear Regret Delayed Feedback Adversarial Bandit Reward Feedback

May 4, 2023

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
Washim Uddin Mondal, Vaneet Aggarwal
Reinforcement Learning Efficient Hybrid Near Optimal Policy Significant Delay Change Reward Feedback Reward Delay

March 23, 2023

Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback
Mohammad Pedramfar, Vaneet Aggarwal
Regret Bound Submodular Maximization Reward Feedback Delayed Bandit Submodular Bandit Full Bandit

June 2, 2022

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards
Ashwinkumar Badanidiyuru, Zhe Feng, Tianxi Li, Haifeng Xu
Reinforcement Learning Automated Conversion Episodic Markov Decision Process Novel Reinforcement Learning Mixed Speech Incremental Model Reward Feedback Reward Delay

May 24, 2022

First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization
Siddharth Reddy, Sergey Levine, Anca D. Dragan
Reinforcement Learning Mutual Information Mutual Information Maximization Human Machine Interface Reward Feedback

February 9, 2022

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration
Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, Dileep Kalathil, Srinivas Shakkottai
Reinforcement Learning Sparse Reward Efficient Reinforcement Learning Human Guidance Optimal Behavior Real World Reinforcement Learning Reward Feedback

December 6, 2021

Nonstochastic Bandits with Composite Anonymous Feedback
Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour
Planning Loss Reward Feedback