Reward Report

Reward report research centers on efficiently learning reward functions to guide reinforcement learning (RL) agents, particularly in complex domains like large language models (LLMs) and robotics. Current efforts focus on improving reward model accuracy and efficiency through techniques like active learning, parameter insertion within existing model architectures, and leveraging vision-language models (VLMs) to generate dense reward functions. This research is crucial for advancing RL's capabilities in safety-critical applications and for aligning AI systems more effectively with human preferences, ultimately leading to more robust and beneficial AI systems.

Papers

August 24, 2023

Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion
Yunho Kim, Hyunsik Oh, Jeonghyun Lee, Jinhyeok Choi, Gwanghyeon Ji, Moonkyu Jung, Donghoon Youm, Jemin Hwangbo
Financial Application Reward Report Neural Network Controller Legged Robot Locomotion Complex Robotic System Reward Engineering

July 13, 2023

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, Mengdi Wang
Reward Function Conditional Diffusion Model Reward Report Conditional Diffusion DIstribution Estimation

June 29, 2023

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Alexander Meulemans, Simon Schug, Seijin Kobayashi, Nathaniel Daw, Gregory Wayne
Reward Report Sample Efficient Reinforcement Learning Credit Assignment Counterfactual Analysis

June 27, 2023

CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy
Xianhang Li, Zeyu Wang, Cihang Xie
Zero Shot Image Text Pair Reward Report CLIP Training

June 14, 2023

Language to Rewards for Robotic Skill Synthesis
Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia
Large Language Model Human Language Robotic Task Reward Report Skill Learning Robotics Research Low Level

June 2, 2023

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
Language Model Fine Grained Human Feedback Reward Model Reward Report

June 1, 2023

LIV: Language-Image Representations and Rewards for Robotic Control
Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh Jayaraman
Imitation Learning Vision Language Meaningful Representation Robot Control Reward Report Vision Language Representation Lidar Inertial Image Language

May 25, 2023

Beyond Reward: Offline Preference-guided Policy Optimization
Yachen Kang, Diyuan Shi, Jinxin Liu, Li He, Donglin Wang
Reinforcement Learning Policy OpTimization Reward Report Action Free Offline Offline Preference Based Reinforcement Learning

May 23, 2023

Video Prediction Models as Rewards for Reinforcement Learning
Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel
Reinforcement Learning Reward Report Video Prediction Reward Signal Reward Prediction

May 16, 2023

Balancing Risk and Reward: An Automated Phased Release Strategy
Yufan Li, Jialiang Mao, Iavor Bojinov
Posterior Inference High Quality Risk Description Reward Report Rare Event Adaptive Bayesian

April 11, 2023

BanditQ: Fair Bandits with Guaranteed Rewards
Abhishek Sinha
Multi Armed Bandit Optimal Regret Bandit Feedback Reward Report Adversarial Bandit Bandit Policy

April 6, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks
New Benchmark Artificial Agent Reward Report Multiple Meaning Ethical Behavior Power Seeking Social Decision Making

March 22, 2023

Reinforcement Learning with Exogenous States and Rewards
George Trimponias, Thomas G. Dietterich
Reinforcement Learning Markov Decision Process State Space Reward Report MDP Model Exogenous Global Markov Process

February 20, 2023

Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems
Yihao Feng, Shentao Yang, Shujian Zhang, Jianguo Zhang, Caiming Xiong, Mingyuan Zhou, Huan Wang
Reinforcement Learning Case Study Reward Function Task Oriented Task Oriented Dialogue System Reward Report Reward Learning End to End Response Generation

February 9, 2023

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals
Yue Wu, Yewen Fan, Paul Pu Liang, Amos Azaria, Yuanzhi Li, Tom M. Mitchell
Reward Model Reward Report Read V HELP Request Agent Performance Reward Structure Object Interaction ATARI Game User Manual

January 26, 2023

Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Athul Shibu, Abhishek Kumar, Heechul Jung, Dong-Gyu Lee
Convolutional Neural Network Deep Learning Model Reward Report Channel Pruning Pruned Model Meta Pruning

January 18, 2023

Learning to Participate through Trading of Reward Shares
Michael Kölle, Tim Matheis, Philipp Altmann, Kyrill Schmid
LeArning Abstract Artificial Intelligence Autonomous Agent Reward Report Financial Trading Social Dilemma General Sum Markov Game

January 2, 2023

Deep Reinforcement Learning for Asset Allocation: Reward Clipping
Jiwon Kim, Moon-Ju Kang, KangHun Lee, HyungJun Moon, Bo-Kwan Jeon
Reinforcement Learning Deep Reinforcement Learning Reinforcement Learning Algorithm Reward Report Portfolio Optimization Asset Allocation

December 21, 2022

Reward Bonuses with Gain Scheduling Inspired by Iterative Deepening Search
Taisuke Kobayashi
Sparse Reward Reward Report Task Specific Reward Iterative Neural Network Learning Search Automatic Gain Control

December 20, 2022

Settling the Reward Hypothesis
Michael Bowling, John D. Martin, David Abel, Will Dabney
Reward Report Scientific Hypothesis Balancing Efficiency