Paper ID: 2312.14292

Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming

Siddhant Bhambri, Mudit Verma, Upasana Biswas, Anil Murthy, Subbarao Kambhampati

Preference-based Reinforcement Learning (PbRL) has made significant strides in single-agent settings, but has not been studied for multi-agent frameworks. On the other hand, modeling cooperation between multiple agents, specifically, Human-AI Teaming settings while ensuring successful task completion is a challenging problem. To this end, we perform the first investigation of multi-agent PbRL by extending single-agent PbRL to the two-agent teaming settings and formulate it as a Human-AI PbRL Cooperation Game, where the RL agent queries the human-in-the-loop to elicit task objective and human's preferences on the joint team behavior. Under this game formulation, we first introduce the notion of Human Flexibility to evaluate team performance based on if humans prefer to follow a fixed policy or adapt to the RL agent on the fly. Secondly, we study the RL agent's varying access to the human policy. We highlight a special case along these two dimensions, which we call Specified Orchestration, where the human is least flexible and agent has complete access to human policy. We motivate the need for taking Human Flexibility into account and the usefulness of Specified Orchestration through a gamified user study. We evaluate state-of-the-art PbRL algorithms for Human-AI cooperative setups through robot locomotion based domains that explicitly require forced cooperation. Our findings highlight the challenges associated with PbRL by varying Human Flexibility and agent's access to the human policy. Finally, we draw insights from our user study and empirical results, and conclude that Specified Orchestration can be seen as an upper bound PbRL performance for future research in Human-AI teaming scenarios.

Submitted: Dec 21, 2023