Learning from Human Reward Benefits from Socio−competitive Feedback
Guangliang Li‚ Hayley Hung‚ Shimon Whiteson and W. Bradley Knox
Abstract
Learning from rewards generated by a human trainer observing an agent in action has proven to be a powerful method for non-experts in autonomous agents to teach such agents to perform challenging tasks. Since the efficacy of this approach depends critically on the reward the trainer provides, we consider how the interaction between the trainer and the agent should be designed so as to increase the efficiency of the training process. This paper investigates the influence of the agent's socio-competitive feedback on the human trainer's training behavior and the agent's learning. The results of our user study with 85 subjects suggest that the agent's socio-competitive feedback substantially increases the amount of time they spend training, the amount of feedback they provide, and the agents' resulting performance. Moreover, making this feedback active further increases the amount of time trainers spend training but does not further improve agent performance. Our analysis suggests that this may be because some trainers train a more complex behavior in the agent that is appropriate for a different performance metric which is sometimes associated with the target task.