Skip to main content

Multi−Agent Hierarchical Reinforcement Learning with Dynamic Termination

Dongge Han‚ Wendelin Boehmer‚ Michael Wooldridge and Alex Rogers

Abstract

In a multi-agent system, an agent's optimal policy will typically depend on the policies of other agents. Predicting the behaviours of others, and responding promptly to changes in such behaviours, is therefore a key issue in multi-agent systems research. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical RL framework. However, this approach results in inflexible agents when options have an extended duration. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent's actual behaviour and its broadcasted intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options.

Address
Richland‚ SC
Book Title
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems
Year
2019