Bayesian Reinforcement Learning: Robustness and Safe Training
Supervisor
Suitable for
Abstract
In this project we shall build on recent work on ``Safe Learning'' [2], which frames classical RL algorithms to synthesise
policies that abide by complex tasks or objectives, whilst training safely (that is, without violating given safety requirements).
Tasks/objectives for RL-based synthesis can be goals expressed as logical formulae, and thus be richer than standard reward-based
goals.
We plan to frame recent work by OXCAV [2] in the context of Bayesian RL, as well as to leverage modern robustness results,
as in [3]. We shall pursue both model-based and -free approaches.
[2] M. Hasanbeig, A. Abate and D. Kroening, ``Cautious Reinforcement Learning with Logical Constraints,'' AAMAS20, pp. 483-491,
2020.
[3] B. Recht, ``A Tour of Reinforcement Learning: The View from Continuous Control,'' Annual Reviews in Control, Vol. 2, 2019.