Reinforcement learning under safety non-Markovian Safety Specifications and Rewards expressed in Linear Temporal Logics on Finite Traces
Supervisor
Suitable for
Abstract
In some cases, the agent has a simulator of the environment instead of a formal specification, so it needs to learn its strategies to achieve its task in the environment. Sometimes even the task is only implicitly specified through rewards. The key issue is that the type of properties we are often interested in are non-Markovian (e.g., specified in LTL or LTLf), and hence we need to introduce non-Markovian characteristics in decision processes and reinforcement learning.A particular promising direction is when such non-Markovian characteristics can be expressed in Pure Past Linear Temporal Logics.
Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement reinforcement learning techniques that remain safe wrt safety specification and achieve rewards specified in linear temporal logics on finite traces
Reinforcement Learning, MDP, Non-Markovian Rewards, Non-Markovian Decision Processes, Linear Temporal Logics
Ronen I. Brafman, Giuseppe De Giacomo, Fabio Patrizi: LTLf/LDLf Non-Markovian Rewards. AAAI 2018: 1771-1778Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564
Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, Fabio Patrizi: Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications. ICAPS 2019: 128-136
Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. AAAI-18: 2669–2678.
Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi: Imitation Learning over Heterogeneous Agents with Restraining Bolts. ICAPS 2020: 517-521
Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi, Alessandro Ronca: Temporal Logic Monitoring Rewards via Transducers. KR 2020: 860-870
Giuseppe De Giacomo, Antonio Di Stasio, Francesco Fuggitti, Sasha Rubin: Pure-Past Linear Temporal and Dynamic Logic on Finite Traces. IJCAI 2020: 4959-4965
Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860