Reinforcement learning under safety non-Markovian Safety Specifications and Rewards expressed in Linear Temporal Logics on Finite Traces

Supervisor

Suitable for

Mathematics and Computer Science, Part C

Abstract

In some cases, the agent has a simulator of the environment instead of a formal specification, so it needs to learn its strategies to achieve its task in the environment. Sometimes even the task is only implicitly specified through rewards. The key issue is that the type of properties we are often interested in are non-Markovian (e.g., specified in LTL or LTLf), and hence we need to introduce non-Markovian characteristics in decision processes and reinforcement learning.

A particular promising direction is when such non-Markovian characteristics can be expressed in Pure Past Linear Temporal Logics.

Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement reinforcement learning techniques that remain safe wrt safety specification and achieve rewards specified in linear temporal logics on finite traces

Reinforcement Learning, MDP, Non-Markovian Rewards, Non-Markovian Decision Processes, Linear Temporal Logics

Ronen I. Brafman, Giuseppe De Giacomo, Fabio Patrizi: LTLf/LDLf Non-Markovian Rewards. AAAI 2018: 1771-1778Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564

Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, Fabio Patrizi: Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications. ICAPS 2019: 128-136

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. AAAI-18: 2669–2678.

Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi: Imitation Learning over Heterogeneous Agents with Restraining Bolts. ICAPS 2020: 517-521

Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi, Alessandro Ronca: Temporal Logic Monitoring Rewards via Transducers. KR 2020: 860-870

Giuseppe De Giacomo, Antonio Di Stasio, Francesco Fuggitti, Sasha Rubin: Pure-Past Linear Temporal and Dynamic Logic on Finite Traces. IJCAI 2020: 4959-4965

Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860

Reinforcement learning under safety non-Markovian Safety Specifications and Rewards expressed in Linear Temporal Logics on Finite Traces

Supervisor

Suitable for

Abstract

Student Space