Skip to main content

Deep Reinforcement Learning for High-Dimensional POMDPs

Supervisors

Suitable for

MSc in Advanced Computer Science

Abstract

Background

Recent work has demonstrated that Deep Reinforcement Learning (DRL) algorithms can achieve human-level control policies across various applications. This project will focus on developing and testing DRL methods specifically for Partially Observable Markov Decision Processes (POMDPs), where the agent must make decisions in environments with limited and noisy observations. A key challenge is ensuring that the algorithm remains robust in environments with high-dimensional observations.

Focus

The student undertaking this project will gain familiarity with POMDP definitions and relevant environments. The objective is to implement DRL-based POMDP algorithms capable of deriving robust solutions for high-dimensional observations. The student will explore the following techniques during the project:

• Dimensionality reduction using neural networks to compress high-dimensional observations into lower-dimensional latent representations;

• Attention mechanisms to focus on the most relevant parts of high-dimensional observations for decision-making, linking these to specific beliefs;

• Processing observations in a hierarchical structure at different resolutions to improve computational efficiency;

• Designing specific loss functions that incorporate reconstruction, contrastive, and belief consistency terms to learn compact, task-relevant representations.

Method

References:

• Igl M, Zintgraf L, Le T A, et al. Deep variational reinforcement learning for POMDPs[C]. International conference on machine learning. PMLR, 2018.

• Meng L, Gorbet R, Kulić D. Memory-based deep reinforcement learning for pomdps[C]. IEEE/RSJ international conference on intelligent robots and systems. IROS, 2021.

• Lauri M, Hsu D, Pajarinen J. Partially observable Markov decision processes in robotics: A survey[J]. IEEE Transactions on Robotics, 2022.