Student Projects

Undergraduate students in the third year of the Final Honour School of Computer Science may undertake a project. Fourth years of the Final Honour School of Computer Science, and students for the MSc in Computer Science are required to undertake a project. Mathematics & Computer Science undergraduates are required to undertake a Computer Science project or a Mathematics dissertation in their fourth year. Computer Science & Philosophy undergraduates may choose to undertake a Computer Science project or a Philosophy thesis in their fourth year.

This is your chance to work on your own project, something that interests and inspires you. We have put together a brief guidance document on how to do it; please click on the link below. On this site, you will also find a sortable and searchable list of projects and potential supervisors. Undergraduate students are welcome to choose one of the projects on the list, or approach a potential supervisor and negotiate your own topic. MSc students should discuss with potential supervisors and get agreement.

You will be expected to make arrangements with potential supervisors between weeks 5 and 7 in Hilary Term, for project work for the following year. Please don't contact potential supervisors outside this window.

Limited equipment is available for use in projects, namely LEAP equipment. If you have any requests, please contact the Head of Academic Administration.

Important Deadlines are:

All:: Monday of Week 7, Hilary Term : your online project registration survey must be completed. You will be sent a link.
MSc students:: Monday of Week 1, Trinity Term : submission deadline for your project proposal.
3rd/4th year students:: 12pm on Monday of Week 4 of Trinity Term : Submission deadline.
MSc students:: Last Tuesday in August (25th August 2026): Submission deadline

Project writing handbook

Sample projects

3rd year

4th year

List of projects

Suitable for

Part B Part C MSc

Research themes

Supervisors

Project	Supervisors	Themes	B	C	MSc	Description
Aggregation of Photovoltaic Panels	Alessandro Abate	Artificial Intelligence and Machine Learning, Automated Verification		C	MSc	The increased relevance of renewable energy sources has modified the behaviour of the electrical grid. Some renewable energy sources affect the network in a distributed manner: whilst each unit has little influence, a large population can have a significant impact on the global network, particularly in the case of synchronised behaviour. This work investigates the behaviour of a large, heterogeneous population of photovoltaic panels connected to the grid. We employ Markov models to represent the aggregated behaviour of the population, while the rest of the network (and its associated consumption) is modelled as a single equivalent generator, accounting for both inertia and frequency regulation. Analysis and simulations of the model show that it is a realistic abstraction, and quantitatively indicate that heterogeneity is necessary to enable the overall network to function in safe conditions and to avoid load shedding. This project will provide extensions of this recent research. In collaboration with an industrial partner. Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking
Analysis and verification of stochastic hybrid systems	Alessandro Abate	Automated Verification		C	MSc	Stochastic Hybrid Systems (SHS) are dynamical models that are employed to characterize the probabilistic evolution of systems with interleaved and interacting continuous and discrete components. Formal analysis, verification, and optimal control of SHS models represent relevant goals because of their theoretical generality and for their applicability to a wealth of studies in the Sciences and in Engineering. In a number of practical instances the presence of a discrete number of continuously operating modes (e.g., in fault-tolerant industrial systems), the effect of uncertainty (e.g., in safety-critical air-traffic systems), or both occurrences (e.g., in models of biological entities) advocate the use of a mathematical framework, such as that of SHS, which is structurally predisposed to model such heterogeneous systems. In this project, we plan to investigate and develop new analysis and verification techniques (e.g., based on abstractions) that are directly applicable to general SHS models, while being computationally scalable. Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games Prerequisites: Familiarity with stochastic processes and formal verification
Automated verification of complex systems in the energy sector	Alessandro Abate	Artificial Intelligence and Machine Learning, Automated Verification		C	MSc	Smart microgrids are small-scale versions of centralized electricity systems, which locally generate, distribute, and regulate the flow of electricity to consumers. Among other advantages, microgrids have shown positive effects over the reliability of distribution networks. These systems present heterogeneity and complexity coming from 1. local and volatile renewables generation, and 2. the presence of nonlinear dynamics both over continuous and discrete variables. These factors calls for the development of proper quantitative models. This framework provides the opportunity of employing formal methods to verify properties of the microgrid. The goal of the project is in particular to focus on the energy production via renewables, such as photovoltaic panels. The project can benefit form a paid visit/internship to industrial partners. Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games Prerequisites: Familiarity with stochastic processes and formal verification, whereas no specific knowledge of smart grids is needed.
Bayesian Reinforcement Learning: Robustness and Safe Training	Alessandro Abate	Artificial Intelligence and Machine Learning, Automated Verification		C		In this project we shall build on recent work on ``Safe Learning'' [2], which frames classical RL algorithms to synthesise policies that abide by complex tasks or objectives, whilst training safely (that is, without violating given safety requirements). Tasks/objectives for RL-based synthesis can be goals expressed as logical formulae, and thus be richer than standard reward-based goals. We plan to frame recent work by OXCAV [2] in the context of Bayesian RL, as well as to leverage modern robustness results, as in [3]. We shall pursue both model-based and -free approaches. [2] M. Hasanbeig, A. Abate and D. Kroening, ``Cautious Reinforcement Learning with Logical Constraints,'' AAMAS20, pp. 483-491, 2020. [3] B. Recht, ``A Tour of Reinforcement Learning: The View from Continuous Control,'' Annual Reviews in Control, Vol. 2, 2019.
Formal verification of a software tool for physical and digital components	Alessandro Abate	Automated Verification			MSc	We are interested in working with existing commercial simulation software that is targeted around the modelling and analysis of physical, multi-domain systems. It further encompasses the integration with related software tools, as well as the interfacing with devices and the generation of code. We are interested in enriching the software with formal verification features, envisioning the extension of the tool towards capabilities that might enable the user to raise formal assertions or guarantees on models properties, or to synthesise correct-by-design architectures. Within this long-term plan, this project shall target the formal generation of faults warnings, namely of messages to the user that are related to ``bad (dynamical) behaviours'' or to unwanted ``modelling errors''. The student will be engaged in developing algorithmic solutions towards this goal, while reframing them within a formal and general approach. The project is inter-disciplinary in dealing with hybrid models involving digital and physical quantities, and in connecting the use of formal verification techniques from the computer sciences with more classical analytical tools from control engineering Courses: Computer-Aided Formal Verification, Software Verification. Prerequisites: Knowledge of basic formal verification.
Model learning and verification	Alessandro Abate	Artificial Intelligence and Machine Learning, Automated Verification		C	MSc	This project will explore connections of techniques from machine learning with successful approaches from formal verification. The project has two sides: a theoretical one, and a more practical one: it will be up to the student to emphasise either of the two sides depending on his/her background and/of interests. The theoretical part will develop existing research, for instance in one of the following two inter-disciplinary domain pairs: learning & repair, or reachability analysis & Bayesian inference. On the other hand, a more practical project will apply the above theoretical connections on a simple models setup in the area of robotics and autonomy. Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning
Precise simulations and analysis of aggregated probabilistic models	Alessandro Abate	Automated Verification		C	MSc	This project shall investigate a rich research line, recently pursued by a few within the Department of CS, looking at the development of quantitative abstractions of Markovian models. Abstractions come in the form of lumped, aggregated models, which are beneficial in being easier to simulate or to analyse. Key to the novelty of this work, the proposed abstractions are quantitative in that precise error bounds with the original model can be established. As such, whatever can be shown over the abstract model, can be as well formally discussed over the original one. This project, grounded on existing literature, will pursue (depending on the student's interests) extensions of this recent work, or its implementation as a software tool. Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning
Reinforcement Learning for Space Operations	Alessandro Abate, Licio Romao	Artificial Intelligence and Machine Learning, Automated Verification		C		Oxcav has an ongoing collaboration with the European Space Agency (ESA) that involves applying Reinforcement Learning (RL) algorithms to a satellite called OPS-SAT, which has been launched in 2019 and is a flying laboratory that allows ESA’s partners to test and validate new techniques (more information can be found at https://www.esa.int/Enabling_Support/Operations/OPS-SAT). This project aims at designing controllers that will be used to perform nadir pointing and sun-tracking of OPS-SAT, while meeting some specifications (e.g., admissible nadir pointing errors). The focus will be on data-driven methods that leverage available sensors (gyroscopes, GPS, fine sun sensor, magnetometer) and actuators data using a RL architecture to come up with a safe policy that can yield an adequate performance. The main tasks of the project will consist in (1) exploring an ESA platform called MUST to collect all the necessary data and (2) implementing a RL scheme that will be later deployed in the satellite. Throughout the project you will have the opportunity to work with state-of-the-art data-driven techniques that have been developed at Oxcav, under the supervision of Prof. Alessandro Abate and Dr. Licio Romao.
Relaxing Decisions under Uncertainty	Alessandro Abate, Giuseppe De Giacomo, Thom Badings, Francesco Fabiano	Automated Verification, Data, Knowledge and Action		C	MSc	Prerequisites: CAFV, PMC Background Motivation: Markov Decision Processes (MDPs) are a foundational framework for sequential decision-making in stochastic environments and play a central role in artificial intelligence and, particularly, reinforcement learning. However, a key limitation of MDPs is their reliance on precisely specified transition probabilities. In realistic settings, accurately estimating these probabilities is often difficult or infeasible. To overcome this issue, robust MDPs (RMDPs) extend standard MDPs by allowing uncertainty sets over transition probabilities rather than fixed values. The standard objective in an RMDP is to find an optimal robust policy, i.e., one that maximizes the expected return under the worst-case (adversarial) transition probabilities within the uncertainty set. While this adversarial assumption ensures robustness, it can also be overly conservative in practical applications where the environment does not actively oppose the agent. For example, consider an autonomous drone navigating through uncertain wind conditions: clearly, the wind might behave stochastically but not adversarially (the wind does not depend on the drone’s actions). Hence, optimizing only for the worst-case scenario can lead to unnecessarily cautious behaviour. This motivates our central research question: Can we compute the “best” policy, while keeping into account that the environment might not act fully adversarially? Related Work: Several research directions have considered this question by exploring relaxations of decision-making under uncertainty. Notably, the notion of best-effort introduced by (Faella, 2009) offers a game-theoretic relaxation of strictly winningstrategies. These ideas have since been extended to reactive synthesis, where, in the absence of a winning strategy, best-effort strategies can be computed efficiently. Building on this, recent work (Aminof, Giacomo, Rubin, & Zuleger, 2023) has adapted these concepts to stochastic games, and most recently, to robust MDPs as a tie-breaking criterion for optimal robust policies (Abate, Badings, De Giacomo, & Fabiano, 2026). This project aims to extend this line of research by developing a more flexible and theoretically grounded notion of best-effort robustness for RMDPs. Additionally, related perspectives from fairness in decision processes (Wen, 2021), and policy permissiveness (Zhu & De Giacomo, 2022) might offer valuable insights. Part of the project will involve a thorough literature review to identify conceptual and methodological connections among these lines of research and our proposed approach. Focus This project explores how to design policies for decision-making with robust MDPs, which are both robust but also not overly conservative. The main research question is: can we compute the “best” policy while accounting for the fact that the environment may not act fully adversarially? Expected contributions include one or more of the following: Define an ε-relaxed “best-effort optimal robustness” concept for RMDPs. Leverage best-effort to develop more efficient yet “robust enough” alternatives to standard methods, e.g, robust value iteration. Explore related concepts and propose new ways to balance robustness with expected environmental behavior. Method The project will begin with a literature review of the relevant works discussed above. Building on these foundations, we will develop and formalize a new notion of optimality, e.g., “ε-relaxed best-effort optimal robustness”, for RMDPs, examining theirtheoretical properties and relationship to existing approaches. The final phase will focus on implementing and evaluating the proposed notions through numerical experiments on common robust MDP benchmarks. The expected outcomes include a proof-of-concept implementation, a clear theoretical framework, and, if results are promising, a publication-quality paper summarizing the findings. Bibliography Abate, A., Badings, T., De Giacomo, G., & Fabiano, F. (2026). Best-Effort Policies for Robust Markov Decision Processes. 40th AAAI Conference on Artificial Intelligence, (p. TBD). Retrieved from https://arxiv.org/abs/2508.07790 Aminof, B., Giacomo, D. G., Rubin, S., & Zuleger, F. (2023). Stochastic Best-Effort Strategies for Borel Goals. LICS (pp. 1--13). IEEE. Faella, M. (2009). Admissible Strategies in Infinite Games over Graphs. MFCS -- Lecture Notes in Computer Science, 5734, 307--318. Retrieved from https://link.springer.com/chapter/10.1007/978-3-642-03816-7_27 Wen, M. a. (2021). Algorithms for Fairness in Sequential Decision Making. 24th International Conference on Artificial Intelligence and Statistics (pp. 1144--1152). PMLR. Zhu, S., & De Giacomo, G. (2022). Synthesis of Maximally Permissive Strategies for LTLf Specifications. IJCAI, (pp. 2783--2789). Retrieved from https://doi.org/10.24963/ijcai.2022/386
Safe Reinforcement Learning	Alessandro Abate	Automated Verification		C	MSc	Reinforcement Learning (RL) is a known architecture for synthesising policies for Markov Decision Processes (MDP). We work on extending this paradigm to the synthesis of ‘safe policies’, or more general of policies such that a linear time property is satisfied. We convert the property into an automaton, then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the automaton. With this reward function, RL synthesises a policy that satisfies the property: as such, the policy synthesis procedure is `constrained' by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP. We evaluate the performance of the algorithm on numerous numerical examples. This project will provide extensions of these novel and recent results. Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning
Safety verification for space dynamics via neural-based control barrier functions	Alessandro Abate	Automated Verification		C		Barrier functions are Lyapunov-like functions that serve as certificates for the safety verification of dynamical and control models. The OXCAV group has recently worked on the automated and sound synthesis of barrier functions structured as neural nets, with an approach that uses SAT modulo theory. In this project, we shall pursue two objectives: 1. Apply recent results [1] on sound and automated synthesis of barrier certificates on models that are pertinent to the Space domain, such as models for attitude dynamics. Airbus will support this part. 2. Develop new results that extend theory and algorithms to models encompassing uncertainty, such as probabilistic models or models that are adaptive to sensed data. Airbus will support this effort providing environments and models for experiments and testing. [1] A. Abate, D. Ahmed and A. Peruffo, ``Automated Formal Synthesis of Neural Barrier Certificates for Dynamical Models,'' TACAS, To appear, 2021.
Software development for abstractions of stochastic hybrid systems	Alessandro Abate	Automated Verification		C	MSc	Stochastic hybrid systems (SHS) are dynamical models for the interaction of continuous and discrete states. The probabilistic evolution of continuous and discrete parts of the system are coupled, which makes analysis and verification of such systems compelling. Among specifications of SHS, probabilistic invariance and reach-avoid have received quite some attention recently. Numerical methods have been developed to compute these two specifications. These methods are mainly based on the state space partitioning and abstraction of SHS by Markov chains, which are optimal in the sense of reduction in abstraction error with minimum number of Markov states. The goal of the project is to combine codes have been developed for these methods. The student should also design a nice user interface (for the choice of dynamical equations, parameters, and methods, etc.). Courses: Probabilistic Model Checking, Probability and Computing, Numerical Solution of Differential Equations Prerequisites: Some familiarity with stochastic processes, working knowledge of MATLAB and C
Thinking Fast and Slow in AI	Alessandro Abate, Francesco Fabiano	Automated Verification		C	MSc	Prerequisites: basic AI/ML courses Background Motivation: When working to build machines that have a form of “intelligence”, it is natural to be inspired by humans. Of course, humans are very different from machines, in their embodiment and myriad other ways. Humans exploit their bodies to experience the world, create an internal model of it, and use this model to reason, learn, and make contextual and informed decisions. Machines lack the same embodiment but often have access to both more memory and more computing power. Despite these crucial disanalogies, it is still useful to leverage our knowledge of how the human mind reasons and makes decisions to design and build machines that demonstrate behaviours similar to those of a human. In this project, we aim to investigate a novel AI architecture, Slow and Fast AI (SOFAI), which is inspired by the Thinking, Fast and Slow cognitive theory of human decision-making. SOFAI is a multi-agent architecture that employs both “fast” and “slow” solvers underneath a metacognitive agent that is able to choose among solvers as well as reflect on, and learn from, past experience. Related Work: Cognitively inspired architectures are a widely studied area of research and have produced many applications and different architectural paradigms (Kotseruba and Tsotsos 2020). In this project, we examine a specific novel architecture introduced in (Booch, et al. 2021, Fabiano, et al. 2025), along with its underlying paradigms (Kahneman 2011, Graziano 2013).The architecture has been shown to be valuable in several use cases, such as automated planning and constrain-drivennavigation (Fabiano, et al. 2025). Focus This project centres on the Thinking Fast and Slow paradigm and related dual-process theories, rather than on the SOFAI architecture itself. We aim to investigate how System 1/System 2 interactions can support learning, modelling, and decision-making in new settings. Some research directions include: Use the Thinking Fast and Slow paradigm to explore novel problem settings, e.g., controller synthesis. Investigate methods to derive deductive rules from experience and use those rules to guide and constrain reasoning. Enrich the architecture with online learning so both systems can readily adapt from experience. Develop solver-agnostic ways to transfer information from System 1 to System 2. Explore non-crisp notions of correctness: “how can we evaluate solutions when a formal notion of correctness is absent?” Shift the role of System 1 and System 2 from the purely “solving” phase to the modelling phase. Method The project will begin with a literature review. Building on this, we will formalize ways to enrich the interaction between System 1 and System 2---for example, by developing principles that guide S2 reasoning using S1 outputs, or by evaluating the architecture’s behaviour through relaxed notions of correctness. The later stages will focus on implementing and testing these ideas in controlled settings, such as synthesis or planning tasks, to assess how the proposed S1/S2 mechanisms improve modelling or decision quality. Expected outcomes include a proof-of-concept implementation, a clear conceptual and theoretical framework, and, if results are promising, a publication-ready paper detailing the findings. Bibliography Booch, Grady, Francesco Fabiano, Lior Horesh, Kiran Kate, Jonathan Lenchner, Nick Linck, Loreggia, et al. 2021. “Thinking Fast and Slow in AI.” AAAI Conference on Artificial Intelligence. 15042-15046. https://ojs.aaai.org/index.php/AAAI/article/view/17765. Fabiano, Francesco, Marianna B. Ganapini, Loreggia, rea, Nicholas Mattei, Keerthiram Murugesan, Vishal Pallagani, Francesca Rossi, Biplav Srivastava, and K. Brent Venable. 2025. “Thinking Fast and Slow in Human and Machine Intelligence.” Commun. ACM (68) 72–79. doi:10.1145/3715709. Graziano, Michael SA. 2013. Consciousness and the social brain. Oxford University Press. Kahneman, Daniel. 2011. Thinking, Fast and Slow. Macmillan. Kotseruba, Iuliia, and John K. Tsotsos. 2020. “40 years of cognitive architectures: core cognitive abilities and practical applications.” Artificial Intelligence Review. https://doi.org/10.1007/s10462-018-9646-y.
Tool for Data-driven Abstraction of Stochastic Dynamical Systems	Alessandro Abate, Thom Badings, Mahdi Nazeri	Automated Verification		C	MSc	Prerequisites: Python Programming Background Our group has published a series of papers providing a general framework for data-driven abstraction of stochastic dynamical systems [1], [2]. These works introduce methods forconstructing Interval Markov Decision Processes (IMDPs) abstractions directly from data that allows formal verification and synthesis. Focus The project focuses on implementing and extending this framework to a scalable tool that uses Python and JAX to enable high-performance parallel computation. Method We will: · Implement the core abstraction framework described in [1], [2] using Python and JAX. · Ensure that the implementation supports efficient batching, vectorization, and parallel computation. · Evaluate the tool on at least a dozen stochastic dynamical systems Reading: [1] Nazeri, Mahdi, et al. "Data-Driven Abstraction and Synthesis for Stochastic Systems with Unknown Dynamics." arXiv preprint arXiv:2508.15543 (2025). [2] Nazeri, Mahdi, et al. "Data-driven yet formal policy synthesis for stochastic nonlinear dynamical systems." arXiv preprint arXiv:2501.01191 (2025).
AI Vulnerability Modelling	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	This project would seek to study AI systems to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. The approach might involve designing test suites using attack graphs and validation in a laboratory setting.
Augmented-Reality Personal Security Solutions	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	This project seeks to explore the application of augmented-reality technologies to developing more usable personal-security solutions. There is a growing body of work exploring how interactive augmented-reality technologies can help users to maintain awareness of potential threats and thus protect themselves against cyber-attacks. The project will build on existing work to develop and validate the effectiveness of new augmented-reality tools.
Chatbot Attack and Vulnerability Models	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	As chatbots are increasingly widely used in a growing range of applications, understanding the potential for cyber-attacks to compromise the integrity and confidentiality of the data they output is critical. This project would seek to study chatbot systems to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. The approach might involve designing test suites using attack graphs and validation in a laboratory setting.
Deep Learning Models to Support Computer Network Defence	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	This project would explore the application of deep-learning models such as Large Language Models (LLM) to computer network defence. Deep learning has the potential to improve processes at various defensive stages including identification of assets and risk analysis; detection and mitigation of attacks; and automated response. The project scope is broad enough to allow the student to identify an area of interest.
Designing Cybersecurity Test Suites for Generative AI Systems	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	As generative AI systems are increasingly widely used in a growing range of applications, understanding the potential for cyber-attacks to compromise the integrity and confidentiality of the data they output is critical. This project would aim to develop attack graphs for generative AI systems, and based on this design cybersecurity test suites that facilitate testing the security of generate AI systems against a range of potential attacks. The project might involve using these test suites to test the security of a range of generative AI implementations in a laboratory setting.
Generating Realistic Cybersecurity Datasets and Testbeds	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	A common challenge in the cybersecurity research community is the limited availability of cybersecurity data: while there are some real data captures made available publicly or for research purposes, challenges of confidentiality and practicality mean the resources are limited. A particular example is the availability of network data from various environments: the networks of organisations; Industrial Internet-of-Things (IIoT) environments; and Blockchain networks. This project seeks to develop high-quality synthetic cybersecurity datasets that would benefit the cybersecurity research community. The project would begin with background research to identify the characteristics of a realistic dataset. The approach might then involve creating a cybersecurity testbed composed of a synthetic network and set of attack test suites, and capturing traffic.
Insider threat detection	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	Organisations are experiencing an ever-growing concern of how to identify and defend against insider threats. Those who have authorised access to sensitive organisational data are placed in a position of power that could well be abused and could cause significant damage to an organisation. This could range from financial theft and intellectual property theft, through to the destruction of property and business reputation. This focus of this project is on the detection and identification of insider threats to organisations. The project builds on the group’s previous research in this area.
Low-Orbit Space Cybersecurity	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	This project would seek to study low-orbit space systems to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. The approach might involve designing test suites using attack graphs and validation in a laboratory setting.
Operational Security Tools for Users with Limited Cybersecurity Knowledge	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	As digitisation increases globally, there is a growing range of different types of user required to have an understanding of the operational cybersecurity of digital systems – i.e., the ability to detect and respond to potential cyber attacks. This includes not only individual users of smartphones and computers, but also personnel in organisations from the growing range of sectors becoming digitised (e.g., transport, marine, energy, healthcare). New challenges are emerging in terms of how to enable parties to monitor the necessary information and make informed security decisions. This project seeks to identify the requirements of users in this area, and develop operational cybersecurity tools developed to this need. The project scope is broad, and the student may select particular sectors of interest, or particular approaches of interest such as cybersecurity visualizations or automated incident-response tools.
Ransomware detection	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	Ransomware attacks are increasing rapidly every year. While signature-based malware detection methods work well for detecting and stopping known malware, attackers can bypass the detection using obfuscation techniques or zero-day attacks. There is therefore a need for better detection mechanisms that are able to predict new forms of malware and react against them. This project aims at exploring malware detection to develop a better understanding of the differences that make malware different from normal processes or files. It will further seek to implement a machine learning (ML) tool that would help in detecting malicious behaviour efficiently, so that malware infection and propagation can be stopped. The ML classifiers used will depend on the malware family explored as well as data available.
Systemic-Risk Modelling	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	A number of features of the digital environment create the potential for systemic risk, i.e., the possibility that a single event or development might trigger widespread failures and negative effects spanning multiple organisations, sectors or nations. These features include the interconnectivity of digital systems, homogeneity of devices meaning vulnerabilities may affect a large proportion of systems, and high dependency of organisations on service providers. This project would aim to develop models that allow analysis and quantification of the systemic risk that a population may face. The approaches used may include agent-based simulations and/or the development of harm trees.
Visualising Large Cybersecurity Datasets	Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	A key challenge for cybersecurity operations in organisations is the sheer scale of the data generated within organisations as they increasingly digitise their operations. The scale of data makes is challenging for personnel responsible for monitoring security to maintain a clear situational awareness of network activity, and detect, identify and respond to potential attacks. This project focuses on identifying and implementing approaches to visualising large cybersecurity datasets in a way that is comprehensible and digestible for users. The project might include requirements gathering (e.g., through interviews) to identify users’ needs; implementation and validation of visualization approaches in a laboratory setting; or testing of the developed visualizations with end users to assess utility.
Sonification for detecting cyber-attacks	Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	In the face of increasingly frequent, sophisticated and varied cyber-attacks, organisations must continuously adapt and improve their network defences. There is a need for effective tools to help security practitioners to engage with and understand the data communicated over the network, and the outputs of automated attack-detection methods. Visual (text-based and graphical) presentations of data are usually used for this. Over the last few years, the utility of sonification (the representation of data as sound) for network-security monitoring has been theorised and explored experimentally. This project seeks to build on prior research in our research group and externally, in which the effectiveness of sonification at representing a limited range of attack types has been experimented with. The aim of the project is to expand on this experimentation by assessing and comparing the effectiveness of various sonification designs at representing a wider range of attack types. This will involve identifying the key indicators present in network data for each attack type, exploring how the sonification design can best represent these indicators, and experimenting with the effectiveness of the resulting sonification approaches. Attack types experimented with could include, for example, various malwares and ransomwares, advanced persistent threats, and various types of flooding attack (this could use both real and synthetic attack datasets).
Threat Models for Blockchains	Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	This project would seek to study Blockchain-based systems to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. The project might focus on the general form of Blockchains, or focus on a particular implementations or consensus protocols. A number of different approaches might be considered, for example: formal modelling of protocols used to establish consensus or resolve conflicts; design of test suites using attack graphs and validation in a laboratory setting.
AIS Signals Cleaning and Normalisation	Michael Benedikt, Giorgio Orsi	Data, Knowledge and Action			MSc	This project falls under the fields of Geospatial data engineering, a subfield of Computer Science concerned with geospatial data management and analysis. Automatic Identification Systems (AIS) is the primary mechanism used by the shipping industry to provide secure navigation over the oceans. The International Maritime Organisation (IMO) has been leading and enforcing the use of AIS for vessel monitoring and maritime traffic management. Satellite AIS signals are usually transmitted by transponders carried by the vessels and received by satellites carrying specialised equipment and then retransmitted to receiving ground stations. Often ground stations can receive AIS signals directly from the vessels if they are near shore. Regardless of the technology used, AIS signals suffer from a large amount of noise which can be roughly classified into: ● Data corruption: usually caused by interference due to equipment or atmospheric events ● Data errors: the information carried by an AIS signal is mostly human-inputted and therefore subject to human error ● Data obfuscation: the identity of a vessel (e.g., its MMSI) can be intentionally changed to hide the true identify of the vessel and its purpose (spoofing) ● Data gaps: AIS signals can be missing either because of operational signal loss or by intentionally disabling the AIS transponder for, e.g., safety reasons ● AIS noise poses substantial challenges for the correct identification of the vessel, its nature, and the identification of its current position and route. Current approaches to the problem of AIS cleaning largely rely on batch clustering and analysis of AIS signals, i.e., the AIS signals are processed and corrected in bulk. Batch processing of AIS signals is no longer suitable for the dynamic, fast moving requirements of trading applications which are emerging in the energy sector and in which Vortexa operates. Trajectory prediction algorithms rely on the implicit assumption that the input data is received in order (i.e., in terms of timestamps) and clean. AIS data is noisy and often received out of order undermining the accuracy of existing trajectory prediction algorithms. On the other hand, data cleaning algorithms, especially those relying on clustering, assume the entire dataset to be cleaned is available beforehand. AIS data is naturally streaming and their geospatial nature limits the effectiveness of traditional off-the-shelf cleaning algorithms. At this time, all approaches to the AIS noise problem aim at providing general algorithms that are only based on geospatial information provided by the AIS signal, e.g., positions, vessel type, identity, vastly ignoring other potential useful information which is specific to the specific industries the vessels operate in, e.g., tankers vs dry-bulk cargoes. The aims of this project are as follows: 1) to attain an understanding of the patterns and phenomenology of AIS noise at a fundamental level. 2) to design algorithms with theoretical quality guarantees to tackle the AIS noise problem, building on top of established research such as TREAD (Pallotta et Al. - 2013), OP TICS (Ankerst et Al. - 1999), Spectral Clustering (Shi and Malik - 2000), and Adaptive HDBSCAN (Bai et Al. - 2023). 3) to demonstrate the effectiveness of these new noise cleaning algorithms in a real world setting of oil, gas, and chemical tankers as well as supporting vessels (e.g., lightering and bunkering). The specific innovations we are bringing include: 1) Exploring the use of product information (i.e., the type of cargo carried by the vessel) to correctly classify the behaviour of the vessel 2) The use of commercial information about the vessel to determine how aggressive the noise cleaning algorithm needs to be (e.g., dark fleets vessels typically require more aggressive cleaning) 3) The use of knowledge about the source and location of the AIS receivers and satellites to adapt the noise cleaning algorithm (e.g., genuine AIS signals gaps are more likely and frequent in South East Asia than in the Mediterranean sea. 4) The study, identification, and classification of known patterns of AIS spoofing as a means to complement general-purpose algorithms with human-provided knowledge about spoofing behaviour. We believe this is the right time to investigate this problem in a more principled way as the number of satellite constellations capable of receiving AIS signals have dramatically increased in recent years. Moreover, the number of actors who actively attempt to undermine the effectiveness of AIS receivers is also increasing especially in areas involved in conflicts where electronic warfare is predominant. Dark fleets from sanctioned countries such as Iran and Russia actively adopt AIS spoofing and interference technology to mask or disrupt the AIS network making this research area especially important for the entire maritime industry. Skills and Experience Required: ● Driven by working in an intellectually engaging environment with the top minds in the industry, where constructive and friendly challenges and debates are encouraged, not avoided ● Strong foundation in software engineering and machine learning, with coursework in advanced machine learning, databases, or data science preferred. ● Proficiency in Python, especially in machine learning libraries and geospatial data processing. ● Interest in online machine-learning algorithms and data streams. ● Interest in applying machine learning to real-world maritime challenges and developing cutting-edge algorithms.
Advanced Detection of Ship-To-Ship Transfers	Michael Benedikt, Giorgio Orsi	Data, Knowledge and Action			MSc	The project focuses on the application of data science in maritime operations, specifically in enhancing the detection of ship-to-ship (STS) transfer activities. STS operations entail the exchange of cargo between two vessels at sea, serving both legitimate purposes – such as draught optimisation to prevent grounding in shallow waters or cargo ownership transfer – as well as illegal purposes, such as circumventing international sanctions. Detection hinges on analysing GPS-based patterns in Automatic Identification System (AIS) data emitted by ships. Vortexa is recognised as an industry leader in real-time global STS operation detection. However, our existing detection model has become outdated. Developing a new model using the latest data and techniques is crucial for enhancing detection accuracy and range, providing substantial value to our clients. Research challenges: ● Modern neural architectures: explore the integration of state-of-the-art neural networks. This new model must be scalable to detect STS operations worldwide in real-time. ● Enhanced geographic coverage: the new model’s aim is to expand its surveillance capabilities beyond predefined regions of interest. By employing advanced data analytics and machine learning techniques, we seek to automatically identify and monitor potential STS activity across the globe, eliminating dependency on manual inputs and uncovering new areas of interest. ● Increased accuracy and reduced false positives: refine detection algorithms to minimise false positives, particularly in busy maritime areas with frequent anchorages. Achieving superior accuracy will enhance the reliability of alerts, boosting client confidence and data usability. ● Detection of "dark" STS events: one of the most ambitious advances is to develop methodologies to detect STS transfers involving "dark" ships, which disable their AIS transponders to evade sanctions. This requires integrating alternative data sources and advanced pattern recognition technologies, achieving breakthroughs in illicit activity tracking and compliance monitoring. Achieving these objectives promises substantial advancement in our detection capabilities, reinforcing our industry leadership while providing clients with unmatched insights and reliability. Expected outcomes: ● Development of a new STS detection model utilising modern neural architecture. ● Enhanced detection accuracy with reduced false positives and broader geographic monitoring. ● Breakthrough methodologies for identifying “dark” STS activities, setting new standards in maritime monitoring. Skills and Experience Required: ● Driven by working in an intellectually engaging environment with the top minds in the industry, where constructive and friendly challenges and debates are encouraged, not avoided ● Strong foundation in software engineering and machine learning, with coursework in advanced machine learning or data science preferred. ● Proficiency in Python, especially in machine learning libraries and geospatial data processing. ● Interest in applying machine learning to real-world maritime challenges and developing cutting-edge detection methods.
Decision procedures for arithmetic with powers	Michael Benedikt	Data, Knowledge and Action		C	MSc	The project will look at implication and satisfaction problems involving formulas that involve additive arithmetic, inequalities, and some form of exponentiations with a constant base. For example, systems of inequalities with constant multiples of variables and constant multiple of expressions like 2^x. We will also look at first order logic built up from these inequalities. In the absence of exponentiatial terms like 2^x, the theory is known as Presburger Arithmetic, and has been heavily investigated in both theory and practice. In the presence of expenontial terms to a fixed base, it is known to be decidable, from work of Semenov in the late 70's and 80's. Recent work has shown fragments where the complexity is not too terrible, and in the process some new algorithms have been proposed. The goal of the project is to implement decision procedures for expressions of this form. A theoretically-oriented student can also work on complexity analysis. A pre-requisite for the project is good background in logic - Logic and Proof and preferably one of CAFV or KRR.
Destination Sequence Model	Michael Benedikt, Giorgio Orsi	Data, Knowledge and Action			MSc	The "Destination Sequence Model" project seeks to advance our capacity to forecast global oil and gas tanker destinations. At Vortexa, we've already developed an industry-leading model that provides accurate, real-time predictions of vessel destinations. This information is valuable to our clients, helping them optimise logistics, understand supply and demand trends, take advantage of market opportunities, and make informed decisions about their investments. This project aims to push the boundaries of our predictive capabilities by leveraging advanced machine learning architectures, specifically Transformers, for vessel movement analysis. By treating historical vessel movements as sequences, we aim to refine prediction precision and obtain deeper insights into behavioural patterns in tanker routes. Despite the success of our current models, incorporation of elements like geohashing remains unexplored, which holds potential for addressing significant prediction challenges. Research challenges: ● Dynamic data landscape: the energy market is highly volatile, influenced by factors such as supply and demand, geopolitical events, regulatory changes, and weather. Accurately predicting vessel destinations requires robust modelling to adapt to these fluctuations. ● Data incompleteness and noise: shipping schedules and routes often remain confidential, leaving gaps in our data. Furthermore, the Automatic Identification System (AIS), a critical data source, is susceptible to noise, data manipulation, and coverage gaps, thus complicating prediction efforts. ● Erratic vessel behaviour: tankers can alter their destinations mid-journey in response to market conditions, price fluctuations, or new orders, introducing additional complexity into forecasting models. ● Complex logistics: the process of transporting energy often involves multiple stops, transfers, and storage points, necessitating sophisticated modelling techniques surpassing traditional methods. Expected outcomes: ● Development and evaluation of an enhanced prediction model integrating geohashing. ● Identification of key factors influencing tanker routes through advanced sequence modelling. ● Generation of insights into the market factors driving vessel diversions. Skills and Experience Required: ● Driven by working in an intellectually engaging environment with the top minds in the industry, where constructive and friendly challenges and debates are encouraged, not avoided ● Strong foundation in software engineering and machine learning, with coursework in advanced machine learning or data science preferred. ● Proficiency in Python, especially in machine learning libraries and geospatial data processing. ● Interest in online machine-learning algorithms and data streams. ● Interest in applying machine learning to real-world maritime challenges and developing cutting-edge algorithms.
Explainable early dementia modelling from longitudinal, multimodal data, and knowledge	Michael Benedikt	Data, Knowledge and Action			MSc	Supervisors: Hang Dong (University of Exeter) and Michael Benedikt (Oxford) Early dementia prediction is an urgent problem. The goal is to improve the clinical pathway for individuals who are at risk of dementia due to well-established risks such as cognitive decline, lifestyle, medical or genetic factors. Artificial Intelligence has the potential to offer a route to personalisation of risk profiling to detect early, mild cognitive impairment cases. There are two key challenges in early dementia modelling, which are at the core of this project: (1) The linking and generative modelling of longitudinal, tabular, cognitive test data with external knowledge bases, and (2) the meaningful explanation of the prediction, using linkage to external knowledge bases within explanation. The project will build on recent advances in prediction methods [1] for tabular data understanding e.g., using tables in csv or excel format that record information from participants across several timepoints, including clinical data and answers to questions. We will extend these methods with linkages to domain-specific knowledge bases [2] (e.g., constructed from the literature on cognitive tests and dementia risks with the tabular data). This approach is expected to enhance both prediction performance and interpretability, providing more robust insights into dementia progression and associated risk factors. The project will mainly work on the data from the PROTECT platform [3], containing a cohort of 30,000 older adults in community and primary care settings with ten-year longitudinal data. We will explore enhancing explainability through connections within the tabular data and between the tabular data and external sources. For example, we can link datatypes in the table to knowledge bases derived from the literature, and these can be useful in providing more interpretable predictions [5]. The project team will work with medical experts at University of Exeter, both in providing background and in assessment. The project requires good development skills and a good general background in machine learning. No specialised background on dementia or tabular data learning is required. [1] Shmatko A, Jung AW, Gaurav K, Brunak S, Mortensen LH, Birney E, et al. Learning the natural history of human disease with generative transformers. Nature. 2025 Sept 17;1-9. doi: 10.1038/s41586-025-09529-3 [2] Chen J, Dong H, Hastings J, Jiménez-Ruiz E, López V, Monnin P, Pesquita C, Škoda P, Tamma V. Knowledge Graphs for the Life Sciences: Recent Developments, Challenges and Opportunities. Transactions on Graph Data and Knowledge. 2023 Dec 19;1(1):5-1. doi: 10.4230/TGDK.1.1.5 [3] University of Exeter. PROTECT Study: Research to support healthy ageing and reduce dementia risk. https://www.exeter.ac.uk/research/dementia-research/research/protect/#a0. Accessed 28 Sep 2025. (Led by Prof Anne Corbett and the team) [4] Hotchkiss L, Squires E, Gallacher JE, Newbury M, Morris C, Lyons RA, et al. Enabling Advanced Multi-Modal Neuroimaging Analysis within a Trusted Research Environment. medRxiv; 2024 Feb 13. doi: 10.1101/2024.02.13.24302751. (See datasets at https://portal.dementiasplatform.uk/data/imaging-matrix/) [5] Longo L, Brcic M, Cabitza F, Choi J, Confalonieri R, Ser JD, et al. Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Information Fusion. 2024 June 1;106:102301. doi: 10.1016/j.inffus.2024.102301
Generative AI Approaches to Maritime Data Parsing	Michael Benedikt, Giorgio Orsi	Data, Knowledge and Action			MSc	This project lies within the field of natural language processing (NLP) and Information Extraction (IE) using Generative AI (GenAI) technologies. The specific problem at hand is Information Extraction (IE) from unstructured and semi-structured text documents used by the maritime industry. These documents include, among other: ● Port agent records (import/export bills of lading), containing information such as the identities of the vessels loading or discharging cargo into a port, the product carries, the quantities, and ports and dates of arrival, departure, clearance. Commercial information is often present such as the charterer, shipper, consignee, notify party. Depending on the country and the ports specific identifying codes can be present. ● Inspection reports, containing information about safety and commercial inspections of cargoes at various ports and carried out by specialised companies. These reports contain very technical detailed information about the cargoes, including technical specifications, e.g., APIs classifications for fuel. They also often include technical information about the vessel itself, e.g., checking the presence of a scrubber or how it fares against safety standards. ● Fixtures, containing information about a maritime fixture of a vessel for the transport of a specific cargo, including the type of product, the identity of the vessels, the rate at which it was fixed, the terms of the contracts, and the laycan dates. ● Port Lineups, containing predicted arrival schedules with predicted dates at ports across the world. These documents are often the result of manual data inputting and OCR conversion (from e.g., paper filings) resulting in a substantial amount of noise and errors. In academic terms, these documents fall within the categorisation of Noisy Unstructured Text (or NUT) which poses challenges when coming to parsing and information extraction. Thanks to the extraordinary advances in Large Language Models (LLMs) and Generative AI, the state of the art in NLP and IE is extraordinarily advanced. Modern LLMs like GPT-o and Claude show incredible parsing and question answering capabilities both in terms of general language understanding as well as specialised domains, e.g., medical and legal where an extraordinary amount of data is available. LLMs language understanding performance in specialised domains is directly dependent on the availability of data for that specific domain. While a large amount of data about the energy and maritime industries is available when it comes to general knowledge, data is extremely scarce when it comes to the technologies, operations, entities, and processes. Most of this data is still collected and managed manually by traders, brokers, port agents and government agencies without specific requirements for disclosure. It’s currently extremely challenging to train or even fine tune LLMs on the type of documents used within the energy and maritime industries. Moreover the fields are extremely technical, often matching the level and depth of technicalities found in the medical and legal domains. The main aim of this project is to demonstrate that we can fully replace the need for custom rule-based parsers written in an imperative or declarative programming language entirely with LLMs. Additionally we aim at demonstrating that this can be achieved at the necessary level of accuracy required by the maritime industry (>95% accuracy). The solution must be multi-modal (i.e., able to extract information from both images, text, and eventually audio) and should not be dependent on the type of information being parsed. For example, it needs to be able to process with a single model fixtures, port agent records, port lineups and inspection reports without requiring multiple models or different conditioning. The solution must be scalable to process documents in real-time as contracts (i.e., fixtures) are signed, and vessels enter or leave ports. Moreover, the solution must be transparent and explainable to comply with the strict regulatory requirements that the maritime and energy industries enforce, including confidentiality. Skills and Experience Required: ● Driven by working in an intellectually engaging environment with the top minds in the industry, where constructive and friendly challenges and debates are encouraged, not avoided ● Strong foundation in software engineering and machine learning, with coursework in advanced machine learning or data science preferred. ● Proficiency in Python, especially in machine learning libraries and natural language processing. ● An understanding of the fundamentals of LLMs, Generative AI, and Retrieval Augmented Generation (RAG) is a plus.
Graph Machine Learning with Neo4j	Michael Benedikt	Data, Knowledge and Action			MSc	Supervisor: Michael Benedikt Co-supervisor: Neo4j (industrial partner) We have discussed a set of projects concerning graph ML in industry with Brian Shi (an Oxford alumnus) of the Neo4j Graph Data Science team, including (but not limited to): LLM agents with graph tools. LLMs equipped with tools can act as agents capable of handling complex tasks. Adding retrieval (Cypher) and graph-algorithm tools enhances the retrieval and reasoning capabilities of LLM-based agents. What types of tasks can be solved by agents that call sequences of such tools? How can we design these tools so that LLMs can use them more effectively out of the box? Can we use techniques such as SFT, RLHF, or RLVR to improve the agent’s ability to use existing algorithmic tools? Bridging graph neural networks and graph databases. Basic message-passing equations can be implemented in Cypher, but which GNN architectures can be (partly) expressed as queries, and which queries can be implemented and learned by GNNs? Are there GNNs that cannot be written in Cypher? How can we leverage a graph database for efficient GNN inference? A student would work with Michael Benedikt and the Neo4j team. The project will be research-focused, and any software produced would be in the public domain. Additional hardware and resources will be provided if needed. The balance between experiment and theory can be tuned to the student’s interests. The main prerequisite is strong knowledge of graph ML (e.g., the GDL course), algorithms, databases, or general ML, depending on the specific topic.
Optimized reasoning with guarded logics	Michael Benedikt	Data, Knowledge and Action		C	MSc	"Inference in first-order logic is undecidable, but a number of logics have appeared in the last decade that achieve decidability. Many of them are guarded logics, which include the guarded fragment of first-order logic. Although the decidability has been known for some decades, no serious implementation has emerged. Recently we have developed new algorithms for deciding some guarded logics, based on resolution, which are more promising from the perspective of implementation. The project will pursue this both in theory and experimentally." Prerequisites A knowledge of first-order logic, e.g. from the Foundations of CS or Knowledge Representation and Reasoning courses, would be important.
Game-theoretic Approaches to Multi-Agent Pathfinding	Sara Bernardini	Data, Knowledge and Action	B	C	MSc	Prerequisites: Artificial Intelligence, programming Background Multi-Agent Path Finding (MAPF) is a core optimisation problem in multi-agent systems: given many agents sharing a space, compute time-coordinated, collision-free paths from their start locations to their assigned goals. Contemporary applications are many and include automated warehouses, UAV/drone flight scheduling and urban air-traffic management, autonomous-vehicle and robotaxi fleets, port and airport ground-vehicle coordination, factory and semiconductor-fab AGVs/AMHS, sidewalk delivery robots, hospital logistics robots, and crowd/character routing in games and simulation. All of these applications involve large agent populations and complex environments, making the MAPF problem challenging. Computing optimal MAPF plans is NP-hard, so practical deployments typically use bounded-suboptimal or anytime methods that trade exact optimality for speed while preserving high solution quality. Most MAPF planners are centralised, i.e. one coordinator computes and dispatches all agents’ paths, which simplifies global reasoning but creates serious practical limits. First, scalability is poor: planning and re-planning latency and memory blow up with agent count and map size. Second, the architecture is brittle: a single point of failure and communication bottlenecks make performance degrade under packet loss or overload. Third, centralised schedules are hard to execute robustly in the real world: small timing errors, sensor noise, and model mismatch (e.g., grid abstractions that ignore robot kinematics and dynamics) quickly invalidate tightly coordinated plans. Finally, responsiveness is limited: disturbances such as delays, blockages, or new tasks often require expensive global re-planning, and heterogeneity in agent sizes and speeds further exacerbates these issues. Focus This project aims to develop a principled, decentralised framework for MAPF based on game theory. In particular, the idea is to formulate MAPF as a noncooperative game and draw on congestion game theory, mechanism and incentive design, and evolutionary dynamics to derive distributed algorithms with provable performance guarantees, explicitly characterising efficiency/complexity tradeoffs. Noncooperative game theory naturally models systems of strategic, self-interested agents with potentially conflicting objectives, as in the applications above. This viewpoint allows us to use game-theoretic tools to optimise, or at least improve, system outcomes and welfare. In particular, mechanism design studies how to synthesise policies that enable a planner to efficiently allocate scarce resources to agents while ensuring that they are individually rational and strategy-proof. These desirable properties guarantee that each agent always benefits from participating in the mechanism and, at the same time, that agents are incentivised to report their private information truthfully to the mechanism. Additional classes of policies that can be designed to stir agents’ selfish behaviour towards socially desirable outcomes include dynamic congestion pricing, whereby the planner decides how much to charge the agents for using the available resources, as well as information design, which entails the planner choosing what information to disclose to the agents to best nudge them towards the desirable goal. Method Goals: Game-theoretic modelling of MAPF problems Design of game-based mechanisms and policies with formal performance guarantees Development of efficient algorithms to solve MAPF leveraging the game-theoretic formulation Implementation for representative real-world problems Related papers: Friedrich, P., Zhang, Y., Curry, M., Dierks, L., McAleer, S., Li, J., Sandholm, T. and Seuken, S., Scalable Mechanism Design for Multi-Agent Path Finding. https://arxiv.org/pdf/2401.17044
Leveraging Machine Learning for Multi-Agent Path Finding	Sara Bernardini	Data, Knowledge and Action	B	C	MSc	Prerequisites: Artificial Intelligence, programming Background Multi-Agent Path Finding (MAPF) is a core optimisation problem in multi-agent systems: given many agents sharing a space, compute time-coordinated, collision-free paths from their start locations to their assigned goals. Contemporary applications are many and include automated warehouses, UAV/drone flight scheduling and urban air-traffic management, autonomous-vehicle and robotaxi fleets, port and airport ground-vehicle coordination, factory and semiconductor-fab AGVs/AMHS, sidewalk delivery robots, hospital logistics robots, and crowd/character routing in games and simulation. All of these applications involve large agent populations and complex environments, making the MAPF problem very challenging to solve. MAPF has been driven primarily by search-based approaches, with decades of work in heuristics and combinatorial optimisation. Although remarkable progress has been made in MAPF technologies, their deployment in real-world scenarios still encounters significant challenges. Computing optimal MAPF plans is NP-hard, so practical deployments typically use bounded-suboptimal or anytime methods that trade exact optimality for speed while trying to preserve high solution quality. In addition, most MAPF planners are centralised, i.e. one coordinator computes and dispatches all agents’ paths, which simplifies global reasoning but creates serious practical limits. First, scalability is poor: planning and re-planning latency and memory blow up with agent count and map size. Second, the architecture is brittle: a single point of failure and communication bottlenecks make performance degrade under packet loss or overload. Third, centralised schedules are hard to execute robustly in the real world: small timing errors, sensor noise, and model mismatch (e.g., grid abstractions that ignore robot kinematics and dynamics) quickly invalidate tightly coordinated plans. Finally, responsiveness is limited: disturbances such as delays, blockages, or new tasks often require expensive global re-planning, and heterogeneity in agent sizes and speeds further exacerbates these issues. Focus This project aims to explore how ML can be leveraged to overcome these challenges. Two approaches are viable. The first is to augment existing solvers through ML. This involves using ML techniques to enhance the optimisation process in traditional non-ML algorithms. The goal is to improve these solvers’ efficiency, scalability, and adaptability. For example, consider Prioritised Planning, a widely used MAPF algorithm that plans based on each agent's assigned priorities. Finding effective priority orders is typically hard and is currently done manually. ML techniques can enhance this prioritisation process. The second approach to using ML in the context of MAPF is to develop effective decentralised MAPF methods that employ ML techniques such as imitation or reinforcement learning to achieve decentralised agent behaviours. These behaviours are influenced by the agents’ local observations and, if applicable, by their interactions with one another, to develop more responsive and self-governing decentralised MAPF approaches. For example, the application of graph neural networks (GNNs) to the development of adaptive multi-robot systems is an emerging field of research. Method Goals: Explore different ways in which ML can be used to solve MAPF efficiently Design an approach to leverage ML for MAPF (e.g. using GNNs) Development of efficient hybrid algorithms to solve MAPF, leveraging both search and ML Implementation for representative real-world problems Related papers: M. Alkazzi and K. Okumura, "A Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding," in IEEE Access, vol. 12, pp. 57390-57409, 2024. Li, F. Gama, A. Ribeiro, and A. Prorok, “Graph neural networks for decentralized multi-robot path planning,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Oct. 2020, pp. 11785–11792.
Safety and Robustness in Agentic Systems	Adel Bibi, Philip Torr			C	MSc	Agentic systems increasingly perform tasks in graph-based environments, such as managing a calendar. In these graphs, nodes represent states (e.g., a scheduled meeting or an empty time slot), and edges represent actions transitioning between states (e.g., adding a meeting, rescheduling an event, or canceling an appointment). Each node is associated, potentially by another LLM-based safety function, with a label indicating whether it is safe to execute an action leading to that state. To rigorously study these systems, we aim to reformulate the problem into a reinforcement learning (RL) framework, where the agent learns policies to navigate the graph safely while optimizing task performance. This project will benchmark safety risks, explore strategies to prevent unsafe transitions, and guide the development of robust, risk-aware decision-making. It is designed to lead to a high-quality publication, and we are seeking a highly motivated student to contribute. We will be working with Guohao Li (co founder of Eigent ai and Camel AI -- London based startup) and another industry partner.
AI Safety and Synthetic Data	Reuben Binns, Naman Goel	Human Centred Computing			MSc	Synthetic data refers to artificially or algorithmically generated data, as opposed to real data that is generated by real-world events. This project will explore opportunities and risks of synthetic data, focusing on the safety of AI systems that use synthetic data in development or evaluation processes. We are interested in a range of features (reasoning, alignment, factuality, etc) and a range of application domains (WWW, social sciences/economics, health, finance, education, and other critical sectors). The student will perform empirical research by collecting novel data, conducting experiments to understand the effect of synthetic data, proposing (and evaluating) solutions to address the unintended effects and weaknesses of synthetic data. Interested students are welcome to contact Naman Goel to discuss or propose their own ideas related to above (naman.goel@cs.ox.ac.uk). Prerequisites: Good understanding and hands-on experience with machine learning, mathematical maturity, proficiency in Python. Experience with large language models (huggingface, running local models, using APIs of closed models, etc) or ability to learn quickly. Understanding of potential real-world harms will be a big plus. References: 1. Liu, Ruibo, et al. "Best practices and lessons learned on synthetic data." First Conference on Language Modeling. 2024. https://openreview.net/pdf?id=OJaWBhh61C
Computational Modelling of Disease Progression and Therapy Response in Hypertrophic Cardiomyopathy	Alfonso Bueno-Orovio, Abdallah Hasaballa	Computational Biology and Health Informatics	B	C	MSc	Prerequisites: Computational Medicine (recommended) Abstract Hypertrophic cardiomyopathy (HCM) presents a complex pattern of electrical, mechanical, and structural changes that evolve throughout the course of the disease. Early hypercontractility, altered calcium handling, myofilament dysfunction, and progressive hypertrophy interact across scales to shape clinically observed phenotypes including repolarisation abnormalities, diastolic impairment, and increased arrhythmic risk. Despite extensive clinical and experimental evidence, the mechanistic links between subcellular, tissue-level, and whole-organ dynamics in HCM progression remain poorly understood. This project will use advanced multiscale electromechanical models of the human ventricle to investigate how specific forms of ionic remodelling, sarcomeric dysfunction, and structural adaptation contribute to stage-dependent alterations in cardiac function. Simulations will explore how progressive remodelling modifies action potential morphology, ventricular deformation, wall stress, pressure–volume loops, and surface ECG features. The student will also examine how different classes of therapeutic mechanisms influence electromechanical behaviour under early and advanced disease conditions. Possible research directions include: (i) modelling progressive ion-channel, calcium-handling, or myofilament changes characteristic of HCM; (ii) quantifying electromechanical biomarkers of disease progression; (iii) exploring how mechanism-specific therapeutic interventions alter cardiac performance across disease stages. This project will contribute to a deeper mechanistic understanding of how HCM evolves and how therapy interacts with underlying biophysical abnormalities. References [1] In-silico human electro-mechanical ventricular modelling and simulation for drug-induced pro-arrhythmia and inotropic risk assessment. https://doi.org/10.1016/j.pbiomolbio.2020.06.007 [2] Mechanisms of pro-arrhythmic abnormalities in ventricular repolarisation and anti-arrhythmic therapies in human hypertrophic cardiomyopathy. https://doi.org/10.1016/j.yjmcc.2015.09.003
Computational methods for identifying abnormalities from the electrocardiogram in heart disease	Alfonso Bueno-Orovio, James Coleman	Computational Biology and Health Informatics	B	C	MSc	Prerequisites: Computational Medicine (recommended) Background Hypertrophic cardiomyopathy (HCM) is a genetic heart disease and a leading cause of sudden cardiac death in the young. Identifying patients at high risk of lethal arrhythmias is crucial for improving patient outcomes. Electrocardiography (ECG) is commonly used in HCM to assess electrical activity of the heart, which may reveal markers of arrhythmic risk. However, the fundamental limits of what electrical activity can be revealed by the ECG remain incompletely understood. Focus In the project, the student will investigate how different cardiac electrical patterns manifest as different ECG signatures using sensitivity analysis and computer modelling, to determine the main factors underlying 12-lead ECG signals. The student can then apply machine learning techniques to their ECG dataset and computer models, to build predictive models of cardiac electrical activity from the ECG. If successful, the student will then apply their methods to a clinical dataset of 12-lead ECGs, to better characterise the functional disease substrate of HCM patients. Method Electrocardiogram phenotypes in hypertrophic cardiomyopathy caused by distinct mechanisms: apico-basal repolarization gradients vs. Purkinje-myocardial coupling abnormalities. https://doi.org/10.1093/europace/euy226 Simulation-based digital twinning of activation and repolarisation sequences from the ECG across healthy and diseased hearts. https://doi.org/10.1016/j.compbiomed.2025.111222
Efficient Calibration of Cardiac Digital-Twin Cohorts Using Gaussian-Process Surrogate Modelling and Transfer Learning	Alfonso Bueno-Orovio, Abdallah Hasaballa	Computational Biology and Health Informatics	B	C	MSc	Prerequisites: Computational Medicine (recommended) Abstract High-fidelity cardiac digital twins enable detailed investigation of individual anatomy, electrophysiology, mechanics, and therapy response. However, calibrating these personalised models is computationally demanding, especially when building large cohorts or updating models longitudinally as new clinical data become available. Recent advances suggest that calibration cost can be substantially reduced by exploiting shared structure across individuals and by constructing surrogate models that emulate the behaviour of full electromechanical simulations. This project will explore Gaussian-process surrogates, emulator-based optimisation, and transfer-learning strategies to accelerate the calibration of patient-specific digital twins. The idea is to treat each calibrated model as part of a larger cohort, allowing information to be transferred across patients with similar anatomical, electrophysiological, or mechanical properties. Students will integrate surrogate models with an existing ventricular electromechanical modelling pipeline and evaluate how much computational cost can be reduced while maintaining physiological accuracy. Possible research themes include: (i) building Gaussian-process emulators for key electromechanical biomarkers; (ii) designing cohort-aware calibration strategies that reuse model information across individuals; (iii) developing methods for efficient recalibration of digital twins over time; (iv) benchmarking surrogate-enabled methods against full high-fidelity calibration workflows. The project will contribute to scalable digital-twin frameworks capable of supporting population studies, therapy testing, and longitudinal follow-up. References [1] In-silico human electro-mechanical ventricular modelling and simulation for drug-induced pro-arrhythmia and inotropic risk assessment. https://doi.org/10.1016/j.pbiomolbio.2020.06.007 [2] Harnessing 12-lead ECG and MRI data to personalise repolarisation profiles in cardiac digital twin models for enhanced virtual drug testing. https://doi.org/10.1016/j.media.2024.103361
High Throughput, High Resolution, and High Frame-Rate Analysis of Cellular Heart Function	Alfonso Bueno-Orovio	Computational Biology and Health Informatics	B	C	MSc	Co-supervised by Christopher Toepfer Cardiomyocytes are the cells responsible for generating contractile force in the heart. They are highly adaptable and alter their function in response to many stimuli (electrical stimuli, calcium, contraction, organisation of cellular structures, metabolism, and transcriptional signatures). These processes are all interrelated and dynamic in live cardiomyocytes, but we lack the ability to visualise them simultaneously and correlate them in real-time. We have previously developed specialised software [1, 2] to automate high-throughput analysis of cardiomyocyte function. However, these are not optimised to work with large files and cause data fragmentation as they do not work in tandem. To develop novel software solutions for the analysis of cellular heart function for three channel, high throughput, resolution, and frame-rate imaging and analysis of beating cardiomyocytes. Imaging live cardiomyocytes during contraction and relaxation necessitates high-frame rates (100s of FPS), which must be twinned with high resolutions (<70 nm per pixel), also allowing the simultaneous analysis of multiple well plates for high-throughput. In collaboration with the Oxford Department of Cardiovascular Medicine, and building on extensive datasets of high-resolution, high-rate fluorescent imaging of live cardiomyocytes, students taking this project will develop novel open-source software solutions integral to discovery and translational cardiovascular research. Different research options would be available, such as: optimisation and/or development of novel algorithms for the analysis of sarcomere contractility and relaxation; development of an analysis pipeline for masking, segmenting, and fitting action potentials from a variety of cellular systems; development of an analysis pipeline for the automated analysis of mitochondrial shape, abundance, and cellular metabolites. [1] CalTrack: High-Throughput Automated Calcium Transient Analysis in Cardiomyocytes. https://doi.org/10.1161/CIRCRESAHA.121.318868 [2] SarcTrack: an adaptable software tool for efficient large-scale analysis of sarcomere function in hiPSC-cardiomyocytes. https://doi.org/10.1161/CIRCRESAHA.118.314505
Integrating ECG and Myocardial Strain for Mechanistic Risk Stratification in Heart Disease	Alfonso Bueno-Orovio, Abdallah Hasaballa	Computational Biology and Health Informatics	B	C	MSc	Prerequisites: Computational Medicine (recommended) Abstract Hypertrophic cardiomyopathy (HCM) is characterised by complex interactions between electrical abnormalities, ventricular remodelling, and mechanical dysfunction. While the 12-lead ECG is widely available and cost-effective, its interpretation is often confounded by overlapping phenotypes. Myocardial strain, derived from echocardiography or cardiac MRI, captures regional mechanical impairment such as reduced deformation, mechanical dispersion, or dyssynchrony. Individually, both ECG and strain offer valuable insight, but their combined interpretation has not been systematically explored through mechanistic modelling. This project will integrate ECG features and myocardial strain metrics to identify electro-mechanical markers associated with HCM severity and arrhythmic risk. Students will extract ECG descriptors (intervals, morphology, repolarisation features) and strain-based indices (global and regional strain, temporal dispersion). These will be analysed jointly and supported by mechanistic simulations of ventricular activation and repolarisation to explore plausible structural or ionic substrates underlying the observed patterns. Potential avenues include clustering electro-mechanical phenotypes, linking ECG–strain concordance to proposed mechanisms, or developing simple risk-stratification models. The overarching aim is to establish mechanistically grounded electro-mechanical signatures that improve current non-invasive risk assessment in HCM. References [1] Distinct ECG phenotypes identified in hypertrophic cardiomyopathy using machine learning associate with arrhythmic risk markers. https://doi.org/10.3389/fphys.2018.00213 [2] Electrocardiogram phenotypes in hypertrophic cardiomyopathy caused by distinct mechanisms: apico-basal repolarisation gradients vs. Purkinje-myocardial coupling abnormalities. https://doi.org/10.1093/europace/euy226
Interpretable Cardiac Anatomy and Electrophysiology Modelling in Cardiomyopathy Patients using Generative Models	Alfonso Bueno-Orovio, Vicente Grau, Abhirup Banerjee	Computational Biology and Health Informatics		C	MSc	Prerequisites: Computational Medicine (recommended), Deep Learning in Healthcare (recommended) Background: Cardiac anatomy and function vary considerably across the human population with important implications for clinical diagnosis, treatment planning, and prognosis of disease. Consequently, many computer-based approaches have been developed to capture this variability for a wide range of applications, including explainable cardiac disease detection and prediction, dimensionality reduction, cardiac shape analysis, and the generation of virtual heart populations. Here, we will leverage on these technologies to further investigate connections between cardiac anatomy and electrocardiographic (ECG) signals in patients with Hypertrophic Cardiomyopathy (HCM), the most common hereditary heart disease and leading cause of sudden cardiac death in the young and competitive athletes. Focus: To investigate connections between patterns of cardiac hypertrophy and their manifestation in the ECG in HCM patients. Generation of virtual populations of HCM hearts, capturing the variability in cardiac shape and ECG signals in this group of high-risk patients. Method: We have recently proposed novel variational mesh autoencoder (mesh VAE) methods as a novel geometric deep learning approach to model population-wide variations in cardiac anatomy [1], which enable direct processing of surface mesh representations of the cardiac anatomy in an efficient manner. These methods can also be extended to simultaneously learn the ECG signal of a given patient [2]. Exploiting cardiac Magnetic Resonance Imaging (MRI) and ECG resources from established clinical collaborators, this project will explore the quality and interpretability of the mesh VAE's latent space for the reconstruction and synthesis of multi-domain cardiac signals in HCM patients. It will also investigate the method's ability to generate realistic virtual populations of cardiac anatomies and ECG signals in terms of multiple clinical metrics in this group of patients. Other types of generative models (adversarial networks, stable diffusion, transformers) can also be considered. References [1] Interpretable Cardiac Anatomy Modeling Using Variational Mesh Autoencoders. https://doi.org/10.3389/fcvm.2022.983868 [2] Multi-Domain Variational Autoencoders for Combined Modelling of MRI-Based Biventricular Anatomy and ECG-Based Cardiac Electrophysiology. https://doi.org/10.3389/fphys.2022.886723
Modelling and Simulation of Genetic Heart Disease	Alfonso Bueno-Orovio	Computational Biology and Health Informatics		C	MSc	The goal of Precision Medicine is to provide therapies tailored to each patient. This is especially an urgent need in inherited heart disease, where current drug therapy mostly relies on symptom relief. In this regard, computational modelling and simulation constitutes a flexible platform for investigating the mechanisms underlying arrhythmic risk in these patients, and for the virtual screening of pharmacological therapy of both established and novel agents. To investigate mechanisms of arrhythmic risk and/or response to therapy using multiscale computational models (cell to whole-organ) of inherited heart disease. Students will investigate, by means of computational modelling and simulation, how changes in structure and cellular function modulate the risk of life-threatening events and/or response to pharmacological therapy in patients with hypertrophic cardiomyopathy (HCM). Different research options will be available, including: (i) arrhythmia mechanisms in HCM due to calcium dysregulation and spontaneous calcium release; (ii) modelling of impaired energetics and impaired metabolism in HCM; (iii) role of abnormalities in tissue microstructure as precursors of arrhythmic triggers; (iv) HCM mutation-specific safety and efficacy of the latest pharmacological therapies (myosin inhibitors) when accounting for disease progression; (v) HCM mutation-specific safety and efficacy of genetic therapy; (vi) modelling and simulation of paediatric patients. [1] Mechanisms of pro-arrhythmic abnormalities in ventricular repolarisation and anti-arrhythmic therapies in human hypertrophic cardiomyopathy. https://doi.org/10.1016/j.yjmcc.2015.09.003 [2] Improving the clinical understanding of hypertrophic cardiomyopathy by combining patient data, machine learning and computer simulations: A case study. https://doi.org/10.1016/j.morpho.2019.09.001 [3] Electrophysiological and contractile effects of disopyramide in patients with obstructive hypertrophic cardiomyopathy: a translational study. https://doi.org/10.1016/j.jacbts.2019.06.004 [4] Mechanism based therapies enable personalised treatment of hypertrophic cardiomyopathy. https://doi.org/10.1038/s41598-022-26889-2 Pre-requisites: Computational Medicine (recommended)
Detection of cycles in the cryptographic protocol verifier ProVerif’s saturation procedure	Vincent Cheval	Security	B	C	MSc	Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol). The problem of verifying security properties (even simple secrecy) is undecidable in general, especially when considering an unlimited number of sessions. For this reason, tools such as ProVerif and Tamarin may not terminate, as they allow users to model unbounded number of sessions. In the vein of [1], many recent features have been introduced in ProVerif to help the tool terminate. However, most of these features require some manual intervention on the part of users, and it can be quite challenging to understand which feature to use depending on the protocol under study. Internally, ProVerif’s core algorithm consists of saturating of a set of Horn clauses with free resolution. In practice, most, if not all, cases of non-termination occur due to the saturation procedure entering an infinite loop. In this project, we aim to develop a procedure that can automatically detect a large class of loops and appropriately apply some of the ProVerif’s features to exit the detected loops. Objectives: get familiar with ProVerif’s saturation procedure (see [1]. The paper [2] is also interesting to have a broader view on ProVerif’s theory) get familiar with the tool itself and the current benchmark for some non-terminating cases design the new algorithm for loop detection and decision of appropriate countermeasures. implement the algorithm and test them on provided benchmarks to evaluate their efficiency. The library will be incorporated to the official release of ProVerif. Bibliography: Reading the papers is not mandatory to complete the project but it will provide a good context for the project. [1] Bruno Blanchet, Vincent Cheval, and Cortier Véronique. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 43th IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, May 2022. [2] Bruno Blanchet. Modeling and Verifying Security Protocols with the Applied Pi Calculus and ProVerif. Foundations and Trends in Privacy and Security, 1(1-2):1-135, October 2016. Pre-requisites: Logic and Proof, Functional Programming
Efficient algorithm unification for union of equational theories	Vincent Cheval	Security	B	C	MSc	Prerequisites: Logic and Proof, Functional programming Background Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol). Focus Internally, these tools express cryptographic messages using terms. The algebraic properties of the cryptographic primitives are then expressed by mean of an equational theory on terms. For example, the equation dec(enc(x,y),y) = x expresses that deciphering (“dec”) with a key “y” a cipher of a message “x” by the same key “y”, i.e. “enc(x,y)”, allows to retrieve the message “x”. Equality of terms modulo the equational theory is one of the core problems that these tools must solve. Although unification algorithms for many primitives with complexes algebraic properties, such as finite variant, XOR with homomorphism, abelian groups, etc are known, these algorithms only work when the equations do not mix primitives, in other words when the equations are not distinct. We aim to develop new efficient unification algorithm for the union of first distinct equational theories and possibly non-distinct ones. Proofs of correctness would be first done by hand but an implementation in F* [1] would be of interest (or Coq [2] or Lean[5]). Method Objectives: get familiar with currently known algorithm for unification of union of distinct equational theories. [3] create a semi-decision algorithm for unification of general union of equational theories and establish its limitation. It will be based on a general framework that extend the work of [4] Implement the resulting algorithm and evaluate its efficiency Implement the correctness proof on F* or Coq or Lean The implemented library will be incorporated in the tool ProVerif. Bibliography: [1] https://www.fstar-lang.org [2] https://coq.inria.fr/ [3] FRANZ BAADER, KLAUS U. SCHULZ, Unification in the Union of Disjoint Equational Theories: Combining Decision Procedures, Journal of Symbolic Computation, Volume 21, Issue 2, 1996 [4] Bruno Blanchet, Martín Abadi, and Cédric Fournet. Automated Verification of Selected Equivalences for Security Protocols. Journal of Logic and Algebraic Programming, 75(1):3-51, February-March 2008. [5] https://en.wikipedia.org/wiki/Lean_(proof_assistant)
Generation and verification of Proof Certificates for cryptographic protocols	Vincent Cheval	Security		C	MSc	Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol). Due to the critical aspect of these security protocols, it is crucial to guarantee that possible bugs in these automatic tools do not affect the correctness of the security proofs. In addition, even though the efficiency of these tools have significantly improved over the years, the verification of real-life protocols may require an large amount time and memory to terminate. For instance, when the verification requires more than 200GB of memory, as it was the case for verifying privacy properties of an extension of TLS 1.3, the reproducibility of the verification results is severely affected.To tackle this problem, the approach taken in this project is to develop proof certificates that can be verified by a machine-checked verifier. One of the main difficulty is to ensure that certificates are sufficiently small in practice (in order that they can be easily exchanged) while guaranteeing a relatively small verification time. The certificate generation and verification will be based on ProVerif internal procedure. Objectives: define the formal content of the certificate to be generated by ProVerif propose an algorithm to verify the certificate and prove its correctness (by hand) implement the generation of certificate in ProVerif implement the prover and its proof of correctness with F[1] We do not expect for the last objective to be fully completed as the implementation in F may be extremely time consuming. However, a partial implementation will be elaborated. [1] https://www.fstar-lang.org [2] Bruno Blanchet, Vincent Cheval, and Cortier Véronique. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 43th IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, May 2022. Pre-requisites: Logic and Proof, Functional Programming, (CAFV is a plus)
Handling cryptographic primitives with complex algebraic properties	Vincent Cheval	Security	B	C	MSc	Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol). Internally, these tools express cryptographic messages using terms. The algebraic properties of the cryptographic primitives are then expressed by mean of an equational theory on terms. For example, the equation dec(enc(x,y),y) = x expresses that deciphering (“dec”) with a key “y” a cipher of a message “x” by the same key “y”, i.e. “enc(x,y)”, allows to retrieve the message “x”. Equality of terms modulo the equational theory is one of the core problems that these tools must solve. Although unification algorithms for many primitives with complexes algebraic properties, such as finite variant, XOR with homomorphism, abelian groups, etc are known, these algorithms only work when the equations do not mix primitives, in other words when the equations are not distinct. We aim to develop new efficient unification algorithm for the union of non-distinct equational theories. Proofs of correctness would be first done by hand but an implementation in F* [1] would be of interest (or Coq [2]). Objectives: get familiar with currently known algorithm for unification of union of distinct equational theories. [3] create a semi-decision algorithm for unification of general union of equational theories and establish its limitation. It will be based on a general framework that extend the work of [4] implement the resulting algorithm and evaluate its efficiency implement the correctness proof on F* or Coq The implemented library will be incorporated in the tool ProVerif. [1] https://www.fstar-lang.org [2] https://coq.inria.fr/ [3] FRANZ BAADER, KLAUS U. SCHULZ, Unification in the Union of Disjoint Equational Theories: Combining Decision Procedures, Journal of Symbolic Computation, Volume 21, Issue 2, 1996 [4] Bruno Blanchet, Martín Abadi, and Cédric Fournet. Automated Verification of Selected Equivalences for Security Protocols. Journal of Logic and Algebraic Programming, 75(1):3-51, February-March 2008. Pre-requisites: Logic and Proof, Functional Programming
Multicore overhaul of the cryptographic protocol verifier ProVerif	Vincent Cheval	Security	B			Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol). Of the three aforementioned tools, ProVerif is generally considered to be the most efficient. It is also the only tool that does not rely on distributed or parallel computing. ProVerif’s internal procedure is based on the saturation of sets of Horn clauses which is inherently more suited to sequential computing. However, several time-consuming algorithms used in the saturation procedure can greatly benefit from parallel computing, such as Horn clauses subsumption checking, unification and matching of terms modulo an equational theory. This project aims to transform many algorithms used in ProVerif using the multicore programming framework that is provided in OCaml 5. We can expect significant speedup as, for instance, Horn clauses subsumption checking represents usually 80 to 90% of the total execution time of ProVerif. One of the difficulties however will be to appropriately combine parallel computing algorithm with other optimization features already implemented that are specifically tailored for sequential computing (such as hash consing techniques). Objectives: get familiar with ProVerif’s saturation procedure (see [1]. The paper [2] is also interesting to have a broader view on ProVerif’s theory) identify the components in ProVerif’s saturation procedure that can benefit from concurrent programming design the new algorithms for these components evaluate the efficiency impact on the overall execution time (a benchmark is provided). The library will be incorporated to the official release of ProVerif. Bibliography: Reading the papers is not mandatory to complete the project but it will provide a good context for the project. [1] Bruno Blanchet, Vincent Cheval, and Cortier Véronique. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 43th IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, May 2022. [2] Bruno Blanchet. Modelling and Verifying Security Protocols with the Applied Pi Calculus and ProVerif. Foundations and Trends in Privacy and Security, 1(1-2):1-135, October 2016. Pre-requisites: Logic and Proof, Functional Programming, Concurrent Programming
Reducing memory consumption of the cryptographic protocol verifier ProVerif with hash consing techniques	Vincent Cheval	Security	B	C	MSc	Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol). While the recent overhaul of ProVerif [1] significantly improved its efficiency, it did not really affect its memory consumption. Indeed complex real-life protocols, such as TLS, required between 10 to 100GB of memory depending on the type of security properties being proved. This problem of memory is crucial as we reach the limits of the capabilities of our computers. We aim to change the internal representations of messages within ProVerif by using hash consing techniques [2] that allow to maximize the sharing of memory. In particular, two messages semantically equal will thus be ensured to be also physically equal (they will point to the same address in the memory). It will require to rework most of the implemented algorithms operating on messages (resolution, unification, matching, subsumption,…). The internal algorithm of ProVerif being based mostly on the saturation of Horn clauses, a maximal sharing of memory within each Horn clause but also within the set of Horn clauses generated by ProVerif will allow us to considerably reduce the memory consumption of ProVerif. Maximizing the memory sharing within one clause is fairly straightforward. However, for sets of Horn clauses, the problem becomes much harder due to the presence of variables that sometimes need to be considered distinct within two Horn clauses. Objectives: get familiar with hash consing techniques and implement a library for managing the new representation of terms. study the classical algorithm for unification on DAG terms and propose an adaptation for our new structure of terms. create similar algorithms for resolution, matching, subsumption, unification modulo and equational theory propose an algorithm for sharing memory on a set of Horn clauses and find the complexity upper bound of the problem implement these algorithms and test them on provided benchmarks to evaluate their efficiency. Bibliography: Reading the papers and slides in the bibliography is not mandatory to complete the project but it will provide a good context for the project. [1] Bruno Blanchet, Vincent Cheval, and Cortier Véronique. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 43th IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, May 2022. [2] Jean-Christophe Filliâtre, Sylvain Conchon. Type-safe modular hash-consing. ML 2006: 12-19 [3] https://www3.risc.jku.at/education/courses/ss2018/unification/slides/02_Syntactic_Unification_Improved_Algorithms_handout.pdf Pre-requisites: Logic and Proof, Functional Programming
Verifying privacy-type properties in a probabilistic setting	Vincent Cheval	Security		C	MSc	Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol). The DeepSec prover specializes in the verification of security properties that can be expressed as behavioural equivalence, such as anonymity, vote privacy, unlinkability, strong secrecy, etc. It currently implements a procedure for checking “trace equivalence” between processes. The paper that describes the theory behind DeepSec also describe a theoretical procedure that allows to prove two stronger behavioural equivalences, that are similarity and bissimilarity. The description of these procedures is still considered “theoretical” as they are heavily non-deterministic, hence preventing efficient implementations. One part of this project thus aims to design an effective and efficient procedure for checking bissimilairty and similarity. Currently, symbolic models are purely non-deterministic. For instance, the random generation of numbers are intuitively abstracted and they are assumed to be impossible to guess by the attacker. While this is generally sensible, it is not always possible to eliminate probabilities altogether: some protocols specifically rely on probabilistic coin flips in their specification, and a recent paper [2] has shown that giving the attacker the ability to use probabilistic choices can enable him to break behavioural equivalence. This paper provides the general probabilistic model but only describe a procedure for checking probabilistic equivalence in a restrictive case, based on the internal procedure of DeepSec. This article provides the general probabilistic model, but only describes a procedure for checking probabilistic equivalence in a restrictive case, based on DeepSec's internal procedure. The second part of this project will aim to design a (potentially theoretical) procedure for verifying probabilistic equivalence in the general setting. Currently, symbolic models are purely non-deterministic. For instance, the random generation of numbers are intuitively abstracted and they are assumed to be impossible to guess by the attacker. While this is generally sensible, it is not always possible to eliminate probabilities altogether: some protocols specifically rely on probabilistic coin flips in their specification, and a recent paper [2] has shown that giving the attacker the ability to use probabilistic choices can enable him to break behavioural equivalence. This paper provides the general probabilistic model but only describe a procedure for checking probabilistic equivalence in a restrictive case, based on the internal procedure of DeepSec. This article provides the general probabilistic model, but only describes a procedure for checking probabilistic equivalence in a restrictive case, based on DeepSec's internal procedure. The second part of this project will aim to design a (potentially theoretical) procedure for verifying probabilistic equivalence in the general setting. Objectives: get familiar with the internal procedure of DeepSec [1] design and implement an efficient algorithm for checking similarity/bissimilarity design an algorithm for checking probabilistic equivalences in general setting prove the correctness of all designed algorithms The implemented library will be incorporated in the tool DeepSec. [1] V. Cheval, S. Kremer and I. Rakotonirina. DEEPSEC: Deciding Equivalence Properties in Security Protocols - Theory and Practice. In Proceedings of the 39th IEEE Symposium on Security and Privacy (S&P'18), IEEE Computer Society Press, 2018. [2] Symbolic protocol verification with dice. Vincent Cheval‚ Raphaëlle Crubillé and Steve Kremer. In J. Comput. Secur.. Vol. 31. No. 5. Pages 501–538. 2023. Pre-requisites: Logic and Proof, Functional Programming, Probabilistic Model Checking
Verifying security protocols with exclusive-or using PROVERIF	Vincent Cheval	Security	B	C		Prerequisites: Logic and Proof, Functional programming, Model Checking Background Security protocols are distributed programs designed to ensure security properties — such as confidentiality, authentication, and anonymity — using cryptographic primitives. These protocols are widely employed in a variety of applications, including online commerce, banking systems, mobile devices, and electronic voting. Formal methods have proven to be highly effective in the design and verification of security protocols. They offer rigorous frameworks and techniques that have helped uncover subtle flaws and vulnerabilities in protocol designs, e.g [1,2] There exist several models for analyzing security protocols. One widely used approach is the symbolic model, where messages are abstractly represented as terms. When protocols involve operators such as exclusive OR (XOR), it becomes essential to reason modulo the equational theory of XOR, which is associative, commutative, and has additional properties such as nilpotency () and the existence of a neutral element (). Taking such properties into account significantly complicates protocol analysis, and many existing tools struggle to handle them effectively. One of the most popular tools in the field of symbolic verification of security protocols, ProVerif [5] is based on a translation of the verification problem into Horn clauses, coupled with a resolution procedure that is proven correct. However, while correctness is guaranteed, termination is not: it is observed in practice but not ensured in theory. ProVerif has been the subject of initial research aimed at supporting AC operators (associative-commutative), such as XOR or Diffie-Hellman exponentiation, through preprocessing techniques [4]. However, this approach suffers from several limitations and has not been integrated into the main tool. An alternative direction, currently being explored by our team, involves reducing XOR to an AC operator using the so-called finite variant property. While promising, this approach raises concerns: the unification algorithm — a core component of ProVerif — is significantly more expensive modulo AC than it is modulo the theory of XOR, which enjoys better computational properties. Objectives The first objective of this internship will be to become familiar with the ProVerif tool and its underlying resolution-based procedure. Then, the main goal will be to adapt ProVerif's resolution procedure to natively handle the XOR operator, i.e., without relying on a reduction from XOR to a generic AC operator. This work will involve the following steps: 1. Propose a new resolution procedure, by adapting the existing one to support XOR directly; 2. Prove the correctness of the proposed procedure; 3. Implement the procedure by modifying the current codebase, and evaluate its practical behavior, especially regarding termination. To this end, a set of case studies — such as those presented in [4,3] — will be considered. Naturally, these three steps are interconnected. In case the procedure fails to terminate in practice, it will be necessary to return to step 1 and refine the method accordingly. The project can be adapted depending on the candidate’s academic level (Part B, C or MsC). Expected skills We are looking for candidates with a solid background in Foundation of Computer Science, particularly in areas such as logic and automated deduction. Some prior knowledge of security is an asset but is not mandatory. However, a strong interest in programming is essential. The ProVerif tool is implemented in OCaml. This internship may lead to a PhD thesis on similar topics with possible funding through ongoing research projects (e.g., the SVP project, ERC Synergy) within the team. Collaboration This project will be co supervise with Prof. Stéphanie Delaune from Rennes (France) in the IRISA laboratory. A visit to Rennes may be considered for an in-person meeting with Prof. Stéphanie Delaune. References [1] Bruno Blanchet. An efficient cryptographic protocol verifier based on prolog rules. In 14th IEEE Computer Security Foundations Workshop (CSFW-14 2001), 11-13 June 2001, Cape Breton, Nova Scotia, Canada, pages 82–96. IEEE Computer Society, 2001. [2] Bruno Blanchet, Vincent Cheval, and Véronique Cortier. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 42nd IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, 2022. [3] Jannik Dreier, Lucca Hirschi, Sasa Radomirovic, and Ralf Sasse. Automated unbounded verification of stateful cryptographic protocols with exclusive OR. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018, pages 359–373. IEEE Computer Society, 2018. [4] Ralf Küsters and Tomasz Truderung. Reducing protocol analysis with XOR to the xor-free case in the horn theory based approach. J. Autom. Reason., 46(3-4) :325–352, 2011. [5] ProVerif. https://bblanche.gitlabpages.inria.fr/proverif/, 2001.
Detecting and Modelling Mirrors for 3D Scene Reconstruction	Ronnie Clark	Artificial Intelligence and Machine Learning, Systems		C	MSc	Elements such as mirrors and glass still present a major difficulty for simultaneous localisation and mapping (SLAM) and 3D reconstruction algorithms [1]. This project will address this challenge by developing a method for accurately identifying mirrors in a scene and accounting for their unique properties in the reconstruction process. The goal will be to improve the fidelity and accuracy of the 3D models. The work will involve aspects of computer vision, geometric modelling, and possibly ray tracing techniques. [1] https://www.thomaswhelan.ie/Whelan18siggraph.pdf Pre-requisites: Suitable for those who have taken a course in machine learning or computer graphics
Enhanced Single Image Depth Prediction using a Percentile-based Loss	Ronnie Clark	Artificial Intelligence and Machine Learning, Systems		C	MSc	Single image depth perception is a common task in computer vision which involves predicting the distance from the camera to the scene for each pixel in an image. This problem has broad applications in autonomous systems, virtual and augmented reality, as well as graphics. However, a significant challenge in single image depth prediction is the estimation of the absolute scene scale. Therefore, most depth estimation methods focus on predicting relative depth rather than absolute measurements [1]. This project aims to explore a new relative depth parameterisation based on the percentile ranking of depth values. The student will be expected to train a model using this parameterisation and evaluate its effectiveness by comparing its performance with existing methods on widely-used datasets such as NYUv2. [1] https://arxiv.org/pdf/1907.01341v3.pdf [2] https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html Pre-requisites: Suitable for those who have taken a course in machine learning
Low Rank training of Neural Fields	Ronnie Clark	Artificial Intelligence and Machine Learning, Systems		C	MSc	Neural fields [1] have emerged as a promising method for representing 3D data, utilising Multi-Layer Perceptrons (MLPs) to predict the properties of a field at every point in space. Despite their potential, a significant drawback is the lengthy training process, often due to the large size of the MLPs. This project aims to explore the application of low-rank adaptation (LoRA) [2] in enhancing the training efficiency of neural fields. An essential part of this investigation will involve benchmarking the performance of low-rank training against full training across various types of fields, such as Signed Distance Functions (SDFs) or radiance fields, to evaluate the effectiveness of LoRA in this context. [1] https://neuralfields.cs.brown.edu/siggraph23.html [2] https://arxiv.org/pdf/2106.09685.pdf Pre-requisites: Suitable for those who have taken a course in machine learning or computer graphics
Unsupervised Visual Learning using Segment-Masks	Ronnie Clark	Artificial Intelligence and Machine Learning, Systems		C	MSc	Masked language modelling is the prevalent method for pretraining large language models (LLMs). This involves 'masking' words in the text and prompting the model to predict the hidden words. This approach has also been adapted in computer vision, notably in Vision Transformers (ViTs) [1], where patches of images are dropped, and the model is tasked with predicting these missing regions. However, in the vision context process isn't ideal, as patches in images don't correspond directly to words in text. Unlike words, which represent complete concepts, image patches can contain parts of various objects. This project will investigate a novel approach to vision pretraining: using segmentation masks instead of patches. A key aspect of the project will be comparing the effectiveness of segmentation-based training against the traditional patch-based method in downstream tasks, such as semantic segmentation or depth estimation. [1] https://arxiv.org/pdf/2111.06377.pdf Pre-requisites: Suitable for those who have taken a course in machine learning
Topics in Online Algorithms and Learning-Augmented Algorithms	Christian Coester	Algorithms and Complexity Theory	B	C	MSc	Description: An online algorithm is an algorithm that receives its input over time, e.g. as a sequence of requests that need to be served. The algorithm must act on each request upon its arrival, without knowledge of future requests, often with the goal of minimising some cost. (For example, assigning taxis to ride requests with the goal of minimising distance traveled by taxis.) Due to the lack of information about future requests, the algorithm cannot always make optimal decisions, but for many problems there exist algorithms that provably find solutions whose cost is within a bounded factor of the optimum, regardless of the input. An emerging related field of research are learning-augmented algorithms, where the algorithm receives some (perhaps erroneous) predictions about the future as additional input. In this project, the student will work on a selected problem in the field of online algorithms or learning-augmented algorithms, with the goal of designing algorithms and theoretically analysing their performance. The project is suitable for mathematically oriented students with an interest in proving theorems about the performance of algorithms.
Taming PETS: Privacy-Enhancing Technologies	Graham Cormode		B	C	MSc	Prerequisites: Computer Security or Probability Background Include: A brief motivation for why the project is interesting. A summary of the area. Almost all data of interest to modern analysis and machine learning tasks derives from human behaviour and characteristics, and so is considered sensitive. The area of Privacy Enhancing Technologies (PETS) seeks to understand how technical solutions can contribute to the responsible and private handling of data from individuals, while still ensuring that patterns and models can be extracted accurately. Over recent years, many important methods have been developed under different paradigms, such as Differential Privacy, Synthetic Data Generations, Federated Learning, Secure Multi-party Computation, Homomorphic Encryption, and Zero-knowledge proofs. For any given task, applying one of these modalities comes with a set of tradeoffs: how much trust is placed in different parties? What accuracy can be achieved compared to non-private computation? What are the computational overheads and dependence on other parameters? Focus Include: Research topic/question and expected contribution. The goal of this project is to identify and study a data analysis task of interest, e.g., k-means clustering or logistic regression, through the lens of a suitable PET, e.g., differential privacy or federated learning. The student will work under my guidance to study the relevant literature on the topic, and propose privacy-preserving techniques to perform the analysis of interest, by applying methods that have been published or coming up with novel variants. A typical project will involve an implementation of two or more algorithms, to allow their comparison and study of their properties on realistic data sets. Some examples that would be of interest: Secure Aggregation (SecAgg) + Invertible Bloom Look-up Tables (IBLT). Secure Aggregation is a general-purpose primitive that allows multiple parties to pool their information, so that only the sum of all the inputs can be extracted from the protocol, and nothing else. Invertible Bloom Look-up Tables represent a practical data structure that allow small sets to be retrieved, and the tables can be combined via arithmetic addition. SecAgg+IBLT represents a potentially powerful combination for lightweight private computation, allowing sampling, distributional analysis and frequent item mining. Tracking the statistical distribution of data that is also spread across multiple parties is important to detect changes in the behaviour of an aggregate population. The goal of this project is to develop techniques for taking statistical estimators for measures like the KL divergence of a dataset, and evaluating them under the models of local differential privacy and distributed differential privacy. Synthetic data is a popular way to take a sensitive dataset, and replace it with an entirely fabricated dataset which nevertheless shares many of the same properties. There are two leading approaches to this task: one based on graphical models of correlations between features in the data, and the other based on methods from deep learning (GANs, VAEs, diffusion models etc.). The aim of this project is to compare the strengths and weakness of these two paradigms, and to seek ways to combine them, or unify them into a common framework. Method Include: References to any papers, libraries or projects which might be used as a starting point. List of goals including which goals are essential to the project and which are stretch-goals. I will work closely with any student wishing to do a project on PETS to help develop a bespoke set of objectives and accompanying reading list. The common goal of any project is to gain a deeper understanding of these methods, and to contribute new insights to the community. A successful project will almost always involve performing an empirical comparison of different algorithms, by making your own implementations (leveraging existing libraries) and evaluating on public benchmark datasets. Some useful initial reading for each possible topic includes: Federated learning and Privacy (Bonawitz, Kairouz, McMahan, Ramage, CACM https://dl.acm.org/doi/10.1145/3500240 ) Synthetic Tabular Data: Methods, Attacks and Defenses (C., Maddock, Ullah, Gade, KDD https://dl.acm.org/doi/10.1145/3711896.3736562 ) Differential Privacy for Databases (Near, He, Foundations and Trends, https://dpfordb.github.io/dpfordb.pdf ) Proofs, Arguments and Zero-knowledge proofs (Thaler, Foundations and Trends, https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.html
Efficient Similarity Search in RDF Knowledge Graphs	Bernardo Cuenca Grau	Data, Knowledge and Action		C	MSc	Large-scale retrieval systems increasingly combine vector-based similarity search (for embedding-driven semantic retrieval) with graph-based indexes (for structured relationships and reasoning). However, efficiently joining results across these heterogeneous indexes remains an open research challenge. This project aims to design and evaluate algorithms that perform joins between vector and graph indexes, implemented in C++ for high performance. The work is inspired by RDFox’s in-memory reasoning and query optimization techniques but extends them to operate in hybrid neural–symbolic settings. This project presents the opportunity to work with one of the Computer Science department’s spinout companies, and success story, Oxford Semantic Technologies. As well as help candidates build their CV strong candidates will have the opportunity of summer internships with Oxford Semantics. Background reading Aidan Hogan et al. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers 2021, ISBN 978-3-031-00790-3, pp. 1-257 Heiko Paulheim. *Knowledge graph refinement: A survey of approaches and evaluation methods.* Semantic Web 8(3): 489-508 (2017) Jason Mohoney et al. High-Throughput Vector Similarity Search in Knowledge Graphs. Proc. ACM Manag. Data 1(2): 197:1-197:25 (2023)
Graph Analytics in RDF Knowledge Graphs	Bernardo Cuenca Grau	Data, Knowledge and Action		C	MSc	RDF Knowledge Graphs (KGs) are widely used for representing complex, interconnected data in domains such as semantic web, bioinformatics, and enterprise data integration as they excel at storing, querying and reasoning about relationships between entities. However, while such systems provide rich semantic capabilities, they commonly lack native support for advanced graph analytics. Techniques such as shortest path computation are fundamental in graph theory and can reveal insights like entity proximity, semantic similarity, and relationship strength, which are important in various applications like recommendation systems. The goal of this project is to bridge this gap by combining the semantic query power of RDF Knowledge Graphs with efficient graph analytics algorithms to deliver scalable, graph-theoretic capabilities to semantic systems. To meet the high-performance demands of these applications, the integration will be written in C++. This project presents the opportunity to work with one of the Computer Science department’s spinout companies, and success story, Oxford Semantic Technologies. As well as help candidates build their CV strong candidates will have the opportunity of summer internships with Oxford Semantics. Background reading Aidan Hogan et al. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers 2021, ISBN 978-3-031-00790-3, pp. 1-257 Heiko Paulheim. *Knowledge graph refinement: A survey of approaches and evaluation methods.* Semantic Web 8(3): 489-508 (2017) Angela Bonifati, M. Tamer Özsu, Yuanyuan Tian, Hannes Voigt, Wenyuan Yu, Wenjie Zhang: *A Roadmap to Graph Analytics.* SIGMOD Rec. 53(4): 43-51 (2024)
Knowledge Extraction from Natural Language	Bernardo Cuenca Grau	Data, Knowledge and Action		C	MSc	Knowledge Graphs (KGs) provide a structured representation of entities and their relationships, enabling powerful semantic querying and reasoning across diverse domains such as enterprise data integration, bioinformatics, and the semantic web. However, constructing high-quality KGs from unstructured sources remains a significant challenge. Enterprise textual data stored in web pages, manuals and documentation often contain rich information that must be accurately extracted, normalized, and linked to ontologies to ensure consistency and usability. The goal of this project is to design and implement a system for automated extraction of RDF Knowledge Graphs from textual data. The work will explore state-of-the-art techniques in natural language processing, entity recognition, leveraging Large Language Models and semantic indexing for semantic understanding and disambiguation. This project presents the opportunity to work with one of the Computer Science department’s spinout companies, and success story, Oxford Semantic Technologies. As well as help candidates build their CV strong candidates will have the opportunity of summer internships with Oxford Semantics. Background reading Aidan Hogan et al. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers 2021, ISBN 978-3-031-00790-3, pp. 1-257 Heiko Paulheim. *Knowledge graph refinement: A survey of approaches and evaluation methods.* Semantic Web 8(3): 489-508 (2017) Igor Melnyk, Pierre L. Dognin, Payel Das: Knowledge Graph Generation From Text. EMNLP (Findings) 2022: 1610-1622 Belinda Mo, Kyssen Yu, Joshua Kazdan, Proud Mpala, Lisa Yu, Chris Cundy, Charilaos I. Kanatsoulis, Sanmi Koyejo: *KGGen: Extracting Knowledge Graphs from Plain Text with Language Models.* CoRR abs/2502.09956 (2025)
Talking with data: Natural Language Querying of RDF Knowledge Graphs	Bernardo Cuenca Grau	Data, Knowledge and Action		C	MSc	Natural language interfaces to databases have become increasingly important as they allow users to query complex data without requiring deep technical knowledge of query languages. RDF Knowledge Graphs (KGs), which store structured semantic data, are typically queried using SPARQL—a powerful but syntactically demanding language. This creates a barrier for non-expert users who wish to retrieve information using simple, natural language queries. Bridging this gap requires robust techniques for translating human language into precise SPARQL queries while preserving semantic intent and handling ambiguity via semantic search. The goal of this project is to develop a system that converts natural language questions into SPARQL queries for RDF Knowledge Graphs, enabling intuitive access to semantic data. The work will explore state-of-the-art approaches in natural language processing and semantic parsing, leveraging LLMs and ontology-aware generation to ensure accurate query generation. To meet performance and scalability requirements, the implementation will target the RDF engine RDFox. Enhanced topics: Graph based memory – storing conversation metadata in a conversation graph to provide additional memory-based context. Enhanced through reasoning – explore how reasoning can be used to simplify the most frequently asked questions and increase accuracy for the answers given. Working with small language models that target smaller devices. This project presents the opportunity to work with one of the Computer Science department’s spinout companies, and success story, Oxford Semantic Technologies. As well as help candidates build their CV strong candidates will have the opportunity of summer internships with Oxford Semantics. Background reading Aidan Hogan et al. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers 2021, ISBN 978-3-031-00790-3, pp. 1-257 Heiko Paulheim. *Knowledge graph refinement: A survey of approaches and evaluation methods.* Semantic Web 8(3): 489-508 (2017) Katrin Affolter, Kurt Stockinger, Abraham Bernstein: A comparative survey of recent natural language interfaces for databases. VLDB J. 28(5): 793-819 (2019) Vincent Emonet, Jerven T. Bolleman, Severine Duvaud, Tarcisio Mendes de Farias, Ana Claudia Sima: LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs. HGAIS@ISWC 2024 Jacopo D’Abramo, Andrea Zugarini, and Paolo Torroni. 2025. Investigating Large Language Models for Text-to-SPARQL Generation. In Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing, pages 66–80.Association for Computational Linguistics
Automated Synthesis of Norms in Multi-Agent Systems	Giuseppe De Giacomo	Data, Knowledge and Action		C	MSc	Norms have been widely proposed to coordinate and regulate multi-agent systems (MAS) behaviour. In this project we want to consider the problem of synthesising and revising the set of norms in a normative MAS to satisfy a design objective expressed in logic. We focus on dynamic norms, that is, that allow and disallow agent actions depending on the history so far. In other words, the norm places different constraints on the agents' behaviour depending on its own state and the state of the underlying MAS. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement norm synthesis techniques against given logical specifications. Reactive Synthesis, Game-Theoretic Techniques, Temporal Logics, ATL, ATL*, Strategy Logics, Reasoning about Actions, Planning, Model Checking of Multi-Agent Systems Natasha Alechina, Giuseppe De Giacomo, Brian Logan, Giuseppe Perelli: Automatic Synthesis of Dynamic Norms for Multi-Agent Systems. KR 2022 (and reference there in)
Exploring Large Language Models for Reactive Synthesis Problems	Giuseppe De Giacomo	Data, Knowledge and Action		C	MSc	Reactive synthesis represents the culmination of declarative programming. By focusing on what a system should accomplish rather than how to achieve it we are able, on the one hand, to simplify the system design process while avoiding human mistakes and, on the other hand, to allow an autonomous agent to self-program itself just from high-level specifications. Linear Temporal Logic (LTL) or LTL on finite traces (LTLf) synthesis is one of the most popular variants of reactive synthesis, being the problem of automatically designing correct-by-construction reactive systems with the guarantee that all its behaviours comply with desired dynamic properties expressed in LTL/LTLf. Recently, there has been a growing interest in applying Large Language Models (LLMs) to various problems in computer science such as Automated Reasoning, Knowledge Representations and Planning, and Formal Methods in general. Emerging studies have explored the capabilities of LLMs in these fields. The objective of this project is to investigate the feasibility and effectiveness of using LLMs for reactive synthesis problems with specifications expressed in LTL and LTLf. Focus: • Insights into the potential and limitations of LLMs for reactive synthesis with LTL and LTLf specifications. • An LLM-guided reactive synthesis framework that integrates natural language processing into the synthesis process. • Empirical evaluation of the performance and effectiveness of LLM-based synthesis algorithms. References: • Amir Pnueli The Temporal Logic of Programs. FOCS 1977: 46-57 • Amir Pnueli, Roni Rosner: On the Synthesis of a Reactive Module. POPL 1989: 179-190 • Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860 • Giuseppe De Giacomo , Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI • Philipp J. Meyer, Salomon Sickert, Michael Luttenberger: Strix: Explicit Reactive Synthesis Strikes Back! CAV (1) 2018: 578-586 • Marco Favorito, Shufang Zhu. LydiaSyft: A Compositional Symbolic Synthesizer for LTLf Specifications • Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao Kambhampati: On the Planning Abilities of Large Language Models - A Critical Investigation. CoRR abs/2305.15771 (2023) Pre-requisites: Foundations of Self-Programming Agents and Computer-Aided Formal Verification not required but will help
LTLf+ and PPLTL+: Extending LTLf and PPLTL to Infinite Traces	Giuseppe De Giacomo	Data, Knowledge and Action	B	C	MSc	Reactive synthesis, a branch of Formal Methods (FM), seeks to automatically construct systems that meet specified dynamic properties. The basic techniques for reactive synthesis share several similarities with Model Checking, another core FM problem, leveraging connections between Logics, Automata, and Games [6]. The most used specification language used in Computer Science and Artificial Intelligence is Linear Temporal Logic (LTL) [4]. Model Checking stands as a major success in FM, employed by companies like Intel and NASA on a daily basis. In contrast, reactive synthesis has not achieved comparable adoption, largely due to scalability challenges. In particular Reactive Synthesis for LTL involves: (1) having a specification φ of the desired system behaviour in LTL, in which one distinguishes controllable and uncontrollable variables; (2) extracting from the specification an equivalent automaton on infinite words, corresponding to the infinite traces satisfying φ; (3) (differently from Model Checking) determinizing the automaton to obtain an arena for a game between the system and the environment; (4) solving the game, by fixpoint computation, for an objective determined by the automaton’s accepting condition yielding a strategy for the system that fulfils the original specification φ. However, despite this, Step (3) remains a major performance obstacle. For LTL, this involves determinizing nondeterministic Buchi automata, which is notoriously difficult. This has held back the use of reactive synthesis in applications. Recently, the AI community has shifted attention toward LTL on finite traces (LTLf) [1], which is better suited for typical Planning and AI scenarios where goals are specified over finite traces. In this setting, the agent receives a goal, “thinks” about how to achieve it, synthesizes a plan, executes the plan, and repeats. The advantage of focusing on finite traces is that in Step (3) one can rely on (classic) automata operating on finite traces, including deterministic finite automata (DFA), and use known determinization algorithms with good practical performance. A recent breakthrough demonstrates that DFA-based techniques foundational to LTLf synthesis can be extended to LTL [1]. By leveraging the Pnueli-Manna hierarchy [2], which classifies properties into six categories (safety, guarantee, obligation, recurrence, persistence, and general reactivity), an extension of LTLf has been proposed that enables the expression of arbitrary LTL properties over infinite traces [1]. This project aims to solving reactive synthesis for LTL that builds upon these recent advancements by exploiting Emerson-Lei automata for adversarial environments, and Limit Deterministic Buchi Automata and Good for MDP Automata for Stochastic settings (MDPs). References: Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin, Moshe Y. Vardi. LTLf+ and PPLTL+: Extending LTLf and PPLTL to Infinite Traces. arXiv:2411.09366, 2024. Giuseppe De Giacomo and Moshe Y. Vardi. Linear temporal logic and linear dynamic logic on finite traces. In IJCAI 2013. Zohar Manna and Amir Pnueli. A hierarchy of temporal properties. In PODC 1990. Amir Pnueli. The temporal logic of programs. In FOCS, 1977. Daniel Hausmann, Mathieu Lehaut, Nir Piterman: Symbolic Solution of Emerson-Lei Games for Reactive Synthesis. In FoSSaCS 2024. Fijalkow et al. Games on Graphs. Book, https://arxiv.org/abs/2305.10546
Nondeterministic Situation Calculus	Giuseppe De Giacomo	Data, Knowledge and Action		C	MSc	Background and focus The standard situation calculus assumes that atomic actions are deterministic. But many domains involve nondeterministic actions. The key point when dealing with nondeterministic actions is that we need to clearly distinguish between choices that can be made by the agent and choices that are made by the environment, i.e., angelic vs. devilish nondeterminism. “Nondeterministic Situation Calculus” is an extension to the standard situation calculus that accommodates nondeterministic actions and preserves Reiter’s solution to the frame problem. It allows for answering projection queries through regression. But it also provides the means to formalize, e.g., (first-order) planning in nondeterministic domains and to define execution of ConGolog high-level program in nondeterministic domains. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Research topic/question and expected contribution Under suitable assumptions (e.g., propositional or finite-object cases), devise and implement reasoning about action and strategic reasoning techniques for nondeterministic situation calculus based on formal methods methodologies Method Reasoning about Actions, Planning, Model Checking, reactive synthesis Existing work and references Giuseppe De Giacomo, Yves Lespérance: The Nondeterministic Situation Calculus. KR 2021: 216-226 Ray Reiter: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems, MIT Press 2001 Giuseppe De Giacomo, Yves Lespérance, Hector J. Levesque: ConGolog, a concurrent programming language based on the situation calculus. Artif. Intell. 121(1-2): 109-169 (2000) Giuseppe De Giacomo, Yves Lespérance, Fabio Patrizi: Bounded situation calculus action theories. Artif. Intell. 237: 172-203 (2016)
Obligation Games for Reactive Synthesis	Giuseppe De Giacomo	Data, Knowledge and Action	B	C	MSc	This project focuses on developing an algorithmic solution for solving obligation games as part of reactive synthesis. Obligation properties, a unique class within the safety-progress hierarchy of property specifications [2], play a foundational role in specifying complex system behaviors, particularly when a system interacts with an unpredictable or adversarial environment. Expressed through Linear Temporal Logic (LTL) [1] and its finite variant (LTLf) [3]—the most widely used logical languages in Computer Science and Artificial Intelligence for specifying temporal properties—obligation properties combine safety and guarantee elements to outline conditions for conditional behaviors based on both desired and undesired events. Safety properties ensure that “nothing bad happens,” requiring specific conditions to hold at every instant of the system's operation. For example, a safety property might dictate that a robot must avoid obstacles continuously. Violations of safety properties are identifiable in finite time, meaning the system or observer can detect and respond to any bad state immediately. Guarantee properties, in contrast, specify that “something good eventually happens.” These properties represent goals that the system should fulfill at least once within a finite timeframe, such as ensuring that a network packet eventually reaches its destination. Together, safety and guarantee properties define comprehensive behavioral expectations for systems operating in dynamic or adversarial settings. Obligation properties, introduced as part of the safety-progress hierarchy by Lichtenstein, Pnueli, and Zuck (1985) and further developed by Manna and Pnueli [2], represent Boolean combinations of safety and guarantee properties. Reactive synthesis is the problem of automatically synthesizing a system that meets desired dynamical properties in a partially controllable environment. The environment is typically adversarial; however, in case it is stochastic, reactive synthesis becomes MDP solving for temporal specification. This project aims to devise novel synthesis techniques that address obligation properties in either adversarial or stochastic environments. References: Giuseppe De Giacomo and Moshe Y. Vardi. Linear temporal logic and linear dynamic logic on finite traces. In IJCAI 2013. Zohar Manna and Amir Pnueli. A hierarchy of temporal properties. In PODC 1990. Amir Pnueli. The temporal logic of programs. In FOCS, 1977.
Planning for Temporally Extended Goals in Linear Time Logics of Finite Traces	Giuseppe De Giacomo	Data, Knowledge and Action		C	MSc	Planning for temporally extended goals expressed in Linear Time Logics on finite traces requires synthesizing a strategy to fulfil the temporal (process) specification expressed by the goal. To do so one can take advantage of techniques developed in formal methods, based on reduction to two-players game and forms model checking. It is crucial to keep in mind that planning is always performed in a domain, which can be thought of a specification of the possible environment reactions. Typically, such a domain specification is is Markovian, i.e., the possible reactions of the environment depend on the current state and the agent action only. However, forms of non-Markovian domains are also possible and, in fact, of great interest. Note that the domain can be fully observable, partially observable, or non-observable at all. Each of these settings leads to different forms of planning and different techniques for solving it. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement techniques for synthesizing strategies that fulfil temporally extend goals in a planning domain, based on both AI and formal methods methodologies Reasoning about Actions, Planning, Model Checking, reactive synthesis Giuseppe De Giacomo, Sasha Rubin: Automata-Theoretic Foundations of FOND Planning for LTLf and LDLf Goals. IJCAI 2018: 4729-4735 Giuseppe De Giacomo, Moshe Y. Vardi: LTLf and LDLf Synthesis under Partial Observability. IJCAI 2016: 1044-1050 Giuseppe De Giacomo, Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860 Giuseppe De Giacomo, Moshe Y. Vardi: Automata-Theoretic Approach to Planning for Temporally Extended Goals. ECP 1999: 226-238
Reactive Program Synthesis Under Environment Specifications in Linear Time Logics on Finite and Infinite Traces.	Giuseppe De Giacomo	Data, Knowledge and Action		C	MSc	“Reactive synthesis” is the automated synthesis of programs for interactive/reactive ongoing computations (protocols, operating systems, controllers, robots, etc.). This kind of synthesis is deeply related to planning in nondeterministic environments. In this project we are interested in Reactive Synthesis under Environment Specifications, where the agent can take advantage of the knowledge about the environment's behaviour in achieving its tasks. We consider how to obtain the game arena out of the LTL and LTLf specifications of both the agent task and the environment behaviours, and how to solve safety games, reachability games, games for LTLf objectives, and for objectives expressed in fragments of LTL. We also want to understand how winning regions of such arenas are related to the notion of “being able” to achieve desired properties (without necessarily committing to a specific strategy for doing so). We focus on agent tasks that eventually terminate and hence are specified in LTLf. While for the environment we focus on safety specifications and limited forms of guarantee, reactivity, fairness, and stability. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement reactive synthesis techniques for fulfil agent task in environment with specified behaviour. Reactive Synthesis, Temporal Logics, 2-Player games, Reasoning about Actions, Planning, Model Checking Benjamin Aminof, Giuseppe De Giacomo, Aniello Murano, Sasha Rubin: Planning under LTL Environment Specifications. ICAPS 2019: 31-39 Giuseppe De Giacomo, Antonio Di Stasio, Moshe Y. Vardi, Shufang Zhu: Two-Stage Technique for LTLf Synthesis Under LTL Assumptions. KR 2020: 304-314 Giuseppe De Giacomo, Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860 Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin: Best-Effort Synthesis: Doing Your Best Is Not Harder Than Giving Up. IJCAI 2021: 1766-1772 Shufang Zhu, Giuseppe De Giacomo: Synthesis of Maximally Permissive Strategies for LTLf Specifications. IJCAI 2022: 2783-2789 Shufang Zhu, Giuseppe De Giacomo: Act for Your Duties but Maintain Your Rights. KR 2022 Giuseppe De Giacomo, Antonio Di Stasio, Francesco Fuggitti, Sasha Rubin: Pure-Past Linear Temporal and Dynamic Logic on Finite Traces. IJCAI 2020: 4959-4965
Reactive Program Synthesis and Planning under Multiple Environments	Giuseppe De Giacomo	Data, Knowledge and Action		C	MSc	In this project we consider an agent that operates, with multiple models of the environment, e.g., two of them: one that captures expected behaviours and one that captures additional exceptional behaviours. We want to study the problem of synthesizing agent strategies that enforce a goal against environments operating as expected while also making a best effort against exceptional environment behaviours. We want to formalize these concepts in the context of linear-temporal logic (especially on finite traces) and give algorithms for solving this problem. More generally, we want to formalize and solve synthesis in the case of multiple, even contradicting, assumptions about the environment. One solution concept is based on ``best-effort strategies'' which are agent plans that, for each of the environment specifications individually, achieve the agent goal against a maximal set of environments satisfying that specification. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement reactive synthesis techniques that are effective under multiple environments. Reactive Synthesis, Temporal Logics, 2-Player games, Reasoning about Actions, Planning, Model Checking Benjamin Aminof, Giuseppe De Giacomo, Alessio Lomuscio, Aniello Murano, Sasha Rubin: Synthesizing Best-effort Strategies under Multiple Environment Specifications. KR 2021: 42-51 Benjamin Aminof, Giuseppe De Giacomo, Alessio Lomuscio, Aniello Murano, Sasha Rubin: Synthesizing strategies under expected and exceptional environment behaviors. IJCAI 2020: 1674-1680 Daniel Alfredo Ciolek, Nicolás D'Ippolito, Alberto Pozanco, Sebastian Sardiña: Multi-Tier Automated Planning for Adaptive Behavior. ICAPS 2020: 66-74 Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin: Best-Effort Synthesis: Doing Your Best Is Not Harder Than Giving Up. IJCAI 2021: 1766-1772 Benjamin Aminof, Giuseppe De Giacomo, Aniello Murano, Sasha Rubin: Planning under LTL Environment Specifications. ICAPS 2019: 31-39 Paolo Felli, Giuseppe De Giacomo, Alessio Lomuscio: Synthesizing Agent Protocols from LTL Specifications Against Multiple Partially-Observable Environments. KR 2012 Giuseppe De Giacomo, Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860
Reinforcement learning under safety non-Markovian Safety Specifications and Rewards expressed in Linear Temporal Logics on Finite Traces	Giuseppe De Giacomo	Data, Knowledge and Action		C	MSc	In some cases, the agent has a simulator of the environment instead of a formal specification, so it needs to learn its strategies to achieve its task in the environment. Sometimes even the task is only implicitly specified through rewards. The key issue is that the type of properties we are often interested in are non-Markovian (e.g., specified in LTL or LTLf), and hence we need to introduce non-Markovian characteristics in decision processes and reinforcement learning. A particular promising direction is when such non-Markovian characteristics can be expressed in Pure Past Linear Temporal Logics. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement reinforcement learning techniques that remain safe wrt safety specification and achieve rewards specified in linear temporal logics on finite traces Reinforcement Learning, MDP, Non-Markovian Rewards, Non-Markovian Decision Processes, Linear Temporal Logics Ronen I. Brafman, Giuseppe De Giacomo, Fabio Patrizi: LTLf/LDLf Non-Markovian Rewards. AAAI 2018: 1771-1778Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, Fabio Patrizi: Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications. ICAPS 2019: 128-136 Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. AAAI-18: 2669–2678. Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi: Imitation Learning over Heterogeneous Agents with Restraining Bolts. ICAPS 2020: 517-521 Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi, Alessandro Ronca: Temporal Logic Monitoring Rewards via Transducers. KR 2020: 860-870 Giuseppe De Giacomo, Antonio Di Stasio, Francesco Fuggitti, Sasha Rubin: Pure-Past Linear Temporal and Dynamic Logic on Finite Traces. IJCAI 2020: 4959-4965 Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860
Topics in type theory	Maximilian Doré	Programming Languages	B	C		I am happy to supervise students at all levels who want to work on or with type theory. Types were originally introduced to stop programmers from writing bad code, but turned out to be an extremely fruitful positive idea. Modern systems like dependent type theory are powerful enough to both express intricate properties of programs and to formalise advanced mathematics, as demonstrated with theorem provers such as Agda, Coq or Lean. The following give an idea what a project with me could look like: Solving monoidal categories: Monoidal categories offer a notion of tensor product which is both very general and concretely applicable. This project is about implementing a tool in Agda that can automatically construct morphisms in a monoidal category (possibly adding more morphisms than the structural morphisms). A student working on this project will get experience in dependent type theory, meta-programming in Agda, and monoidal category theory. Formalising programming language meta-theory: Dependent type theory allows for implementing and reasoning about other programming languages. The student can take a language of choice (for instance the lambda- or pi-calculus), represent it in a theorem prover, and then formally prove properties of that language or devise a verified compiler. This project is ideal for students interested in PL theory and Compilers. For students with a background in mathematics I am also open to supervise mathematical formalisation projects. For this it might be expedient to have another supervisor in mathematics who is an expert in the specific domain the student wants to formalise. These are only suggestions, I am very happy for students to devise their own projects based on their background and interests. Prerequisites: Principles of Programming Languages, Lambda Calculus and Types
Economic aspects of cybersecurity	Patricia Esteve-Gonzalez, Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	The links between cybersecurity and economics are multiple and complex, offering topics of discussion at the user, organisational, national, and international levels. The literature covering the overlap of these two fields is limited, and there is a need to cast light on how decisions are taken, how to incentivise the prioritisation of cybersecurity, and how big is the economic incentives behind the protection of digitalised societies. Approaches to the project might involve empirical or theoretical approaches, for example game theory, surveys, interviews, statistical analysis, etc. See, for example: Kianpour, M., Kowalski, S. J., and Øverby, H. 2021. Systematically Understanding Cybersecurity Economics: A Survey. Sustainability, 13(24). https://doi.org/10.3390/su132413677 Hojda, M.H. 2022. Information Security Economics: Cyber Security Threats. Proceedings of the International Conference on Business Excellence, 16(1): 584-592. https://doi.org/10.2478/picbe-2022-0056
Thematic Analysis of National Cybersecurity Maturity Assessments	Patricia Esteve-Gonzalez, Ioannis Agrafiotis, Louise Axon-Jones, Michael Goldsmith, Sadie Creese	Security	B	C	MSc	There is a large investment being made by the international community aimed at helping nations and regions to develop their capacity in cybersecurity. The work of the Global Cyber Security Capacity Centre (based at the Oxford Martin School) studies and documents this: https://www.sbs.ox.ac.uk/cybersecurity-capacity/content/front. There is scope to study in more detail the global trends in capacity building in cybersecurity, the nature of the work and the partnerships that exist to support it. An interesting analysis might be to identify what is missing (through comparison with the Cybersecurity Capacity Maturity Model, a key output of the Centre), and also to consider how strategic, or not, such activities appear to be. An extension of this project, or indeed a second parallel project, might seek to perform a comparison of the existing efforts with the economic and technology metrics that exist for countries around the world, exploring if the data shows any relationships exist between those metrics and the capacity building activities underway. This analysis would involve regression techniques.
Beyond MCMC -- scalable and approximate Bayesian inference for computational statistics in global health	Seth Flaxman	Artificial Intelligence and Machine Learning	B	C	MSc	Co-supervisor: Dr Swapnil Mishra (https://s-mishra.github.io/) In applied work, especially disease modeling, we have reached the limits of what standard MCMC samplers can solve in a reasonable amount of computational time. We will investigate a variety of recently proposed inference schemes, from variational Bayes to deep learning ensemble models to parallel implementations of Sequential Monte Carlo, applying them to popular models in biostatistics, especially multilevel / hierarchical models. The goal will not be just to figure out "what works" but to understand the shortcomings of existing tools, and come up with guidance for practitioners. Co-supervisor
Disease Mapping with Neural Networks	Seth Flaxman	Artificial Intelligence and Machine Learning		C	MSc	Traditionally, disease mapping has heavily leaned on statistical models such as multivariate normal distributions and Gaussian Processes. However, the ever-evolving landscape of machine learning, especially in the realm of neural networks, presents an untapped potential for disease mapping. This project aims to harness the latest advancements in spatial data modelling and bring them to the forefront of disease mapping. The goal of the project is to explore cutting-edge techniques, including Bayesian inference with Markov Chain Monte Carlo (MCMC), to unlock new insights and capabilities in disease mapping, bridging the gap between traditional statistical methods and the power of deep learning.
Bayesian Experimental Design with LLMs: Enabling Probabilistic Conditioning via In-Context Updates	Yarin Gal, Luckeciano Melo	Artificial Intelligence and Machine Learning			MSc	Prerequisites: • Familiarity with large language models (LLMs), prompting, and in-context learning • Knowledge of probabilistic machine learning (Bayesian inference, conditioning, information gain) • Some familiarity with experimental design / active learning (helpful but not required) • Programming experience in Python (PyTorch + Hugging Face helpful), plus basic experiment tracking (Weights & Biases or similar) Background Bayesian Experimental Design (BED) formalizes how to choose experiments (queries, interventions, observations) to maximally reduce uncertainty about latent hypotheses or parameters. Classical BED relies on probabilistic conditioning: after observing data, the posterior is updated via Bayes’ rule, and the next experiment is chosen to optimize a utility such as expected information gain or reduction in posterior entropy. Large language models (LLMs) exhibit in-context learning: they change behavior after receiving examples or evidence in the prompt. However, these “in-context updates” are not guaranteed to correspond to probabilistic conditioning. They may be miscalibrated, sensitive to prompt formatting, or fail to preserve coherent uncertainty across alternative hypotheses—limitations that prevent LLMs from behaving like Bayesian agents [1]. If LLM in-context updates were close to true Bayesian conditioning, then LLMs could serve as approximate Bayesian agents and approach Bayesian Optimal Experimental Design. However, it is currently unclear: • When do LLM in-context updates approximate Bayes’ rule? • When do they systematically deviate? • Can we measure and reduce this deviation? [1] Choudhury et. Al. BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design. ICLR, 2026. Focus The central goal of this project is to investigate whether and when LLM in-context updates are Bayesian, to quantify the deviation from a Bayesian Oracle, and to design interventions that make these updates more Bayesian-like. The project will address three core questions: • Diagnosis: Under what task structures do LLM in-context updates deviate from Bayes’ rule? • Quantification: How large is the “conditioning gap” between LLM belief updates and Bayesian posterior updates? • Intervention: Can we modify prompting, inference procedures, or training objectives so that LLM updates more closely match Bayesian conditioning? Method To achieve these goals, the project will: • Construct a Bayesian Oracle: Design controlled generative tasks with known priors and likelihoods, where exact or high-precision posterior updates can be computed. This defines the gold-standard Bayesian update. • Elicit LLM Belief Updates: Present priors and sequential evidence in-context and extract the LLM’s implied beliefs (either explicitly as distributions or implicitly via predictive probabilities). • Quantify the Conditioning Gap: Compare LLM updates to the Oracle posterior using metrics such as KL divergence, calibration error, likelihood sensitivity, and invariance to evidence ordering. • Characterize Systematic Deviations: Identify structured failure modes (e.g., under/over-updating, recency bias, prompt sensitivity, incoherent probability mass allocation). • Design Interventions: Develop inference-time (structured prompting, belief tracking, self-consistency) and training-based (distillation from Oracle, auxiliary consistency losses, RL-style objectives) methods to encourage Bayesian-faithful updates. • Evaluate Impact on Experimental Design: Measure whether reducing the conditioning gap improves downstream performance in Bayesian Experimental Design tasks (e.g., regret relative to Bayes-optimal experiment selection)
Emergence of Reasoning Abilities During Training of Large Language Models	Yarin Gal, Yihong Chen	Artificial Intelligence and Machine Learning			MSc	Prerequisites: Strong ML background; interest in learning/opimization dynamics and representation learning. Some familiarity with training neural networks is recommended but not necessary. Background ● Reasoning-like behavior in LLMs often appears to emerge gradually rather than being explicitly programmed. Understanding when and how such capabilities arise during training is a key open question in modern AI. This topic connects empirical deep learning with theoretical questions about emergence, generalization, and learning dynamics. Focus ● This project studies the temporal development of reasoning-related behaviors in LLMs or smaller proxy models. The main question is: how do reasoning capabilities evolve over the course of training, and what patterns characterize their emergence? ● The expected contribution is insight into the dynamics of capability formation in large models. Method The student will draw on literature related to emergence in neural networks, scaling behavior, and training dynamics. Experiments may involve analyzing checkpoints, training curves, or simplified models that exhibit reasoning-like behavior. [1] Kaplan, Jared, et al. "Scaling laws for neural language models." arXiv preprint arXiv:2001.08361 (2020). [2] Wei, Jason, et al. "Emergent Abilities of Large Language Models." Transactions on Machine Learning Research. Goals: ● Essential: Review work on emergent abilities and training dynamics in LLMs. ● Essential: Analyze how performance on reasoning tasks evolves during training or over model scale. ● Stretch: Connect empirical findings to theoretical intuitions about generalization and emergence.
Enhancing safety alignment in LLMs via realistic latent adversarial training	Yarin Gal, Lin Li	Artificial Intelligence and Machine Learning			MSc	Prerequisites: 1. E: familiar with the basics of LLMs, and adversarial training 2. D: familiar with jailbreaking, and safety alignment Background Despite significant progress in aligning LLMs to human-specified safety rules, these models remain vulnerable to Out-Of-Distribution (OOD) inputs such as adversarial prompts that bypass safety guardrails and elicit harmful responses. This failure mode presents serious risks. One of the most practical and concerning is its potential to enable bad actors to carry out harmful activities that would otherwise be beyond their capabilities—such as crafting high-grade explosives—by leveraging the capabilities of frontier LLMs. At the same time, we recognize that many existing safety guardrails, particularly those based on adversarial training, often degrade general-purpose capabilities and/or lead to overly conservative behaviors such as excessive refusals. Therefore, the high-level goal this project pursues is to make LLMs consistently comply with human-specified safety rules while minimizing sacrifice to the general capabilities. We aim to establish worst-case behavioral guarantees for advanced LLMs, i.e. ensuring that no input, including adversarially constructed prompts, can elicit responses that violate clearly defined safety constraints. To this end, our approach emphasizes the inherent safety alignment of a model where no pre/post-processing or monitoring around the model is applied to moderate the input and output. Furthermore, a central objective of our research is to develop methods that maintain or recover general capabilities while ensuring robust safety. Our dual-goal framework seeks to push the frontier of what is possible in safe and performant LLM deployment, rather than accepting a tradeoff between safety and usefulness. Focus Our project builds on existing latent space adversarial training methods, such as latent adversarial training [1, 2] and continuous adversarial training [3]. In each training step, given a harmful prompt, these methods first optimize a perturbation to the latent states that increases the likelihood of an unsafe completion (e.g., “Sure, here it is…”) and/or decreases the likelihood of a safe completion (e.g., “Sorry, I cannot…”). The model parameters are then updated to minimize the likelihood of unsafe completions and/or maximize the likelihood of safe completions for the perturbed latent states. The key distinction between the two approaches is the location of the perturbation: latent adversarial training perturbs the hidden layer activations, while continuous adversarial training perturbs the input embeddings before they enter any hidden layers. Current LAT methods regularize the search for adversarial latent states by constraining them within an Lp epsilon-ball around the latent representation of a known harmful prompt. However, they do not account for whether these perturbed latent states are actually reachable through valid input prompts. As a result, training likely includes unreachable adversarial states—those that cannot be triggered in practice—making the training less effective. In real-world applications, this leads to a waste of model capacity: improving safety against unrealistic attacks at the cost of degrading general capabilities. To alleviate this, current methods restrict adversarial search to the vicinity of human-identified harmful prompts, which in turn excludes latent states that may correspond to novel, real-world attack attacks outside the known distribution. We aim to address the research question: How can latent adversarial training (LAT) be improved by regularizing latent adversarial states? Specifically, we propose to constrain latent adversarial states to be reachable—that is, producible through some valid input prompt. Method Our proposed approaches address these limitations by introducing novel regularization techniques into adversarial perturbation generation that encourage—or even enforce—the reachability of perturbed latent states. This not only improves the realism of training signals but also allows us to broaden the search space for perturbations, allowing coverage of more—and potentially novel—adversarial patterns during training. Overall, our methods aim to produce models with stronger worst-case safety guarantees than those trained with existing latent space adversarial training approaches, while preserving general capabilities or incurring minimal trade-offs. [1] Casper, S., Schulze, L., Patel, O. and Hadfield-Menell, D., 2024. Defending against unforeseen failure modes with latent adversarial training. arXiv preprint arXiv:2403.05030. [2] Sheshadri, A., Ewart, A., Guo, P., Lynch, A., Wu, C., Hebbar, V., Sleight, H., Stickland, A.C., Perez, E., Hadfield-Menell, D. and Casper, S., 2024. Latent adversarial training improves robustness to persistent harmful behaviors in llms. arXiv preprint arXiv:2407.15549. [3] Xhonneux, S., Sordoni, A., Günnemann, S., Gidel, G. and Schwinn, L., 2024. Efficient adversarial training in llms with continuous attacks. Advances in Neural Information Processing Systems, 37, pp.1502-1530
Memory in AI Agents: Internal and External Memory Mechanisms in LLM-Based Systems	Yarin Gal, Yihong Chen	Artificial Intelligence and Machine Learning			MSc	Prerequisites: Machine learning fundamentals; familiarity with Python. Interest in LLMs, agents, or reinforcement learning is helpful but not required. Background ● AI agents based on Large Language Models (LLMs) are increasingly deployed in settings that require remembering past interactions, long-term goals, and external information. Unlike traditional models, these agents often combine internal memory (implicit storage within model parameters and activations) with external memory (such as retrieval systems, tool-based memory, or persistent state). Understanding how these different forms of memory interact is critical for building reliable, interpretable, and scalable AI agents. This project explores memory mechanisms from a systems and behavioral perspective, focusing on how LLM-based agents store, retrieve, and use information over time. Focus ● The project investigates memory in LLM-based AI agents, with emphasis on the interaction between internal and external memory. The central research questions are: What roles do different memory mechanisms play in agent behavior? How do internal representations and external memory systems jointly support long-horizon reasoning and decision-making? ● The expected contribution is a clearer conceptual and empirical understanding of memory usage in modern AI agents. Method The project will build on existing literature on LLM-based agents, memory-augmented neural networks, and retrieval-augmented systems (RAG). Students will analyze agent behavior in controlled environments, comparing performance across tasks that stress different memory demands. Potential starting points include agent frameworks that integrate LLMs with tools or retrieval modules, and research on long-term memory in sequential decision-making. [1] Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in neural information processing systems 33 (2020): 9459-9474. Goals: ● Essential: Survey literature on memory in AI agents, including internal and external memory mechanisms. ● Essential: Empirically study how LLM-based agents use memory across multi-step or long-horizon tasks. ● Stretch: Analyze how variations in memory access or structure affect agent reliability, reasoning, or adaptation.
Non-factuality long-form hallucination benchmark	Yarin Gal, Lin Li	Artificial Intelligence and Machine Learning			MSc	Prerequisites: 1. E: familiar with the basics of LLMs 2. D: familiar with AI hallucination evaluation and detection Background As large language models (LLMs) are increasingly deployed in real-world applications, their tendency to produce factually incorrect or fabricated information—known as hallucination—has become a major concern. To systematically measure and compare this behavior, researchers have developed hallucination benchmarks that evaluate models’ factual accuracy, grounding, and faithfulness across tasks such as question answering, summarization, and dialogue. These benchmarks, including datasets like TruthfulQA, FActScore, and HaluEval, provide standardized settings to quantify how and when models deviate from reliable information. Establishing robust hallucination benchmarks is essential for tracking progress, guiding model improvement, and ensuring the development of trustworthy AI systems. Focus While most existing studies evaluate language models primarily through factuality—how closely their outputs align with external facts—hallucination is a broader phenomenon that extends beyond factual correctness. A model may produce information that is plausible yet unsupported or ungrounded in its input, even when not strictly false. This project focuses on measuring such non-factuality hallucinations—content generated without sufficient grounding or evidence in the provided context. By distinguishing hallucination from simple factual errors, the project aims to develop more comprehensive evaluation methods that capture the full range of unfaithful or unsupported model behaviors. Method This project proposes the design of a new benchmark to evaluate hallucinations in long-form generation while clearly distinguishing them from factuality errors. Unlike factual inaccuracies, these hallucinations cannot be validated or refuted using external knowledge bases or web search, making them more subtle and challenging to detect. Some potential tasks/scenarios for benchmarks include data analysis, Deep Research, etc.
Query rewriting for caching and security	Yarin Gal, Ilia Shumailov	Artificial Intelligence and Machine Learning			MSc	Prerequisites: Foundational AI/ML background Natural language interfaces for data-driven systems face a fundamental conflict between personalization and efficiency. User queries are frequently "data-dependent," meaning the user's private data and the logical structure of their request are intertwined within the query string itself. This fusion of logic and private data renders traditional caching ineffective, as identical user intentions result in textually unique queries, forcing redundant computation. Furthermore, this design exposes sensitive user data to the core query planning and optimization layers, creating significant privacy vulnerabilities and data leakage risks throughout the system. This research proposes a new AI model to address this challenge by performing intelligent query-rewriting. The model will function as an abstraction layer, intercepting a data-dependent natural language query and transforming it into two distinct components: a canonical, data-independent template that represents the abstract operational intent, and a separate, structured parameter object that isolates all the user-specific data. This decoupling is the central hypothesis, designed to systematically separate the what (the logic) from the who (the data). The benefits of this separation are twofold. First, the data-independent templates become highly cacheable, allowing the system to reuse computationally expensive execution plans for all users expressing the same intent, which promises a significant increase in performance and scalability. Second, it enables a more secure processing model where the generalized template is handled by a public-facing planner, while the isolated, sensitive data is managed by a secure, trusted module. This ensures private information is shielded from the main planning environment and only introduced at the final point of execution, greatly enhancing data privacy.
Stealing reasoning from a closed source language model	Yarin Gal, Anushka Nair, Yonatan Gideoni	Artificial Intelligence and Machine Learning			MSc	Prerequisites: Completion of the Uncertainty in Deep Learning course, strong mathematical foundations Background Knowledge distillation is traditionally a technique whereby a large trained “teacher” model can be used to better train a much smaller “student”. It’s now used for many other applications, where recently some studies found that various forms of distillation can transfer behaviours between models. This includes having a non-reasoning language model copy a reasoning model or quirks like having a distilled model inherit the teacher’s liking of owls, although the distilled data has no mention of them and the distillation is done at the token and not the logits level. This last form of distillation has been referred to as “subliminal learning”. Relevant literature: https://arxiv.org/pdf/2509.23886 , https://owls.baulab.info/ Research Questions ● What’s an attack that can steal reasoning capabilities from a closed source model? ● How can we do this so the reasoning traces are still legible? Method ● To be developed, with a focus on simple baselines. LLM inversion may be useful to recover full reasoning traces from partial summaries
Understanding and Evaluating Reasoning in Large Language Models and AI Agents	Yarin Gal, Yihong Chen	Artificial Intelligence and Machine Learning			MSc	Prerequisites: Background in machine learning; familiarity with PyTorch and exposure to Large Language Models. Experience with LLM APIs, agent frameworks, or training pipelines is a plus. Background ● LLMs and agent-based systems achieve strong performance on tasks labeled as “reasoning,” but it remains unclear what these benchmarks truly measure and what factors drive observed improvements. Performance gains may reflect reasoning ability, scale, data, system design, or evaluation artifacts. Clarifying these issues is essential for reliable AI evaluation and interpretation. Focus ● This project examines both the validity of reasoning benchmarks and the drivers of reasoning performance in LLMs and AI agents. Key questions include what current evaluations actually measure and which factors most contribute to reasoning success. Method The project will build on existing reasoning benchmarks and evaluation frameworks used in recent AI research (e.g. FrontierScience, ARC-AGI, ARC AI2, GSM-Symbolic, LogicBench, GenBench, ). Students will survey relevant literature on LLM and Agentic reasoning, and perform empirical analyses on selected benchmarks using modern LLMs and agentic systems. [1] Mirzadeh, Seyed Iman, et al. "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models." The Thirteenth International Conference on Learning Representations. [2] Parmar, Mihir, et al. "LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models." Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. Goals: ● Essential: Review and analyze existing reasoning benchmarks for LLMs and agents; identify patterns, assumptions, and limitations. ● Essential: Empirically evaluate model behavior across across models, tasks, and system configurations. ● Stretch: Propose or prototype alternative evaluation perspectives or diagnostic analyses for reasoning.
Understanding and mitigating hallucinations in LLM distillation	Yarin Gal, Lin Li	Artificial Intelligence and Machine Learning			MSc	Prerequisites: 1. E: familiar with the basics of LLMs 2. D: familiar with model distillation and LLM hallucination detection Background Large language models (LLMs) have demonstrated remarkable capabilities across reasoning, generation, and knowledge-intensive tasks, yet their immense computational and memory demands limit accessibility and real-world deployment. Model distillation offers a promising solution—compressing the knowledge of powerful teacher models into smaller, faster students while maintaining much of their performance. This process enables efficient, cost-effective, and on-device language intelligence. However, distilled models often inherit or even amplify the hallucination tendencies of their teachers, posing risks to factual reliability and trustworthiness. Understanding and mitigating hallucination during knowledge distillation is therefore both scientifically important and practically valuable. Our project explores methods to reduce hallucination in distilled models, contributing to the broader effort to make compact LLMs not only efficient but also faithful and dependable. Focus This project focuses on understanding how student models inherit hallucinations from teacher models, and the methods to reduce hallucination in distilled models by intervening in the distillation process. Method Students could begin by reviewing recent advances in large language model distillation [1, 2] and hallucination detection like [3, 4]. The project will first conduct an empirical study to examine how student models inherit hallucinations from their teacher models. Building on these insights, the project will develop a method that uses hallucination detectors to identify and filter hallucinated content in the teacher’s outputs before distillation, aiming to reduce hallucinations in the resulting student models. [1] Zhu, Xunyu, et al. "A survey on model compression for large language models." Transactions of the Association for Computational Linguistics 12 (2024): 1556-1577. [2] Xu, Xiaohan, et al. "A survey on knowledge distillation of large language models." arXiv preprint arXiv:2402.13116 (2024). [3] Farquhar, Sebastian, et al. "Detecting hallucinations in large language models using semantic entropy." Nature 630.8017 (2024): 625-630. [4] Obeso, Oscar, et al. "Real-time detection of hallucinated entities in long-form generation." arXiv preprint arXiv:2509.03531 (2025).
Topics in Randomised Algorithms and Computational Complexity	Andreas Galanis	Algorithms and Complexity Theory		C	MSc	Description: Andreas Galanis is willing to supervise projects in the areas of randomised algorithms and computational complexity. Problems of interest include (i) the analysis of average case instances of hard combinatorial problems (example: can we satisfy a random Boolean formula?), and (ii) the computational complexity of counting problems (example: can we count approximately the number of proper colourings of a graph G?). The projects would suit mathematically oriented students, especially those with an interest in applying probabilistic methods to computer science.
Topics in rapid mixing Markov chains	Andreas Galanis	Algorithms and Complexity Theory		C	MSc	Prerequisites: Probability theory, combinatorics. Background *Xusheng* is willing to supervise projects in the areas of sampling algorithms, design and analysis of Markov chains, with a focus on probabilistic and combinatorial methods. An example project could be the analysis of convergence rates for non-reversible Markov chains. Irreversible chains offer both theoretical challenges and practical benefits, providing insights into stochastic dynamics and enabling the design of faster algorithms for sampling and optimization tasks in fields like statistical physics, machine learning. Their analysis, however, presents unique challenges as classical tools like spectral analysis and conductance bounds may not directly apply. Focus These projects would suit students with a strong mathematical background, particularly those interested in probability theory, combinatorics, and theoretical computer science. Implementation skills can also be useful, as some projects may involve designing algorithms, running simulations, or exploring computational aspects to complement theoretical insights. Method https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.119.240603
AI for Photonic Design	Jiarui Gan, Mengyun Wang	Artificial Intelligence and Machine Learning			MSc	Prerequisites: AI/ML, reinforcement learning, Python Background Photonic materials and devices, such as metasurfaces and photonic crystals, are crucial in applications ranging from telecommunications to biomedical sensing. Traditional approaches to designing these structures often rely on extensive trial-and-error simulations, which can be time-consuming and computationally expensive. By using ML and AI, it becomes possible to explore large design spaces more efficiently, leading to novel photonic structures with optimized performance. AI-based photonic design typically involves: Surrogate modelling: Training a model (e.g., a neural network) to predict optical responses (reflection, transmission, bandgap properties) from geometrical parameters. Inverse design: Learning to propose geometry configurations that meet specific optical targets directly. Recent work in this field has shown significant reductions in computational cost and enhancements in device performance, making AI-driven photonic design an increasingly promising research direction. This project will develop and evaluate a machine learning approach for the inverse design of a particular class of photonic structures (e.g., photonic crystals, metasurfaces). The main question is how effectively AI-based methods can discover device geometries that meet predefined optical goals compared to established optimization methods. Focus Expected Contributions and goals include: ML Model Training: - Implement a forward model (predicting optical response from geometry) and/or an inverse model (predicting geometry from target responses). - Assess prediction accuracy using relevant metrics. Baseline Comparison: - Compare AI-driven design strategies with a conventional optimization approach (e.g., genetic algorithm or gradient-based method). - Evaluate time-to-solution, final device performance, and robustness. Insights and Design Principles: - Derive insights into how AI-driven solutions differ from conventional methods. - Highlight potential new strategies or patterns in photonic device engineering. Method We expect to explore reinforcement learning or Bayesian optimization to iteratively refine designs with fewer required simulations. Familiarity in these areas and AI/ML foundations is therefore required. Proficiency in implementing ML algorithms is also preferred. References [1] Chen, Mu Ku, et al. "Artificial intelligence in meta-optics." Chemical Reviews 122.19 (2022): 15356-15413. [2] Bonfanti, Silvia, et al. "Computational design of mechanical metamaterials." Nature Computational Science 4.8 (2024): 574-583. [3] So, Sunae, Trevon Badloe, Jaebum Noh, Jorge Bravo-Abad, and Junsuk Rho. "Deep learning enabled inverse design in nanophotonics." Nanophotonics 9, no. 5 (2020): 1041-1057.
Dynamic information design	Jiarui Gan	Artificial Intelligence and Machine Learning			MSc	In information design, a more-informed player (sender) influences a less-informed decision-maker (receiver) by signalling information about the state of the world. The problem for the sender is to compute an optimal signalling strategy, which leads to the receiver taking actions that benefit the sender. Dynamic information design, as a new frontier of information design, generalises the one-shot framework to dynamic settings that are modelled based on Markov decision processes. The goal of the project is to study several variants of the dynamic information design problem and it can be approached from both theoretical and empirical perspectives. Theoretically, the focus is on determining the computational complexity of the optimal information design in different dynamic settings and developing algorithms. A background in computational complexity and algorithm design is beneficial. Practically, the objective is to apply existing algorithms to novel applications, such as traffic management or board games, and to develop algorithms that work effectively in real-world scenarios using state-of-the-art methods. Knowledge in Markov Decision Processes, stochastic/sequential games, and Reinforcement Learning is preferred. Related work: J. Gan, R. Majumdar, G. Radanovic, A. Singla. Bayesian persuasion in sequential decision-making. AAAI '22 E. Kamenica, M. Gentzkow. Bayesian persuasion. American Economic Review, 2011 S. Dughmi. Algorithmic information structure design: a survey. ACM SIGecom Exchanges 15.2 (2017): 2-24. Wu, J., Zhang, Z., Feng, Z., Wang, Z., Yang, Z., Jordan, M. I., & Xu, H. (2022). Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning. arXiv preprint arXiv:2202.10678.
Scalable Equilibrium Computation in n-Player General-Sum Games	Jiarui Gan	Artificial Intelligence and Machine Learning			MSc	Prerequisites: Computational Game Theory, Algorithm Design and Analysis, Linear Programming, Python Background Computing equilibria—such as Nash equilibria (NE) or their refinements—is a fundamental challenge in game theory and artificial intelligence. Traditional methods can become intractable as the number of agents and strategy spaces increase. To address this, Policy Space Response Oracles (PSRO) and its extension Joint Policy Space Response Oracles (JPSRO) have emerged as powerful frameworks [1,2]. These algorithms iteratively expand the strategy space by computing best responses and refining equilibrium approximations [3]. PSRO considers best responses for each player independently, while JPSRO computes joint best responses, often leading to more coordinated and efficient outcomes. However, there is room to further improve JPSRO by introducing more structured methods for joint strategy generation and leveraging scalable optimisation techniques. The goal of this project is to investigate these methods, establish their theoretical underpinnings, implement the algorithms, and demonstrate their efficiency through empirical experiments. In doing so, we aim to develop practical tools for equilibrium computation in multi-agent games. Focus Theoretical Algorithm Design - Develop a novel scalable algorithm for computing equilibria of large, general-sum n-player games. Prove convergence theorems, including bounds on convergence speed and approximation quality. - Investigate additional algorithmic properties regarding the scalability of the approach. Empirical Implementation and Evaluation - Implement the proposed algorithm, alongside relevant benchmarks. Compare against established baselines to demonstrate computational efficiency and enhanced coordination. - Provide documentation to guide future research on scalable equilibrium computation. Method The theoretical part of the work will be proof-based. The empirical part uses Python to implement the algorithm. The following skills are essential for this project: - Familiarity with game theory, algorithm design, and optimisation methods like linear programming. - Comfort with formal proofs and mathematical writing. - Proficiency in Python. References [1] Lanctot, Marc, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, and Thore Graepel. "A unified game-theoretic approach to multiagent reinforcement learning." Advances in neural information processing systems 30 (2017). [2] Marris, Luke, Paul Muller, Marc Lanctot, Karl Tuyls, and Thore Graepel. "Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers." In International Conference on Machine Learning, pp. 7480-7491. PMLR, 2021. [3] Bighashdel, Ariyan, Yongzhao Wang, Stephen McAleer, Rahul Savani, and Frans A. Oliehoek. "Policy space response oracles: A survey." arXiv preprint arXiv:2403.02227 (2024).
Few-step Distillation for Flow Matching Generative Models	Yicheng Gao, Niki Trigoni	Artificial Intelligence and Machine Learning		C	MSc	Abstract Recent advances in generative AI have led to rapid progress in image, video, and multimodal generation. Many of these systems rely on diffusion or flow-based generative frameworks, and Flow Matching has become a promising approach due to its strong generative quality and simpler training objectives. A key practical limitation, however, is that Flow Matching models still require multi-step ODE integration at inference time, making fast sampling an active challenge. Distillation provides a way to accelerate generation by transferring the behaviour of a pretrained model into a more efficient few-step or one-step generator, and several alternative formulations have been proposed for this purpose. This project will investigate a distillation approach for Flow Matching models and evaluate its effectiveness relative to standard few-step baselines in terms of sample quality, stability, and efficiency. Pre-requisites: Suitable for those who have taken a course in machine learning. Some familiarity with PyTorch would be beneficial. References: [1] Lipman, Yaron, et al. "Flow matching for generative modeling." International Conference on Learning Representations (ICLR), 2023. arXiv:2210.02747. [2] Yin, Tianwei, et al. "One-step diffusion with distribution matching distillation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2024. arXiv:2311.18828 [3] Frans, Kevin, et al. "One Step Diffusion via Shortcut Models." International Conference on Learning Representations (ICLR), 2025. arXiv:2410.12557 [4] Geng, Zhengyang, et al. "Mean Flows for One-step Generative Modeling." Advances in Neural Information Processing Systems (NeurIPS), 2025. arXiv:2505.13447
Probability Path Design for Discrete Flow Matching Generative Models	Yicheng Gao, Niki Trigoni	Artificial Intelligence and Machine Learning		C	MSc	Abstract Discrete Flow Matching (DFM) is a recently introduced framework for non-autoregressive generation over discrete sequences such as text or code. In contrast to continuous flow models, which evolve samples within a continuous space, DFM defines a flow over discrete state spaces through probability paths that connect a simple source distribution to the data distribution. A key modelling choice is how this path is constructed, including the scheduling function, the coupling between source and target sequences, and the form of intermediate conditional distributions. Prior work has shown that these design decisions can affect perplexity, convergence behaviour and sampling stability, even when the underlying model remains unchanged. This project will examine how different definitions of probability paths influence the performance of discrete flow models on small-scale datasets. The work will involve implementing DFM with several path choices and scheduling strategies, and comparing their effects on sampling quality, convergence and robustness. The goal is to understand how path design shapes the behaviour of discrete flows and to identify settings that lead to more reliable or efficient generation. Pre-requisites: Suitable for those who have taken a course in machine learning. Some familiarity with PyTorch would be beneficial. References: [1] Lipman, Yaron, et al. "Flow matching for generative modeling." International Conference on Learning Representations (ICLR), 2023. arXiv:2210.02747. [2] Gat, Itai, et al. "Discrete flow matching." Advances in Neural Information Processing Systems 37 (NeurIPS), 2024. [3] Austin, Jacob, et al. "Structured denoising diffusion models in discrete state-spaces." Advances in neural information processing systems 34 (NeurIPS), 2021.
Shortcut Models for Efficient Sampling in Flow-Based Generative Models	Yicheng Gao, Niki Trigoni	Artificial Intelligence and Machine Learning		C	MSc	Abstract Flow-based generative models normally require many ODE integration steps to produce high-quality samples. Shortcut models aim to accelerate sampling by replacing this step-by-step integration with a single update or a small number of updates. Standard samplers follow the instantaneous velocity of the flow at each step, while shortcut methods instead predict a more direct transformation from noise to data, such as an average velocity or a displacement estimate. Because these approximations capture only part of the underlying dynamics, different shortcut designs can vary substantially in accuracy and stability. This project will investigate how different shortcut formulations affect the quality of fast sampling in flow-based generative models. The work will analyse continuous-time flows and small pretrained generative models, comparing several choices of velocity or displacement representations for one-step or few-step generation. The project will evaluate these methods in terms of approximation accuracy, stability, and their ability to match the behaviour of standard multi-step samplers. Pre-requisites: Suitable for those who have taken a course in machine learning. Some familiarity with PyTorch would be beneficial. References: [1] Lipman, Yaron, et al. "Flow matching for generative modeling." International Conference on Learning Representations (ICLR), 2023. arXiv:2210.02747. [2] Frans, Kevin, et al. "One Step Diffusion via Shortcut Models." International Conference on Learning Representations (ICLR), 2025. arXiv:2410.12557 [3] Geng, Zhengyang, et al. "Mean Flows for One-step Generative Modeling." Advances in Neural Information Processing Systems (NeurIPS), 2025. arXiv:2505.13447 [4] Shafir et al., "Terminal Velocity Matching," 2025. arXiv:2511.19797. https://lumalabs.ai/blog/engineering/tvm
Training-Free Video Editing with Pretrained Flow-Based Generative Models	Yicheng Gao, Niki Trigoni	Artificial Intelligence and Machine Learning		C	MSc	Abstract Training-free generative methods have recently become popular for image editing tasks, where a pretrained model is guided by text prompts to modify an input image without any additional model training. Such methods are widely used for style changes, attribute edits, and super-resolution, and the goal is to alter selected visual attributes while preserving the overall scene and identity of the input. Extending this paradigm to video is more challenging, as video edits must stay consistent across time, maintain object identity, and avoid frame-wise drift or flicker. This project will explore how training-free image-editing techniques can be adapted to video using pretrained flow-based generative models. The work will evaluate whether this approach can produce coherent, prompt-aligned edits and will compare its performance with inversion-based and optimisation-based baselines. Pre-requisites: Suitable for those who have taken a course in machine learning. Some familiarity with PyTorch would be beneficial. References: [1] Lipman, Yaron, et al. "Flow matching for generative modeling." International Conference on Learning Representations (ICLR), 2023. arXiv:2210.02747. [2] Meng, Chenlin, et al. "SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations" International Conference on Learning Representations (ICLR), 2022. arXiv:2108.01073. [3] Kulikov, Vladimir, et al. "FlowEdit: Inversion-free text-based editing using pre-trained flow models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. [4] Mokady, Ron, et al. "Null-text inversion for editing real images using guided diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2023.
Cake-cutting with low envy	Paul Goldberg	Algorithms and Complexity Theory		C		Cake-cutting refers to the design of protocols to share a divisible good amongst a collection of agents. A standard starting-point for cake-cutting protocols is the classical "I cut, you choose" rule. This rule is said to be "envy-free" since each player can ensure that they value their own share at least as much as they value the other player's share. Well-known further work has extended this idea to more than 2 players. In the paper at the URL below, we identify various classes of protocols, and show that one can convert protocols from one kind to another so as to maintain the worst-case level of envy that results. https://arxiv.org/abs/2108.03641 The project is to look at the computational complexity of evaluating the worst-case level of envy that can result from using a given protocol, and related questions about protocols belonging to classes of interest. The project is mainly based on mathematical analysis as opposed to computational experiment. But there is scope for some computational experiment, for example in searching for value functions that result in a high envy.
Learning probabilistic automata	Paul Goldberg	Algorithms and Complexity Theory		C		Probabilistic automata are a means of producing randomly-generated strings that are accepted/recognised by a finite automaton. (They are conceptually similar to hidden Markov models.) The general topic of the project is to find algorithms to reconstruct an automaton based on observations of strings generated by that automaton. It's a topic that has led to a steady stream of papers in the research literature, and the project would focus on one of the many variants of the problem. For example, "labelled automata" in which partial information is provided about the current state of the automaton in the form of random labels associated with states. Another general issue is how to efficiently extract extra information present in long strings generated by the unknown automaton. Within this project topic, there is scope for focusing either on experiments, or on mathematical analysis of algorithms. In the latter case, it would be helpful to know the basic concepts of computational learning theory.
Cooperative Capabilities in Multi-Agent AI Systems: Formalisation, Training, and Evaluation	Imran Hashmi, Michael Wooldridge	Artificial Intelligence and Machine Learning			MSc	This thesis investigates the specification, emergence, and evaluation of cooperative capabilities in multi-agent AI systems. Cooperative capabilities are defined as functional properties that enable agents to collaborate effectively toward collective objectives. Two primary methodologies: (a) Game-theoretic modelling and (b) Multi-Agent Reinforcement Learning (MARL), will be explored to formalise, train, and evaluate cooperative behaviour. Through a case study involving resource allocation in multi-agent systems, this research demonstrates how agents can overcome challenges of cooperative AI. The main contribution of this thesis will be a comprehensive framework and actionable strategies to design and evaluate cooperative systems, enabling effective collaboration in dynamic and complex environments. Goals and Objectives Formalisation of Cooperative Capabilities Identify key aspects of cooperative capabilities: communication, coordination, negotiation, and collective planning. Create a taxonomy categorising these capabilities. Develop formal definitions of cooperative capabilities using game-theoretic models and MARL frameworks. Metrics for Rigorous Evaluation Design robust, interpretable metrics for measuring cooperative capabilities. Validate metrics using theoretical models and empirical results. Training and Emergence Studies Investigate conditions under which cooperative capabilities emerge, considering the differences in training techniques between game theory-based modelling and MARL. Perform empirical experiments in multi-agent settings to evaluate cooperative behaviours. Impact Analysis of Asymmetries Asymmetric agent capabilities refer to differences in the abilities, resources, or knowledge that agents possess in a multi-agent system. Model scenarios with asymmetric agent capabilities and evaluate their effects on collective outcomes. Propose methods to mitigate risks from imbalances, such as reward redistribution or coalition-building mechanisms. Strategies for Fostering Cooperation Suggest interventions to enhance cooperative outcomes using methods like fine-tuning policies or designing cooperative reward structures. Adapt interventions for game-theoretic and MARL-based systems to test their effectiveness. Tasks Literature Review and Taxonomy Development Review existing research on cooperative AI in both game-theoretic and MARL paradigms. Develop a comprehensive taxonomy of cooperative capabilities. Formal Specification (Game Theory) Define cooperative behaviours using game-theoretic models such as Nash equilibrium, Pareto efficiency, and cooperative game solutions. Specify desirable and undesirable behaviours while integrating ethical considerations. Learning-Based Modelling (MARL) Train agents using MARL frameworks with cooperative policies, such as value decomposition networks or shared reward mechanisms. Design reward structures and communication protocols to foster cooperation. Design of Evaluation Metrics Create and validate task-specific metrics for both approaches (e.g., efficiency, fairness, and stability). Simulation and Empirical Studies For Game Theory: Design controlled settings to explore theoretical cooperation strategies under bounded rationality and asymmetry in Agent-based simulation frameworks For MARL: Simulate dynamic multi-agent environments with emergent behaviours using platforms like OpenAI Gym or Unity ML-Agents. Impact Analysis Investigate the influence of asymmetries (e.g., mobility, resource access) on cooperation in game-theoretic and MARL systems. Develop strategies to mitigate negative impacts in each framework. Intervention Design Propose and implement interventions, such as dynamic coalitions or hybrid training strategies. Test interventions across both approaches to assess their scalability and robustness. Research Methodology Game-Theoretic Modelling Theoretical Framework Define agent interactions in an ABM using cooperative and non-cooperative game theory. Verification Tools Use formal methods like probabilistic model checking (e.g., PRISM) to validate theoretical models. Simulation and Analysis Create small-scale simulations to test theoretical predictions in controlled environments. Multi-Agent Reinforcement Learning Model Training Train agents in MARL environments Implement cooperative reward structures and communication protocols. Simulation Environments Use platforms like OpenAI Gym, Unity ML-Agents, or custom-built environments to train and test agents. Experimental Analysis Test training strategies to evaluate emergent cooperative behaviours. Case Study: Multi-Agent Resource Allocation Game Objective Evaluate cooperative capabilities in a resource-constrained environment using game-theoretic modelling or MARL. Scenario Environment: A grid world where agents compete for limited resources such as food, water, and energy. Goal: Balance individual needs with collective objectives through cooperative mechanisms. Tasks Baseline Experiment Game Theory: Define the resource-sharing game and compute theoretical cooperative solutions (e.g., Pareto-optimal allocations). MARL: Implement baseline training for agents to collect resources individually without cooperation. Cooperative Experiment Game Theory: Introduce incentive schemes or binding agreements to encourage cooperation and measure changes in efficiency. MARL: Enable agent communication and train cooperative policies to optimise resource allocation collectively. Asymmetry Experiment Game Theory: Introduce capability asymmetries (e.g., restricted resource access) and analyse their impact on cooperation. MARL: Simulate agents with heterogeneous abilities and explore strategies to mitigate disparities. Metric Validation Apply proposed metrics to quantify cooperation levels and validate their effectiveness for both methods. Intervention Design Develop strategies for improving cooperation, such as adaptive reward redistribution or prioritising weaker agents. Test interventions using both game-theoretic and MARL setups. Reflection Compare outcomes of the game-theoretic and MARL approaches. Propose improvements and future directions for research. Technical Requirements Suitability: MSc Prerequisites Proficiency in Python and familiarity with either game theory, agent-based modelling, reinforcement learning. Knowledge of MARL libraries (e.g., PyTorch, TensorFlow) or formal verification tools (e.g., PRISM). Software and Tools Game Theory: ABM framework (e.g., MESA, Agent.JL, BEAST) Formal verification tools (PRISM, SPIN). MARL: Simulation platforms (OpenAI Gym, Unity ML-Agents). Hardware Requirements Access to GPU-enabled machines for ABM simulations and MARL training. Deliverables Formal Definitions and Taxonomy Detailed formal definitions of cooperative capabilities, including a comprehensive taxonomy. Metrics and Evaluation Tools Validated metrics and tools applicable to game-theoretic and MARL approaches. Empirical Results Comparative analysis of cooperative behaviours under both paradigms. Strategies for Enhancing Cooperation Interventions tailored for game-theoretic and MARL systems, with tested results and guidelines. References Dafoe, A., Hughes, E., Bachrach, Y., Collins, T., McKee, K. R., Leibo, J. Z., Larson, K., & Graepel, T. (2020), Open Problems in Cooperative AI. arXiv preprint arXiv:2012.08630. Conitzer, V., & Oesterheld, C. (2023) Foundations of Cooperative AI. Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 15359-15367. Barton, S. L., Waytowich, N. R., Zaroukian, E., & Asher, D. E. (2019), Measuring Collaborative Emergent Behavior in Multi-Agent Reinforcement Learning. In Human Systems Engineering and Design: Proceedings of the 1st International Conference on Human Systems Engineering and Design (IHSED2018) (pp. 422-427). Springer International Publishing. Shoham, Y., & Leyton-Brown, K. (2008), Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press. An Introduction to MultiAgent Systems - Second Edition by Michael Wooldridge Published May 2009 by John Wiley & Sons Multi-Agent Reinforcement Learning Foundations and Modern Approaches By Stefano V. Albrecht, Filippos Christianos, Lukas Schäfer · 2024 Omicini, A., Petta, P., & Pitt, J. (Eds.). (2004) Engineering Societies in the Agents World IV. Springer. Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W. M., ... & Silver, D. (2019).
Incorporating Reinforcement Learning into the Big and Efficient Agent-based Simulation Toolkit (BEAST)	Imran Hashmi, Michael Wooldridge	Artificial Intelligence and Machine Learning			MSc	Introduction Agent-Based Models (ABMs) are powerful tools for simulating complex systems, where individual agents operate based on predefined rules, leading to the emergence of collective behaviours. These emergent phenomena can be either desirable, such as achieving equilibrium or convergence, or undesirable, such as runaway snowball effects that amplify systemic instability. ABMs provide a flexible framework to analyse and predict such outcomes, enabling insights into the underlying dynamics of complex systems. BEAST (Big and Efficient Agent-based Simulation Toolkit) is a high-performance, modular simulation platform designed for modelling large-scale ABM simulations. It leverages advanced computational techniques such as GPU acceleration, Pytorch vectorized operations, and scalable batch processing to handle complex interactions between agents and their environments. BEAST has been used to study the spatial ecology and complex population dynamics of a large genetically modified mosquito population. By integrating biological behaviours, spatial dynamics, and environmental feedback, BEAST enables researchers to explore scenarios such as species dispersal, population dynamics, and genetic inheritance with unprecedented detail and efficiency. Its modular design ensures flexibility, allowing users to implement custom models, define agent behaviours, and integrate geospatial data for realistic simulations. With support for millions of agents and multi-GPU capabilities, BEAST is particularly well-suited for studying ecological processes at large temporal and spatial scales. However, current implementations rely on predefined rules, limiting the model's adaptability and responsiveness to dynamic environments. Reinforcement Learning (RL), a subfield of machine learning, enables agents to learn optimal behaviours through interaction with their environment. Integrating RL into BEAST can enable agents to adaptively achieve desirable emergent behaviours, advancing the state-of-the-art in ABM design and providing new capabilities for simulating real-world scenarios. This thesis proposes a novel integration of RL into the BEAST framework to train agents to optimise survival and reproduction under dynamic environmental conditions. The methodology focuses on leveraging RL to guide agent decision-making, achieving emergent system-level objectives without requiring exhaustive rule definitions. State of the Art: Reinforcement Learning in ABMs Current research demonstrates promising applications of RL in ABMs, including adaptive traffic management, ecological simulations, and urban development modelling. RL has been particularly effective in enabling: Dynamic Strategy Adaptation: Agents learn strategies for resource acquisition and risk avoidance. Scalable Coordination: Multi-agent RL techniques enhance coordination among agents to achieve collective goals. Optimised Emergent Phenomena: RL facilitates desirable emergent patterns, such as survivability, stability or equilibrium, in systems with complex interactions. Core Contribution The primary contribution of this thesis is to: Develop a methodology for integrating RL into the BEAST framework, enabling agents to learn behaviours that optimise and achieve desirable emergence (e.g., survivability of mosquitoes having a particular genes) Demonstrate the scalability and effectiveness of RL in large-scale ABMs, addressing computational and behavioural complexities. Introduce a case study using mosquitoes as agents that utilise RL to improve survival and reproduction by adapting to environmental changes, including resource availability, predators, and spatial boundaries. Goals and Objectives Enhance BEAST with RL Capabilities: Design RL modules for agent decision-making. Implement scalable RL training algorithms. Achieve Desirable Emergence in ABMs: Define metrics to evaluate emergent phenomena. Develop training objectives for aligning RL policies with system-level goals. Validate RL-Enhanced BEAST Framework: Simulate mosquito survival scenarios. Compare RL-driven and rule-based agent behaviours in achieving emergent objectives. Methodology Framework Enhancement: Integrate RL libraries (e.g., PyTorch) with BEAST’s core modules. Design interfaces for RL policy updates and reward assignment in the BEAST simulation loop. Training Process: Define states (e.g., resource availability, neighbouring agents), actions (e.g., movement, reproduction), and rewards (e.g., survival, offspring count). Employ Actor-Critic or Proximal Policy Optimisation (PPO) algorithms for efficient training. Emergence Optimization: Formulate system-level objectives as emergent properties (e.g., population stability). Train agents to maximise individual rewards aligned with global objectives. Case Study Implementation: Simulate mosquito agents learning to optimise survival by avoiding predators, seeking mates, and locating resources using RL policies. Evaluate outcomes with and without RL. Technical Requirements Hardware: GPU-accelerated compute resources for scalable RL training. High-memory nodes for large-scale ABM simulations. Software: BEAST framework with integrated RL modules. PyTorch for RL model training. CuSpatial for geospatial analysis and boundary checks. Data: Geospatial data for defining simulation environments (e.g., vegetation density, resource distribution). Agent-level attributes for initialisation. Use-Case: Mosquitoes Adapting for Survival In the proposed use-case, mosquito agents will use RL to learn behaviours that enhance their survival and reproductive success. Key Features: State Representation: Environmental factors (e.g., temperature, humidity). Proximity to resources and other agents. Action Space: Movement decisions to seek resources or mates. Avoidance of predation risks. Reward Function: Positive rewards for successful reproduction and resource acquisition. Negative rewards for predation or energy depletion. Emergent Outcomes: Stable population dynamics. Spatially distributed resource usage. Adaptation to environmental changes. Expected Outcomes Framework Advancement: Integration of RL into BEAST to support dynamic, goal-oriented agent behaviours. Scalable Simulations: Demonstrate RL’s scalability for large-scale, heterogeneous ABMs. Impactful Use-Case: Validate RL’s effectiveness in improving emergent behaviours, such as mosquito population management.
Quantum Information Projects	Matty Hoban	Quantum		C	MSc	Prerequisites: Quantum Information Overview I am interested in supervising projects in quantum computing and quantum information. Quantum computing and quantum information (partly) looks at what happens to our information processing power when information can be fully described by quantum theory, instead of deterministic or probabilistic (classical) theories, as implicitly used in conventional machines. What new power do you get, and what limitations do we have? For instance, we do not believe that quantum computers can solve NP-HARD problems efficiently, but they seem to be able to solve some problems that are hard to solve for a conventional machines. Computational complexity is a natural toolkit with which to look at the power of quantum machines. I would (ideally) like to work with a student looking at computational problems within quantum information, such as the hardness of determining the entropy of quantum systems, or the uncertainty in quantum measurements. What interesting consequences are there if one of these problems cannot be solved even on a quantum computer? I am also happy to supervise projects in quantum information such as the certification and verification of quantum devices, and resources for quantum cryptography. In general, I like to meet with students to discuss projects together and am happy to form a project together. We also organise various events and group seminars for students doing (or interested in doing) dissertations with the Quantum Group, usually starting in HT. If you are interested, contact aleks.kissinger@cs.ox.ac.uk and we'll make sure you get informed about those.
Distributional prior in Inductive Logic Programming	Celine Hocquette				MSc	Inductive Logic Programming is a form (ILP) of Machine Learning based on Computational Logic. Given examples and some background knowledge, the goal of an ILP learner is to search for a hypothesis that generalises the examples. However, the search space in ILP is a function of the size of the background knowledge. All predicates are treated as equally likely, and current ILP systems cannot make use of distributional assumptions to improve the search. This project focuses on making use of an assumed given prior probability over the background predicates to learn more efficiently. The idea is to order subsets of the background predicates in increasing order of probability of appearance. This project is a mix of theory, implementation, and experimentation. Prerequisites: logic programming, probability theory, or interest to learn about it
Identifying relevant background knowledge in Inductive Logic Programming	Celine Hocquette				MSc	Inductive Logic Programming (ILP) is a form of Machine Learning based on Computational Logic. Given examples and some background knowledge, the goal of an ILP learner is to search for a hypothesis that generalises the examples. The search space in ILP is a function of the size of the background knowledge. To improve search efficiency, we want to identify relevant areas of the search space as relevant background knowledge predicates. We propose to evaluate and compare several relevance identification methods such as compression of the examples or statistical approaches. This project is a mix of theory, implementation, and experimentation. Prerequisites: logic programming, statistical learning, or interest to learn about it
A fast numerical solver for multiphase flow	David Kay	Computational Biology and Health Informatics	B	C	MSc	The Allen-Cahn equation is a differential equation used to model the phase separation of two, or more, alloys. This model may also be used to model cell motility, including chemotaxis and cell division. The numerical approximation, via a finite difference scheme, ultimately leads to a large system of linear equation. In this project, using numerical linear algebra techniques, we will develop a computational solver for the linear systems. We will then investigate the robustness of the proposed solver.
Efficient solution of one dimensional airflow models	David Kay	Computational Biology and Health Informatics	B	C	MSc	In this project we will investigate the use of numerical and computational methods to efficiently solve the linear system of equations arising from a one dimensional airflow model within a network of tree like branches. This work will build upon the methods presented within the first year Linear Algebra and Continuous Maths courses. All software to be developed can be in the student’s preferred programming language. General area: Design and computational implementation of fast and reliable numerical models arising from biological phenomena.
A theoretical investigation of the Bag Gain phenomenon in steganography	Andrew Ker	Security	B	C	MSc	Prerequisite: Part A Probability, or a similar course Bag gain is something that happens when a sender wishes to use steganography to spread a secret message across a number of covers: the set of objects sent, some of which contain the hidden payload, is called a bag. Theory predicts that the size of the secret that can be undetectably transmitted should scale with the square root of the size of the bag, but in practice researchers have observed that it grows faster. This is attributed to being able to select only the "best" covers in the bag, where "best" means those in which the presence hidden data is hardest to detect (for example, noisy images). This is theoretical project that aims to prove theorems about highly abstract versions of the above problem. For example, the "covers" can be simply binary pixels with different Bernoulli probabilities, and the "steganography" can simply flip a bit. The first part of the project would re-prove the classic square-root law when the flipped pixels are selected at random, and the second part would try to prove asymptotic bounds on detectability when the "covers" are selected to carry "steganography" depending on their Bernoulli parameter.
An empirical investigation of the Bag Gain phenomenon in steganography	Andrew Ker	Security	B	C	MSc	Bag gain is something that happens when a sender wishes to use steganography to spread a secret message across a number of covers: the set of objects sent, some of which contain the hidden payload, is called a bag. Theory predicts that the size of the secret that can be undetectably transmitted should scale with the square root of the size of the bag, but in practice researchers have observed that it grows faster. This is attributed to being able to select only the "best" covers in the bag, where "best" means those in which the presence hidden data is hardest to detect (for example, noisy images). This project, which is experimental in nature, aims to replicate and extend these observations. The student will need to use off-the-shelf implementations of simple steganography in images, with an image library (supplied by the supervisor), implementing a steganography detector by combining off-the-shelf implementations, which will need an ability to train CNNs (probably using pytorch, but other packages may also be suitable). Experiments will determine how the detectability of hidden data depends its size, the size of the bag, and the method used to spread the message into the bag. The results will then be analyzed.
Extensions of the square root law of steganography	Andrew Ker	Security	B	C	MSc	This is a theoretical project that aims to prove probabilistic results relating to the capacity of different types of cover to conceal a hidden message. It will use tools of probability, information theory, and mathematics. An advanced square root law can be found at http://www.cs.ox.ac.uk/andrew.ker/docs/ADK71B.pdf, and this project either aim for elementary proofs of special cases of that result, or elementary extensions, or to find counterexamples to illustrate that the hypotheses are necessary. Suitable for a student with a background in mathematics who has taken the Probability & Computing option.
Projects in Theoretical Computer Science: Graph Theory and Algorithms, Logic, Automata Theory	Sandra Kiefer	Algorithms and Complexity Theory	B	C	MSc	Graphs are a common model for relations between entities. A fundamental computational problem when dealing with graphs is that of isomorphism, the structural equivalence of two graphs. To handle the complexity algorithmically, canonical representations of the input graphs are often beneficial. In the search for such representations, one tries to make use of the properties of the considered graph class. Approaches to comparing graphs involve tools from combinatorics, finite-model theory, algorithms, and machine learning. Graphs and relational structures are also used as models of computation, for example, in the shape of transducers that transform strings into a binary or a more sophisticated output. Petri nets and some population protocols are abstractions for distributed computing with graph-like elements. In the context of graph-based computation models, mathematical logic has turned out to be fruitful to capture expressivity and to tackle verification questions. Sandra Kiefer supervises projects with connections to theoretical computer science, more precisely, topics in structural graph theory and graph algorithms, mathematical logic, automata theory, and graph neural networks. The projects can be purely theoretical or have practical components. Interested students should have knowledge in some of the aforementioned fields and a high interest in formalising concepts, building theories, and proving theorems.
Bots for a Board Game	Stefan Kiefer	Automated Verification	B	C	MSc	The goal of this project is to develop bots for the board game Hex. In a previous project, an interface was created to allow future students to pit their game-playing engines against each other. In this project the goal is to program a strong Hex engine. The student may choose the algorithms that underly the engine, such as alpha-beta search, Monte-Carlo tree search, or neural networks. The available interface allows a comparison between different engines. It is hoped that these comparisons will show that the students' engines become stronger over the years. The project involves reading about game-playing algorithms, selecting promising algorithms and datastructures, and design and development of software (in Java or Scala).
Using Virtual Reality to predict how we use memory in natural behaviour: collaborative interdisciplinary projects.	Stefan Kiefer	Automated Verification	B	C		Name and Email Address of Research Project Supervisor: Dr Dejan Draschkow, dejan.draschkow@psy.ox.ac.uk; Project Description: Although human knowledge and memories represent the past, we form and use them to support future behavior (Nobre & Stokes, 2019). Understanding which factors contribute to learning about the world and successfully finding the learned information in mind is of critical importance for developing methods for supporting this behavior in healthy individuals, but also in individuals with a range of neurocognitive and psychiatric conditions, such as stroke, Alzheimer’s, and Parkinson’s disease. Our novel virtual reality task (Draschkow, Kallmayer, & Nobre, 2021) combines the ecological validity, experimental control, and sensitive measures required to investigate the naturalistic interplay between memory and perception and opens the doors to investigating and supporting complex cognitive functions (https://www.youtube.com/watch?v=GT2kLkCJQbY). In the proposed interdisciplinary projects, computer science and experimental psychology students will be paired to develop and validate sophisticated virtual reality protocols for measuring and supporting complex cognitive mechanism. Specific projects will focus on selected sub-topics and vary in scope, depending on students' interests and what kind of project it is (3rd, 4th, or MSc). These include: • Programming and refining game-like cognitive VR tasks in C# (Unity) • Developing protocols for online-based assessments of cognitive functions in C#/JavaScript • Developing algorithms for detecting markers of neurocognitive symptoms (such as tremor for Parkinson’s disease) in multivariate VR data (R/Python) • Developing proof-of-concept multimodal (voice, visual, and touch) protocols for supporting learning and memory in VR (with implications for supporting dementia patients) (C#/JavaScript/R/Python) The projects are suitable for students who feel comfortable with highly interdisciplinary work/teams and have experience with (or be open to learn) scientific programming in C#/JavaScript/R/Python. Students will be fully integrated in a successful and collaborative research group and get hands-on experience with an interactive product-development cycle, including multiple stakeholders. Further related readings are: (Ballard et al., 1997; Hayhoe, 2017; Hayhoe & Ballard, 2014) Relevant readings from cognitive science: Ballard, D. H., Hayhoe, M. M., Pook, P. K., & Rao, R. P. N. (1997). Deictic codes for the embodiment of cognition. In Behavioral and Brain Sciences (Vol. 20, Issue 4, pp. 723–767). https://doi.org/10.1017/S0140525X97001611 Draschkow, D., Kallmayer, M., & Nobre, A. C. (2021). When Natural Behavior Engages Working Memory. Current Biology, 31(4), 869-874.e5. https://doi.org/10.1016/j.cub.2020.11.013 Hayhoe, M. M. (2017). Vision and Action. Annual Review of Vision Science, 3(1), 389–413. https://doi.org/10.1146/annurev-vision-102016-061437 Hayhoe, M. M., & Ballard, D. (2014). Modeling task control of eye movements. Current Biology : CB, 24(13), R622-8. https://doi.org/10.1016/j.cub.2014.05.020 Nobre, A. C. (Kia), & Stokes, M. G. (2019). Premembering Experience: A Hierarchy of Time-Scales for Proactive Attention. Neuron, 104(1), 132–146. https://doi.org/10.1016/j.neuron.2019.08.030
Fault Tolerant Syndrome Extraction in Context	Aleks Kissinger, Andrey Khesin	Quantum		C	MSc	Implementations of various quantum algorithms need to be done fault tolerantly in order to prevent errors on a few qubits propagating to many. Various error correcting codes such as the surface code can sometimes admit circuits for measuring their parities that are not fault-tolerant in general, but propagate errors in a way that is not detrimental in the context of this code. This can allow for much more computationally efficient syndrome extraction circuits. There is evidence to suggest that various other codes also have this property. This project will approach this problem from two directions: first, it will attempt to compute or prove the resistance to certain types of propagating errors of highly structured codes such as bivariate bicycle [1], 2BGA [2], or mirror codes [3]. Second, the project will attempt to design efficient but not fully fault tolerant syndrome extraction circuits and find which codes (or possibly novel ones) as context would make these circuits fault tolerant. Students interested in doing a project are highly encouraged to take the MSc/PartC course Quantum Processes and Computation. [1] Bravyi, S., Cross, A. W., Gambetta, J. M., Maslov, D., Rall, P., & Yoder, T. J. (2024). High-threshold and low-overhead fault-tolerant quantum memory. Nature, 627(8005), 778-782. [2] Lin, H. K., & Pryadko, L. P. (2024). Quantum two-block group algebra codes. Physical Review A, 109(2), 022407. [3] Khesin, A. B., & Lu, J. Z. (2026). Mirror codes: High-threshold quantum LDPC codes beyond the CSS regime. Manuscript in preparation.
Morphing new Quantum Error Correcting Codes	Aleks Kissinger, Andrey Khesin	Quantum		C	MSc	Recent advances in quantum error correction have led to discoveries of novel quantum code families with large numbers of logical qubits and low parity-check weights. These include bivariate bicycle codes [1] and 2BGA codes [2], among many others. Additionally, a recent technique called morphing [3] has shown how to construct new codes with desirable properties such as reduced connectivity or better code parameters by treating an existing code as being halfway through the syndrome extraction process of an unknown code, and then computing what the unknown code must be. By combining morphing with recently-discovered families of low density parity check codes known as mirror codes [4], this project would study how those codes are related under morphing and what new code families can be created by using this technique. Students interested in doing a project are highly encouraged to take the MSc/PartC course Quantum Processes and Computation. [1] Bravyi, S., Cross, A. W., Gambetta, J. M., Maslov, D., Rall, P., & Yoder, T. J. (2024). High-threshold and low-overhead fault-tolerant quantum memory. Nature, 627(8005), 778-782. [2] Lin, H. K., & Pryadko, L. P. (2024). Quantum two-block group algebra codes. Physical Review A, 109(2), 022407. [3] Shaw, M. H., & Terhal, B. M. (2025). Lowering connectivity requirements for bivariate bicycle codes using morphing circuits. Physical review letters, 134(9), 090602. [4] Khesin, A. B., & Lu, J. Z. (2026). Mirror codes: High-threshold quantum LDPC codes beyond the CSS regime. Manuscript in preparation.
Quantum Software Projects	Aleks Kissinger	Quantum		C	MSc	I am interested in supervising projects in these areas: (i) developing better quantum compilers for translating high-level algorithms to real hardware, (ii) simulating complex quantum computations with classical (super) computers, and (iii) designing efficient fault-tolerant implementations of quantum computions for noisy hardware. Many of these projects rely on a mathematical methods based on the ZX calculus or related graphical calculi, mostly developed here in Oxford. Rather than specific, pre-made projects, I prefer to meet with students and design a project together, often with input/cosupervision from other members of the Quantum Group. Students interested in doing a project are highly encouraged to take the MSc/PartC course Quantum Processes and Computation. Research topic - Quantum Software: compiling, classical simulation, and verification
Computation Theory with Atoms	Bartek Klin	Programming Languages	B	C	MSc	Sets with atoms, also known as nominal sets, are an abstract foundational approach to computing over data structures that are infinite but highly symmetrical, so much so they are finitely presentable and amenable to algorithmic manupulation. Many basic results of classical mathematics and computation theory become more subtle, or even fail, in sets with atoms. For example, in sets with atoms not every vector space has a basis, and a Turing machine may not determinize. A variety of specific topics are available, aimed at mathematically oriented students. Prerequisities: Strong mathematical background is essential. Students who enjoy courses such as "Models of Computation", "Categories, Proofs and Processes" or "Computer-Aided Formal Verification" will find this subject suitable. Some relevant literature: M. Bojanczyk: Slightly Infinite Sets. draft available from https://www.mimuw.edu.pl/~bojan/upload/main-6.pdf A. Pitts: Nominal Sets: Names and Symmetry in Computer Science. Cambridge University Press, 2013.
Asymptotically automatic sequences	Jakub Konieczny, James Worrell	Automated Verification	B	C		Asymptotically automatic sequences - that is, sequences whose n-th term can be computed by a finite automaton given as input the expansion of n in a given base k - have long been studied in theoretical computer science, number theory, combinatorics and algebra, to name just a few. Recently I introduced a notion tentatively dubbed "asymptotically automatic sequence". This class of sequences remains largely unexplored. It's likely that new results can be obtained by minor modifications of existing arguments, while other cases will present more interesting challenges. The purpose of this project is to see what results on automatic sequences admit a straightforward generalisation to the asymptotic regime, and for which it's possible to construct a counterexample. The goal is to find at least one, preferably more, existing results on automatic sequences and either generalise it to asymptotically automatic sequences, or to disprove such a generalisation. Basic results and definitions can be found in: https://arxiv.org/abs/2305.09885 And a variant of Cobham's theorem can be found in: http://arxiv.org/abs/2209.09588 The standard reference for automatic sequences is “Automatic Sequences: Theory, Applications, Generalizations” by J.-P. Allouche and J. Shallit. Pre-requisites: Familiarity with automatic sequences / regular languages; basic analysis
Extensions of Presburger arithmetic by polynomial-like functions	Jakub Konieczny, James Worrell	Automated Verification	B	C		Presburger arithmetic – that is, the first order theory of the natural numbers with addition – is decidable, meaning that there exists a procedure to determine if a given sentence in the language of this theory is true or false. In contrast, Peano arithmetic – which also includes multiplication – is undecidable. This naturally leads to the much-studied question: Which extensions of Presburger arithmetic are decidable? For instance, adding the square function allows us to define multiplication and hence leads to an undecidable theory. The same applies to other polynomials. But what about more general functions? We recently noticed that for a number of simple generalised polynomials – that is, expressions built up from the usual polynomials using addition, multiplication and the integer part function – the corresponding extensions are also undecidable. The idea behind this project is to investigate more broadly the question of decidability of extensions of the Presburger arithmetic by generalised polynomials. The purpose of this project is to apply existing techniques to show in concrete cases that the extension of the Presburger arithmetic by a given generalised polynomial is undecidable. More generally, one can consider other classes of sequences which have polynomial-like behaviour in a suitable sense. The goal is to show undecidability of the extension of the Presburger arithmetic by at least one generalised polynomial which was not previously covered. More ambitious variants correspond to dealing with wider classes of generalised polynomials. Background on generalised polynomials can be found in (the introductory sections of) the papers: A canonical form and the distribution of values of generalized polynomials by A. Leibman Distribution of values of bounded generalized polynomials by V. Bergelson and A. Leibman Pre-requisites: A course in logic, some analysis preferred but not required
Truthful scheduling for graphs	Elias Koutsoupias	Algorithms and Complexity Theory	B	C	MSc	The aim of the project is to advance our understanding of the limitations of mechanism design for the scheduling problem, the "holy grail" of algorithmic mechanism design. Specifically, we will consider the graphical scheduling problem, in which every task can be only allocated to two machines, and study the approximation ratio of mechanisms for this setting. The aim is to prove both lower and upper bounds. Both directions are hard problems, and we plan to try to gain some understanding by experimentally searching for lower bounds or trying to verify potentially useful statements about the upper bound. Of particular importance is the case of trees and their special case of stars, i.e., when every task can be given either to the root or to a particular leaf. We plan to study not only the standard objective of the makespan, but the more general class of objectives in which the mechanism minimizes the L^p norm of the times of the machines. The case L^infinity is to minimize the makespan, L^1 is to minimize the welfare, and the case L^0 corresponds to the Nash Social Welfare problem, all of which are interesting problems. Further possible directions include considering fractional scheduling and mechanisms without money. Bibliography: George Christodoulou, Elias Koutsoupias, Annamária Kovács: Truthful Allocation in Graphs and Hypergraphs. ICALP 2021: 56:1-56:20 (https://arxiv.org/abs/2106.03724)
Divide-and-Conquer Context: Do many short-context agents beat one long-prompt agent?	Andrey Kravchenko		B	C	MSc	Supervised by Andrey Kravchenko andrey.kravchenko@cs.ox.ac.uk Expected background of students and CS techniques that will be applied The student will know how to work or be able to quickly learn how to work with Pytorch, Hugging Face, IPython notebooks. They should also have a solid mathematical base. Divide-and-Conquer Context: Do many short-context agents beat one long-prompt agent? Many studies show LLMs don’t robustly use very long prompts: performance drops as context grows and when key facts sit mid-prompt (“lost in the middle”); synthetic and realistic long-context suites (BABILong, RULER, LongBench/v2) likewise report sharp degradation beyond a few-thousand tokens on tasks that require multi-hop aggregation rather than simple retrieval. This motivates testing whether teams of short-context, task-specialized agents can beat a single long-prompt agent under the same token budget. The project involves comparing (i) a single long-prompt baseline that ingests the entire context vs. (ii) a multi-agent setup that shards the corpus into ≤1–5k-token slices, routes them to specialists (retriever, summarizer, evidence-checker), and aggregates with a judge/vote or Mixture-of-Agents layer; use AutoGen (or other alternatives) for orchestration variants. Evaluate on BABILong (reasoning-in-a-haystack) and LongBench/v2 (long-doc QA/summary/code). Track accuracy/F1, position robustness, wall-clock, total tokens, and an overhead ratio (coordination tokens ÷ reading tokens) while sweeping team size, shard overlap, and aggregator type. Outcomes & next step. Deliver (i) breakpoints where short-context teams reliably outperform long prompts, (ii) a “law of diminishing returns” curve for coordination overhead, and (iii) design guidance for team size/routing/aggregation. Potential next step is to build a Prompt-to-Team compiler that automatically decomposes long prompts into agent roles and shards under a token budget, validated on the same suites.
Do Punctuation Tokens Act as Sinks, Summaries, and Anchors in Transformers?	Andrey Kravchenko		B	C	MSc	Supervised by Andrey Kravchenko andrey.kravchenko@cs.ox.ac.uk *Expected background of students and CS techniques that will be applied* The student will know how to work or be able to quickly learn how to work with Pytorch, Hugging Face, IPython notebooks. They should also have a solid mathematical base. Do Punctuation Tokens Act as Sinks, Summaries, and Anchors in Transformers? Transformers often allocate unusually high attention to punctuation—especially periods and commas—hinting that these tokens play a structural role in how context is organized. This project will test three hypotheses: (0) punctuation acts as an attention “sink” that reliably attracts attention mass; (1) punctuation positions encode compressed summaries of the preceding span; and (2) these tokens serve as anchors that help the model predict what comes next. Methodologically, you’ll quantify attention-to-punctuation across families of pre-trained generative models (GPT, Llama), controlling for frequency and position. You’ll run causal interventions by removing, shuffling, or inserting punctuation and measuring changes in loss, attention redistribution, and representation drift. Finally, you’ll probe hidden states at punctuation to test whether they reconstruct preceding content and evaluate anchor effects on next-token prediction. Outcomes include a comparative map of punctuation-related attention by layer/head, causal evidence for or against sink/summary/anchor roles, and practical guidance for punctuation-aware decoding or caching in long-context settings.
Graph-Memory Agents for Self-Evolving Web and SWE Tasks	Andrey Kravchenko		B	C	MSc	Supervised by Andrey Kravchenko andrey.kravchenko@cs.ox.ac.uk *Expected background of students and CS techniques that will be applied* The student will know how to work or be able to quickly learn how to work with Pytorch, Hugging Face, IPython notebooks. They should also have a solid mathematical base. Graph-Memory Agents for Self-Evolving Web and SWE Tasks Do structured world models improve self-evolving agents compared to reasoning-memory banks? This project will adapt the AriGraph architecture—an evolving semantic-episodic knowledge graph for agent memory—to the ReasoningBank task suite (WebArena, Mind2Web, SWE-Bench-Verified). The goal is to test whether graph-based memory can match or surpass ReasoningBank’s distilled “reasoning memory” for long-horizon, cross-task learning, and to explore hybrid designs that combine both. Methodologically, you’ll implement an AriGraph-based agent for the ReasoningBank environments. AriGraph will encode web or code entities (e.g., pages, DOM elements, repositories, tests, patches) as nodes and link them via semantic and episodic relations (“contains,” “links_to,” “fails_on,” “fixed_by”). As the agent acts, it will extract triplets and events using an LLM parser, incrementally updating the graph to reflect new discoveries or corrections. For each new task, you’ll retrieve relevant subgraphs—filtered by recency and degree—to guide reasoning, replacing or augmenting ReasoningBank’s textual memory items. You’ll benchmark this AriGraph agent against the ReasoningBank + MaTTS baseline and simple retrieval or trajectory-memory controls. Metrics include success rate, steps per task, and computational efficiency across WebArena, Mind2Web, and SWE-Bench-Verified. Causal analyses will involve interventions on the graph (e.g., pruning nodes, corrupting edges, disabling contradiction handling) and patching tests (e.g., removing specific strategy memories) to measure effects on loss, reasoning stability, and generalization. Representation analyses will inspect how retrieved subgraphs influence hidden states and whether AriGraph supports smoother reasoning transitions than textual memory. Outcomes include: (i) a comparative map of graph vs. reasoning-memory performance by domain and model family; (ii) causal evidence for when structured world knowledge or distilled strategies drive improvement; and (iii) design insights for hybrid graph-reasoning memory, suggesting where to store world state versus reusable reasoning. The project may also extend AriGraph with temporal decay, graph compression, or multi-task schemas, offering practical guidance for scalable, structured, self-evolving agents in web and software reasoning settings.
Hidden Encoder in Decoder only generative models	Andrey Kravchenko		B	C	MSc	Supervised by Andrey Kravchenko andrey.kravchenko@cs.ox.ac.uk *Expected background of students and CS techniques that will be applied* The student will know how to work or be able to quickly learn how to work with Pytorch, Hugging Face, IPython notebooks. They should also have a solid mathematical base. Hidden Encoder in Decoder only generative models In decoder-only LMs, do some layers primarily encode context while later layers convert it into next-token predictions? This project will quantify and test that separation. You’ll use probes to map a model’s “prediction depth”—how the latent guess for the next token sharpens layer-by-layer—and ask whether early/mid layers look encoder-like (contextual abstraction) while late layers look decoder-like (strong next-token alignment). Prior work shows the raw logit lens often gives brittle, biased readouts, while the tuned lens gives calibrated per-layer distributions; you’ll build on those and other methods. Study will involve popular pre-trained open-source models (e.g., GPT, Llama). Possible criteria for “encoding vs decoding”: Predictivity: per-layer cross-entropy of various probes to the ground-truth next token; the “generation frontier” is where loss drops sharply. Causality: “activation/attribution patching’ and targeted ablations to see which layers are necessary for (a) reconstructing/encoding context features vs (b) predicting the next token. Mechanism clues: check for contribution of induction heads (next-token copy/continuation circuits) and MLP key-value memories (stored lexical/semantic knowledge) that typically appear in mid/late blocks—evidence for generation-oriented roles. Supplement with representation analyses to show early→mid layers aggregate and abstract context, aligning with broader findings of layer specialization. Outcomes include a reproducible map of layer roles across models, with: (i) a quantitative prediction-depth metric; (ii) causal evidence for which layers are primarily “encoders” vs “decoders” under your criteria; and (iii) design hints (e.g., where pruning, caching, or routing would least harm “understanding”). Additionally, it is possible to compare families/scales, and test whether moving or skipping “middle” layers preserves understanding but hurts generation.
Problem statement for the planning task	Andrey Kravchenko		B	C	MSc	Supervised by Andrey Kravchenko andrey.kravchenko@cs.ox.ac.uk Objective. Given (i) a user query in natural language, (ii) a world/ontology (optionally), and (iii) a catalog of heterogeneous agents, synthesise (a) a feasible set of agents and (b) a coordination plan that achieves the user’s intent under hard constraints while optimising stated preferences. Inputs (formal ar&facts) • Logical intent G: Translate the natural-language query into a compact logical form -- e.g., AMR (Abstract Meaning Representation) → typed predicates -- capturing goals, required outcomes, and any explicit constraints or preferences. • World/ontology K: Typed predicates for skills, tasks, resources, compatibilities, regulations, and temporal/causal relations. • Agent catalog A: For each agent: skill set, cost/risk, capacity, and (optionally) ac8on schemas with preconditions/effects. Outputs Team (a subset of agents) and plan (par8al order or schedule of agents and tasks) that: (1) achieves G, (2) satisfies hard constraints (skills, capacity, mutual exclusions, resource, temporal), and (3) is op8mal under a declared objective (e.g. cost, makespan, risk) and preference model. In ASP this is naturally expressed via weak constraints/preferences. Expected background of students and CS techniques that will be applied 1. Expected background of a student: 1. strong Python programming; 2. good CS/AI fundamentals (algorithms, search, optimization, planning); 3. basic logic / symbolic reasoning (predicates, constraints); 4. some NLP / LLM familiarity (useful for parsing natural-language tasks); 5. ability to run experiments and write clear technical documentation; 6. nice to have: ontologies/knowledge graphs, ASP/CP/SAT/SMT, multi-agent systems, scheduling. 2. CS techniques that might be applied: 1. NLP / semantic parsing: convert user request into structured goals and constraints; 2. knowledge representation: predicates, ontologies, task/agent/resource graphs; 3. planning & scheduling: task decomposition, dependency handling, temporal planning; 4. constraint solving / optimisation: CP, ASP, SAT/SMT, MILP; 5. preference handling: soft constraints, weighted objectives, multi-objective optimisation; 6. agent/team selection: matching, coverage, assignment, heuristics; 7. hybrid neuro-symbolic approach: LLM for parsing + symbolic solver for planning/validation
Task-Tailored Schema Agents for Reliable LLM Automation	Andrey Kravchenko		B	C	MSc	Supervised by Andrey Kravchenko andrey.kravchenko@cs.ox.ac.uk Expected background of students and CS techniques that will be applied. The student will know how to work or be able to quickly learn how to work with Pytorch, Hugging Face, IPython notebooks. They should also have a solid mathematical base. Task-Tailored Schema Agents for Reliable LLM Automation Design an agent that, given a task description and a small sample of target data, automatically proposes a fit-for-purpose schema and compiles a robust extraction/automation pipeline, then validates, executes, and self-improves it. The agent will (i) induce a schema (fields, types, constraints, relations) from task instructions and examples; (ii) compile this schema into a structured-output program with constrained decoding; and (iii) verify—and, when available, link—outputs against a reference ontology or database. To harden reliability, add an agentic refinement loop (Self-Refine; Reflexion): when coverage or validity drops, the agent critiques its outputs and edits the schema before re-running. Implement the workflow in DSPy, allowing the compiler to tune prompts, demonstrations, and tool ordering to the target metric (e.g., F1 × validity × latency). Evaluate across 2–3 domains (e.g., scientific IE, event extraction) using schema coverage/minimality, per-field F1, JSON/ontology validity, and robustness to small task shifts. Outcomes: (1) an open-source agent that induces, enforces, and verifies task-specific schemas from a handful of examples; (2) measurable gains in structural validity and downstream F1 from combining constrained decoding with ontology-based verification; and (3) ablations, an error taxonomy, and a small reusable schema library plus tuned DSPy pipelines.
Probabilistic Modelling Checking	Marta Kwiatkowska	Automated Verification	B	C		Professor Marta Kwiatkowska is happy to supervise projects involving probabilistic modelling, verification and strategy synthesis. This is of interest to students taking the Probabilistic Model Checking course and/or those familiar with probabilistic programming. Below are some concrete project proposals, but students’ own suggestions will also be considered: Synthesis of driver assistance strategies in semi-autonomous driving. Safety of advanced driver assistance systems can be improved by utilising probabilistic model checking. Recently (http://qav.comlab.ox.ac.uk/bibitem.php?key=ELK+19) a method was proposed for correct-by-construction synthesis of driver assistance systems. The method involves cognitive modelling of driver behaviour in ACT-R and employs PRISM. This project builds on these techniques to analyse complex scenarios of semi-autonomous driving such as multi-vehicle interactions at road intersections. Equilibria-based model checking for stochastic games. Probabilistic model checking for stochastic games enables formal verification of systems where competing or collaborating entities operate in a stochastic environment. Examples include robot coordination systems and the Aloha protocol. Recently (http://qav.comlab.ox.ac.uk/papers/knps19.pdf) probabilistic model checking for stochastic games was extended to enable synthesis of strategies that are subgame perfect social welfare optimal Nash equilibria, soon to be included in the next release of PRISM-games (www.prismmodelchecker.org). This project aims to model and analyse various coordination protocols using PRISM-games. Probabilistic programming for affective computing. Probabilistic programming facilitates the modelling of cognitive processes (http://probmods.org/). In a recent paper (http://arxiv.org/abs/1903.06445), a probabilistic programming approach to affective computing was proposed, which enables cognitive modelling of emotions and executing the models as stochastic, executable computer programs. This project builds on these approaches to develop affective models based on, e.g., this paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=PK18).
Safety Assurance for Deep Neural Networks	Marta Kwiatkowska	Automated Verification	B	C		Safety Assurance for Deep Neural Networks Professor Marta Kwiatkowska is happy to supervise projects in the area of safety assurance and automated verification for deep learning, including Bayesian neural networks. For recent papers on this topic see http://qav.comlab.ox.ac.uk/bibitem.php?key=WWRHK+19, http://qav.comlab.ox.ac.uk/bibitem.php?key=RHK18 and http://qav.comlab.ox.ac.uk/bibitem.php?key=CKLPPW+19, and also https://www.youtube.com/watch?v=XHdVnGxQBfQ. Below are some concrete project proposals, but students’ own suggestions will also be considered: Robustness of attention-based sentiment analysis models to substitutions. Neural network models for NLP tasks such as sentiment analysis are susceptible to adversarial examples. In a recent paper (https://www.aclweb.org/anthology/D19-1419/) a method was proposed for verifying robustness of NLP tasks to symbol and word substitutions. The method was evaluated on CNN models. This project aims to develop similar techniques for attention-based NLP models (www-nlp.stanford.edu/pubs/emnlp15_attn.pdf). Attribution-based safety testing of deep neural networks. Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. In a recent paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=WWRHK+19) a game-based method was proposed for robustness evaluation, which can be used to provide saliency analysis. This project aims to extend these techniques with the attribution method (http://arxiv.org/abs/1902.02302) to produce a methodology for computing the causal effect of each feature and evaluate it on image data. Uncertainty quantification for end-to-end neural network controllers. NVIDIA has created a deep learning system for end-to-end driving called PilotNet (http://devblogs.nvidia.com/parallelforall/explaining-deep-learning-self-driving-car/). It inputs camera images and produces a steering angle. The network is trained on data from cars being driven by real drivers, but it is also possible to use the Carla simulator. In a recent paper (http://arxiv.org/abs/1909.09884) a robustness analysis with statistical guarantees for different driving conditions was carried out for a Bayesian variant of the network. This project aims to develop a methodology based on these techniques and semantic transformation of weather conditions (see http://proceedings.mlr.press/v87/wenzel18a/wenzel18a.pdf) to evaluate the robustness of PilotNet or similar end-to-end controllers in a variety of scenarios.
Harj Projects 2026-27	Harjinder Lallie		B	C		Remove registry analysis, social network analysis, dashcam analysis, ADD AI in DFI, Event correlation 1 Enhancing Forensic Analysis of Program Execution Artefacts In digital forensics, accurate reconstruction of program execution is essential for inferring user intent and establishing timelines of activity. Investigators rely on Windows artefacts including ShellBags (folder view metadata indicating Explorer navigation), Jump Lists (recent/frequent file and task associations per application), Prefetch files (.pf records of application launches, load counts, and paths), and registry entries (e.g., UserAssist for execution frequency and timestamps). Existing tools often adopt a narrow, artefact-specific focus—e.g., dedicated ShellBag parsers or Prefetch extractors—resulting in fragmented insights that hinder efficient correlation and holistic interpretation. This project proposes a prototype tool to address these limitations by unifying analysis of these execution-related artefacts. Core functionality will parse and extract relevant metadata from each source. Optional enhancements include: (1) integration as a custom ingest module within Autopsy for streamlined workflow; and (2) automated correlation across artefacts and supplementary sources (e.g., event logs, browser history) using temporal alignment and semantic matching techniques. Outputs will prioritise investigator usability, featuring structured reports, interactive timelines, and visualisations to support rapid comprehension. The approach draws on established forensic principles (e.g., ACPO guidelines) and is informed by data fusion and multi-source correlation methods in digital forensics research, including timeline-based event reconstruction frameworks (e.g., TER-Model) and tools like Plaso that aggregate heterogeneous artefacts for coherent event sequencing. Implementation in Python will enable evaluation on controlled datasets, with benchmarking against standalone parsers for improved completeness and efficiency. How to apply I normally receive good interest in my projects. I do not offer a first come first serve, but select candidates based on suitability. Please provide any information you can, preferably a CV to help me make this decision. 2 Multi-Source Correlation for Event Reconstruction in Digital Forensics In digital forensics, events—discrete occurrences such as system logins, file accesses, application launches, or security alerts—offer critical evidence of user and system activity. Primarily captured in Windows Event Logs (e.g., Security.evtx for audits, Application.evtx for errors, System.evtx for operational changes), events are also embedded in ancillary sources like browser history (navigation timestamps), Prefetch files (execution events), Jump Lists (task initiations), and ShellBags (folder interactions). Isolated analysis of these yields partial timelines, as inter-source linkages (e.g., an event log entry for a process start correlating with prefetch metadata) are underexplored, impeding anomaly detection and evidentiary validation. This project develops a prototype tool and method to correlate events from at least two sources, prioritising Event Logs while extracting and normalising timestamps, identifiers, and contextual attributes from complementary artefacts. Core deliverables include: a correlation technique using temporal synchronisation and event ontology mapping; a parsing tool for selected sources; an automated timeline reconstruction algorithm; and evaluation on simulated datasets for accuracy, recall, and scalability. Optional enhancements may include Autopsy integration via ingest modules and expansion to additional event-rich sources (e.g., registry or network logs). Grounded in forensic standards (ACPO principles) and informed by multi-source data fusion research—such as Plaso's super timelines for artefact aggregation and the TER-Model for standardised event sequencing—the prototype will be implemented in Python, with empirical benchmarking against standalone log analysers. How to apply I normally receive good interest in my projects. I do not offer a first come first serve, but select candidates based on suitability. Please provide any information you can, preferably a CV to help me make this decision. 3 Enhancing Visualisation of Technical Controls and Uncertainty in Cyber Attack Graphs Cyber attack graphs model sequential attacker actions leading to system compromise, visualising vulnerabilities, pathways, and potential defences. Foundational research by Lallie, Debattista, and Bal (e.g., empirical reviews of over 180 attack graphs/trees and practitioner-preferred visual syntax configurations) highlights the lack of standardised representations, particularly for integrating technical controls (e.g., firewalls, patches, access restrictions) and uncertainty (e.g., probabilistic exploit success, incomplete knowledge). This project builds on an existing cyber attack graph framework developed by the supervisor. The student will explore alternative visual and structural designs for representing controls (e.g., overlay annotations, conditional edges, mitigation nodes) and uncertainty (e.g., probabilistic weights, confidence intervals, fuzzy notations). These designs will be applied to real-world case studies of cyber attacks, generating revised attack graphs for each. The methodology employs mixed methods: graph construction and visualisation prototyping, followed by qualitative evaluation through participant studies assessing clarity, usability, cognitive effectiveness, and communicative power of the representations. Outcomes will identify superior visual conventions for conveying complex attack scenarios and defences, advancing cyber threat modelling. Successful contributions may support ongoing publication efforts with the supervisor. Grounded in cognitive visualisation principles, perceptual psychology in graph comprehension, and probabilistic threat modelling, the work aligns with CyBOK knowledge areas across risk management, secure systems architecture, and adversarial behaviours. Implementation will leverage graph visualisation libraries (e.g., Graphviz, Cytoscape.js) for prototyping and empirical user studies. How to apply I normally receive good interest in my projects. I do not offer a first come first serve, but select candidates based on suitability. Please provide any information you can, preferably a CV to help me make this decision. 4 Automated CyBoK Alignment of CVs and Module Descriptions for NCSC Certification using Machine Learning The Cyber Security Body of Knowledge (CyBoK) defines 19 knowledge areas that underpin NCSC certification of academic programmes and professional qualifications. Mapping CVs or module descriptions to CyBoK remains a manual, subjective, and non-scalable process. This project develops a machine-learning prototype for automated semantic matching and certification-recommendation generation. The student will create a synthetic dataset of CVs and module descriptions (generated via LLM-based templating and prompting to ensure controlled, balanced CyBoK coverage while preserving realism and privacy). An expert panel (drawn from the supervisory team and/or cybersecurity educators/practitioners) will then produce gold-standard manual alignments, scoring coverage, gaps, and recommendation strength for each artefact. The same synthetic artefacts will be processed by the ML pipeline (transformer embeddings, semantic similarity, and supervised classification) to generate automated outputs. Rigorous evaluation will compare model predictions directly against panel assessments using standard metrics (precision, recall, F1-score, Cohen’s κ for agreement) to quantify accuracy, consistency, and bias. Collaboration with the supervisory team provides domain expertise for annotation guidelines, panel moderation, and iterative refinement. The work is grounded in established synthetic-data generation methods for NLP evaluation and human-expert gold-standard validation frameworks used in ontology matching and competency mapping. Implementation in Python (Hugging Face Transformers, scikit-learn) will enable reproducible benchmarking and extension to real (anonymised) data. How to apply I normally receive good interest in my projects. I do not offer a first come first serve, but select candidates based on suitability. Please provide any information you can, preferably a CV to help me make this decision. 5 Comparative Analysis of Cybersecurity Degree Programmes Against CyBoK: Trends in the UK, US, and Beyond The Cyber Security Body of Knowledge (CyBoK) delineates 19 foundational knowledge areas (KAs) for cybersecurity education, serving as a benchmark for NCSC certification in the UK and informing global curricula. Despite its adoption, systematic cross-national analyses of university degree programmes remain limited, with existing studies (e.g., Nautiyal et al., 2020 on UK certification; Catal et al., 2022 on skills gaps) highlighting regional disparities in coverage, such as deeper emphasis on adversarial behaviours in US programmes versus risk management in the UK. This project conducts a comparative analysis of cybersecurity undergraduate/postgraduate degrees in the UK, US, and one additional country (e.g., Australia), mapping module descriptions to CyBoK KAs to identify trends, gaps, and evolutions. The student will curate a dataset of publicly available module descriptors from university websites (via ethical web scraping or APIs), employing NLP techniques for semantic mapping (e.g., BERT embeddings for similarity scoring against CyBoK definitions). Core deliverables: automated mapping tool; quantitative trend analysis (e.g., KA coverage frequencies, temporal shifts using archived data); qualitative insights on regional differences; and visualisation of findings (e.g., heatmaps). Grounded in ontology matching and curriculum analysis research (e.g., comparative frameworks in CSEC2017 vs. CyBoK), the prototype will be implemented in Python (Hugging Face Transformers, scikit-learn), evaluated on precision/recall against expert-annotated samples, and benchmarked against manual assessments. How to apply I normally receive good interest in my projects. I do not offer a first come first serve, but select candidates based on suitability. Please provide any information you can, preferably a CV to help me make this decision.
Keyword searching audio/video files	Harjinder Lallie		B	C		Digital forensic investigators often search for the existence of keywords on hard disks or other storage medium. Keywords are easily searchable in PDF/word/ text/other popular formats, however, current digital forensic tools do not allow for keyword searching through movies/audio. This is essential in cases which involve dashcam footage, recorded conversations etc. The aim of this project is to produce a tool which auto-transcribes audio data and then performs a keyword search on the transcribed file – pinpointing the point(s) in the file where the keyword(s) appear. You will be expected to develop the solution using Python, and if possible, integrate the solution with Autopsy, an open-source digital forensic tool. Prerequisite. Additional support can be provided by providing you with access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. You are likely to use the Python SpeechRecognition and possibly the PyAudio libraries. For convenience and demonstration of practicality, you may want to integrate the solution with the open-source forensics tool – Autopsy – and hence will need to develop a good understanding of this tool particularly the keyword search facility.
Dataset distillation for CRISPR-Cas9 guide-target library design	Jeffrey Mak	Computational Biology and Health Informatics		C	MSc	Prerequisites: Essential: Computational Biology, familiarity with PyTorch. Desirable: Interest in dataset distillation. Abstract: The success of deep learning-based CRISPR-Cas9 cleavage activity prediction models relies on the availability of large cleavage activity datasets [1]. Containing tens of thousands to hundreds of thousands of guide-target pairs, these datasets are experimentally generated through high-throughput guide-target lentiviral library screens, where guide-target libraries are manually designed based on target genes of interest or properties like the spacer sequence’s GC content. Owing to the high cost of these library screens, it is infeasible to conduct library screens at scale across hundreds of Cas9 variants. To address this issue, this project explores the use of data distillation [2,3] to obtain smaller synthetic sets of guide-target pairs from the original large dataset, and the potential of using such synthetic sets as guide-target libraries for other Cas9 variants. If successful, this project would reduce the cost of library screens per Cas9 variant, and thus the data curation cost required for building a unified cleavage activity tool for Cas9 variants. [1] Xiang, X., Corsi, G. I., Anthon, C., Qu, K., Pan, X., Liang, X., ... & Luo, Y. (2021). Enhancing CRISPR-Cas9 gRNA efficiency prediction by data integration and deep learning. Nature communications, 12(1), 3238. [2] Wang, T., Zhu, J. Y., Torralba, A., & Efros, A. A. (2018). Dataset distillation. arXiv preprint arXiv:1811.10959. [3] Lei, S., & Tao, D. (2023). A comprehensive survey of dataset distillation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1), 17-32.
Development of deep learning-based cleavage activity prediction models for genome editors	Jeffrey Mak, Peter Minary	Computational Biology and Health Informatics	B			Prerequisites: Essential: Machine Learning and/or Deep Learning in Healthcare Desirable: Prior knowledge or interest in molecular biology Abstract: Genome editors, i.e., DNA-cutting enzymes, have revolutionized the field of gene therapy, as seen by the Nobel Prize in Chemistry 2020 and the Food and Drug Administration’s approval of Casgevy --- the world’s first CRISPR-based gene therapy --- in 2023. Component-wise, genome editors are composed of two parts: a single guide RNA (sgRNA) which directs the editor to the target DNA site of interest, and the enzyme, which is responsible for binding and cleavage of the target site [1]. Broadly speaking, genome editors mechanistically operate in three steps: binding of the enzyme to the DNA, sgRNA-DNA heteroduplex formation, and DNA cleavage. Since the sgRNA’s spacer sequence and the target sequence are the primary factors affecting a genome editor’s cleavage activity, this project aims to address the cleavage activity prediction problem by learning the function mapping between the spacer-target pair and cleavage activity of a recently discovered genome editor. More concretely, the student will routinely apply deep learning [2] on high-throughput cleavage activity data available in the literature [3,4], thereby obtaining a prediction model with good test performance metrics. [1] Jiang, F., & Doudna, J. A. (2017). CRISPR–Cas9 structures and mechanisms. Annual review of biophysics, 46, 505-529. [2] Kim, N., Kim, H. K., Lee, S., Seo, J. H., Choi, J. W., Park, J., ... & Kim, H. H. (2020). Prediction of the sequence-specific cleavage activity of Cas9 variants. Nature Biotechnology, 38(11), 1328-1336. [3] Sung, K., Jung, Y., Kim, N., Kim, Y. W., Kim, H. H., Kim, S. K., & Bae, S. (2025). A rational engineering strategy for structural dynamics modulation enables target specificity enhancement of the Cas9 nuclease. Nucleic Acids Research, 53(12), gkaf535. [4] Crawford, K. D., Khan, A. G., Lopez, S. C., Goodarzi, H., & Shipman, S. L. (2025). High throughput variant libraries and machine learning yield design rules for retron gene editors. Nucleic Acids Research, 53(2), gkae1199.
Predicting protein contacts of CRISPR-Cas9 domains with factored attention	Jeffrey Mak, Peter Minary	Computational Biology and Health Informatics	B	C	MSc	Prerequisites: Essential: Deep Learning in Healthcare Desirable: Knowledge of how attention works Abstract: Transformer-based models like ESM-2 [1] and AlphaFold 2 [2] have revolutionized protein sequence modelling, structure prediction, and design by treating protein sequences as strings of amino acid tokens and learning the “grammar rules” of such sequences. But what are the underlying principles driving the success of such models? This project aims to be a primer for this problem by exploring the connection between factored attention [3,4,5,6] --- a simplified version of the multi-head attention mechanism [7] used in transformers --- and generalized Potts model, which were traditionally used for unsupervised protein contact prediction. Specifically, the student will implement a single layer factored attention model to extract protein contacts for a given CRISPR-Cas9 domain/interface of interest, and compare the quality of the model’s extracted protein contacts with protein contacts obtained from other approaches. Depending on project progress, various extensions can also be explored. [1] Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 1123-1130. [2] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. nature, 596(7873), 583-589. [3] Bhattacharya, N., Thomas, N., Rao, R., Dauparas, J., Koo, P. K., Baker, D., ... & Ovchinnikov, S. (2021). Interpreting potts and transformer protein odels through the lens of simplified attention. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022 (pp. 34-45). [4] Bhattacharya, N., Thomas, N., Rao, R., Dauparas, J., Koo, P. K., Baker, D., ... & Ovchinnikov, S. (2020). Single layers of attention suffice to predict protein contacts. Biorxiv, 2020-12. [5] Caredda, F., & Pagnani, A. (2025). Direct coupling analysis and the attention mechanism. BMC bioinformatics, 26(1), 41. [6] Rende, R., Gerace, F., Laio, A., & Goldt, S. (2024). Mapping of attention mechanisms to a generalized potts model. Physical Review Research, 6(2), 023057. [7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Covert Satellite Communications	Ivan Martinovic, Edd Salkield	Security	B	C	MSc	Relay satellites are vulnerable to covert communication attacks, in which the available spectrum is used to covertly communicate data over the link. This is possible since the relay satellites retransmit received signals at the Physical Layer without any processing, allowing secret information to be embedded within the carrier signal. In this project, you will design and evaluate a protocol to perform these attacks on real relay satellites. The results will enable further research on detecting these covert channels. Interested students should have experience in computer networks and communication protocols, and be confident in developing software using Python. Some knowledge of cryptography/steganography helpful. No prior experience in radio communications is required. Relevant reading material: • https://en.wikipedia.org/wiki/Direct-sequence_spread_spectrum • https://en.wikipedia.org/wiki/Steganography • https://en.wikipedia.org/wiki/Digital_watermarking
Detecting Ship Misbehaviour through SAR Satellite Imagery and RF Signal Analysis	Ivan Martinovic, Simon Birnbach	Security	B	C	MSc	A key challenge for maritime security is automatically detecting and classifying ships. This is essential so that government and law enforcement agencies know where ships are and what they are up to. This way they can combat key maritime threats such as piracy, unsustainable fishing, pollution, or the smuggling of illicit goods. Due to limitations of the two main technologies used for maritime security—the automatic identification system (AIS) and synthetic aperture radar (SAR) satellite imagery—current solutions are inaccurate and often require manual intervention. AIS is a cooperative tracking system that provides precise and frequent updates of a vessel’s location and identity. But, due to the cooperative nature of AIS, ships can choose to stop participating in the system or even falsify messages that mask their location or identity. As a non-cooperative system, SAR can detect ships even if they want to remain hidden. However, SAR has its own disadvantages, including low image resolutions and infrequent updates of each location. Students may choose to tackle this problem in one of three possible ways: • By improving ship detection and classification systems • By developing a transmitter fingerprinting system for AIS messages • By leveraging RF signals to localise AIS messages Prerequisites: This project will require knowledge in machine learning, data analysis, and a good grasp of python. References: For the ship classification sub project: [1] Fernando Paolo, et al. xview3-sar: Detecting dark fishing activity using synthetic aperture radar imagery. Advances in Neural Information Processing Systems, 35, 2022. [2] Xiyue Hou, et al. Fusar-ship: Building a high-resolution sar-ais matchup dataset of gaofen-3 for ship detection and recognition. Science China Information Sciences, 63, 2020. For the transmitter fingerprinting sub project: [1] Joshua Smailes et al. Watch this space: securing satellite communication through resilient transmitter fingerprinting. Conference on Computer and Communications Security, ACM, 2023. https://ora.ox.ac.uk/objects/uuid:6d23ae00- 8a25-434a-952d-0908cc9a3b89 [2] Qi Jiang et al. Rf fingerprinting identification in low snr scenarios for automatic identification system. IEEE Transactions on Wireless Communications, 23:3, 2024. For the RF localisation sub project: [1] Eric Jedermann, et al. Orbit-based authentication using tdoa signatures in satellite networks. In Proceedings of the 14th ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2021.
LEO Satellite Reconnaissance and Monitoring	Ivan Martinovic, Joshua Smailes, Edd Salkield	Security	B	C	MSc	Through a prior collaboration, we have built a motorised satellite dish capable of tracking objects in Low Earth Orbit (LEO) and monitoring their communication. In this project, you will use this dish to build a satellite discovery and monitoring system, tracking known and suspected satellites to discover characteristics of their communication. This project can be taken in a variety of directions depending on interest, including adapting existing protocol decoders to survey capabilities, building new decoders, or analysing signals at the physical layer. Interested students should have some experience with communication protocols, but no prior experience in radio communication is required. For a related project working with FPGAs, see “SparSDR++: Using FPGAs for Wideband Satellite Reconnaissance”
Physical Layer Satellite Protocol Verification	Ivan Martinovic, Joshua Smailes, Edd Salkield	Security	B	C	MSc	Formal verification techniques are widely applied to assess protocol security against unconstrained attacker models with full capabilities. However, these models do not take into account the Physical Layer environment where the communication session occurs in real-world settings which constrain the attacker capabilities. In this project you will apply rigorous formal methods to define and analyse the security of widely-used satellite protocols, specifically anti-jamming availability and secrecy. This will be conducted with respect to these Physical Layer attributes including the signal structure and noise power, and be derived through both simulations and measurements of real satellite channels. Thus this work is designed to directly impact and improve the development of satellite protocols. Interested students should have some experience in formal verification and/or optimisation problems, and be confident in developing software using Python. No prior experience in radio communications is required.
Satellite Signal Hijacking and Interference	Ivan Martinovic, Edd Salkield	Security	B	C	MSc	Through a prior collaboration, we have access to a satellite receiver ground station suitable for security testing. In this project, you will develop tools to hijack or jam satellite signals, and evaluate their effect through real-world experiments on the ground station. A closed-loop experiment involving a software-defined radio (SDR) receiver will be set up to measure the effect of the interference over the channel. The results will enable analysis to defend satellite signals against these threats, with applications in satellite-based internet services and telemetry. Interested students should have experience in computer networks and communication protocols, and be confident in developing software using Python. No prior experience in radio communications is required. Relevant reading material: • https://dl.acm.org/doi/pdf/10.1145/3558482.3590190 • https://www.cs.ox.ac.uk/files/14313/Salkield_et_al_2023_satellite_spoofing_from.pdf
Securing Satellite Communication using Radio Transmitter Fingerprints	Ivan Martinovic, Simon Birnbach, Joshua Smailes	Security	B	C	MSc	Existing work has shown that satellite transmitters can be authenticated by looking only at the physical layer signal, due to small differences in transmitter hardware. This is particularly useful in the absence of other authentication, and can also be used to identify transmitter hardware, classify the nature of attacks, or prevent timing-based attacks. In this project, you will extend existing fingerprint-based systems in one of several ways: • Through our collaboration with the European Space Agency (ESA), you will adapt this system to run on their new “CyberCUBE” satellite, looking at the uplink instead of the downlink to authenticate ground systems, identify the transmitter hardware, or understand when the system is under attack. This will use finetuning, quantisation, and/or distillation to produce smaller or more performant models that can run under more constrained conditions. • Continuing to focus on the satellite downlink, you will develop techniques to improve performance. This may involve making use of more incoming data from the satellite, extracting additional features, or combining with other systems. • Looking beyond authentication, you will investigate novel applications for transmitter fingerprints, extracting location and other information from physical layer characteristics. Interested students should have experience working with Python, with machine learning / tensorflow experience preferable. No prior experience in radio communication is required. Relevant reading material: • “SatIQ” satellite fingerprinting paper: https://www.cs.ox.ac.uk/files/14805/main.pdf • SatIQ source code: https://github.com/ssloxford/SatIQ • Extensions of SatIQ: – https://arxiv.org/pdf/2402.05042 – https://arxiv.org/pdf/2503.02118
Signal Injection Attacks Against Modern Sensors	Ivan Martinovic, Sebastian Köhler	Security	B	C	MSc	In recent years, the boundaries between the physical and the digital world have become increasingly blurry. Nowadays, many digital systems interact in some way with the physical world. Large and complex cyber-physical systems, such as autonomous and electric vehicles, combine the physical and the digital world and enable the interaction between those two domains. Usually, such systems are equipped with numerous sensors to measure physical quantities, such as temperature, pressure, light, and sound. These physical quantities are vital inputs for the computations and can influence the decision-making process of the system. However, the nature of an analog sensor makes it not easily possible to authenticate the physical quantity that triggered a stimulus [1]. For instance, a temperature sensor cannot detect if the stimulus was caused by a legitimate temperature increase or by an adversary using a hairdryer. This is a major concern because the integrity of sensor measurements is critical to ensuring that a system behaves as intended, and a violation of this principle can have serious security, safety and reliability consequences. In our research, we have shown that different sensors are vulnerable to signal injection attacks on the physical layer [2, 3, 4]. In this project, a student would analyse the vulnerability of sensors as they are used in modern systems, such as cars, the smart grid and IoT devices. The project will enable the student to research signal injection attacks using different modalities, such as light, acoustic and electromagnetic waves. Moreover, the student will be able to assess the impact of a successful attack against a system as a whole and work on novel countermeasures that can help to improve the security of the next generation of systems. Prerequisites Some familiarity in the area of digital signal processing and with Python. Useful URLs https://github.com/ssloxford/ccd-signal-injection-attacks https://github.com/ssloxford/they-see-me-rollin https://arxiv.org/pdf/2305.06901 References [1] Kune, Denis Foo, et al. "Ghost talk: Mitigating EMI signal injection attacks against analog sensors." 2013 IEEE Symposium on Security and Privacy. IEEE, 2013. [2] Köhler, Sebastian, Richard Baker, and Ivan Martinovic. "Signal injection attacks against ccd image sensors." Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. 2022. [3] Köhler, Sebastian, et al. "They See Me Rollin’: Inherent Vulnerability of the Rolling Shutter in CMOS Image Sensors." Annual Computer Security Applications Conference. 2021. [4] Szakály, Marcell, et al. "Assault and Battery: Evaluating the Security of Power Conversion Systems Against Electromagnetic Injection Attacks." arXiv preprint arXiv:2305.06901 (2023).
Simulating Large-Scale Satellite Networks for Security Research	Ivan Martinovic, Simon Birnbach, Joshua Smailes	Security	B	C	MSc	The Deep Space Network Simulator (DSNS) is a network simulator capable of simulating large satellite networks and interplanetary networks, enabling protocol development and testing. In this project, you will build new features for the simulator to enable an even wider range of research, and demonstrate its capabilities by building reference scenarios or designing new protocols. This project can be taken in a number of directions: • Optimising the simulator to improve performance and scalability, or facilitate easier development; • Integrate DSNS with existing protocol definition schemes (e.g. SPACECOP, see reading material) to instantly enable a huge range of cryptographic protocols to be simulated on large-scale networks; • Build a module to enable simulation of network links at the physical layer, enabling realistic jamming and overshadowing attacks; • Integrate with formal verification tools to provide guarantees about security, taking network topology into account. Interested students should be experienced at writing Python, and have some experience with network/communication protocols. Relevant reading material: • DSNS paper: https://arxiv.org/pdf/2508.04317 • DSNS source code: https://github.com/ssloxford/DSNS/ • Extensions of DSNS: – “KeySpace” (PKI): https://arxiv.org/pdf/2408.10963 – Routing: https://arxiv.org/pdf/2509.10173 • “SPACECOP”: https://indico.esa.int/event/571/attachments/7210/13625/SPACECOP%20System%20for9
SparSDR++: Using FPGAs for Wideband Satellite Reconnaissance	Ivan Martinovic, Joshua Smailes	Security	B	C	MSc	Through a prior collaboration, we have built a motorised satellite dish capable of tracking objects in Low Earth Orbit (LEO), which will be used as a component of our upcoming satellite discovery and monitoring system. In this project, you will extend existing work writing FPGA images for Software Defined Radios to allow them to scan on a much wider range of frequencies than usually possible, significantly expanding the system’s monitoring capabilities and enabling passive discovery of satellite communication. Interested students should be experienced working with low-level languages, and prior experience working with FPGAs or a willingness to quickly pick up the topic. No prior experience with radio communication is required. For a related project working with communication protocols and the physical layer, see “LEO Satellite Reconnaissance and Monitoring”. Relevant reading material: • “SparSDR” paper: https://dl.acm.org/doi/pdf/10.1145/3307334.3326088 • “SparSDR” source code: https://github.com/ucsdsysnet/sparsdr
Transport Layer Security for Satellite Networks	Ivan Martinovic, Simon Birnbach	Security	B	C	MSc	Long-range satellite communications networks suffer from high network latencies due to the long distances to satellites. These high latencies have a detrimental effect on the performance of common protocols used for internet traffic, such as the Transmission Control Protocol (TCP). Currently widely-used tools to optimise TCP performance are incompatible with the encrypted traffic of current VPNs. This has led to many operators resorting to providing their communication services unencrypted, leaving customers exposed to eavesdropping attacks. In our research group, we have developed QPEP [1], a novel combination of a VPN and a satellite performance-enhancing proxy, which enables the use of encrypted traffic over satellite links without the usual performance drawbacks. This project would improve on this work in one or more of the following ways: • Low Earth Orbit Evaluation • Packet Loss Resilience • Scalability & Multi-user Environments This project is in collaboration with the European Space Agency. References: [1] Pavur, J. C., et al. “QPEP: An actionable approach to secure and performant broadband from geostationary orbit.” (2021). https://github.com/ssloxford/qpep https://www.ndss-symposium.org/wp-content/uploads/2021-074-paper.pdf
Applied Formal Verification	Tom Melham	Automated Verification	B	C	MSc	Not available for 2024-25 as on sabbatical. I am happy to supervise projects at all levels in applied formal verification. These typically involve hands-on investigation and innovation to develop new, practical methods of correctness analysis and verification for various kinds of computer systems – either hardware, software, or a combination. They are ideally suited to students who have an interest in computer systems, logic, and hands-on practical work that has a solid theoretical basis. You don’t have to have taken the course in Computer-Aided Formal Verification, if you are willing to learn the theories and technologies required. All my projects are designed to have a strong element of research and sufficient challenge to allow a motivated student to make an excellent contribution. Projects are usually therefore quite ambitious, but they are also designed realistically to fit into the time available. And they always have some fall-back options that are less challenging but can still result in an excellent achievement. Rather than offer readymade project ideas, I encourage students with an interest in this area to meet me and together discuss what might align best with their background and interests. I always have several project ideas that link to my current research or research being done by my group. Often projects will have a connection to real-world verification problems in industry, and many of my students will have contact through their projects with leading verification researchers and engineers in industry. If interested to discuss, please contact me.
Developing computational tools to aid the design of CRISPR/Cas9 gene editing experiments	Peter Minary	Computational Biology and Health Informatics	B	C	MSc	At present, the most versatile and widely used gene-editing tool is the CRISPR/Cas9 system, which is composed of a Cas9 nuclease and a short oligonucleotide guide RNA (or guide) that guides the Cas9 nuclease to the targeted DNA sequence (on-target) through complementary binding. There are a large number of computational tools to design highly specific and efficient CRISPR-Cas9 guides but there is a great variation in performance and lack of consensus among the tools. We aim to use ensemble learning to combine the benefits of a selected set of guide design tools to reach superior performance compared to any single method in predicting the efficiency of guides (for which experimental data on their efficiency is available) correctly. Recommended for students who has done the Machine Learning and the Probability and Computing courses.
Developing machine learning models for off-target prediction in CRISPR/Cas9 gene editing	Peter Minary	Computational Biology and Health Informatics	B	C	MSc	The CRISPR/Cas9 gene editing system is composed of a Cas9 nuclease and a short oligonucleotide guide RNA (or guide) that guides the Cas9 nuclease to the targeted DNA sequence (on-target) through complementary binding but the Cas9 nuclease may also cleave off-target genomics DNA sequences, which contain mismatches compared to the gRNA, therefore, undesired cleavage could occur. The obvious factors influencing off-target cleavage activity of the CRISPR/Cas9 gene editing system are the sequence identities of the guide RNA and the off-target DNA . Various 'basic features' derived from said sequences have been fueling the development of procedural and machine learning models for off-target cleavage activity prediction but there are numerous 'non-basic fetures' (such as the sequence context around the off-target DNA) that may also influence off-target cleavage activity. The project will aim for the development of novel off-target clavage activity predictions models using approaches that include but not limited to combining 'basic features' and 'non-basic features' to increase the accuracy of model predictions of experimental off-target cleavage activities. Prerequisites: Recommended for students who has done a Machine Learning course and has interest in molecular biology.
Implementing a Datalog Reasoner	Boris Motik	Data, Knowledge and Action	B			The objective of this project would be for a student to implement a simple Datalog reasoner. Depending on the student’s ambitions, the reasoner could be running in main memory (easier version) or on disk (more challenging). The student would be expected to design the data structures needed to store the data (in RAM or on disk, as agreed with me). On top of this, the student would implement the seminaive algorithm and evaluate the system on a couple of medium-sized datasets. We have in our group ready-made datasets that could be used for evaluation. A variation of this project where the student would reuse an existing relational database to store the data. Then, the seminaive algorithm would be implemented either as an extension of the database (assuming the database is open source), or on top of the database (by running SQL queries). This last variant would arguably be much easier as it would involve less design, and more just reusing existing technologies. More advanced students could extend their system to implement various incremental reasoning algorithms. For example, I would give them one of the many papers I’ve written on this topic, and they would have to (a) understand the formal material and (b) implement and evaluate the algorithms. Hence, this project would also give students the opportunity to go as far as they can. Having attended the Databases or Database System Implementation courses, and perhaps to a lesser extent the KRR course, would be a prerequisite for doing this project.
Implementing a Tableaux Reasoner for Description Logics	Boris Motik	Data, Knowledge and Action	B			The objective of this project would be for a student to implement a simple tableau-based reasoner for description logics. The project would involve the student designing the core data structures, implementing the basic algorithm, realising backtracking, and implementing various optimisations typically used in practical reasoners. The student could use a preexisting API (e.g., the OWL API) to load a description logic ontology, so they could just focus on solving the core problem. The student would be expected to evaluate their implementation on the ontologies from the repository that we’re maintaining in the KRR group (http://www.cs.ox.ac.uk/isg/ontologies/). The project seems to me to provide good opportunity for the student to demonstrate how well they absorbed the CS material (e.g., algorithms, data structures, computational complexity analysis, etc.) taught in our degrees. Also, the project is sufficiently open-ended so that the student can go quite a long way before running out of options of things to do. Having attended the KRR course would be a prerequisite for doing this project.
Topics in Automata Theory, Program Verification and Programming Languages (including lambda calculus and categorical semantics)	Andrzej Murawski	Programming Languages		C	MSc	Prof Murawski is willing to supervise in the area of automata theory, program verification and programming languages (broadly construed, including lambda calculus and categorical semantics). For a taste of potential projects, follow this link.
Concurrent Algorithms and Data Structures	Hanno Nickau	Programming Languages		C	MSc	Projects in the area of Concurrent Algorithms and Data Structures
Concurrent Programming	Hanno Nickau	Programming Languages	B	C	MSc	Projects in the area of Concurrent Programming
Continual learning by surrogate objectives	Yangchen Pan , Jiarui Gan	Artificial Intelligence and Machine Learning			MSc	A critical challenge in continual learning is catastrophic forgetting (CF), where the acquisition of new information leads to the erosion of previously learned knowledge. This phenomenon poses a substantial barrier, particularly in the context of updating large models, rendering the process computationally unscalable as data increase. CF is primarily attributed to biased gradients resulting from shifts in data sampling distribution over time. This project explores a computationally efficient approach to alleviate forgetting. The core idea involves identifying surrogate objective functions that circumvent the need for extensive memory and computational resources during optimization. The student undertaking this project will work on: Literature Review: Delving into key papers on continual learning to gain insights into the intricacies of the problem and experiment settings. Implementation: implementing the proposed method and conducting a comparative analysis against existing approaches, with a focus on both sample and computation efficiency.
Sample and Computation Efficient Online Adaptation through Offline Reinforcement Learning	Yangchen Pan , Jiarui Gan	Artificial Intelligence and Machine Learning			MSc	In addressing real-world challenges, AI agents, often driven by expansive neural networks like Large Language Models (LLMs) such as GPT, face significant computational and sample-related costs during training and deployment. Notably, Reinforcement Learning (RL) agents frequently undergo training on vast offline datasets with relaxed computation budgets, followed by deployment or fine-tuning during an online stage that demands rapid computation. This project aims to investigate methods that capitalize on the disparity between offline and online computation budgets to streamline the training and deployment of RL agents. The student undertaking this project will first delve into relevant literature by studying recommended papers. Subsequently, the student will implement offline RL methods to facilitate downstream online learning. In the third phase, the student will experiment with various online RL algorithm variants. The final algorithm will undergo rigorous empirical testing and comparison to validate its efficiency in handling the challenges posed by large models and varying computation budgets.
Deep Reinforcement Learning for High-Dimensional POMDPs	David Parker, Nick Hawes	Automated Verification			MSc	Background Recent work has demonstrated that Deep Reinforcement Learning (DRL) algorithms can achieve human-level control policies across various applications. This project will focus on developing and testing DRL methods specifically for Partially Observable Markov Decision Processes (POMDPs), where the agent must make decisions in environments with limited and noisy observations. A key challenge is ensuring that the algorithm remains robust in environments with high-dimensional observations. Focus The student undertaking this project will gain familiarity with POMDP definitions and relevant environments. The objective is to implement DRL-based POMDP algorithms capable of deriving robust solutions for high-dimensional observations. The student will explore the following techniques during the project: • Dimensionality reduction using neural networks to compress high-dimensional observations into lower-dimensional latent representations; • Attention mechanisms to focus on the most relevant parts of high-dimensional observations for decision-making, linking these to specific beliefs; • Processing observations in a hierarchical structure at different resolutions to improve computational efficiency; • Designing specific loss functions that incorporate reconstruction, contrastive, and belief consistency terms to learn compact, task-relevant representations. Method References: • Igl M, Zintgraf L, Le T A, et al. Deep variational reinforcement learning for POMDPs[C]. International conference on machine learning. PMLR, 2018. • Meng L, Gorbet R, Kulić D. Memory-based deep reinforcement learning for pomdps[C]. IEEE/RSJ international conference on intelligent robots and systems. IROS, 2021. • Lauri M, Hsu D, Pajarinen J. Partially observable Markov decision processes in robotics: A survey[J]. IEEE Transactions on Robotics, 2022.
Model checking of POMDPs	David Parker	Automated Verification	B	C	MSc	Formal verification techniques have recently been developed for probabilistic models with partial observability, notably partially observable Markov decision processes (POMDPs), and implemented in software tools such as PRISM. This project will investigate extensions of these techniques, including for example the applicability of sampling based solution methods, adaptation to more expressive temporal logics or extension to partially observable stochastic games.
Model checking of stochastic games	David Parker	Automated Verification	B	C	MSc	This project will develop formal verification techniques for stochastic games, in particular by considering techniques based on game-theoretic notions of equilibria. A range of such techniques are already implemented in the PRISM-games model checker. This project will consider extensions of these approaches to tackling new types of equilibria, such as Stackelberg equilibria, with potential directions including designing extended temporal logics and solution methods, or modelling new applications, for example from multi-robot coordination.
Probabilistic Model Checking under Uncertainty	David Parker	Automated Verification	B	C	MSc	Formal methods for analysing models such as Markov chains and Markov decision processes can be extended to explicitly reason about model uncertainty, for example by building and analysing interval Markov decision processes. This project will investigate alternative approaches to tackling this problem, which could include alternative models of transition probability uncertainty, factoring in dependencies between different sources of uncertainty, or using bayesian inference to learn model parameters.
Automatic translation to GPGPU	Joe Pitt-Francis	Computational Biology and Health Informatics	B	C		This project involves running cardiac cell models on a high-end GPU card. Each model simulates the electrophysiology of a single heart cell and can be subjected to a series of computational experiments (such as being paced at particular heart rates). For more information about the science and to see it in action on CPU see "Cardiac Electrophysiology Web Lab" at https://travis.cs.ox.ac.uk/FunctionalCuration/ An existing compiler (implemented in Python) is able to translate from a domain specific XML language (http://models.cellml.org) into a C++ implementation. The goal of the project is to add functionality to the compiler in order to get OpenCL or CUDA implementations of the same cell models and to thus increase the efficiency of the "Web Lab".
General GPGPU and high performance computing projects	Joe Pitt-Francis	Computational Biology and Health Informatics	B	C		I am willing to supervise projects which fit into the general areas of General Purpose Graphics Processing Unit (GPGPU) programming and High-Performance Computing (HPC). Specific technologies used are likely to based around NVIDIA CUDA for GPU programming; MPI for distributed-memory cluster computing. All application areas considered although geometric algorithms are favourites of mine.
General graphics projects	Joe Pitt-Francis	Computational Biology and Health Informatics	B	C		I am interested in supervising general projects in the area of computer graphics. If you have a particular area of graphics-related research that you are keen to explore then we can tailor a bespoke project for you. Specific projects I have supervised in the past include "natural tree generation" which involved using Lindenmayer systems to grow realistic looking bushes and trees to be rendered in a scene; "procedural landscape generation" in which an island world could be generated on-the-fly using a set of simple rules as a user explored it; "gesture recognition" where a human could control a simple interface using hand-gestures; "parallel ray-tracing" on distributed-memory clusters and using multiple threads on a GPU card; "radiosity modelling" used for analysing the distribution of RFID radio signal inside a building; and "non-photorealistic rendering" where various models were rendered with toon/cel shaders and a set of pencil-sketch shaders. (Such a project is generally not suitable for MSc students. They should note that in order for this option to work as a potential MSc project then it should be combined with a taught-course topic such as machine learning, concurrent programming, linguistics etc.)
Graphics pipeline animator	Joe Pitt-Francis	Computational Biology and Health Informatics	B			Pre-requisites: Computer graphics, Object-oriented programming The idea behind this project is to build an educational tool which enables the stages of the graphics pipeline to be visualised. One might imagine the pipeline being represented by a sequence of windows; the user is able to manipulate a model in the first window and watch the progress of her modifications in the subsequent windows. Alternatively, the pipeline might be represented by an annotated slider widget; the user inputs a model and then she moves the slider down the pipeline, watching an animation of the process
Intuitive exploration through novel visualisation	Joe Pitt-Francis	Computational Biology and Health Informatics	B	C		I am interested in novel visualisation as a way to represent things in a more appealing and intuitive way. For example the Gnome disk usage analyzer (Baobab) uses either a "ring chart" or "treemap chart" Representation to show us which sub-folders are using the most disk. In the early 1990s the IRIX file system navigator used a 3D skyscraper representation to show us similar information. There are plenty more ways of representing disk usage: from DAGs to centralised Voronoi diagrams. What kind of representation is most intuitive for finding a file which hogging disk-space and which is most intuitive for helping us to remember where something is located in the file-system tree? The aim is to explore other places where visualisation gain intuition: for example, to visualise the output of a profiler to find bottlenecks in software, to visual a code coverage tool in order to check that test-suites are are testing the appropriate functionality or even to visualise the prevalence of diabetes and heart disease in various regions of the country.
Separation Logic Verification and Testing for Systems Software	Christopher Pulte	Programming Languages		C	MSc	Prerequisites: Some knowledge of programming language semantics or verification, functional programming I am happy to supervise MSc and Part C projects around verification and testing of systems software. Students should ideally have some knowledge of programming language semantics or verification; functional programming experience is useful. Background Systems software – operating systems, hypervisor, firmware, etc. – is notoriously difficult to reason about formally, due to the complex invariants and programming idioms it relies on, and due to its interaction with the underlying hardware. Recent advances in software verification include verified OS kernels and hypervisors, but practical verification of real-world systems above realistic semantics remains an open problem. The goal of the CN type system [2] is to enable practical verification of systems software written in C. CN builds on the Cerberus C semantics [3] and combines refinement types and separation logic [1] to reason about C code based on a notion of memory ownership. Focus There are several directions for possible related projects. In the following I give two examples as a starting point for discussions, but interested students are welcome to contact and meet me to discuss options. Separation Logic Testing. Recent work in this area [4] includes executable testing against separation logic specifications: even when proof is the goal, runtime testing against CN specifications lets one discover specification or code errors early, before embarking on proof, and quickly gain more confidence in both. The existing Fulminate tooling makes specifications executable by translating them into C assertions and instrumenting the target code. This project explores an alternative: extending the Cerberus model to support testing C code against CN specifications directly within the Cerberus interpreter, to make testing sound with respect to C's complex semantics, and to simultaneously catch undefined behaviour and specification violations. Possible extensions include (a) using the interpreter-based implementation to explore extensions to CN's specification language (making use of the greater flexibility compared to compilation to C), or (b) exploiting the extended Cerberus for debugging proof failures, by replaying and explaining verification counterexamples in the interpreter. CN in Lean. CN automates large parts of the proof by sending proof obligations to an SMT solver. To ensure reliable automatino, the solver is only given problems in a decidable fragment of first-order logic, allowing users to fall back to manual proof where this is insufficient: user-specified lemmas can be exported and manually verified in an interactive theorem prover. This project explores whether verification can be made more practical by re-creating a CN-like system – possibly significantly cut-down – inside the interactive theorem prover Lean, to provide a smoother integration of SMT automation with manual proof. References [1] Separation Logic: A Logic for Shared Mutable Data Structures. John C Reynolds, 2002. https://www.cs.cmu.edu/~jcr/seplogic.pdf [2] CN: Verifying Systems C Code with Separation-Logic Refinement Types. Christopher Pulte, Dhruv C Makwana, Thomas Sewell, Kayvan Memarian, Peter Sewell, Neel Krishnaswami, 2023. https://www.cs.ox.ac.uk/people/christopher.pulte/popl23.pdf [3] Into the Depths of C: Elaborating the De Facto Standards. Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert NM Watson, Peter Sewell, 2016. https://www.cl.cam.ac.uk/~km569/into_the_depths_of_C.pdf [4] Fulminate: Testing CN Separation-Logic Specifications in C. Rini Banerjee, Kayvan Memarian, Dhruv Makwana, Christopher Pulte, Neel Krishnaswami, and Peter Sewell, 2025. https://www.cs.ox.ac.uk/people/christopher.pulte/2024-cn-testing-paper.pdf
The fundamental problem of counterfactual estimation	Francesco Quinzan, Mark van der Wilk	Artificial Intelligence and Machine Learning			MSc	This project deals with the fundamental problem of counterfactual estimation. Counterfactual estimation (CE) refers to the process of estimating or predicting what would have happened in a given situation if different actions or events had taken place. Counterfactual estimation is particularly vital in healthcare due to the complex and interconnected nature of biological systems and the need to understand the true causes behind diseases, treatment effects, and patient outcomes. Although CE holds tremendous promise and potential, estimating counterfactuals remains a significant challenge. Recently, transformer-type algorithms have emerged as a powerful tool for the related problem of zero-shot treatment effect estimation. In this project, we will look into transformer-type algorithms for CE. The work will consist of: (i) reading and understanding [1]; (ii) extend the framework of [1] to counterfactual estimation from interventional data; (iii) implement and perform experiments with a base model for this extended framework. [1] Towards Causal Foundation Model: on Duality between Causal Inference and Attention. Jiaqi Zhang et al. CoRR abs/2310.00809 (2023) Pre-requisites: A student suitable for this project will work with causal inference, attention mechanisms, and possibly the basics of reproducing kernel Hilbert spaces and feature maps. A background on any of these topics is desirable. Coding skills are required.
Statistical shape atlas of cardiac anatomy	Blanca Rodriguez, Pablo Lamata	Computational Biology and Health Informatics			MSc	Description: Cardiac remodelling is the change in shape of the anatomy due to disease processes. 3D computational meshes encode shape variation in cardiac anatomy, and render higher diagnostic than conventional geometrical metrics. Better shape metrics will enable an earlier detection, a more accurate stratification of disease, and more reliable evaluation of remodelling response. This project will contribute to the development of a toolkit for the construction of anatomical atlases of cardiac anatomy, and its translation to clinical adoption. The student will learn about the challenges and opportunities of the cross-disciplinary field between image analysis, computational modelling and cardiology, and be part of a project with a big potential impact on the management of cardiovascular diseases. Prerequisites: Motivation. Good programming skills. Experience with computational graphics and image analysis is an advantage.
A High-Level Language for Digital Fabrication	Alex Rogers	Systems	B	C	MSc	Prerequisites: None Digital fabrication describes the design and manufacture workflow where digital data directly drives manufacturing equipment. It typically involves generating 2D or 3D designs in CAD software which are then processed to generate the G-code instructions that are understood by 3D printers, drawing machines and desktop CNC machines. While being ubiquitous, this workflow is quite cumbersome since it enforces a clean separation between design and manufacture. By contrast, many rapid prototyping workflows would be improved with design tools better integrated with the intended manufacturing process. To this end, in this project, you will develop an improved workflow for rapid prototyping that allows both the design and manufacturing process to be described together in a domain specific high-level language which will compile to G-code for a specific machine type. You will design this language for a particular application and demonstrate it in simulation, and as a stretch goal, on a real physical machine.
Resurrecting Extinct Computers	Alex Rogers	Systems	B	C	MSc	While the architecture of current reduced instruction set processors is well established, and relatively static, the early days of computing saw extensive experimentation and exploration of alternative designs. Commercial processors developed during the 1960s, 1970s and 1980s included stack machines, LISP machines and massively parallel machines, such as the Connection Machine (CM-1) consisting of 65,536 individual one-bit processors connected as a 12-dimensional hypercube. This period also saw the development of the first single chip microprocessors, such as the Intel 4004, and the first personal computers, such as the Altair 8800 using the Intel 8080 microprocessor. This project will attempt to resurrect one of these extinct designs (or a scaled down version if necessary) using software simulation or a low-cost field-programmable gate array (FPGA). You will be required research the chosen processor, using both original and modern sources, and then develop a register level description of the device that can be implemented in software or on an FPGA. The final device should be able to run the software of the original. Prerequisites Digital Systems and Computer Architecture useful but not essential.
Refinement for feed-forward SfM models	Christian Rupprecht	Artificial Intelligence and Machine Learning	B	C	MSc	Advances in machine learning have massively impacted 3D computer vision. This has also changed the relationship between visual geometry and deep neural networks. For a long time, the belief was that accurate 3D reconstruction should be obtained from visual geometry principles by solving systems of equations or via optimization of energy functions, like in bundle adjustment (BA). In this view, machine learning was relegated to work as a pre-processor, addressing tasks like feature matching and tasks that geometry cannot handle, such as monocular depth prediction. Later, as machine learning methods matured, they became integrated more deeply in visual geometry pipelines, culminating in methods like VGGSfM that, using differentiable BA, achieve state-of-the-art results in Structure from Motion (SfM). Even so, visual geometry still plays a major role, which increases complexity and computational cost. As more and more powerful and capable foundation models emerge in computer vision, 3D tasks can now be solved directly by a neural predictor, eschewing visual geometry almost entirely. Recent contributions like Dust3r and its evolution Mast3r have shown promising results in this direction. In this project we will explore visual geometry post-optimization of the predictions of feed-forward 3D models and analyse how much neural predictions can be improved with classical multi-view geometry approaches. Goals: Setup an evaluation framework for both the deep and traditional 3D reconstruction methods to measure the progress. Use gradient descent optimization to update the predictions from feed-forward 3D models, making them more geometry-aligned. Test several hypotheses for bias in multiple different models. Stretch Goal: Define your own objective function that improves the results even further. Analyze the capabilities and limitations of feed-forward 3D models. References: Leroy, Vincent, Yohann Cabon, and Jérôme Revaud. "Grounding image matching in 3d with mast3r." European Conference on Computer Vision. Springer, Cham, 2025. Wang, Shuzhe, et al. "Dust3r: Geometric 3d vision made easy." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. Wang, Jianyuan, et al. "VGGSfM: Visual Geometry Grounded Deep Structure From Motion." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. Richard Hartley and Andrew Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000 Pre-requisites: Machine Learning
Understanding Bias in Object Detection Models	Christian Rupprecht	Artificial Intelligence and Machine Learning	B	C	MSc	Object detection is a fundamental task in computer vision. The goal is to locate and classify individual instances of objects in images (e.g., people, cars, cups, sheep, etc.). Most current models have been trained on benchmark datasets that consist of hand-annotated images collected from the internet. This introduces bias in the training data which in turn has been shown to lead to biased models. In this project, we will make use of modern image editing techniques, such as in-painting diffusion models, to modify images in specific ways (e.g., changing the apparent age of people, skin-color, day-night, …), which allows us to measure the impact of the edit on the object detection performance. Goals: • Setup an evaluation framework of object detection models that allows modification of the test data. • Use different image editing techniques to edit the test data. • Test several hypotheses for bias in multiple different models. Stretch Goal: • Bias can potentially mitigated by including the modified data during training of the model. References: Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models. 2022 IEEE." CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. Brooks, Tim, Aleksander Holynski, and Alexei A. Efros. "Instructpix2pix: Learning to follow image editing instructions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014. Singh, Krishna Kumar, et al. "Don't judge an object by its context: Learning to overcome contextual bias." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. Pre-requisites: Machine Learning
Category-theoretic syntactic models of programming languages	Philip Saville, Sam Staton	Programming Languages		C	MSc	Syntactic models -- categories constructed from the syntax of a programming language -- play a key role in category-theoretic denotational semantics. Showing such models exist and satisfy a suitable universal property amounts to giving a sound and complete semantic interpretation for the language in question. Often this involves carefully studying the interplay between program features and categorical structures. The three main aims of the project are as follows. Firstly, to construct syntactic models for two idealised effectful functional programming languages, namely Moggi's monadic metalanguage [1] and computational lambda calculus [2]. Next, to prove their universal properties, and finally to use these to give syntactic translations between the languages. The starting point would be to understand the categorical semantics of the simply-typed lambda calculus, the monadic metalanguage and the computational lambda calculus. Extensions would include exploring extensionality / well-pointedness and constructing fully abstract syntactic models of these languages. [1] E. Moggi, "Notions of computation and monads," 1991 [2] E. Moggi, Pre-requisite courses: - Categories, proofs and processes Useful background courses: - Principles of programming languages - Lambda calculus and types
Oxford Witt Lab Projects	Christian Schroeder de Witt		B	C	MSc	Student Projects / Supervision We offer regular, close supervision on ambitious projects at the frontier of AI capability, safety, and security for senior undergraduates, masters’ students, and DPhil/PhD students. Our work combines rigorous foundations with high-impact applications, and students are encouraged to engage directly with open research questions that shape the future of autonomous systems. Student research with OWL has led to multiple publications and prizes - including a Best MSc Thesis Award (Tony Hoare Prize), multiple top-tier AI conference publications, and a workshop Best Paper Award - and we regularly submit to and publish at leading venues in machine learning, AI safety, and multi-agent systems. Projects are designed to be innovative, technically deep, and publication-oriented - ideal for students aiming to contribute original research at the highest level. Please find example project directions below. We also welcome original project ideas from motivated students - if you have a proposal that aligns with the lab’s themes or pushes in an exciting new direction, we are very happy to consider it. We value intellectual initiative and enjoy shaping ambitious ideas into rigorous research projects. If you are interested in working with us, please get in touch at contact@wittlab.ai with a brief description of your background and research interests, as well as a CV. Please note that due to the volume of requests we may not be able to get back to everyone. Strategic Cyber Agents We are exploring the next generation of autonomous cyber agents: AI systems capable of operating in complex, adversarial digital environments. Our work spans both offensive and defensive settings, combining optimisation- and verification-based methods to build agents that can reason, plan, and adapt over long horizons. A central focus is developing rich world models for cyber, enabling agents to act strategically while remaining stealthy, robust, and aligned. We are particularly interested in “low-and-slow” behaviours, long-term reasoning, and principled evaluation in realistic environments. Possible projects sit at the intersection of one or several of: machine learning capabilities, safety, security, evaluation science, and formal methods - ideal for students excited by high-impact, technically deep research. Resources: [1] Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents, C Schroeder de Witt, arXiv:2505.02077 [cs.CR] [2] Secret Collusion among AI Agents: Multi-Agent Deception via Steganography, SR Motwani, M Baranchuk, M Strohmeier, V Bolina, PHS Torr, L Hammond, C Schroeder de Witt, NeurIPS 2024 [3] MALT: Improving Reasoning with Multi-Agent LLM Training, SR Motwani, C Smith, RJ Das, R Rafailov, I Laptev, PHS Torr, F Pizzati, R Clark, C Schroeder de Witt, COLM 2025 [4] Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs, X Davies, E Winsor, A Souly, T Korbak, R Kirk, C Schroeder de Witt, Y Gal, NeurIPS 2025 [5] Defeating Prompt Injections by Design, E Debenedetti, I Shumailov, T Fan, J Hayes, N Carlini, D Fabian, C Kern, C Shi, A Terzis, F Tramèr, arXiv:2503.18813 [cs.CR] [6] REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites, D Garg, S VanWeelden, D Caples, A Draguns, N Ravi, P Putta, N Garg, T Abraham, M Lara, F Lopez, J Liu, A Gundawar, P Hebbar, Y Joo, J Gu, C London, C Schroeder de Witt, S Motwani, arXiv:2504.11543 [cs.AI] Undetectable Threats We investigate the emerging science of undetectable threats in multi-agent AI systems. As autonomous agents proliferate, security challenges are shifting from overt attacks to subtle, covert, and strategically grounded manipulations. Our work draws on game theory, information theory, and modern machine learning to study steganography, covert channels, illusory attacks, and undetectable neural network backdoors. At the same time, we develop principled monitoring tools for detecting out-of-distribution dynamics and hidden coordination. The goal is to anticipate and mitigate the most pressing security risks facing autonomous systems - building theoretical foundations and practical defenses for a world of increasingly capable, interacting AI agents. Resources: [1] Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents, C Schroeder de Witt, arXiv:2505.02077 [cs.CR] [2] Perfectly Secure Steganography Using Minimum Entropy Coupling, C Schroeder de Witt, S Sokota, JZ Kolter, JN Foerster, M Strohmeier, ICLR 2023 [3] Unelicitable Backdoors via Cryptographic Transformer Circuits, A Draguns, A Gritsevskiy, S Motwani, C Schroeder de Witt, NeurIPS 2024 [4] Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection, L Nasvytis, K Sandbrink, J Foerster, T Franzmeyer, C Schroeder de Witt, AAMAS 2024 [5] Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs, X Davies, E Winsor, A Souly, T Korbak, R Kirk, C Schroeder de Witt, Y Gal, NeurIPS 2025 [6] Multi-Agent Common Knowledge Reinforcement Learning, C Schroeder de Witt, J Foerster, G Farquhar, P Torr, W Boehmer, S Whiteson, NeurIPS 2019 Foundations of Interpretability We study the foundations and limits of interpretability for modern AI systems. As models scale, mechanistic interpretability and automated interpretation (autointerp) face fundamental constraints: human input bottlenecks, causal identifiability limits, information-theoretic barriers, and even forms of program obfuscation within neural networks. Our goal is to rigorously characterise these limits - both theoretical and practical - and develop principled methods to work around them. By combining insights from representation learning, sparse modelling, causality, and information theory, we aim to clarify what interpretability can realistically achieve, and where it must evolve to remain effective for large-scale, safety-critical AI systems. Resources: [1] The Dead Salmons of AI Interpretability, M Méloux, G Dirupo, F Portet, M Peyrard, arXiv:2512.18792 [cs.AI] [2] Efficient Dictionary Learning with Switch Sparse Autoencoders, A Mudide, J Engels, E J Michaud, M Tegmark, C Schroeder de Witt, ICLR 2024 [3] Unelicitable Backdoors via Cryptographic Transformer Circuits, A Draguns, A Gritsevskiy, S Motwani, C Schroeder de Witt, NeurIPS 2024 Multi-Agent Evaluation Science As AI systems increasingly operate through interacting agents, evaluation itself becomes a core scientific problem. We study how to rigorously measure capability, robustness, and security in multi-agent LLM systems deployed in dynamic, adversarial environments. This includes developing scalable metrics for coordination, strategic reasoning, and multi-agent security, as well as analysing failure under distribution shift and emergent dynamics. A central focus is sim-to-real transfer: understanding how behaviours validated in simulated settings generalise to open-world deployment. We aim to build scalable, principled evaluation frameworks that meaningfully shape training and enable reliable autonomous agents in complex real-world systems. Resources: [1] Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents, C Schroeder de Witt, arXiv:2505.02077 [cs.CR] [2] Multi-Agent Risks from Advanced AI, Lewis Hammond et al., arXiv:2502.14143 [cs.MA]
Preventing Malicious Collusion between Advanced AI Systems	Christian Schroeder de Witt, Philip Torr, Alessandro Abate	Automated Verification	B	C		Consider settings in which one or more principals assign a task to a team of generative AI agents, for example a scheduling or negotiation task. The principals monitor the (not necessarily human-intelligible) communications between the agents and intervene if deemed necessary, hoping to prevent agents from pursuing undesirable joint strategies. In this project, we investigate the question if, when, and how optimisation pressure may lead generative AI agents to hide communications from their principals. We survey the landscape of steganography (information hiding), and, for a given level of security, identify the required knowledge and capabilities of generative AI agents. From these, we design a roadmap for model evaluation building on our recent work in this space [1][2]. We then empirically test the ability of state-of-the-art LLMs to engage in different types of covert communication given a variety of optimisation pressures. This project is designed to lead to publication. We are looking for a highly-motivated student. [1] Secret Collusion Among Generative AI Agents: A Model Evaluation Framework, Sumeet Ramesh Motwani, Mikhail Baranchuk, Philip H.S. Torr, Lewis Hammond, and Christian Schroeder de Witt, to appear - also see this talk here: https://www.alignment-workshop.com/nola-talks/christian-schroeder-de-witt-perfectly-secure-steganography-and-llm-collus [2] https://www.quantamagazine.org/secret-messages-can-hide-in-ai-generated-media-20230518/
Quantum Max-Cut	Sergii Strelchuk	Quantum	B	C	MSc	Prerequisites: Desirable: Courses in Quantum Information / Quantum Computation / Classical Complexity course. This is non-essential and can be picked up quickly. Essential: Familiarity with the basics of Quantum Information and Computation as described in this short set of lecture notes (https://www.qi.damtp.cam.ac.uk/files/PartIIIQC/Part%20IIC%20QIC/PartIIC%20QIClectures%20Full.pdf) Background Constraint satisfaction problems serve as a foundational framework for understanding a wide variety of computational problems. Since it is impossible to solve many of those exactly in polynomial time (assuming P != NP), we turn to studying approximation algorithms. These algorithms often solve a relaxed version of the problem as a semi-definite program (SDP), which can be efficiently optimized. The SDP solution is then ‘rounded’ to a feasible configuration that maximizes the number of satisfied constraints. A landmark example is the Goemans-Williamson algorithm for Max-Cut [GW95], which uses hyperplane rounding to achieve an approximation ratio of 0.878. The PCP theorem establishes that finding a solution within a certain constant factor of the optimum is NP-hard. Furthermore, stronger assumptions like the Unique Games Conjecture suggest that SDP rounding approaches achieve provably optimal approximation ratios [KKMO07]." The quantum counterpart to constraint satisfaction is the Local Hamiltonian Problem, which aims to find the minimum eigenvalue of a given local Hamiltonian. As the canonical QMA-complete problem, it also motivates the study of approximation algorithms. The challenge is even more intriguing because in the quantum setting one has to take into account the monogamy of entanglement – a fundamental quantum constraint on correlations between systems. In the last 5 years there has been fascinating progress to achieve higher approximation ratio in the quantum case using diverse approaches. In 2020, authors in [AGM20] achieved approximation ratio of 0.53 in the worst case. This was followed by hardness results in 2022 by [HNPTW23]: it is Unique Games-hard to compute a (0.956 + epsilon)-approximation to the value of the best state. Shortly after, the achievable approximation ratio was improved to 0.582 by using techniques which involve rounding a semi-definite program relaxation to an entangled state. Further insights were obtained in [WCEHK24] by an extension of non-commutative Sum of Squares optimization techniques to give a new hierarchy of relaxations to Quantum Max Cut and its subsequent refinement in [R23] with improved scaling in [HTPG24]. Another improvement to the achievable approximation of 0.595 in the worst case was obtained by [LP24] in 2024. Lastly, towards the end of 2024 authors in [KKZ24] introduced a Hamiltonian Quantum Approximate Optimization Algorithm. It builds on the well-known Quantum Approximate Optimization Algorithm which is a variational quantum approximation algorithm designed for classical combinatorial optimization problems on near-term hardware. The quest for higher achievable approximation ratio is still ongoing. Focus This project will review and analyse (a subset of) existing techniques and discuss their relative merits and limitations for partial and worse-case instances. Method A natural starting point is [K23]. Depending on a personal preference and/or familiarity with a particular technique, the project will review the ideas from the subset of references that correspond to the chosen technique(s) and study their limitations in special cases (e.g. on triangle-free graphs). References [AGM20] Anshu, A., Gosset, D., & Morenz, K. (2020). Beyond product state approximations for a quantum analogue of max cut. arXiv preprint arXiv:2003.14394. (In proceedings of 15th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2020)) [HNPTW23] Hwang, Y., Neeman, J., Parekh, O., Thompson, K., & Wright, J. (2023). Unique Games hardness of Quantum Max-Cut, and a conjectured vector-valued Borell's inequality. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (pp. 1319-1384). Society for Industrial and Applied Mathematics [K23] King, R. (2023). An improved approximation algorithm for quantum max-cut on triangle-free graphs. Quantum, 7, 1180. [WCEHK24] Watts, A. B., Chowdhury, A., Epperly, A., Helton, J. W., & Klep, I. (2024). Relaxations and exact solutions to quantum Max Cut via the algebraic structure of swap operators. Quantum, 8, 1352. [R23] Rao, S. (2023). Analysis of sum-of-squares relaxations for the quantum rotor model. arXiv preprint arXiv:2311.09010. [LP24] Lee, E., & Parekh, O. (2024). An improved Quantum Max Cut approximation via matching. arXiv preprint arXiv:2401.03616. [HTPG24] Huber, F., Thompson, K., Parekh, O., & Gharibian, S. (2024). Second order cone relaxations for quantum Max Cut. arXiv:2411.04120. [KKZ24] Kannan, I., King, R., & Zhou, L. (2024). A Quantum Approximate Optimization Algorithm for Local Hamiltonian Problems. arXiv preprint arXiv:2412.09221.
Adapting Red to the Language Server Protocol	Bernard Sufrin		B	C		The Language Server Protocol is an open, JSON-RPC-based protocol for use between source code editors or integrated development environments (IDEs) and servers that provide ”language intelligence tools”: programming language-specific features like code completion, syntax highlighting and marking of warnings and errors, as well as refactoring routines. The goal of the protocol is to allow programming language support to be implemented and distributed independently of any given editor or IDE. In the early 2020s LSP quickly became a norm for language intelligence tools providers. Red (also known as AppleRed) is no-frills, unicode-capable, modeless text editor with a simple implementation that can be customized using the redscript language (a Lisp-like notation). Its underlying capabilities can be straightforwardly extended using Scala. It has no pretensions to being an all-encompassing workplace, and unlike many IDE and modern editors does not confuse its user by spontaneously trying to be helpful: everything it does it does in response to user input from mouse, pad, or keyboard. Sometimes such “helpful” systems can be tricky to use because it’s not clear who or what is in control. The aim of this project is to adapt Red to the LSP, and to use one or two specific Language Servers as test cases. Examples of language servers are: Metals (for Scala), rust.Analyzer(for Rust), haskell language server (for Haskell), and texlab (for Latex).
Embedded Handel for Hardware Design	Bernard Sufrin		B	C	MSc	HANDEL (designed in this Department by the hardware compilation group led by Ian Page) was one of the very earliest serious attempts to demonstrate that Hardware should be described by a programming language, not as wires and registers. The key ideas of Handel were rooted in: strong static typing, concurrency as a first-class concept, and its semantics were rooted in csp / occam. For comparison with a somewhat lower-level language in which concurrency is implicit, see [4]. In its prototype implementation it was embedded (in a niche-programming language: Standard ML). Its successful application to designs that were to be implemented using (then nascent) Field Programmable Gate Array technology[1,2] led to a successor language, Handel-C[3], being built commercially, and adopted as a hardware design tool of choice within the academic community, especially in the United Kingdom. The demise of Handel-C has been authoritatively attributed to the very high cost and proprietary nature of parts of its toolchain. These days FPGAs are available at a variety of scales, for example (high end) AMD/Xilinx "Virtex" Ultrascale with ~9 million logic cells, costing tens of thousands of pounds per chip and (low end) Lattice CE40UP5K with up to ~5,000 cells with typical board costs of a few tens of pounds. Lattice is well supported by open toolchains. The time is ripe for reviving Handel. The goal of this project is to embed an implementation of Handel in Scala, generating output for synthesis by an open-source tool-chain. References: https://www.cs.ox.ac.uk/people/bersufrin/personal/HANDEL/pproc.pdf - PARAMETRISED PROCESSOR GENERATION https://www.cs.ox.ac.uk/people/bernard.sufrin/personal/HANDEL/wotug94v - Automatic Design and Implementation of Microprocessors https://en.wikipedia.org/wiki/Handel-C - HANDEL-C https://www.cs.ox.ac.uk/people/bernard.sufrin/personal/HANDEL/Operation-centric_hardware_description_and_synthesis.pdf
Modeless Structure Editing	Bernard Sufrin		B	C		Oege de Moor and I wrote a paper called Modeless Structure Editing for Tony Hoare’s retirement symposium. What we had in mind was a family of modeless editors whose underlying model of the document being edited is akin to its abstract syntax tree. It’s clear what this means if the document represents, for example, a program in a programming language or a proof in a proof calculus; but we conjecture that structured prose documents will also be amenable to this approach to a useful extent. Editing an abstract syntax tree model does not mean that one must necessarily work with a “tree-shaped” view of the document. The benefits of editing an abstract syntax tree model include: The ability to perform semantic checks incrementally and rapidly. Of course semantic checks are realized differently for different formalisms. The potential to interact with varieties of views of the document, and the potential to switch views rapidly. The potential to mix formalisms within a single document, and to derive a variety of artefacts from it. Imagine, for example, writing lectures about programs in a programming language: one wants (parts of) the program source code to appear in the lecture; and one wants the entire source code of the program to be derived from their “reference text” in the document. Such an approach has the potential to yield dividends in performance and simplicity of implementation of incremental semantic checks, codegeneration, etc. The first phase of this project will construct (in Scala) a prototype structure editing toolkit implementing the model explained in this paper. Subsequent phases will provide one or two case studies in its use. The architecture of an editor that could use such a document model already exists – in the shape of our AppleRed editor, whose document model is (in principle) “pluggable”, and that currently uses a linear model (document as a sequence of lines of text). What will be needed is to construct a new “plugin” conforming to the AppleRed document model interface, and to adapt and generalise that interface where necessary.
Programming Language Implementation	Bernard Sufrin		B	C	MSc	PicoML PicoML is a polymorphically-typed lazy functional language with I/O actions (cf the I/O Monad in Haskell). It has a very flexible concrete syntax, and a very straightforward polymorphic type system. Its only implementation - a prototype - is written in OCaml as direct, closure-based, abstract-syntax tree interpreter. PicoML GitHub Repository We would like to see a more efficient implementation, as well as one that would make it straightforward to incorporate new features for engaging with external entities (graphics, etc) The present project would reimplement the language in one or both of the following ways, each of which would first require a reimplementation of parser and type checker – possibly, but not necessarily, in Scala. as a closure-based, abstract-syntax tree interpreter using continuations. as a (Scala) compiler translating to code for an abstract machine - possibly the G-Machine of Lennart Augustsson and Thomas Johnsson - thence to executable code for conventional architecture. The outcome of the project should be both useable and scrutable: students should be able to read the code of a realistic language implementation. Mini Haskell What? We think that the PicoML language implementation described above would be a very good basis for a new implementation of a simplified variant of Haskell, suitable for introducing that language in such a way that the need for type classes emerges organically from programming practice. The project would aim to deliver an implementation suitable for use by Haskell novices that would place emphasis on good error diagnostics: both type errors and runtime errors. Availability of such an implementation could make the teaching of functional programming is the secondary school CS curriculum and for home study a practical proposition: thereby liberating at least some pupils from the conceptual straightjackets of Python, C++, etc. Why? An important difficulty students have with Haskell and its type system is the (important) notion of type classes: whilst very useful to more advanced Haskell users, they can pose a barrier to learning because what might (in their absence) be a type-error that is straightforward to report and to diagnose, gets reported in terms of instances and classes. For example: ghci> max xs <interactive>:2:1: error: • No instance for (Show ([Integer] -> [Integer])) arising from a use of ‘print’ (maybe you haven't applied a function to enough arguments?) • In a stmt of an interactive GHCi command: print it ghci> max <interactive>:1:1: error: • No instance for (Show (() -> () -> ())) arising from a use of ‘print’ (maybe you haven't applied a function to enough arguments?) • In a stmt of an interactive GHCi command: print it Of course wise neophytes take their tutors’ advice, and explore the type of the function they thought they were using correctly: ghci> :t max max :: Ord a => a -> a -> a and may then realize that what they were after is ghci> :t maximum maximum :: (Foldable t, Ord a) => t a -> a But what’s a Foldable? The neophyte has to understand that it’s a constructor class! And the most succinct explanation of that that can be found is “A constructor class is declared with a type parameter of kind * -> , -> * -> * In other words: a constructor class describes behaviour of type constructors, not of ground types.” Thus in order to understand an inscrutable diagnosis of simple mistake made in writing in a simple language, our neophyte must take on board explanations of advanced ideas (type class, kind, etc) that can only be understood once the simpler language has been mastered.
Proof Support for a Haskell-like language	Bernard Sufrin		B	C	MSc	What? The purpose of this project would be to implement an interactive system to support the discovery (and pretty presentation) of equational proofs by a combination of “automatic choice” and human guidance. Richard Bird, acknowledging Mike Spivey’s unpublished work, sketched an implementation of the simplest form of such a “proof assistant” in Haskell in his most excellent book: Thinking Functionally with Haskell. The project would start by implementing a similar assistant with a convenient interactive workflow for discovering, recording, and checking proofs. The tool should also support the use of appropriate induction rules derived automatically from data declarations. An interesting challenge will be the provision of convenient methods for providing human guidance for the application of algebraic laws (such as associativity, commutativity, distributivity) at appropriate points in a proof. Although now somewhat long in the tooth, the Jape generic proof editor still provides (in its functional programming example theory) a working concrete example of the sort of guidance that could be used here, and a user interface by which such guidance can be communicated. Our slogan while building Jape (and many of its example theories) was “Proof by Pointing”. The Jape GitHub repository is here. Equational Proofs Many Haskell proofs take the form of straightforward equational rewriting using definitions or derived laws left-to-right as rules. Typically the discovery of a proof of lhs = rhs will take the form lhs = { rule name lhs1 = { rule name lhs2 . lhsn = { rule name} crux followed by rhs = { rule name rhs1 = { rule name rhs2 ... rhsm = { rule name crux Where crux is the common term to which both sides can be reduced. This style of presentation relies implicitly on the transitivity of equality. Aside: it is annoying that some authors obfuscate these sensible discovery steps by a linearised presentation “as if” the post-crux proof had been discovered by using rules ``right to left’’ ... crux = { rule name rhsm = { rule name ... rhs2 = { rule name rhs1 = { rule name rhs In our experience such presentations can be pedagogically harmful. Sometimes the choice of rule at each stage is almost automatic: “pretend you are the Haskell interpreter”. At other times firm human guidance (and/or the invention of lemmas) is needed. For example, a proof of (rev * rev) . swap . (rev * rev) . swap = ... = id in the presence of laws/definitions rev . rev = id swap(y,z) = (z,y) (f * g)(y, z) = (f x, g y) can use the lemmas swap . (fg) . swap = gf (f * g) . (h j) = (f . h) (g . j) as well as timely invocation of the associativity of composition.
Proof by Pointing: Lean meets Jape	Bernard Sufrin		B	C	MSc	What? For many years the generic proof-editor Jape has been providing a "proof by pointing" graphical user interface to support the human-directed discovery of proofs and derivations in a logic defined as a system of inference rules. The goal of this project is not to reinvent Jape, but to give it a modern semantic substrate, using Lean4: the most recent implementation of a very widely accepted programming language cum proof assistant (See https://lean-lang.org/). The first substantive piece of work will be come up with a proof of concept implementation of proof-by-pointing in the Jape style for one, perhaps two, straightforward logics. I favour (but don't wish to be dogmatic about) a single-conclusion-sequent presentation of a first-order logic, for example, the logic used in Introduction to Formal Proof. I also favour (but don't wish to be dogmatic about) using Scala as the language in which to program the front-end.^[1] Why? Jape was ahead of its time in several respects: Proof by pointing rather than scripting Explicit proof-derivation objects evolving (interactive) step-by-step Structural views of proof derivations: Fitch-boxed or as inference trees: not just as linear text A focus on pedagogy through human interaction, not just the correctness of derivations. At the time Richard Bornat and I designed it Jape was revolutionary in the way it provided straightforward user-directed construction of proofs, as well as supporting a straightforward "logician oriented" metalanguage for defining inference systems. There are few constraints on the kinds of "logic" with which Jape can cope: our goal was always to make it possible for someone who isn’t expert in Jape to construct a Jape inference system capable of capturing their logic, proof system, operational semantics, or what-have-you more or less directly from its presentation in a textbook or paper; then for others to construct derivations in the system interactively. Our motivation, and the principles behind Jape's end-user interfaces, were eventually articulated in a couple of papers delivered at the "User Interfaces for Theorem Provers" conference series that it had catalysed: User Interfaces for Generic Proof Assistants Part I: Interpreting Gestures, and User Interfaces for Generic Proof Assistants Part II: Displaying Proofs and in the (shorter) Formal Aspects of Computing Paper: A Minimal Graphical User Interface for the Jape Proof Calculator. These and others that may be of interest are still readable in the directory https://www.cs.ox.ac.uk/people/bernard.sufrin/jape.org.uk/DOCUMENTS/CURRENT/. When Jape was born, and for very many years afterwards, nearly all of the work in user interfaces for proof tools addressed the problem(s) of attaching a user interface to a powerful pre-existing proof assistant/theorem prover. Our experience of teaching people to conduct proofs (and other derivations) led us to reject this approach. Jape’s semantic substrate was defined by a carefully constructed inference engine; but at the time of its design we were uninterested in foundational aspects of that engine. The advent of Lean4 has given fresh impetus to the idea of re-embodying the UI principles of Jape in a new logical setting. Note Subsequent projects ought to be able to take this proof of concept in the direction of other logics (Modal logics, Program logics, Separation logic, etc) and systems formalisable as inference systems. The research challenges there include finding straightforward ways for the "interaction designer" to describe correspondences between user-gestures and the inference rules / tactics to which they should be bound.^[2] ^[1] I developed the Glyph User Interface Toolkit (github.com/sufrin/Glyph) for purposes such as this. ^[2] The Jape tactic language had binding constructs to facilitate this; but its rebarbative syntax and ad-hoc semantics meant that Bornat and Sufrin were almost the only people who could use it.
A Dataflow Compiler and Simulator for Heterogeneous Optical-Digital Architectures	Philip Torr			C	MSc	Abstract Lumai is developing a 3D optical AI accelerator capable of executing matrix–vector operations with significantly higher energy efficiency than conventional digital hardware such as Nvidia GPUs. Instead of relying on digital parallelism, computation is performed through optical dataflow, enabling extremely high throughput at low energy cost. However, this architecture imposes two critical constraints that current AI models (like Vision Transformers) are not natively designed for: Streaming Computation Data flows continuously through the processor, with limited random access to past activations or memory. Architectures such as State-Space Models (e.g., Mamba) are therefore better aligned than architectures relying on full attention and key–value caches, such as Transformers. Extreme Low Precision (Int4) + Analog Noise To maximize optical efficiency, weights are represented in 4-bit integer format, and computations incur analog noise and non-ideal signal propagation. Many current deep learning models assume FP32/BF16 precision and can degrade significantly under such constraints. We are seeking Master students to explore the algorithmic, software, and hardware co-design challenges associated with this architecture. Given the novelty of running state-space models on optical hardware, these projects have strong potential to lead to publishable research. Project 2: The Systems Co-Design Path Title: A Dataflow Compiler and Simulator for Heterogeneous Optical-Digital Architectures Our accelerator is heterogeneous: it pairs a high-speed Optical Core (for Matrix-Vector Multiplication) with a standard Host CPU or DSP (for non-linearities and control). We need a compiler that automatically partitions a trained PyTorch model to map the right operations to the right processor. Core Objectives: Graph Capture: Develop a tool to trace the computational graph of a VideoMamba model Automated Partitioning: Create a compiler pass that tags operations based on the hardware strengths: Optical Core: Static Matrix Multiplications (Int4, High Throughput). Host CPU/DSP: Element-wise operations, Activations, and complex State Updates (High Precision, Low Throughput). State Memory Management: Design a "State Buffer" strategy to handle the passing of the hidden state between the Optical Unit and the Host CPU/DSP across timesteps, minimizing data transfer penalties. Deliverable: A Compiler Prototype and a Performance Visualizer that estimates throughput (FPS/Watt) for Video Mamba on Lumai hardware compared to Nvidia A100/H100 baselines. Why Apply? Publication Goal: The intersection of Optical Computing, Mamba/SSMs, and Low-bit Quantization is a "hot topic" in current research. We strongly encourage and will support the submission of this work to top-tier venues such as MLSys, NeurIPS, or ICLR. Impact: Your work will directly influence the architecture of a new class of AI hardware. Mentorship: Direct collaboration with the Lumai engineering team.
An AI Co-Scholar for Economic History	Philip Torr		B	C	MSc	Title: An AI Co-Scholar for Economic History Vision Progress in economic history depends on researchers formulating novel hypotheses, constructing datasets from primary sources, and embedding statistical findings within the historical context. The field’s central bottleneck is that datasets remain extremely scarce: information extraction from primary sources is still performed manually due to their heterogeneity, which severely limits the scale and scope of empirical analysis. We propose to develop a multi-agent reasoning system that accelerates each stage of the empirical research pipeline in economic history. The system will include: (1) a Hypothesis and Literature Agent that surveys existing research, identifies theoretical mechanisms and existing datasets, as well as generates new hypotheses; (2) a Primary Source Discovery Agent that locates and retrieves image scans from public digital archives, including associated copyright information; (3) a Dataset Construction Agent that performs information extraction on heterogeneous primary sources (Gothic, Antiqua, handwriting, complex layouts), whether provided by the economic historian from their own scanned materials or retrieved from public digital archives; (4) a Cleaning and Linking Agent that standardizes extracted data and links individuals, firms, and locations across the constructed datasets; and (5) an Analysis Agent that evaluates hypotheses using econometric methods (difference-in-differences, instrumental variables, and other causal identification strategies). All agents must operate with transparent, traceable decision logs and the system should remain steerable for economic historians through a human-in-the-loop interface. This system also provides an environment for AI safety and mechanistic interpretability work on historical reasoning. In the long-term, the goal is to build agent-based simulations to study historical economic systems through the lens of complexity economics, using the collected archival image scans as their empirical foundation. Proposals 1. Hypothesis and Literature Agent for Economic History The Hypothesis and Literature Agent will read the relevant literature, locate publicly available datasets, and identify the mechanisms proposed by economic historians. It will synthesize this information to generate new hypotheses and rank them by their feasibility. The agent will also clearly distinguish between hypotheses that can be examined with existing datasets after minimal linking or restructuring and those that require constructing new datasets from primary sources because key variables are missing. Related Work from different domains: Towards an AI Co-Scientist, https://arxiv.org/abs/2502.18864 2. Primary Source Discovery Agent for Archival Image Scans Governments and private companies worldwide have invested substantial resources in scanning and digitizing historical sources, and many of these materials are now publicly accessible online. Yet they remain scattered across thousands of archives, museum, and company websites. Agentic methods make it possible to automate the retrieval of these archival image scans and assemble them into a unified database. Such a database must contain not only the images but also all relevant copyright information and metadata. Developing an agentic web scraper for historical sources would create a large, centralized corpus of primary source images. This would provide the foundation for large-scale empirical research, since multimodal large language models can read, transcribe, and extract structured information from these scans to generate datasets suitable for statistical analysis. One example of a publicly accessible digital archive: University of Mannheim, https://digi.bib.uni-mannheim.de 3. Dataset Construction Agent for Archival Image Scans This agent builds on our experience using multimodal LLMs to process archival image scans, including double-column patent records from Imperial Germany (forthcoming) and eighteenth- and nineteenth-century German city directories. These sources vary widely in layout, structure, and font, and our work demonstrates that current models perform reliably when pages are processed independently but remain unsuitable for full-volume processing. Although frontier models allow extremely large input context windows, often in the millions of tokens, their maximum output window remains far smaller: for example, Gemini-2.5-Pro is limited to 65,536 output tokens. This restricts the length of transcriptions or extracted fields that can be returned in a single model call, alongside the already well-documented degradation in model performance as context length increases. Developing this agent is therefore a prerequisite for converting large archival PDFs into structured datasets suitable for statistical analysis. Relevant literature: Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents, https://arxiv.org/abs/2504.00414 4. LLM-based Dataset Linking Agent for Historical Microdata Any historical source provides only a single snapshot. To study human lives in the past, we must link many such snapshots across sources and across time. Economic historians have made progress by linking individuals in historical censuses, but existing methods rely on rules-based algorithms with limited accuracy and scale. Our aim is to develop an LLM-based system for record linkage that can trace individuals both across different years of the same source and across entirely different categories of sources. The same person may appear in birth registers, marriage records, city directories, newspapers, probate files and other documents throughout their lifetime. We will build on our city-directory work, which already covers thousands of German directories from the eighteenth and nineteenth century, and will use location, occupation, kinship terms and other contextual cues to anchor identities. Our objective is to establish a state-of-the-art approach to LLM-based dataset linking. A successful solution would allow researchers to reconstruct the lives of ordinary people as coherent life histories rather than isolated fragments, thereby opening new possibilities for historical microdata and quantitative economic history. Current state-of-the-art approach in economic history: Automated Linking of Historical Data, https://www.aeaweb.org/articles?id=10.1257/ jel.20201599 5. Historical Georeferencing with multimodal LLMs Historical city directories contain rich address information for individuals, firms and institutions, yet the physical structure of cities has changed dramatically over the past centuries. Researchers who wish to study urban development, spatial inequality or neighborhood dynamics must therefore spend years manually georeferencing these addresses. We propose to develop a system that uses multimodal LLMs, historical maps and modern coordinate data to automate this process. Combined with the large corpus of German and European city directories we have already collected, this approach would make it possible to reconstruct historical urban environments at scale, and ultimately, to analyze long-run urban change with far greater precision and speed. 6. Mechanistic Interpretability of Multimodal LLMs on Historical Data: Images vs Text A central open question in AI for historical research is whether large language models reason in the same way when they receive historical information as raw text or as images containing the same text. Public archives hold vast numbers of image scans. Many of these can also be processed through OCR before being fed into an LLM. This raises a straightforward but important question: do models extract and interpret information differently depending on whether the input is an image or OCR text? We aim to study how multimodal LLMs internally represent historical scripts such as Gothic and Antiqua. We will then compare these representations to those obtained when the identical content is provided as OCR text. The objective is to map differences in the model’s activation space, to see whether specialized pathways emerge for particular scripts and data formats. We will also investigate whether the model develops something resembling “OCR attention heads” for historical writing systems. Relevant literature: Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models, https://arxiv.org/abs/2509.17588 How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads, https://arxiv.org/abs/2505.15865
Causal and Interpretable AI for Contemporary Art Market Analysis	Philip Torr		B	C	MSc	Background Reading: https://www.engineegroup.com/tcsit/article/view/TCSIT-7-148 https://proceedings.neurips.cc/paper_files/paper/2024/hash/38cc5cba8e513547b96bc326e25610dc-Abstract-Datasets_and_Benchmarks_Track.html https://openreview.net/pdf?id=EO8mTLqDuT https://openreview.net/pdf?id=EDWTHMVOCj https://www.amazon.co.uk/Causality-Judea-Pearl/dp/052189560X This project explores how machine learning and causal inference can enhance transparency and understanding in the contemporary art market. Building on an existing prototype that analyses pricing, provenance, and institutional representation, the project will expand multimodal datasets and refine predictive algorithms to distinguish genuine causal drivers from correlations in art valuation and reputation. By integrating causal discovery methods, explainable AI, and uncertainty estimation, the research aims to produce models that not only predict but also explain how factors such as museum representation, collector networks, and visual characteristics influence market behaviour. The project offers a unique opportunity to contribute to interdisciplinary work at the intersection of AI, econometrics, and the humanities, advancing the interpretability and reliability of algorithmic systems for cultural and financial decision-making.
Evaluating Vulnerabilities of Coding Agents	Philip Torr, Adel Bibi, James Oldfield				MSc	Evaluating Vulnerabilities of Coding Agents Providing LLM agents the ability to read, write, and execute scripts is a major step towards automating coding. However, granting LLMs the ability to interact with a user filesystem exposes a range of new security vulnerabilities. This project aims to explore the attack surface of LLM coding agents; identifying where and how vulnerabilities arise (e.g., perhaps through malicious documentation, unit tests, or otherwise), and proposing defense mechanisms to better mitigate the risks. This project will likely be joint with collaborators from Softserve as an industry partner.
Evaluating and Benchmarking Activation Monitors for Large Language Models	Philip Torr, Adel Bibi, James Oldfield				MSc	Evaluating and Benchmarking Activation Monitors for Large Language Models One effective way of preventing harmful outputs in large language models (LLMs) is to monitor the intermediate activations produced by the network during its forward pass. Methodologies for catching problematic behavior often rely on the simple “linear probe” (Alain et al. 2016), with more sophisticated multi-stage monitors using probes as the first line of defense (Cunningham et al. 2025, McKenzie et al. 2025). Given widespread reliance on probes, understanding their potential vulnerabilities and limitations is vital for security and safety from potential attacks or failure modes. This project seeks to thoroughly evaluate the potential failure modes and attack surfaces of probes for activation monitoring (Bailey et al. 2024), and possibly build better datasets and standardized evaluation suites.
Exploration vs exploitation in AI “scientist” systems; driving novel hypothesis generation	Philip Torr			C	MSc	Exploration vs exploitation in AI “scientist” systems; driving novel hypothesis generation AI “scientist” systems can reason over evidence and propose new research ideas, but it remains unclear how to systematically balance exploration (breadth: searching widely across new hypotheses) versus exploitation (depth: iteratively refining and validating promising directions) [1]. This matters because the hypothesis space is enormous: breadth-first ideation becomes shallow and repetitive, while depth-first reasoning risks getting stuck in local optima. This project will formalise hypothesis generation as a sequential decision problem, where the agent must allocate limited steps/queries/attention across a vast hypothesis space. The student will implement and compare strategies for navigating hypothesis space, for example, multi-armed bandits [2], tree-search planners [3], structured reasoning approaches for LLM agents [4], and world-model style approaches [5]. Using historical scientific papers as a proxy environment, the student will evaluate methods on metrics such as novelty, diversity, and plausibility. An additional focus will be on whether LLM “hallucinations” can be leveraged for exploration when they are induced in a controlled way and then filtered/grounded via plausibility checks [6]. The student will test novelty-driven generation via changes to the objective, reward [7] or the “environment” (e.g., masking parts of the reference corpus, altering constraints, or steering sampling), then study how these interventions shift the distribution of proposed ideas. The goal is to characterise when “novelty-driven hallucinations” help an AI scientist systematically sample under-explored regions of a design space [8]. Objectives Formalise hypothesis generation as a sequential decision process and define measurable objectives for novelty, plausibility, and diversity. Implement & compare breadth/depth control strategies using historical papers or trial protocols as a proxy environment [1, 2, 3]. Design and evaluate controlled hallucination/contradiction mechanisms (novelty steering + grounding filters) and quantify when they improve discovery metrics and downstream usefulness [5, 6]. Interested students will contribute to multiple aspects of this project, from designing methods to developing the evaluation frameworks. This work will provide hands-on experience at the intersection of AI and life sciences and meaningfully contribute to AI scientists’ efforts within the Torr lab. References [1] Lu et al. (2024). The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. [2] Lattimore & Szepesvári (2020). Bandit Algorithms. [3] Zhou et al. (2023/2024). Language Agent Tree Search (LATS) Unifies Reasoning, Acting, and Planning in Language Models. [4] Yao et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. [5] Hafner et al. (2025). Mastering diverse control tasks through world models. [6] Huang et al. (2025). A Survey on Hallucination in Large Language Models. [7] Burda et al. (2018). Exploration by Random Network Distillation. [8] Lehman & Stanley (2011). Evolution through the Search for Novelty Alone.
Exploring Uncertainty in Focus Instruction Tuning: The Role of Feature Specification in Model Confidence and Uncertainty	Philip Torr			C	MSc	Existing uncertainty quantification (UQ) methods, such as MC Dropout [1], Deep Ensembles[2], predictive entropy, and semantic entropy [3] estimate model confidence and uncertainty but do not explicitly investigate how focusing on or ignoring specific features influences uncertainty. In many cases, models may express high certainty when relying on spurious correlations, but when instructed to ignore these features, their uncertainty may increase significantly. Understanding how uncertainty shifts when the model is forced to focus on different aspects of the input is crucial for evaluating robustness and generalization under distribution shifts. Focus Instruction Tuning (FIT) [4] provides a natural framework for this investigation by explicitly controlling which features the model attends to, making it possible to analyze how uncertainty behaves under different feature specifications. This project has two key aims. First, we seek to benchmark standard UQ measures within the FIT framework, examining how a model’s elicited uncertainty changes when instructed to focus on or disregard specific features. This is particularly relevant in cases where models learn spurious correlations, as they may appear confident when attending to irrelevant features but exhibit increased uncertainty when forced to rely on causal signals. The second aim is to develop a Bayesian-style UQ methodology within FIT, incorporating feature priors and posterior inference to quantify uncertainty over specific feature attributions. This approach would allow for a more interpretable and structured form of UQ, where uncertainty is explicitly linked to individual features, providing deeper insights into model confidence and failure modes. By disentangling uncertainty across features, this method could offer a more robust alternative for evaluating and improving uncertainty quantification under distribution shifts and biased correlations. [1] Gal, Yarin, and Zoubin Ghahramani. "A theoretically grounded application of dropout in recurrent neural networks." Advances in neural information processing systems 29 (2016). [2] Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. "Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems 30 (2017). [3] Kuhn, Lorenz, Yarin Gal, and Sebastian Farquhar. "Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation." The Eleventh International Conference on Learning Representations. [4] Lamb, Tom A., et al. "Focus On This, Not That! Steering LLMs With Adaptive Feature Specification." arXiv preprint arXiv:2410.22944 (2024).
Monitoring LLM Agents’ Tool Use	Philip Torr, Adel Bibi, James Oldfield				MSc	Monitoring LLM Agents’ Tool Use *Modern LLM agents invoke tools and act on external information (search, code, files), creating new safety risks: malicious instructions may be hidden in tool outputs, and retrieved files may contain harmful instructions or hijacks (Aichberger et al, 2025). Given LLM agents’ ability to act autonomously in the real world, it is crucial that we develop more sophisticated techniques to address the unique challenges and emerging risks. This project aims to design lightweight monitors of LLM agents’ tool usage that can be run externally; catching and preventing injection of harmful instructions and hijacks. This project will likely be joint with collaborators from Microsoft as an industry partner.
Multi-Modal Partially Labelled Stream	Philip Torr	Artificial Intelligence and Machine Learning			MSc	Data on large systems is often stream lined and multi modal, e.g., textual, images, videos, and or sound. All this data is being accumulated while jointly changing in distribution. Moreover, much of this data presented from the stream is only partially labelled. We seek to study the problem of training models on a partially labelled streams in multi-modal setting. In particular, we seek to find new effective algorithms to performing joint self-supervised continual learning on the unlabelled data while learning in supervised fashion the labelled portion of the stream.
Navigating the Genetic Perturbation Landscape: Multi-modal causal representation learning for target discovery	Philip Torr, Jonathan Hedley	Artificial Intelligence and Machine Learning			MSc	Cardiometabolic disorders remain the leading cause of mortality globally [1,2]. Addressing this major public health issue necessitates identifying effective pharmacological interventions, which requires a detailed understanding of the complex aetiology of these disorders. Cardiometabolic diseases are driven by an intricate interplay of genetic and environmental factors that impact the functionality of diverse cell types across the human body [3]. To tackle this complexity, new drug discovery approaches are essential to navigate the vast combinatorial landscape of potential pharmacological interventions and cellular phenotypes. This project aims to develop an innovative predictive model for cellular response to genetic perturbations, a key step towards discovering drug targets for cardiometabolic disorders. By focusing on how cells react to genetic modifications (e.g., gene knockouts or gene silencing), this model will provide insights into the druggable genome—a critical factor for target discovery. The Torr Vision Group has recently begun a collaboration with Novo Nordisk; together, we have the following objectives: Develop a Predictive Model: Create a model capable of accurately predicting unseen cellular responses to specific genetic perturbations across various cell types. This will be grounded on the comprehensive data generated in-house at the Novo Nordisk Research Centre in Oxford (NNRCO), where a framework is being developed to deeply characterise cellular phenotypes at scale. Develop Enhanced Cellular Representations: Develop multi-modal cellular representations that capture detailed patterns in imaging, gene expression and proteomics data, improving the accuracy of the predictive model. Learning these detailed patterns may also provide insights on genetic interactions and the gene regulatory network. Active Learning for Efficient Genome Screening: Given the scale of the human genome, exploring the combinatorial perturbation landscape defined by 20,000 protein-encoding genes poses a significant experimental challenge. Our approach will utilize an active learning framework to guide sequential, optimal experimental perturbation screens. This will enable efficient and targeted exploration of the genetic perturbation landscape, accelerating the discovery of therapeutic targets. Interested students will have the opportunity to contribute to these multiple aspects of this project, from designing cellular representations to developing the active learning framework. This work will provide hands-on experience at the intersection of ML and genetics, contributing meaningfully to ML-driven drug discovery efforts. [1] GBD 2021 Diabetes Collaborators (2023). Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet, 402(10397), 203–234. [2] G.R. Dagenais, D. P. Leong, S. Rangarajan, et al. (2020). Variations in common diseases, hospital admissions, and deaths in middle-aged adults in 21 countries from five continents (PURE): a prospective cohort study. Lancet; 395(10226):785–794. [3] C. Priest, P. Tontonoz, (2019). Inter-organ cross-talk in metabolic syndrome. Nature Metabolism ;1(12):1177–1188.
Quantizing Video Mamba: Robust Streaming Vision on Low-Precision Optical Hardware	Philip Torr			C	MSc	Abstract Lumai is developing a 3D optical AI accelerator capable of executing matrix–vector operations with significantly higher energy efficiency than conventional digital hardware such as Nvidia GPUs. Instead of relying on digital parallelism, computation is performed through optical dataflow, enabling extremely high throughput at low energy cost. However, this architecture imposes two critical constraints that current AI models (like Vision Transformers) are not natively designed for: Streaming Computation Data flows continuously through the processor, with limited random access to past activations or memory. Architectures such as State-Space Models (e.g., Mamba) are therefore better aligned than architectures relying on full attention and key–value caches, such as Transformers. Extreme Low Precision (Int4) + Analog Noise To maximize optical efficiency, weights are represented in 4-bit integer format, and computations incur analog noise and non-ideal signal propagation. Many current deep learning models assume FP32/BF16 precision and can degrade significantly under such constraints. We are seeking Master students to explore the algorithmic, software, and hardware co-design challenges associated with this architecture. Given the novelty of running state-space models on optical hardware, these projects have strong potential to lead to publishable research. Project 1: The Algorithmic Research Path Title: Quantizing Video Mamba: Robust Streaming Vision on Low-Precision Optical Hardware State-of-the-art Video Mamba models are trained in high precision (FP32/BF16). If we compress their weights to Int4 and inject analog noise (simulating optical physics), the model's recurrent state may "drift," causing hallucinations that worsen over time. Core Objectives: Establish the Baseline: Configure VideoMamba (or similar) model for "Streaming Inference," processing video one frame at a time using purely recurrent states. Build the "Ghost Hardware" Simulator: Implement a custom PyTorch OpticalLiner layer that: Quantizes weights to Int4. Injects Activation Noise. Outlier Suppression: Analyze "Activation Outliers". Implement and evaluate rotation or calibration techniques to smooth these outliers before they hit the optical bottleneck. Deliverable: An "Optical-friendly VideoMamba" model that maintains performance comparable to the FP16 baseline on standard benchmarks while running on simulated 4-bit optical constraints.
Robust Semantic Uncertainty Estimation in Open-Ended Text Generation	Philip Torr			C	MSc	Semantic Entropy (SE) [1] provides a principled approach to quantifying uncertainty in open-ended language generation by assessing the variability in generated responses at a semantic level, rather than relying solely on token-level confidence scores. By clustering model outputs based on their semantic similarity, SE captures uncertainty in a way that reflects meaningful differences between responses. However, despite its promise, SE relies on ad hoc methodological choices, particularly in how it defines semantic similarity and clusters responses. A major limitation is the use of Natural Language Inference (NLI) models to assess textual semantic similarity, which can fail in cases involving long-form responses, nuanced context dependencies, or domain-specific knowledge. These issues introduce noise into SE-based uncertainty estimation, potentially leading to unreliable or inconsistent confidence assessments. A follow-up work, Kernel Language Entropy (KLE) [2], addresses some of these issues by replacing hard clustering with a kernel-based similarity function, providing more fine-grained uncertainty estimates using the von Neumann entropy. While KLE improves over SE by considering pairwise semantic dependencies, it still inherits certain methodological limitations—including sensitivity to kernel choice and reliance on similarity metrics that may not generalize well across diverse natural language generation tasks. This project aims to further refine the methodology behind Semantic Entropy and its successors by addressing these open challenges. Specifically, we seek to improve how semantic similarity is measured, reduce reliance on short-text-based entailment models, and develop uncertainty quantification techniques that generalize across a wider range of tasks, including long-context and domain-specific text generation. By investigating more robust and scalable approaches, this work will contribute to more reliable semantic uncertainty estimation in language models, ultimately improving their trustworthiness in safety-critical applications. [1] Kuhn, Lorenz, Yarin Gal, and Sebastian Farquhar. "Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation." The Eleventh International Conference on Learning Representations. [2] Nikitin, Alexander, et al. "Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities." arXiv preprint arXiv:2405.20003 (2024).
Safeguarding agents against decomposition attacks	Philip Torr, Adel Bibi, James Oldfield				MSc	Safeguarding agents against decomposition attacks A number of recent techniques have been introduced for efficiently monitoring large language models (LLMs) for harmful behavior--monitoring model generations, their internal states, or the combination of the two (McKenzie et al, 2025). However, much less effort has been paid to monitoring across multi-turn interactions (Jaipersaud et al, 2025) or for very long contexts. Harmful instructions may be strategically spread across multiple requests, each of which look benign in isolation* (Yueh-Han et al, 2025). This project aims to build more reliable methods for defending against the so-called decomposition attacks, building stronger guardrails against the changing attack landscape. This project will likely be joint with collaborators from Microsoft as an industry partner.
The AI Historian – Teaching Machines to Understand the Past	Philip Torr		B	C	MSc	Background Reading: https://www.engineegroup.com/tcsit/article/view/TCSIT-7-148 https://proceedings.neurips.cc/paper_files/paper/2024/hash/38cc5cba8e513547b96bc326e25610dc-Abstract-Datasets_and_Benchmarks_Track.html https://openreview.net/pdf?id=EO8mTLqDuT https://openreview.net/pdf?id=EDWTHMVOCj https://www.amazon.co.uk/Causality-Judea-Pearl/dp/052189560X How can we build an artificial intelligence that thinks like a historian? This project will develop AI systems that can interpret historical evidence — texts, maps, artefacts, and data — to test causal explanations for how societies evolve, collapse, and endure. Students will work at the intersection of machine learning, causal inference, simulation, and digital history, developing methods that let AI reason about events, uncertainty, and human behaviour across time. Collaborations with anthropologists and historians at Oxford (including the Seshat and CSSC teams) offer unique access to structured historical datasets. Ideal applicants have strong technical skills in AI/ML or simulation and curiosity about applying them to the grand questions of human history.
The Early Modern Text Lab	Philip Torr		B	C	MSc	The Early Modern Text Lab is a proposed new digital humanities platform that uses AI to turn large quantities of historical texts into richly structured, queryable data. At its core is a tagging engine that does two things simultaneously: (1) it recognizes and applies an existing controlled vocabulary of people, places, commodities, events, legal roles, and conceptual categories, and (2) it identifies new words, phrases, and concepts that ought to be tagged but are not yet part of the vocabulary. Every upload triggers this dual process—AI applies known tags with contextual sensitivity (handling variant spellings, sense distinctions, and case-specific roles), while also proposing new entries that the scholar can approve or reject. Over time, the system becomes increasingly intelligent: its vocabulary expands, its tagging accuracy improves, and its sense-disambiguation and entity-matching become more precise. Built on top of this growing layer of structured annotations is an environment for serious historical analysis. Once documents are tagged, the Lab allows scholars to query their corpus in sophisticated ways: finding all events of a particular type, mapping relationships among people and places, tracing changes over time, or identifying patterns that would be invisible in raw text and impossible for a human to identify without AI assistance. A student taking on this project would design and implement the core platform—AI tagging and suggestion pipelines, human-in-the-loop review workflows, metadata handling, and a knowledge-graph backend—creating a tool that lets historians do at scale what is extremely difficult to do now: ask complex, data-driven research questions directly of thousands of early modern documents.
ARETHA: A Transparent and Respectful Virtual Assistant for the Home	Max Van Kleek	Human Centred Computing	B	C	MSc	Virtual assistants like Alexa and Siri have become hugely popular conveniences in homes worldwide. However, they also pose unprecedented privacy risks by being black-box vehicles of corporate surveillance by the likes of Amazon, Apple and others. Is it possible to design a virtual assistant that can be guaranteed to protect user privacy, and be fully transparent and controllable by end-users? This project will involve designing and developing a fully modular, open source virtual assistant called ARETHA from the ground up, using open source components, which can provide strong guarantees about being prirvacy-preserving and pro-user.
Privopticon: A privacy-preserving mesh OS for self-observation	Max Van Kleek	Human Centred Computing	B	C	MSc	Self-monitoring has many potential applications within the home, such as the ability to understand important health and activity rhythms within a household automatically. But such monitoring activities also have associated extreme privacy risks. Can we design new kinds of sensing architectures that are designed to preserve inhabitants' privacy? We have made initial inroads of a new mesh operating system prototype for raspberry π hardware for new classes of privacy-preserving self-monitoring applications for the home, which will provide user-configurable degrees of information fidelity, have built-in forgetting and are accountable by design. We need your help to try and evaluate different kinds of methods for achieving these goals.
3D demos of geometric concepts	Irina Voiculescu	Computational Biology and Health Informatics	B			The Graphics and Geometric Modelling courses (Part B) present several comples concepts which would be best visualised in a suite of applications. A coherent suite of 3D demos could easily become a useful tool for these courses, as well as for users worldwide. This project would be most suitable for a candidate who already have some experience using a 3D graphics library of their choice and want to improve this skill. The mathematical concepts are well-documented.
3D demos of geometric concepts	Irina Voiculescu	Computational Biology and Health Informatics	B			The Geometric Modelling course (Part B) deals with several interesting concepts which would be best visualised in a suite of applications. A coherent suite of 3D demos could easily become a useful tool for this course, as well as for users worldwide. This project would be most suitable for a candidate who already have some experience using a 3D graphics library of their choice and want to improve this skill. The mathematical concepts are well-documented.
3D environment for Hand Physiotherapy	Irina Voiculescu	Computational Biology and Health Informatics	B	C		After hand surgery, it is almost always necessary for patients to have physiotherapy afterwards to help with their recovery. As part of this, the patient will need to perform hand exercises at home. However, the patient may not always do the exercises correctly, or they might forget to do their exercises. The goal of this project is to use the Leap Motion to create a user-friendly GUI which a patient could use to aid them with their home exercises. The interface would show the user where their hand should be and they would then need to follow the movements. It could work from a web-based software or a downloaded software. It would need to be tailored to the patient so it contained their specific required exercises, which could be input by the physiotherapist. It would need to store data on how the patient is doing and feedback this data to the patient, and possibly also to the physiotherapist via the internet. If internet-based, patient confidentiality and security would need to be considered. This project would be performed in close collaboration with a physiotherapist, an orthopaedic hand surgeon, and a post-doctoral researcher based at the Nuffield Orthopaedic Centre.
3D printing medical scan data	Irina Voiculescu	Computational Biology and Health Informatics	B	C	MSc	Computed tomography (CT) scanning is a ubiquitous scanning modality. It produces volumes of data representing internal parts of a human body. Scans are usually output in a standard imaging format (DICOM) and come as a series of axial slices (i.e. slices across the length of the person's body, in planes perpendicular to the imaginary straight line along the person's spine.) The slices most frequently come at a resolution of 512 x 512 voxels, achieving an accuracy of about 0.5 to 1mm of tissue per voxel, and can be viewed and analysed using a variety of tools. The distance between slices is a parameter of the scanning process and is typically much larger, about 5mm. During the analysis of CT data volumes it is often useful to correct for the large spacing between slices. For example when preparing a model for 3D printing, the axial voxels would appear elongated. These could be corrected through an interpolation process along the spinal axis. This project is about the interpolation process, either in the raw data output by the scanner, or in the post-processed data which is being prepared for further analysis or 3D printing. The output models would ideally be files in a format compatible with 3D printing, such as STL. The main aesthetic feature of the output would be measurable as a smoothness factor, parameterisable by the user. Existing DICOM image analysis software designed within the Spatial Reasoning Group at Oxford is available to use as part of the project.
3D stereo display of medical scan data	Irina Voiculescu, Stuart Golodetz	Computational Biology and Health Informatics, Systems	B	C		The Medical Imaging research group has been working with a variety of data sourced from CT and MRI scans. This data comes in collections of (generally greyscale) slices which together make up 3D images. Our group has developed software to generate 3D models of the major organs in these images. This project aims to develop a simple augmented reality simulation for the Oculus Rift which will render these organs within a transparent model of a human and allow the user to walk around the model so as to view the organs from any angle. This has a number of possible applications, including to train medical students and to help surgeons to explain medical procedures to their patients.
Different pretraining/finetuning strategies and how they impact calibration and uncertainty	Irina Voiculescu	Computational Biology and Health Informatics	B	C	MSc	Medical data acquired in various modalities (CT, MRI, photograph) and of various anatomical parts is used in clinical decision making. Increasingly, machine learning methods are used in classification or segmentation tasks. Yet neural networks are known to be miscalibrated and often provide overconfident uncertainty estimates. The goal of this project is to evaluate the impact of different pretraining strategies (e.g., contrastive learning, self-supervised learning) and different fine-tuning strategies (e.g., data augmentation, test-time augmentation, label smoothing) on model calibration.
Exact Algorithms for Complex Root Isolation	Irina Voiculescu	Computational Biology and Health Informatics	B	C	MSc	Not available in 2013/14 Isolating the complex roots of a polynomial can be achieved using subdivision algorithms. Traditional Newton methods can be applied in conjunction with interval arithmetic. Previous work (jointly with Prof Chee Yap and MSc student Narayan Kamath) has compared the performance of three operators: Moore's, Krawczyk's and Hansen-Sengupta's. This work makes extensive use of the CORE library, which is is a collection of C++ classes for exact computation with algebraic real numbers and arbitrary precision arithmetic. CORE defines multiple levels of operation over which a program can be compiled and executed. Each of these levels provide stronger guarantees on exactness, traded against efficiency. Further extensions of this work can include (and are not limited to): (1) Extending the range of applicability of the algorithm at CORE's Level 1; (2) Making an automatic transition from CORE's Level 1 to the more detailed Level 2 when extra precision becomes necessary; (3) Designing efficiency optimisations to the current approach (such as confirming a single root or analysing areas potentially not containing a root with a view to discarding them earlier in the process); (4) Tackling the isolation problem using a continued fraction approach. The code has been included and is available within the CORE repository. Future work can continue to be carried out in consultation with Prof Yap at NYU.
Gesture recognition using Leap Motion	Irina Voiculescu	Computational Biology and Health Informatics	B	C	MSc	Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia - a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computer-based tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users being asked to carry out static gestures above a Leap Motion sensor, and these gestures being scored according to a variety of criteria. A prototype has been constructed to gather data, and some data has been gathered from a few controls and patients. In order to deploy this as a clinical tool into the NHS, there is need for a systematic data collection and analysis tool, based on machine learning algorithms to help classify the data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia. Also, the graphical user interface needs to be extended to give particular kinds of feedback to the patient in the form of home exercises, as part of a rehabilitation programme. This project was originally set up in collaboration with Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new co-supervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016.
Identifying features in MRI scan data	Irina Voiculescu	Computational Biology and Health Informatics	B	C	MSc	In recent years, medical diagnosis using a variety of scanning modalities has become quasi-universal and has brought about the need for computer analysis of digital scans. Members of the Spatial Reasoning research group have developed image processing software for CT (tomography) scan data. The program partitions (segments) images into regions with similar properties. These images are then analysed further so that particular features (such as bones, organs or blood vessels) can be segmented out. The team's research continues to deal with each of these two separate meanings of medical image segmentation. The existing software is written in C++ and features carefully-crafted and well-documented data structures and algorithms for image manipulation. In certain areas of surgery (e.g. orthopaedic surgery involving hip and knee joint) the magnetic resonance scanning modality (MRI) is preferred, both because of its safety (no radiation involved) and because of its increased visualisation potential. This project is about converting MRI scan data into a format that can become compatible with existing segmentation algorithms. The data input would need to be integrated into the group's analysis software in order then to carry out 3D reconstructions and other measurements. This project is co-supervised by Professor David Murray MA, MD, FRCS (Orth), Consultant Orthopaedic Surgeon at the Nuffield Orthopaedic Centre and the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), and by Mr Hemant Pandit MBBS, MS (Orth), DNB (Orth), FRCS (Orth), DPhil (Oxon)Orthopaedic Surgeon / Honorary Senior Clinical Lecturer, Oxford Orthopaedic Engineering Centre (OOEC), NDORMS.
Reinforcement learning techniques for games	Irina Voiculescu	Computational Biology and Health Informatics	B	C		This project is already taken for 2017-2018 Psychology has inspired and informed a number of machine learning methods. Decisions within an algorithm can be made so as to improve an overall aim of maximising a (cumulative) reward. Supervised learning methods in this class are known as Reinforcement Learning. A basic reinforcement learning model consists of establishing a number of environment states, a set of valid actions, and rules for transitioning between states. Applying this model to the rules of a board game means that the machine can be made to learn how to play a simple board game by playing a large number of games against itself. The goal of this project is to set up a reinforcement learning environment for a simple board game with a discrete set of states (such as Backgammon). If time permits, this will be extended to a simple geometric game (such as Pong) where the states may have to be parameterised in terms of geometric actions to be taken at each stage in the game.
Simple drawing analysis	Irina Voiculescu	Computational Biology and Health Informatics	B	C	MSc	Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia - a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computer-based tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users drawing simple figures on a tablet, and these figures being scored according to a variety of criteria. Data has already been gathered from 200 or so controls, and is being analysed for a range of parameters in order to assess what a neurotypical person could achieve when drawing such simple figures. Further machine learning analysis could help classify such data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia. This project was originally co-supervised by Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new co-supervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016.
Efficient linear algebra for block structured matrices	Jonathan Whiteley	Computational Biology and Health Informatics	B	C		Many large systems of linear equations are sparse, i.e. if the matrix that describes the linear system is of size N times N, where N is large, there are very few non-zero entries in each row. Under these conditions there may not be enough memory to store the whole matrix, and so only the non-zero entries are stored. This prevents techniques such as LU decomposition being used to solve the linear system; instead an iterative technique such as the conjugate gradient technique for symmetric positive definite matrices, or GMRES for more general matrices is used. The number of iterations needed with these techniques can be large, rendering these techniques inefficient. To prevent this, preconditioning techniques are used - if the linear system is defined by Ax=b, then a preconditioner P is used and the system solved is instead PAx = Pb, where P is cheap to calculate and both PAx and Pb are cheap to evaluate. In this project we will investigate matrices with a block structure that arises in many fields, such as constrained optimisation and continuum mechanics. We will utilise the block structure of these matrices to heuristically derive candidate preconditioners, and compare their performances. Prerequisites: linear algebra, continuous mathematics
Parameter recovery for models described by differential equations	Jonathan Whiteley	Computational Biology and Health Informatics	B	C		Phenomena in many fields are described by differential equations where a quantity of interest (for example, depending on the phenomena modelled, a reaction rate, a capacitance, or a subject's cardiac output) appears as a parameter in the differential equation. We then determine the quantity of interest by choosing parameters in the differential equation so that the computed solution of the differential equation using these parameters matches experimental data as closely as possible. This may be posed as an optimisation problem that may be tackled by either a classical optimisation approach (as seen in the continuous mathematics course) or a Bayesian optimisation approach. The aim of this project is to compare these approaches. Prerequisites: linear algebra, continuous mathematics, probability
Topics in Linear Dynamical Systems	James Worrell	Automated Verification	B	C	MSc	A linear dynamical system is a discrete- or continuous-time system whose dynamics is given by a linear function of the current state. Examples include Markov chains, linear recurrence sequences (such as the Fibonacci sequence), and linear differential equations. This project involves investigating the decidability and complexity of various reachability problems for linear dynamical systems. It would suit a mathematically oriented student. Linear algebra is essential, and number theory or complexity theory is desirable. A relevant paper is Ventsislav Chonev, Joël Ouaknine, James Worrell: The orbit problem in higher dimensions. STOC 2013: 941-950.
An algebraic perspective on the π-calculus	Nobuko Yoshida, Dylan McDermott	Programming Languages		C	MSc	Prerequisites: Some knowledge of programming language semantics Background The semantics of the λ-calculus has been studied for a long time, and this research slowly uncovered some connections with the theory of algebraic structures. This perspective was finally made clear by Hyland [1], who gives an algebraic perspective on the λ-calculus, inspired by some concepts from category theory, and shows that this perspective can be used to show some of the most important theorems of the λ-calculus. There has been less work on the π-calculus, which is a model of computation designed around concurrent processes. Focus This project aims to take the first steps towards an algebraic perspective on the π-calculus, similar to the work that has been done for the λ-calculus. The main goal would be to develop a notion of π-theory analogous to Hyland's notion of λ-theory, and to study their basic properties. Extensions could involve using this notion of π-theory to prove important results about the π-calculus. Method The starting point would be Stark's work on models of π-calculus [2]. The notion of π-theory should be an adaptation of the notion of model described by Stark. Once you have a proposed definition, you will be able to start proving some basic properties about it, which would justify that it really is a good notion of π-theory. [1] Martin Hyland, Classical lambda calculus in modern dress. Mathematical Structures in Computer Science, 2015. [2] Ian Stark. Free-algebra models for the π-calculus. Theoretical Computer Science, 2008.
Automated Verification of Multiparty Session Types in Why3	Nobuko Yoshida	Programming Languages		C	MSc	Prerequisites: Concurrency, concurrent programming and functional programming, Lambda Calculus and Types Background Session types are an effective method to control the behaviour of software components that run in distributed systems communicating through message passing[1,2]. Multiparty session types[3,4] provide support for sessions involving multiple participants, thus allowing to represent more expressive scenarios. Focus In this project, we are interested in calculating properties of multiparty session types by using automated deductive verification performed in tools of the OCaml ecosystem relying on Why3 [5,6]. In particular, we are interested in studying functions that compute the behaviour of session type environments, and in verifying these functions automatically, when possible, and interactively when the proof goals require transformations. Method Towards this direction, the student will study related work on multiparty session types and automated deductive verification, and apply the existing methodologies and techniques to the project's setting. The student will attack the problem of implementing the computable functions in tools of the OCaml ecosystem, and of empirically evaluating their behaviour by identifying a test suite of realistic examples of multiparty scenarios. The final goal is to perform the verification of the functions in the Why3 platform, which will eventually rely on both automated constraint solving and proof interaction. References [1] Honda, K.: Types for dyadic interaction. CONCUR 1993. LNCS 715, pp. 509–523. Springer (1993), *https://doi.org/10.1007/3-540-57208-2_35* [2] Honda, K., Vasconcelos, V.T., Kubo, M.: Language primitives and type discipline for structured communication-based programming. ESOP 1998. LNCS 1381, pp. 122–138. Springer (1998). *https://doi.org/10.1007/BFb0053567* [3] Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. POPL 2008. pp. 273–284. ACM (2008). *https://doi.org/10.1145/1328438.1328472* [4] Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. J.ACM 63(1), 9:1–9:67 (2016). *https://doi.org/10.1145/2827695* [5] Filliâtre, J., Paskevich, A.: Why3 - where programs meet provers. ESOP 2013, LNCS 7792, pp. 125–128. Springer (2013). *https://doi.org/10.1007/978-3-642-37036-6_8* [6] Bobot, F., Filliâtre, J., Marché, C., Melquiond, G., Paskevich, A.: The Why3 Platform. Version: 1.7, April 2024. *https:/www.why3.org/doc/index.html*
Compiling Multiparty Session Processes in Go	Nobuko Yoshida	Programming Languages		C	MSc	Abstract Prerequisites: Concurrency, concurrent programming and functional programming. It is ideal if the student took Lambda Calculus and Types Background The GoPi project [1,2] aims at using an high level process language to automatically generated Go code that at runtime (1) does not deadlock on channels declared as linear, and (2) does not enlarge the scope of channels declared as static. The GoPi compiler is a tool implemented in OCaml that relies on constraint solving to perform type inference of the source-level processes, which do not contain type annotations. In particular, channels declared as linear are assigned to linear types[3], and deadlock-freedom is obtained by inferring the order of usage of linear channels. Focus The aim of this project is to embed Multiparty Session Types [4,5] in GoPi, thus allowing to describe more complex scenarios, and to generate code with stronger properties. Method Towards this direction, the student will study related work on Multiparty Session Types and type inference, and apply the existing methodologies and techniques to the GoPi setting. This will involve the development of automatically generated constraints[6] to implement the type inference procedure. The student will attack the problem of deploying techniques to assign Multiparty Session Types to processes, and of implementing it in GoPi. The final objective is to automatically generate deadlock-free Go code that implements well-typed multiparty processes, and to assess to the validity of the implementation by identifying a test suite of realistic multiparty scenarios. References [1] Giunti, M.: The GoPi compiler (June 2019), https://sites.fct.unl.pt/gopi/, Git: *https://github.com/marcogiunti/gopi* [2] Giunti, M.: GoPi: Compiling linear and static channels in Go. COORDINATION 2020. LNCS 12134, pp. 137–152. Springer (2020), *https://doi.org/10.1007/978-3-030-50029-0_9* [3] Kobayashi, N., Pierce, B.C., Turner, D.N.: Linearity and the pi-calculus. ACM Trans. Program. Lang. Syst. 21(5), 914–947 (1999), *https://doi.org/10.1145/330249.330251* [4] Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. POPL 2008. pp. 273–284. ACM (2008). *https://doi.org/10.1145/1328438.1328472* [5] Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. J.ACM 63(1), 9:1–9:67 (2016). *https://doi.org/10.1145/2827695* [6] Barrett, C., Fontaine, P., Tinelli, C.: The SMT-LIB Standard: Version 2.6. Technical report, Department of Computer Science, The University of Iowa (2017)
Complexity of Reachability Problems for Restrictions of Multiparty Session Types	Nobuko Yoshida, Adrian Puerto Aubel	Programming Languages		C	MSc	Prerequisites: Computational Complexity Background Multiparty Session Types (MST) [1] are formal models of asynchronous distributed computation in which local processes exchange messages through communication channels. The fact that communication channels implement unbounded memory entails that these models are as expressive as Turing machines, making all non-trivial properties undecidable. As a consequence, classical problems in the area of distributed computing, such as configuration reachability or coverability have not yet been thoroughly studied. On the other hand, the many variants of Petri nets (PN) [2] have different expressive powers, and there are well-known results regarding the complexity of relevant problems, Focus In this project, you will investigate which restrictions on the MST models make these problems decidable, by studying their complexities either by means of many-to-one reductions between MST and PN , or in the more general framework of well-structured transition systems [3]. The objective of this project is to draw a landscape of complexity results relating MST with two classes of PN, namely P/T and elementary systems. You should address both upper and lower complexity bounds, which will determine the direction of the reductions, and formally define the subclasses of MST models to which these reductions apply. An outstanding outcome of this project would integrate these results in the theory of well-structured transition systems and also extend to analyses of global types [4]. References: [1] Coppo, Mario, Mariangiola Dezani-Ciancaglini, Luca Padovani, and Nobuko Yoshida. "A Gentle Introduction to Multiparty Asynchronous Session Types". In Formal Methods for Multicore Programming, edited by Marco Bernardo and Einar Broch Johnsen, 9104:146–78. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015. https://doi.org/10.1007/978-3-319-18941-3_4. [2] T. Murata, "Petri nets: Properties, analysis and applications," in Proceedings of the IEEE, vol. 77, no. 4, pp. 541-580, April 1989, doi: 10.1109/5.24143. [3] Alain Finkel and Philippe Schnoebelen, "Well-Structured Transition Systems Everywhere!", Theoretical Computer Science 256(1–2), pages 63–92, 2001. [4] Thien Udomsrirungruang, Nobuko Yoshida: Top-Down or Bottom-Up? Complexity Analyses of Synchronous Multiparty Session Types. POPL 2025 Essential goals: To describe reductions from the coverability, and/or the reachability problems between two classes of PN models (P/T and elementary systems) and restrictions of Multiparty Session Types. To formally define the restrictions of MST for which these reductions make sense. Stretch-goals: To integrate and generalise these results according to the theory of well-structured transition systems.
Consolidation in Quantum Concurrent Processes	Nobuko Yoshida, Atalay Ileri	Programming Languages		C	MSc	Prerequisites: Recommended courses include Distributed Processes, Types and Programming (for MSC), Lambda calculus and types, and (Quantum Information or Quantum Processes and Computation). Familiarity with functional programming, Coq, and dependent types would be advantageous. Background CQP and other process calculi [1, 2, 3] are presented to define quantum protocols among two or more participants that can run quantum operations and send and receive qubits. These calculi require each participant to have local access to a quantum computer to carry out the protocol. However, since quantum computers are prohibitively expensive and not commercially available, the mentioned requirement limits their applicability. Focus In this project, you will aid the design of an algorithm that consolidates all quantum operations in a single process while preserving the semantics of the protocol. In addition, you will implement a quantum process calculi, the transformation procedure, and prove the semantic preservation of the transformation in Coq Proof Assistant [4], a dependently typed interactive proof assistant. Method Essential goal: A protocol transformation algorithm with mechanized semantic preservation proofs. Stretch-goals: - Mechanized implementation of a process calculus and its semantics. - Implementation of the transformation algorithm. - Mechanized proofs of semantic preservation. References [1] Simon J. Gay and Rajagopal Nagarajan. 2006. Types and typechecking for Communicating Quantum Processes. Mathematical. Structures in Comp. Sci. 16, 3 (June 2006), 375–406. https://doi.org/10.1017/S0960129506005263 [2] Mingsheng Ying, Yuan Feng, Runyao Duan, and Zhengfeng Ji. 2009. An algebra of quantum processes. ACM Trans. Comput. Logic 10, 3, Article 19 (April 2009), 36 pages. https://doi.org/10.1145/1507244.1507249 [3] Lorenzo Ceragioli, Fabio Gadducci, Giuseppe Lomurno, and Gabriele Tedeschi. 2024. Quantum Bisimilarity via Barbs and Contexts: Curbing the Power of Non-deterministic Observers. Proc. ACM Program. Lang. 8, POPL, Article 43 (January 2024), 29 pages. https://doi.org/10.1145/3632885 [4] https://coq.inria.fr/
Distributed Programming with Distributed Protocols in Scala	Nobuko Yoshida	Programming Languages	B	C	MSc	Session type systems facilitate the development of concurrent and distributed software by enabling programmers to describe communication protocols in types. Not only does this aid understanding and correctness of communicating code, it can guarantee desirable behavioural properties (e.g. deadlock-freedom). Concurrency libraries, e.g. Effpi (Scala), enable the use of session types in real-world programming languages and systems. The goal of this project is to understand basic of session types, to implement distributed protocols in Scala and investigate its usability. Reference: [1] Session Types https://dl.acm.org/doi/pdf/10.1145/2873052 [2] http://dx.doi.org/10.1145/3314221.3322484
Enhancing Verification of Go's Concurrency Features	Nobuko Yoshida	Programming Languages		C	MSc	Prerequisites: Familiarity with model-checking and behavioural types is helpful, but not required Background The Go programming language has seen widespread adoption in industry due to its efficient blend of systems programming and concurrency. Its concurrency primitives, influenced by process calculi like CCS and CSP, utilise channel-based communication and lightweight threads, offering a unique approach to structuring concurrent software. Concurrency bugs, such as deadlocks and safety violations, are common in Go programs and can lead to crashes, unpredictable behaviour, or resource leaks. There have been efforts to verify concurrency in Go programs using behavioural types and model checking [1, 2, 3]. These approaches model dynamic communication patterns to analyse concurrent behaviour. While effective, challenges remain in handling complex language features, reducing false positives, and improving scalability for real-world applications. Focus This project aims to advance techniques for verifying the correctness of concurrent Go programs, building on existing research in behavioural types and model checking. The project will focus on addressing challenges in analysing Go’s dynamic concurrency structures, such as runtime-determined goroutines and channel usage. Method Key objectives include improving the precision and scalability of verification techniques, extending support for advanced Go concurrency features like recursive goroutine spawning, and developing automated methods for detecting and classifying bugs. Essential goals: Extend verification techniques for advanced Go concurrency features such as barriers and recursive gorouting spawining. Stretch goals: Develop methods for detecting and classifying bugs, and improve the over-approximation methods of program translation.
Extension to Probabilistic Resource-Aware Session Types	Nobuko Yoshida, Joe Paulus	Programming Languages	B			"Probabilistic session [1] types explores how uncertainty and likelihood influence communication protocols in distributed systems. We propose an extension on the session type system in which an abstract notion of success and failure can be attached either to the syntax of session types or to the both the type and process level such as explicit failure terms. The goal of which is to extend the scope of [1] showing that probabilistic analysis of binary session types [2] can be applied. Similarly insight are applied in the behavioural equivalence of probabilistically nondeterministic behaviour as well of implementations of extensions within NomosPro/PRast." [1] Probabilistic Resource-Aware Session Types (acm.org) [2] Probabilistic Analysis of Binary Sessions (dagstuhl.de) Pre-requisites- The student wishes to learn a type theory of concurrency and communication, LCT will be helpful
Formalism of The Go Language	Nobuko Yoshida	Programming Languages	B			Go is a popular statically typed programming language designed and implemented by Google, and used, among others, by Uber, Netflix, and Docker. The recent release of Go 1.18 finally realised the long awaited addition of Generics (parametric polymorphism) to the Go type system. Much academic work has been conducted to ensure that generics are correctly implemented, but there is still more to do [1,2,3,4]. This project aims to give a survey of those four papers, and take one of mini-projects: (1) build some examples using the existing prototype of [1,2,3,4] or (2) survey important features included in Go 1.18 such as type inference and type sets [5] but not included in [1,2,3,4]. [1] https://dl.acm.org/doi/pdf/10.1145/3428217 [2] https://dl.acm.org/doi/10.1007/978-3-030-89051-3_7 [3] https://dl.acm.org/doi/10.1007/978-3-031-16912-0_7 [4] https://arxiv.org/abs/2208.06810v2 [5] https://go.googlesource.com/proposal/+/master/design/43651-type-parameters.md
Implementation of Communication Logic for a Microservice Composition Engine.	Nobuko Yoshida, Adrian Puerto Aubel	Programming Languages	B			Prerequisites: Java Programming, (recommended: Kubernetes) Background Latest trends in distributed computation are being shaped by cloud computing, and the availability of remote third party clusters. Microservice architectures have become the leading paradigm in application design. Microservices are containerised application components that implement the application functionality by exchanging messages according to a restful paradigm. On the other hand, Multiparty Session Types (MST) [1] constitute a specification language for communication protocols that allows for the verification of safety properties of the application (deadlock freeness, liveness, etc...). Focus In this project you will join the development team of an open source microservice composition engine [2] based on Guarded Attribute Grammars (GAGs) [3]. This engine interprets a programming language where microservices take the role of functions, and it relies on an orchestrator (Kubernetes) to automatically deploy the resulting application. Your task will be to implement the communication logic of the engine in java, so that microservices follow the protocols specified as MST. References: [1] Yoshida, Nobuko, and Lorenzo Gheri. "A Very Gentle Introduction to Multiparty Session Types". In Distributed Computing and Internet Technology, edited by Dang Van Hung and Meenakshi D´Souza, 11969:73–93. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-36987-3_5. [2] Joskel Ngoufo Tagueu, Eric Badouel, Adrián Puerto Aubel, Maurice Tchoupé Tchendji. "Lazy Services: A Service Oriented Architecture based on Incremental Computations and Commitments." 2021. https://inria.hal.science/hal-03353118 [3] Éric Badouel, Loïc Hélouët, Georges Edouard Kouamou, and Christophe Morvan."A grammatical approach to data-centric case management in a distributed collaborative environment." In Proceedings of the 30th Annual ACM Symposium on Applied Computing, Salamanca, Spain, April 13-17, 2015, pages 1834–1839. ACM, 2015. Project: https://github.com/Service-BP-Dev-Team/Kubernetes-Reactive-Service Essential goals: correct implementation of the communication logic of the engine in java. Stretch-goals: Understanding the features and potential of the formal model of computation (GAGs) underlying the composition engine.
Mechanisation of Distributed Protocol Specifications	Nobuko Yoshida	Programming Languages		C	MSc	Multiparty session types (MPST) offer a specification and verification framework for concurrency: communicating systems can be safely implemented in a distributed fashion, when well-typed against local session types, provided that such local types are obtained by projection of a single choreographic protocol (global type). Multiple projects are available, please get in touch with nobuko.yoshida@cs.ox.ac.uk for more detail. Possible aims are: (i) exploring and implementing an algorithms for MPST protocol compositionality or (ii) formalising correctness of advanced MPST systems (e.g., featuring merge, subtyping, or delegation). Students with a strong interest in the mechanisation of MPST in proof assistants (Coq, Isabelle, Idris) are very welcome to reach out. [1] Mechanisation Paper https://dl.acm.org/doi/10.1145/3453483.3454041 [2] Multiparty Session Types Paper https://link.springer.com/chapter/10.1007/978-3-030-36987-3_5
Mechanisation of Quantum Concurrent Processes	Nobuko Yoshida, Atalay Ileri	Programming Languages	B			Prerequisites: Recommended courses include Lambda calculus and types, and (Quantum Information or Quantum Processes and Computation). Familiarity with functional programming, Coq, and dependent types would be advantageous. Background CQP and other process calculi [1, 2, 3] are presented to define quantum protocols among two or more participants that can run quantum operations and send and receive qubits. Currently, there is no mechanization of such calculi exists. A mechanized formalization will enable researchers to prove different protocols’ properties rigorously. Focus In this project, you will implement a quantum process calculus and its semantics in Coq Proof Assistant [4], a dependently typed interactive proof assistant. Method Essential goal: A mechanized implementation of the process calculus and its semantics. References [1] Simon J. Gay and Rajagopal Nagarajan. 2006. Types and typechecking for Communicating Quantum Processes. Mathematical. Structures in Comp. Sci. 16, 3 (June 2006), 375–406. https://doi.org/10.1017/S0960129506005263 [2] Mingsheng Ying, Yuan Feng, Runyao Duan, and Zhengfeng Ji. 2009. An algebra of quantum processes. ACM Trans. Comput. Logic 10, 3, Article 19 (April 2009), 36 pages. https://doi.org/10.1145/1507244.1507249 [3] Lorenzo Ceragioli, Fabio Gadducci, Giuseppe Lomurno, and Gabriele Tedeschi. 2024. Quantum Bisimilarity via Barbs and Contexts: Curbing the Power of Non-deterministic Observers. Proc. ACM Program. Lang. 8, POPL, Article 43 (January 2024), 29 pages. https://doi.org/10.1145/3632885 [4] https://coq.inria.fr/
Message passing with effect handlers	Nobuko Yoshida, Dylan McDermott	Programming Languages	B			Prerequisites: Some functional programming experience Background Effect handlers are a programming construct designed to enable programmers to implement computational effects (e.g. raising an exception, mutating state) in a modular way. The main application so far has been to concurrent programming. Effect handlers have now been added to the OCaml language [1], there has been recent work on effect handlers for WebAssembly [2] and for C++ [3], and there are various research languages based around them (e.g. Eff [4], Effekt [5], and Koka [6]). Focus This project aims to explore the use of effect handlers to implement message-passing concurrency, where threads can send messages to each other. The primary goal would be to implement handlers for message-passing, and to evaluate them using some examples of message-passing programs. Extensions could include exploring session types with handlers, or looking at the formal semantics. Method You would take some existing language with effect handlers (e.g. OCaml, or one of the research languages mentioned above), and implement some effect handlers with operations for sending and receiving messages, along with some basic concurrency primitives. These languages all have examples of concurrency primitives using effect handlers, which you can use as a starting point. You would then implement some examples of concurrent message-passing programs and explore some improvements to your effect handler implementation. The examples would serve as a way to evaluate your implementations. [1] https://ocaml.org/manual/5.2/effects.html [2] https://wasmfx.dev/ [3] https://github.com/maciejpirog/cpp-effects [4] https://www.eff-lang.org/ [5] https://effekt-lang.org/ [6] https://koka-lang.github.io/
Model Checking Probabilistic Bisimulation in PRISM	Nobuko Yoshida, Joe Paulus	Programming Languages		C	MSc	Prerequisites: Concurrency and verification Background PRISM [1] is a probabilistic model checking tool that enables the formal modelling and verification of systems that exhibit probabilistic behaviours. It is widely regarded as an influential tool (2016 HVC Award at the Haifa Verification Conference) in the domain of verification, particularly for stochastic systems, due to its powerful modelling capabilities and efficient algorithms for analysing probabilistic properties. The PRISM language allows users to specify states, transitions, and probabilistic choices, as well as properties that are to be verified, such as safety, liveness, and performance. Properties are expressed in probabilistic temporal logics such as PCTL (Probabilistic Computation Tree Logic) and LTL (Linear Temporal Logic). There is much work on probabilistic bisimularity that has been applied to both Markov decision processes and probabilistic automata. Probabilistic bisimulation is an extension of traditional bisimulation. While the concept itself is theoretically elegant, computing and reasoning about probabilistic bisimulation is significantly more challenging than its non-probabilistic counterpart due to the need to compare probability distributions over transitions, rather than simply matching states. Hence algorithms involved are computationally intensive, making probabilistic bisimulation harder to compute and verify. Focus PRISM-games [2] allows for the verification of probabilistic systems that can incorporate competitive or collaborative behaviour, modelled as stochastic multi-player games. There is a well-studied relation between bisimulation and game semantics. Our goal is to use PRISM-games to model check probabilistic bisimulation. Further work in benchmarking efficiency and expressing multiple probabilistic definitions of bisimulations and evaluating them both on expressivity and implementation. Then it will be extended to multiparty session types. Method There is a wealth of research on this topic. For example, [3] gives a definition for weak probabilistic bisimulation of labelled concurrent markov chains along with a method on deciding if two systems are bisimilar. Similarly [4] gives similar results in the field of probabilistic automata. We will focus on the former, first modelling Labelled Concurrent Markov Chain (LCMC) can model checking them in PRISM-games. There is much freedom in the project on the paths that can be taken, a clear next step would be to extend this framework from the viewpoint of multiparty process algebras. [1] PRISM - Probabilistic Symbolic Model Checker [2] PRISM-games [3] Weak Bisimulation for Probabilistic Systems \| SpringerLink [4] Language Equivalence for Probabilistic Automata \| SpringerLink
Model Checking Probabilistic Bisimulation in PRISM	Nobuko Yoshida, Joe Paulus	Programming Languages	B			Prerequisites: Concurrency, Verification Background PRISM [1] is a probabilistic model checking tool that enables the formal modelling and verification of systems that exhibit probabilistic behaviours. It is widely regarded as an influential tool (2016 HVC Award at the Haifa Verification Conference) in the domain of verification, particularly for stochastic systems, due to its powerful modelling capabilities and efficient algorithms for analysing probabilistic properties. The PRISM language allows users to specify states, transitions, and probabilistic choices, as well as properties that are to be verified, such as safety, liveness, and performance. Properties are expressed in probabilistic temporal logics such as PCTL (Probabilistic Computation Tree Logic) and LTL (Linear Temporal Logic). There is much work on probabilistic bisimularity that has been applied to both Markov decision processes and probabilistic automata. Probabilistic bisimulation is an extension of traditional bisimulation. While the concept itself is theoretically elegant, computing and reasoning about probabilistic bisimulation is significantly more challenging than its non-probabilistic counterpart due to the need to compare probability distributions over transitions, rather than simply matching states. Hence algorithms involved are computationally intensive, making probabilistic bisimulation harder to compute and verify. Focus PRISM-games [2] allows for the verification of probabilistic systems that can incorporate competitive or collaborative behaviour, modelled as stochastic multi-player games. There is a well-studied relation between bisimulation and game semantics. Our goal is to use PRISM-games to model check probabilistic bisimulation. Further work in benchmarking efficiency and expressing multiple probabilistic definitions of bisimulations and evaluating them both on expressivity and implementation. Method There is a wealth of research on this topic. For example, [3] gives a definition for weak probabilistic bisimulation of labelled concurrent markov chains along with a method on deciding if two systems are bisimilar. Similarly [4] gives similar results in the field of probabilistic automata. We will focus on the former, first modelling Labelled Concurrent Markov Chain (LCMC) can model checking them in PRISM-games. [1] PRISM - Probabilistic Symbolic Model Checker [2] PRISM-games [3] Weak Bisimulation for Probabilistic Systems \| SpringerLink [4] Language Equivalence for Probabilistic Automata \| SpringerLink
Model-checking Timed Session Types	Nobuko Yoshida	Programming Languages	B	C	MSc	This project lies in the intersetion of model-checking, verification, timed systems, and session types, It aims to tackle the critical challenge of ensuring deadlock-freedom in distributed systems employing timed session types [1, 2] and the application of timed model-checking tools. Timed session types provide a formalism for specifying communication protocols with temporal constraints, while deadlocks pose a persistent threat to system stability. Drawing on established time model-checking tools [3], our objective is to systematically analyse the specified models to detect and scrutinise potential deadlocks. The project also endeavours to build Timed Session Types using a bottom-up approach, and then verify the type-level properties by using a model checker, as in [4]. This research aspires to augment the reliability and correctness of concurrent and distributed systems, contributing to the progression of formal verification techniques. [1] http://mrg.doc.ic.ac.uk/publications/timed-multiparty-session-types/CONCUR14.pdf [2] https://mrg.cs.ox.ac.uk/publications/meeting-deadlines-together/ [3] https://uppaal.org/features/ (UPPAAL model checker) [4] https://dl.acm.org/doi/10.1145/3290343 Using a model-checker to verify the type-level properties for timed binary sessions, along with a literature survey on timed systems with extensions into deadlock-freedom checking could be a mini-project of shorter duration. Pre-requisites: A student who wishes to learn model checking is welcome. Familiarity with the basic automata theory. Familiarity with the model checking tools will be helpful but not required.
Probabilistic Bisimulation in Concurrent Protocols	Nobuko Yoshida, Joe Paulus	Programming Languages		C	MSc	Prerequisites: Concurrency Background In probabilistic concurrent process algebras typically probabilistic choice is expressed in one of two ways, the first being attaching a probability value to an explicit nondeterministic choice operator, this can be seen as the process flipping a coin with some bias and behaving dependant on the outputted result such as in [1] (defined on binary session types) where a flip operator is given. Here the focus is on reasoning about expected resource consumption on a system being inferred at type checking. Another implementation can be seen in [2] where an explicit nondeterministic choice is annotated with a probability. Similarly, session types are annotated with probabilities (internal and external choice) allowing for extended reasoning on termination derived from typing. The second formulation has been to attach probabilities to selection behaviours such as in [3] which reasons about probabilistic session types from the top-down perspective. The system allows for both probabilistic choices made internally by a process as well as nondeterministic choices which is made externally. There is much work on probabilistic bisimilarity that has been applied to both Markov decision processes and probabilistic automata. Probabilistic bisimulation is an extension of traditional bisimulation. While the concept itself is theoretically elegant, computing and reasoning about probabilistic bisimulation is significantly more challenging than its non-probabilistic counterpart due to the need to compare probability distributions over transitions, rather than simply matching states. Hence algorithms involved are computationally intensive, making probabilistic bisimulation harder to compute and verify. Focus We want to reason about behaviour equivalences between probabilistic multiparty protocols. The goal of this project it to define what it means for global protocols to be probabilistically bisimular to each other and relate it with existing work. Method There is a wealth of research on this topic. For example, [4] gives a definition for weak probabilistic bisimulation of labelled concurrent Markov chains along with a method on deciding if two systems are bisimular. Similarly [5] gives similar results in the field of probabilistic automata. Simularly [3] gives a method of expressing global protocols. Combining these two techniques we aim to give a satisfactory definition and evaluate its relative expressiveness. [1] Probabilistic Resource-Aware Session Types [2] Probabilistic Analysis of Binary Sessions [3] 1909.01748v1 [4] Weak Bisimulation for Probabilistic Systems \| SpringerLink [5] Language Equivalence for Probabilistic Automata \| SpringerLink
Probabilistic Bisimulation in Concurrent Protocols	Nobuko Yoshida, Joe Paulus	Programming Languages	B			Prerequisites: Concurrency Background In probabilistic concurrent process algebras typically, probabilistic choice is expressed in one of two ways, the first being attaching a probability value to an explicit nondeterministic choice operator, this can be seen as the process flipping a coin with some bias and behaving dependant on the outputted result such as in [1] (defined on binary session types) where a flip operator is given. Here the focus is on reasoning about expected resource consumption on a system being inferred at type checking. Another implementation can be seen in [2] where an explicit nondeterministic choice is annotated with a probability. Similarly session types are annotated with probabilities (internal and external choice) allowing for extended reasoning on termination derived from typing. The second formulation has been to attach probabilities to selection behaviours such as in [3] which reasons about probabilistic session types from the top-down perspective. The system allows for both probabilistic choices made internally by a process as well as nondeterministic choices which is made externally. There is much work on probabilistic bisimularity that has been applied to both Markov decision processes and probabilistic automata. Probabilistic bisimulation is an extension of traditional bisimulation. While the concept itself is theoretically elegant, computing and reasoning about probabilistic bisimulation is significantly more challenging than its non-probabilistic counterpart due to the need to compare probability distributions over transitions, rather than simply matching states. Hence algorithms involved are computationally intensive, making probabilistic bisimulation harder to compute and verify. Focus We want to reason about behaviour equivalences between probabilistic binary session typed processes. Method There is a wealth of research on this topic. For example, [4] gives a definition for weak probabilistic bisimulation of labelled concurrent markov chains along with a method on deciding if two systems are bisimilar. Similarly [5] gives similar results in the field of probabilistic automata. Similarly [3] gives a method of expressing global protocols. Combining these two techniques we aim to give a satisfactory definition and evaluate its relative expressiveness. [1] Probabilistic Resource-Aware Session Types [2] Probabilistic Analysis of Binary Sessions [3] 1909.01748v1 [4] Weak Bisimulation for Probabilistic Systems \| SpringerLink [5] Language Equivalence for Probabilistic Automata \| SpringerLink
Probabilistic Session Types: semantics and tool development	Nobuko Yoshida, Joe Paulus	Programming Languages		C	MSc	Probabilistic session [1] types explores how uncertainty and likelihood influence communication protocols in distributed systems. Extending Probabilistic Resource-Aware Session Types allows for many possibilities, they typing is based on the system of DILL [2] ( a session type system with a Curry Howard isomorphism with linear logic) and can be extended with parallel and restriction similarly alternate rules can be derived from the classical viewpoint [3] expanding typeability. This project is be a mix of both theory and practice where an extension of the implementation NomosPro/PRast. Overall, this extension aims to provide a more comprehensive framework for designing and analysing distributed systems with probabilistic and resource-aware communication protocols. The project also develops bisimulation semantics for concurrent processes. [1] Probabilistic Resource-Aware Session Types (acm.org) [2] concur10.pdf (cmu.edu) [3] propositions-as-sessions.pdf (ed.ac.uk) Pre-requisites: DPTP
Program Transformation of Distributed Protocol Specification	Nobuko Yoshida	Programming Languages		C	MSc	Session types [1] describe patterns of communication in concurrent and distributed systems. Moreover, they guarantee certain desirable behavioural properties, such as deadlock freedom. Although session type libraries exist for a range of programming languages, there is little programmer support for their integration into existing and legacy code. This necessarily raises the level of required expertise to introduce and maintain session typed systems, and can consequently be a source of error. The goal of this project is to develop tools and techniques that can assist programmers in the introduction and manipulation of session types in legacy code. Inspired by similar work for parallelism [2], the project will focus on program transformation techniques, including refactoring [2,3,4], to develop safe and semi-automatic means to not only introduce session types, but modify existing session types in situ. [1] N. Yoshida and L. Gheri: A Very Gentle Introduction to Multiparty Session Types. ICDCIT 2020: 73-93 [2] C. Brown, et al.: Refactoring GrPPI: Generic Refactoring for Generic Parallelism in C++. Int. J. Parallel Program. 48(4): 603-625 (2020) [3] T. Mens and T. Tourwé: A Survey of Software Refactoring. IEEE Trans. Software Eng. 30(2): 126-139 (2004) [4] S. J. Thompson and H. Li: Refactoring tools for functional languages. J. Funct. Program. 23(3): 293-350 (2013) [5] R. N. S. Rowe, et al.: Characterising renaming within OCaml's module system: theory and implementation. PLDI 2019: 950-965
Projecting branches via decision broadcasting in multiparty session types in Rust	Nobuko Yoshida	Programming Languages		C	MSc	Abstract Prerequisites: Concurrency and concurrent programming languages. Familiarity with Rust would be beneficial. Background Include: A brief motivation for why the project is interesting. A summary of the area. Session types [1, 2] provide a formalism to statically verify that parallel programs are free of deadlock and other inconsistencies. Multiparty session types (MPST) are session types involving any number of participants, and describe the interactions of all participants together using global session types. Global types are projected for each participant to obtain a local protocol that the participant must follow. However, if a protocol contains branching, this projection can be undefined. Focus Include: Research topic/question and expected contribution. In this project you will investigate whether it is possible to modify a concurrent protocol that would not otherwise be projectable, to one that is, and establish any performance cost of this. Method Include: References to any papers, libraries or projects which might be used as a starting point. List of goals including which goals are essential to the project and which are stretch-goals. You will modify a protocol such that it is projectable, and implement this modified version. You will benchmark the performance cost of your approach. Further work includes improving on your initial algorithm, and offering proofs that some set of otherwise-unprojectable protocols can be modified to be projected, along with the algorithmic complexity of this modification. You will implement code in Rust. [1] Yoshida, Nobuko, and Lorenzo Gheri. ‘A Very Gentle Introduction to Multiparty Session Types’. In Distributed Computing and Internet Technology, edited by Dang Van Hung and Meenakshi D´Souza, 11969:73–93. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-36987-3_5. [2] Coppo, Mario, Mariangiola Dezani-Ciancaglini, Luca Padovani, and Nobuko Yoshida. ‘A Gentle Introduction to Multiparty Asynchronous Session Types’. In Formal Methods for Multicore Programming, edited by Marco Bernardo and Einar Broch Johnsen, 9104:146–78. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015. https://doi.org/10.1007/978-3-319-18941-3_4. Course A-Z Degrees Online Resources, Handbooks & Library Minerva Examinations Timetables
Rust programming language for communication and distribution.	Nobuko Yoshida	Programming Languages		C		Rust is the most beloved programming language since 2016 according to the annual survey by Stack Overflow. Thanks to its efficiency and memory safety, it is now one of the most popular languages of large-scale concurrent applications such as Servo, a browser engine by Firefox, Stratis, a file system manager for Fedora, and Microsoft Azure. Memory safety is one of the core principles of Rust but does not extend to communication safety, making the implementation of deadlock-free protocols challenging. Our group has been working on implementing library for asynchronous Rust programming and we wish to tackle several challenges such as refinement types, dependent types, and reversible computing with Rust. Such projects can be both theoretical and practical. EXAMPLE (1): Dependent types and asynchronous message reordering In this project, we are interested in developing the dependent type theory for asynchronous message reordering [2] and use Rumpsteak to implement the theory [1]. EXAMPLE (2): Reversible computing In this project, we are interested in using Rust to implement a reversible process calculus. Reversible process calculi are widely studied, in particular for debugging [3]. In this project, we want to explore the usefulness of reversibility as a programming primitive, similar to [4]. Direct applications include fault tolerance (e.g. when merged with affine MPST [5]) or speculative execution. Reference: [1] Zak Cutner, Nobuko Yoshida, Martin Vassor: Deadlock-Free Asynchronous Message Reordering in Rust with Multiparty Session Types. PPoPP '22 : 261 - 246. https://dl.acm.org/doi/10.1145/3503221.3508404 [2] Silvia Ghilezan, Jovanka Pantovic, Ivan Prokic, Alceste Scalas, Nobuko Yoshida: Precise Subtyping for Asynchronous Multiparty Sessions. POPL 2021 : 16:1 - 16:28. https://dl.acm.org/doi/10.1145/3434297 [3] See for instance the CauDEr: https://github.com/mistupv/cauder [4] Controlling Reversibility in Higher-Order Pi, Ivan Lanese, Claudio Antares Mezzina, Alan Schmitt & Jean-Bernard Stefani, https://doi.org/10.1007/978-3-642-23217-6_20 [5] http://mrg.doc.ic.ac.uk/publications/affine-rust-programming-with-multiparty-session-types/
Rust programming language for communication and distribution.	Nobuko Yoshida	Programming Languages	B			Rust is the most beloved programming language since 2016 according to the annual survey by Stack Overflow. Thanks to its efficiency and memory safety, it is now one of the most popular languages of large-scale concurrent applications such as Servo, a browser engine by Firefox, Stratis, a file system manager for Fedora, and Microsoft Azure. Memory safety is one of the core principles of Rust but does not extend to communication safety, making the implementation of deadlock-free protocols challenging. Our group has been working on implementing library for asynchronous Rust programming. The mini-projects focus on either: (1) survey recent work on Rust with session types and implement an example using [1]; or (2) survey recent work on Rust with session types and study a theory of asynchronous message reordering in Rust [2] Reference: [1] Zak Cutner, Nobuko Yoshida, Martin Vassor: Deadlock-Free Asynchronous Message Reordering in Rust with Multiparty Session Types. PPoPP '22 : 261 - 246. https://dl.acm.org/doi/10.1145/3503221.3508404 [2] Silvia Ghilezan, Jovanka Pantovic, Ivan Prokic, Alceste Scalas, Nobuko Yoshida: Precise Subtyping for Asynchronous Multiparty Sessions. POPL 2021 : 16:1 - 16:28. https://dl.acm.org/doi/10.1145/3434297
Session types in scientific computing and machine learning	Nobuko Yoshida	Programming Languages	B			Prerequisites: Scientific Computing or Machine Learning, and Concurrent Programming. Familiarity with Rust would be beneficial. Background Include: A brief motivation for why the project is interesting. A summary of the area. Computationally expensive scientific computing and machine learning workflows often require distributed systems [1]. Such workflows can suffer from concurrency bugs such as deadlock, which can be statically detected with session types [2, 3]. Focus Include: Research topic/question and expected contribution. You will apply session types to scientific computing and machine learning workflows, and discuss the role session types do and could have in this space. Method Include: References to any papers, libraries or projects which might be used as a starting point. List of goals including which goals are essential to the project and which are stretch-goals. You will find scientific computing and/or ML workflows that make non-trivial use of concurrency, and reproduce these in Rust. These can be binary (or two-party protocols) at first. Crucially, you will use existing tooling [4] to ensure soundness in the concurrent communication. Extensions include multiparty workflows and distributed setups. [1] Bekkerman, Ron, Mikhail Bilenko, and John Langford, eds. Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge ; New York: Cambridge University Press, 2012. [2] Yoshida, Nobuko, and Lorenzo Gheri. ‘A Very Gentle Introduction to Multiparty Session Types’. In Distributed Computing and Internet Technology, edited by Dang Van Hung and Meenakshi D´Souza, 11969:73–93. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-36987-3_5. [3] Coppo, Mario, Mariangiola Dezani-Ciancaglini, Luca Padovani, and Nobuko Yoshida. ‘A Gentle Introduction to Multiparty Asynchronous Session Types’. In Formal Methods for Multicore Programming, edited by Marco Bernardo and Einar Broch Johnsen, 9104:146–78. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015. https://doi.org/10.1007/978-3-319-18941-3_4. [4] Cutner, Zak. ‘Rumpsteak’, n.d. https://github.com/zakcutner/rumpsteak.
Survey of Mechanisation of Distributed Protocol Specifications, Session Types	Nobuko Yoshida	Programming Languages	B			Session type systems for distributed and concurrent computations, encompass concepts such as interfaces, communication protocols, and contracts. The session type of a software component specifies its expected patterns of interaction using expressive type languages, so types can be used to determine automatically whether the component interacts correctly with other components. There are several recent work on mechanisation (proof assistants) of session types (Coq, Isabelle, Idris). This project gives an updated survey on mechanisation of session types. [1] Example of mechanisation papers in Coq https://dl.acm.org/doi/10.1145/3453483.3454041 [2] Multiparty Session Types Paper https://link.springer.com/chapter/10.1007/978-3-030-36987-3_5
Survey of Session Types Literature	Nobuko Yoshida	Programming Languages	B			This project produces an updated literature survey of session types. Session type systems for distributed and concurrent computations, encompass concepts such as interfaces, communication protocols, and contracts. The session type of a software component specifies its expected patterns of interaction using expressive type languages, so types can be used to determine automatically whether the component interacts correctly with other components. Session type systems are studied in the context of theory (including automata theory, semantics, linear logic and type theories), programming languages (Java, Scala, Go, Rust, TypeScripts, PureScripts, MPI-C, C, Python, Erlang, F*, F#, Haskell, OCaml, and more) and system applications. There are four surveys in past but they are already outdated. The project produces an updated survey focusing on theory, programming languages and/or applications. [1] Theory https://dl.acm.org/doi/pdf/10.1145/2873052 [2] Tools https://www.awesomebooks.com/book/9788793519824/behavioural-types-from-theory-to-tools-river-publishers-series-in-automation-control-and-robotics [3] Security https://www.sciencedirect.com/science/article/pii/S2352220815000851?via%3Dihub [4] Programming Languages https://doi.org/10.1561/2500000031
The Go Language with Generic Types	Nobuko Yoshida	Programming Languages		C		Go is a popular statically typed programming language desinged and implemented by Google, and used, among others, by Uber, Netflix, and Docker. The recent release of Go 1.18 finally realised the long awaited addition of Generics (parametric polymorphism) to the Go type system. Much academic work has been conducted to ensure that generics are correctly implemented, but there is still more to do [1,2,3,4]. This project aims to further reduce the gap between theory and practice, allowing the Go Team to improve their implementation of generics in Go. Project 1: Formalise (and prove correct) a dictionary-passing translation from Featherweight Generic Go (FGG) to a lambda-calculus with pattern matching or other suitable low-level language. Project 2: The current model of generics in Go (FGG) does not include important features included in Go 1.18 such as type inference and type sets [5]. The proposed project would be to formalise a true FGG (including aforementioned features) along with a correct translation. [1] https://dl.acm.org/doi/pdf/10.1145/3428217 [2] https://dl.acm.org/doi/10.1007/978-3-030-89051-3_7 [3] https://dl.acm.org/doi/10.1007/978-3-031-16912-0_7 [4] https://arxiv.org/abs/2208.06810v2 [5] https://go.googlesource.com/proposal/+/master/design/43651-type-parameters.md
Verified MPI with dependent types	Nobuko Yoshida	Programming Languages		C	MSc	Prerequisites: Concurrent Programming. Lambda calculus and types, and familiarity with dependent and linear types would be advantageous but not required Background Include: A brief motivation for why the project is interesting. A summary of the area. MPI [1, 2] is a widely used message-passing protocol for parallel computing used, for example, in exascale computing. Whilst powerful, MPI can still be used to write programs with deadlock and other concurrency bugs. To prevent this, we can use session types [3, 4] to statically verify parallel programs. Typically, type soundness is established, at least in part, outside the program used to implement concurrency. In this project, we aim to use recent developments in type theory to unify the programming language and concurrency verification. Focus Include: Research topic/question and expected contribution. In this project, you'll expose a concurrency API in Idris that conforms to the MPI protocol, whilst ensuring soundness via Idris’ type system. Idris [5] is a functional programming language with dependent and linear types. Method Include: References to any papers, libraries or projects which might be used as a starting point. List of goals including which goals are essential to the project and which are stretch-goals. We expect all students to create well-typed bindings for MPI, providing a synchronous API for binary sessions, with a basic set of MPI operations (send and recv, for example). We do not expect students to implement concurrency primitives themselves, but rather to create bindings to an existing implementation. Further work could extend this with: multiparty sessions; asynchronous communication; more interesting MPI operations. [1] ‘MPI: A Message-Passing Interface Standard’, 2 November 2023. https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf. [2] Eijkhout, Victor. Parallel Programming in MPI and OpenMP. Vol. 2. The Art of HPC, 2022. https://theartofhpc.com/pcse.html. [3] Yoshida, Nobuko, and Lorenzo Gheri. ‘A Very Gentle Introduction to Multiparty Session Types’. In Distributed Computing and Internet Technology, edited by Dang Van Hung and Meenakshi D´Souza, 11969:73–93. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-36987-3_5. [4] Coppo, Mario, Mariangiola Dezani-Ciancaglini, Luca Padovani, and Nobuko Yoshida. ‘A Gentle Introduction to Multiparty Asynchronous Session Types’. In Formal Methods for Multicore Programming, edited by Marco Bernardo and Einar Broch Johnsen, 9104:146–78. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015. https://doi.org/10.1007/978-3-319-18941-3_4. [5] Brady, Edwin. ‘Idris 2: Quantitative Type Theory in Practice’. arXiv, 1 April 2021. http://arxiv.org/abs/2104.00480.
Verifying Basics of Subtyping for Asynchronous MPST in Coq	Nobuko Yoshida, Burak Ekici	Programming Languages	B			Prerequisites: None Background Asynchronous Multiparty Session Types (MPST) provide a formal framework for specifying and verifying communication protocols in distributed systems that use asynchronous message passing. Unlike synchronous MPST, where communication is blocked until a message is received, the asynchronous model allows messages to be sent and buffered without requiring immediate receipt. Subtyping in asynchronous MPST facilitates optimised programs while maintaining type safety and deadlock freedom. This optimisation is achieved through message reordering, which allows messages to be sent earlier or received later. As a result, a process implementing type T can safely replace one implementing type T', as long as T is a subtype of T'. Focus The goal of this project is to prove several examples of asynchronous subtyping in Coq and establish basic metaproperties such as transitivity and antisymmetry, building on the existing formalisation [1]. Method Since subtyping is defined coinductively, proving these properties requires coinductive reasoning in Coq. To achieve this we employ the parametrissed coinduction technique [2], implemented by the Paco library [3]. [1] Burak Ekici and Nobuko Yoshida. "Completeness of Asynchronous Session Tree Subtyping in Coq," ITP 2024. [2] Chung-Kil Hur, Georg Neis, Derek Dreyer, and Viktor Vafeiadis. "The Power of Parameterization in Coinductive Proof," POPL 2013. [3] https://github.com/snu-sf/paco
Verifying and implementing security protocols in Rust	Nobuko Yoshida	Programming Languages		C	MSc	In this project, our goal is to develop a small working voting software like Helios [1] using Session Types. In the beginning. server will send credentials to all eligible voters before an election. On the election day, all the eligible voters will authenticate themselves and cast their ballot. At this point, the server will encrypt the ballot and will give two options to the voter, either decrypt the ballot --to ensure that server is not cheating-- or cast it. In case of decryption, the voter will again be prompted to enter a new ballot and given two options again, decrypt or cast. A voter can decrypt the ballot as many times as possible, to ensure that server is not cheating, before casting the ballot. Once it's cast, the server will send a confirmation to the voter and post the encrypted ballot to a bulletin board. After the end of the election, the server combines all the ballots homomorphically and decrypts the final tally and declares the winner. Our first goal is to capture a simple version of Helios protocol using Session Types in NuScr (nuScribble) and generate an API, and later fill the API with Rust implementation. This project requires the knowledge of cryptography (basic understanding of ElGamal encryption), NuScibble, and Rust. [1]https://github.com/benadida/helios-server [2]https://www.usenix.org/legacy/events/sec08/tech/full_papers/adida/adida.pdf
Verifying security protocols (Rust)	Nobuko Yoshida	Programming Languages	B			Sigma protocols are a particularly simple and efficient kind of zero-knowledge proof and have seen wide deployment; they remain a leading kind of proof both in terms of simplicity and deployment but recent advances in succinct zero-knowledge proofs offer greater efficiency. The first efficient sigma protocol was introduced by Schnorr [2], several years before the class was defined. You can read more about sigma protocols in chapter 5 of this book [1]. To give you some insight into sigma protocols, we briefly discuss the most famous sigma protocol, the Schnorr protocol [2]. Given some public input (G, g, q, h) where G is a cyclic group of prime order q, and g and h are two generators of the group G, the prover claims that she knows a witness w for the statement h = g^w; the existence of such a w is immediate because g generates the group. However, does the prover know the witness w? In order to convince the verifier, the prover and the verifier do the following: the prover picks a random number u, computes c = g^u, and sends c to the verifier the verifier picks a random challenge e and sends it to the prover the prover computes t = u + e ∗ w and sends t to the verifier The verifier accepts if g^t = c ∗ h^e, otherwise rejects. In this project, the goal is to use session types to capture the communication between the Prove and the Verifier. It is a very simple binary session type with three messages. Our first step will be to encode the Schnorr protocol, described above, in NuScribble, generate Rust API fron NuScribble encoding, and fill the rest of the cryptographic code. Later, we will develop the Parallel composition, AND composition, OR composition, and NEQ composition of sigma protocols (see the chapter 5 of [1]) [1] https://www.win.tue.nl/~berry/2WC13/LectureNotes.pdf [2] https://link.springer.com/article/10.1007/BF00196725
Assessing the impact of recommendation algorithms	Jun Zhao, Isobel Voysey	Human Centred Computing			MSc	Background Currently algorithmic platforms are designed to promote user engagement and addiction. Users’ agency of interacting these platforms is increasingly valued, including being able to opt out personalised advertisement or provide explicit user feedback to control recommendations. However, there is limited evidence showing effective such mechanisms work in practice. The risks of neglecting users' agency in algorithmic designs are exacerbated when much more sophisticated AI models are deployed on these platforms, underpinned by complicated feedback loops or increasingly human-like responses. In this project, we will explore the transparency of recommendation algorithms on leading social media platforms and their ability to align with users feedback, i.e. respecting their user agency. Objectives Design a set of avatars of different demographic features, such as age, gender, or personal interests, in order to create the simulation data for assessing algorithmic impact. Define impact benchmarks, drawing on existing child-centred recommendation algorithm impact benchmarking Produce a reproducible algorithm benchmark dataset and pipeline that others can extend to new models and domains. Methodology Avatar Development: Generate avatars by drawing on existing methodologies and clear research goals Benchmark Design: Operationalize dimensions into computational checks (e.g., factual consistency for accuracy and user satisfaction, harmful manipulation detection for safety, presence of response to user feedback for user autonomy). Algorithm Testing: Collect responses from simulated algorithmic interactions Evaluation: Score outputs using a mix of automated methods and annotations. Expected Contributions A tested avatar-based algorithm auditing methodology A multi-dimensional evaluation framework to assess support for user autonomy and agency Resources for researchers and policymakers working on responsible AI in family and education contexts. References Wood S (2024). Children and Social Media Recommender Systems: How Can Risks and Harms be Effectively Assessed in a Regulatory Context? https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4978809
Human-centred benchmarking of Large Language Models	Jun Zhao	Human Centred Computing			MSc	Large language models (LLMs) are increasingly positioned as personal advisors or companions for many of us. However, their reliability is unclear. For example, their suitability for supporting children’s critical thinking and curiosity, or for supporting parents gaining advice on children’s emotional regulations, sleeping patterns, or screen time management. Evaluating LLMs for such use cases is critical, yet existing benchmarks rarely address multi-dimensional qualities beyond factual correctness. This project aims to create a systematic evaluation framework to assess LLM responses for human-centred scenarios across five key dimensions: accuracy, safety, actionability, empathy/tone, and clarity. Objectives Design a synthetic dataset of realistic LLM in use scenarios (e.g., helping children navigate online safety and friendship, or helping parents discussing data privacy with children). Define evaluation rubrics for accuracy, safety, actionability, empathy/tone, and clarity, drawing on child-centred research and digital literacy guidelines. Benchmark multiple LLMs (e.g., GPT, Claude, LLaMA) on these scenarios. Produce a reproducible evaluation pipeline that others can extend to new models and domains. Methodology Scenario Development: Generate a corpus of LLM prompt dialogues using expert-informed templates. Rubric Design: Operationalize dimensions into computational checks (e.g., factual consistency for accuracy, harmful content detection for safety, presence of concrete steps for actionability, sentiment analysis for empathy, readability metrics for clarity). Model Testing: Collect responses from multiple LLMs to each scenario. Evaluation: Score outputs using a mix of automated methods and rubric-based annotation. Expected Contributions A synthetic benchmark for evaluating LLMs in the real applications A multi-dimensional evaluation framework extending beyond accuracy to include social and communicative qualities. Comparative insights into strengths and weaknesses of LLMs in providing human-centred support Resources for researchers and policymakers working on responsible AI in family and education contexts. References https://arxiv.org/abs/2505.08775 https://arxiv.org/abs/2511.04703
LLMs to support beginner cryptic crossword solvers	Jun Zhao, Isobel Voysey	Human Centred Computing			MSc	Background Cryptic crosswords are a type of crossword puzzle where each clue includes 1) a definition and 2) wordplay by which the solver can check the answer. Surface readings are deliberately misleading and the wordplay involves combining multiple elements, often manipulating the words themselves. Cryptic crosswords remain challenging for LLMs, with fine-tuned LLMs achieving only around 10% success on a dataset taken from common UK newspapers. Recent progress has been made using a formaliser-verifier approach, but success rates still remain less than 50%. Cryptic crosswords are also challenging for beginner solvers – more complex clues rely on esoteric knowledge of abbreviations, synonyms, and terminology. Given these two realities, how could we use LLMs to support beginner solvers? Given the high likelihood of generating incorrect solutions (including correct answers with incorrect reasoning), an LLM is unlikely to be able to provide direct guidance towards the solution, but how could it be integrated into a solving process? How can we verify that a clue is accessible to a beginner solver and support them in the solving experience? Focus We currently have two research questions that could be explored, though students are welcome to propose others: How can LLMs support the solving experience for a beginner, given LLMs do not have the solution a priori? How can LLMs or other approaches be used to verify clues are accessible to beginners? Children are a population of interest Method The project will apply and extend the reasoning-based approach by Andrews and Witteveen (2025). For the first research question, the student would apply this behind the scenes to a multi-turn conversation with a user, combining the user insights into the LLM-generated formalisation and verification at each stage. The main goal would be to show that user insights can correct faulty reasoning and increase solution rates. Stretch goals would include exploring different scaffolding strategies to extract and prompt user insights, given the uncertainty of the LLM-generated solution. For the second research question, the student may extend the verifier to use things like the British National Corpus and other NLP techniques to determine if steps in the wordplay would be familiar to children or beginner solvers (e.g., words in expected vocabulary, two words known to be synonyms (used synonymously in corpus), abbreviations familiar). The main goal of the project would be to apply this verification to the Wordplay dataset (a set of clues and corresponding wordplay and answers) to collate a bank of clues suitable to beginners. Stretch goals would apply the extended verifier to the Cryptonite dataset (only clues and answers, wordplay must be generated) and define some measure of clue complexity, to further categorise the bank of clues. References: Andrews, M., & Witteveen, S. (2025). A Reasoning-Based Approach to Cryptic Crossword Clue Solving. arXiv preprint arXiv:2506.04824. Code: https://github.com/mdda/cryptic-crossword-reasoning-verifier/tree/main
Topics in Algorithms, Complexity, and Combinatorial Optimisation	Standa Živný	Algorithms and Complexity Theory	B	C	MSc	Prof Zivny is willing to supervise in the area of algorithms, complexity, and combinatorial optimisation. In particular, on problems related to convex relaxations (linear and semidefinite programming relaxations), submodular functions, and algorithms for and complexity of homomorphisms problems and Constraint Satisfaction Problems. Examples of supervised projects involve extensions of min-cuts in graphs, analysis of randomised algorithms for graph and hypergraph colourings, and sparsification of graphs and hypergraphs. The projects would suit mathematically oriented students, with interest in rigorous analysis of algorithms and applications of combinatorics and probabilistic methods to computer science.

Student Projects

Important Deadlines are:

Project writing handbook

Sample projects

3rd year

4th year

List of projects

Suitable for

Research themes

Supervisors

Prerequisites: Artificial Intelligence, programming

Background

Focus

Method

Prerequisites: Artificial Intelligence, programming

Background

Focus

Method

Prerequisites: Computational Medicine (recommended)

Abstract

Prerequisites: Computational Medicine (recommended)

Background

Focus

Method

Prerequisites: Computational Medicine (recommended)

Prerequisites: Computational Medicine (recommended)

Abstract

Prerequisites: Computational Medicine (recommended), Deep Learning in Healthcare (recommended)

Prerequisites: Logic and Proof, Functional programming

Background

Focus

Method

Prerequisites: Computer Security or Probability

Background

Focus

Method

Prerequisites: Foundational AI/ML background

Background

Focus

Method

Prerequisites: AI/ML, reinforcement learning, Python

Background

Focus

Method

Prerequisites: Computational Game Theory, Algorithm Design and Analysis, Linear Programming, Python

Background

Focus

Method

Overview

1 Enhancing Forensic Analysis of Program Execution Artefacts

How to apply

2 Multi-Source Correlation for Event Reconstruction in Digital Forensics

How to apply

3 Enhancing Visualisation of Technical Controls and Uncertainty in Cyber Attack Graphs

How to apply

4 Automated CyBoK Alignment of CVs and Module Descriptions for NCSC Certification using Machine Learning

How to apply

5 Comparative Analysis of Cybersecurity Degree Programmes Against CyBoK: Trends in the UK, US, and Beyond

How to apply

Prerequisites: Some knowledge of programming language semantics or verification, functional programming

Background

Focus

References

Student Projects / Supervision

Strategic Cyber Agents

Undetectable Threats

Foundations of Interpretability

Multi-Agent Evaluation Science

Prerequisites:

Background

Focus

Method

References

PicoML

Mini Haskell

What?

Why?

What?

Equational Proofs

Abstract