Skip to main content

Student Projects

Undergraduate students in the third year of the Final Honour School of Computer Science may undertake a project. Fourth years of the Final Honour School of Computer Science, and students for the MSc in Computer Science are required to undertake a project. Mathematics & Computer Science undergraduates are required to undertake a Computer Science project or a Mathematics dissertation in their fourth year. Computer Science & Philosophy undergraduates may choose to undertake a Computer Science project or a Philosophy thesis in their fourth year.

This is your chance to work on your own project, something that interests and inspires you. We have put together a brief guidance document on how to do it; please click on the link below. On this site, you will also find a sortable and searchable list of projects and potential supervisors. Undergraduate students are welcome to choose one of the projects on the list, or approach a potential supervisor and negotiate your own topic. MSc students should discuss with potential supervisors and get agreement.

You will be expected to make arrangements with potential supervisors between weeks 5 and 7 in Hilary Term, for project work for the following year. Please don't contact potential supervisors outside this window.

Limited equipment is available for use in projects, namely LEAP equipment. If you have any requests, please contact the Head of Academic Administration.

Important Deadlines are:

All:
Monday of Week 7, Hilary Term : your online project registration survey must be completed. You will be sent a link.
MSc students:
Monday of Week 1, Trinity Term : submission deadline for your project proposal.
3rd/4th year students:
12pm on Monday of Week 4 of Trinity Term : Submission deadline.
MSc students:
Tuesday of Week -5 (minus 5), Michaelmas Term : Submission deadline

Project writing handbook

Sample projects

3rd year

4th year

List of projects

Suitable for

Research themes

Supervisors

Project Supervisors Themes MSc  Description
Modelling Delayed Labels in Online Continual Learning MSc

Co-supervised by Phillip Torr and Adel Bibi from the Department of Engineering Science.  Online continual learning is the problem of predicting every sample in the stream while simultaneously learning from it. That is to say, the stream first presents data to be predicted and then the stream reveals the labels for the model to train on. However, in most real-case scenarios, labelling is an expensive laborious and time-consuming procedure. Thereof, we seek to study the sensitivity of existing continual learning algorithms when labels of images at step t are only provided at step t + k. This setting poses two challenges. (1) Learning from unlabelled data. (2) Modelling the delayed labels. To that end, we are interested in proposing new algorithms that per time step t can perform self-supervision continually while jointly training on the labelled data revealed from step t-k.

Aggregation of Photovoltaic Panels Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C MSc

The increased relevance of renewable energy sources has modified the behaviour of the electrical grid. Some renewable energy sources affect the network in a distributed manner: whilst each unit has little influence, a large population can have a significant impact on the global network, particularly in the case of synchronised behaviour. This work investigates the behaviour of a large, heterogeneous population of photovoltaic panels connected to the grid. We employ Markov models to represent the aggregated behaviour of the population, while the rest of the network (and its associated consumption) is modelled as a single equivalent generator, accounting for both inertia and frequency regulation. Analysis and simulations of the model show that it is a realistic abstraction, and quantitatively indicate that heterogeneity is necessary to enable the overall network to function in safe conditions and to avoid load shedding. This project will provide extensions of this recent research. In collaboration with an industrial partner.

Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking

Analysis and verification of stochastic hybrid systems Alessandro Abate Automated Verification C MSc

Stochastic Hybrid Systems (SHS) are dynamical models that are employed to characterize the probabilistic evolution of systems with interleaved and interacting continuous and discrete components.

Formal analysis, verification, and optimal control of SHS models represent relevant goals because of their theoretical generality and for their applicability to a wealth of studies in the Sciences and in Engineering.

In a number of practical instances the presence of a discrete number of continuously operating modes (e.g., in fault-tolerant industrial systems), the effect of uncertainty (e.g., in safety-critical air-traffic systems), or both occurrences (e.g., in models of biological entities) advocate the use of a mathematical framework, such as that of SHS, which is structurally predisposed to model such heterogeneous systems.

In this project, we plan to investigate and develop new analysis and verification techniques (e.g., based on abstractions) that are directly applicable to general SHS models, while being computationally scalable.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games

Prerequisites: Familiarity with stochastic processes and formal verification

Automated verification of complex systems in the energy sector Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C MSc

Smart microgrids are small-scale versions of centralized electricity systems, which locally generate, distribute, and regulate the flow of electricity to consumers. Among other advantages, microgrids have shown positive effects over the reliability of distribution networks.

These systems present heterogeneity and complexity coming from 1. local and volatile renewables generation, and 2. the presence of nonlinear dynamics both over continuous and discrete variables. These factors calls for the development of proper quantitative models. This framework provides the opportunity of employing formal methods to verify properties of the microgrid. The goal of the project is in particular to focus on the energy production via renewables, such as photovoltaic panels.

The project can benefit form a paid visit/internship to industrial partners.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games

Prerequisites: Familiarity with stochastic processes and formal verification, whereas no specific knowledge of smart grids is needed.

Bayesian Reinforcement Learning: Robustness and Safe Training Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C In this project we shall build on recent work on ``Safe Learning'' [2], which frames classical RL algorithms to synthesise policies that abide by complex tasks or objectives, whilst training safely (that is, without violating given safety requirements). Tasks/objectives for RL-based synthesis can be goals expressed as logical formulae, and thus be richer than standard reward-based goals. We plan to frame recent work by OXCAV [2] in the context of Bayesian RL, as well as to leverage modern robustness results, as in [3]. We shall pursue both model-based and -free approaches. [2] M. Hasanbeig, A. Abate and D. Kroening, ``Cautious Reinforcement Learning with Logical Constraints,'' AAMAS20, pp. 483-491, 2020. [3] B. Recht, ``A Tour of Reinforcement Learning: The View from Continuous Control,'' Annual Reviews in Control, Vol. 2, 2019.
Formal verification of a software tool for physical and digital components Alessandro Abate Automated Verification MSc

We are interested in working with existing commercial simulation software that is targeted around the modelling and analysis of physical, multi-domain systems. It further encompasses the integration with related software tools, as well as the interfacing with devices and the generation of code. We are interested in enriching the software with formal verification features, envisioning the extension of the tool towards capabilities that might enable the user to raise formal assertions or guarantees on models properties, or to synthesise correct-by-design architectures. Within this long-term plan, this project shall target the formal generation of faults warnings, namely of messages to the user that are related to ``bad (dynamical) behaviours'' or to unwanted ``modelling errors''. The student will be engaged in developing algorithmic solutions towards this goal, while reframing them within a formal and general approach. The project is inter-disciplinary in dealing with hybrid models involving digital and physical quantities, and in connecting the use of formal verification techniques from the computer sciences with more classical analytical tools from control engineering

Courses: Computer-Aided Formal Verification, Software Verification. Prerequisites: Knowledge of basic formal verification.

Model learning and verification Alessandro Abate Artificial Intelligence and Machine Learning, Automated Verification C MSc

This project will explore connections of techniques from machine learning with successful approaches from formal verification. The project has two sides: a theoretical one, and a more practical one: it will be up to the student to emphasise either of the two sides depending on his/her background and/of interests. The theoretical part will develop existing research, for instance in one of the following two inter-disciplinary domain pairs: learning & repair, or reachability analysis & Bayesian inference. On the other hand, a more practical project will apply the above theoretical connections on a simple models setup in the area of robotics and autonomy.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Precise simulations and analysis of aggregated probabilistic models Alessandro Abate Automated Verification C MSc

This project shall investigate a rich research line, recently pursued by a few within the Department of CS, looking at the development of quantitative abstractions of Markovian models. Abstractions come in the form of lumped, aggregated models, which are beneficial in being easier to simulate or to analyse. Key to the novelty of this work, the proposed abstractions are quantitative in that precise error bounds with the original model can be established. As such, whatever can be shown over the abstract model, can be as well formally discussed over the original one.

This project, grounded on existing literature, will pursue (depending on the student's interests) extensions of this recent work, or its implementation as a software tool.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Reinforcement Learning for Space Operations Alessandro Abate, Licio Romao Artificial Intelligence and Machine Learning, Automated Verification C Oxcav has an ongoing collaboration with the European Space Agency (ESA) that involves applying Reinforcement Learning (RL) algorithms to a satellite called OPS-SAT, which has been launched in 2019 and is a flying laboratory that allows ESA’s partners to test and validate new techniques (more information can be found at https://www.esa.int/Enabling_Support/Operations/OPS-SAT). This project aims at designing controllers that will be used to perform nadir pointing and sun-tracking of OPS-SAT, while meeting some specifications (e.g., admissible nadir pointing errors). The focus will be on data-driven methods that leverage available sensors (gyroscopes, GPS, fine sun sensor, magnetometer) and actuators data using a RL architecture to come up with a safe policy that can yield an adequate performance. The main tasks of the project will consist in (1) exploring an ESA platform called MUST to collect all the necessary data and (2) implementing a RL scheme that will be later deployed in the satellite. Throughout the project you will have the opportunity to work with state-of-the-art data-driven techniques that have been developed at Oxcav, under the supervision of Prof. Alessandro Abate and Dr. Licio Romao.
Safe Reinforcement Learning Alessandro Abate Automated Verification C MSc

Reinforcement Learning (RL) is a known architecture for synthesising policies for Markov Decision Processes (MDP). We work on extending this paradigm to the synthesis of ‘safe policies’, or more general of policies such that a linear time property is satisfied. We convert the property into an automaton, then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the automaton. With this reward function, RL synthesises a policy that satisfies the property: as such, the policy synthesis procedure is `constrained' by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP. We evaluate the performance of the algorithm on numerous numerical examples. This project will provide extensions of these novel and recent results.

Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Safety verification for space dynamics via neural-based control barrier functions Alessandro Abate Automated Verification C Barrier functions are Lyapunov-like functions that serve as certificates for the safety verification of dynamical and control models. The OXCAV group has recently worked on the automated and sound synthesis of barrier functions structured as neural nets, with an approach that uses SAT modulo theory. In this project, we shall pursue two objectives: 1. Apply recent results [1] on sound and automated synthesis of barrier certificates on models that are pertinent to the Space domain, such as models for attitude dynamics. Airbus will support this part. 2. Develop new results that extend theory and algorithms to models encompassing uncertainty, such as probabilistic models or models that are adaptive to sensed data. Airbus will support this effort providing environments and models for experiments and testing. [1] A. Abate, D. Ahmed and A. Peruffo, ``Automated Formal Synthesis of Neural Barrier Certificates for Dynamical Models,'' TACAS, To appear, 2021.
Software development for abstractions of stochastic hybrid systems Alessandro Abate Automated Verification C MSc

Stochastic hybrid systems (SHS) are dynamical models for the interaction of continuous and discrete states. The probabilistic evolution of continuous and discrete parts of the system are coupled, which makes analysis and verification of such systems compelling. Among specifications of SHS, probabilistic invariance and reach-avoid have received quite some attention recently. Numerical methods have been developed to compute these two specifications. These methods are mainly based on the state space partitioning and abstraction of SHS by Markov chains, which are optimal in the sense of reduction in abstraction error with minimum number of Markov states.

The goal of the project is to combine codes have been developed for these methods. The student should also design a nice user interface (for the choice of dynamical equations, parameters, and methods, etc.).

Courses: Probabilistic Model Checking, Probability and Computing, Numerical Solution of Differential Equations

Prerequisites: Some familiarity with stochastic processes, working knowledge of MATLAB and C

AI Vulnerability Modelling Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc This project would seek to study AI systems to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. The approach might involve designing test suites using attack graphs and validation in a laboratory setting.
Augmented-Reality Personal Security Solutions Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc

This project seeks to explore the application of augmented-reality technologies to developing more usable personal-security solutions. There is a growing body of work exploring how interactive augmented-reality technologies can help users to maintain awareness of potential threats and thus protect themselves against cyber-attacks. The project will build on existing work to develop and validate the effectiveness of new augmented-reality tools. 

Chatbot Attack and Vulnerability Models Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc As chatbots are increasingly widely used in a growing range of applications, understanding the potential for cyber-attacks to compromise the integrity and confidentiality of the data they output is critical. This project would seek to study chatbot systems to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. The approach might involve designing test suites using attack graphs and validation in a laboratory setting.
Deep Learning Models to Support Computer Network Defence Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc This project would explore the application of deep-learning models such as Large Language Models (LLM) to computer network defence. Deep learning has the potential to improve processes at various defensive stages including identification of assets and risk analysis; detection and mitigation of attacks; and automated response. The project scope is broad enough to allow the student to identify an area of interest.
Designing Cybersecurity Test Suites for Generative AI Systems Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc As generative AI systems are increasingly widely used in a growing range of applications, understanding the potential for cyber-attacks to compromise the integrity and confidentiality of the data they output is critical. This project would aim to develop attack graphs for generative AI systems, and based on this design cybersecurity test suites that facilitate testing the security of generate AI systems against a range of potential attacks. The project might involve using these test suites to test the security of a range of generative AI implementations in a laboratory setting.
Generating Realistic Cybersecurity Datasets and Testbeds Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc A common challenge in the cybersecurity research community is the limited availability of cybersecurity data: while there are some real data captures made available publicly or for research purposes, challenges of confidentiality and practicality mean the resources are limited. A particular example is the availability of network data from various environments: the networks of organisations; Industrial Internet-of-Things (IIoT) environments; and Blockchain networks. This project seeks to develop high-quality synthetic cybersecurity datasets that would benefit the cybersecurity research community. The project would begin with background research to identify the characteristics of a realistic dataset. The approach might then involve creating a cybersecurity testbed composed of a synthetic network and set of attack test suites, and capturing traffic.
Insider threat detection Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc Organisations are experiencing an ever-growing concern of how to identify and defend against insider threats. Those who have authorised access to sensitive organisational data are placed in a position of power that could well be abused and could cause significant damage to an organisation. This could range from financial theft and intellectual property theft, through to the destruction of property and business reputation. This focus of this project is on the detection and identification of insider threats to organisations. The project builds on the group’s previous research in this area.
Low-Orbit Space Cybersecurity Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc This project would seek to study low-orbit space systems to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. The approach might involve designing test suites using attack graphs and validation in a laboratory setting.
Operational Security Tools for Users with Limited Cybersecurity Knowledge Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc As digitisation increases globally, there is a growing range of different types of user required to have an understanding of the operational cybersecurity of digital systems – i.e., the ability to detect and respond to potential cyber attacks. This includes not only individual users of smartphones and computers, but also personnel in organisations from the growing range of sectors becoming digitised (e.g., transport, marine, energy, healthcare). New challenges are emerging in terms of how to enable parties to monitor the necessary information and make informed security decisions. This project seeks to identify the requirements of users in this area, and develop operational cybersecurity tools developed to this need. The project scope is broad, and the student may select particular sectors of interest, or particular approaches of interest such as cybersecurity visualizations or automated incident-response tools.
Ransomware detection Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc

Ransomware attacks are increasing rapidly every year. While signature-based malware detection methods work well for detecting and stopping known malware, attackers can bypass the detection using obfuscation techniques or zero-day attacks. There is therefore a need for better detection mechanisms that are able to predict new forms of malware and react against them.

This project aims at exploring malware detection to develop a better understanding of the differences that make malware different from normal processes or files. It will further seek to implement a machine learning (ML) tool that would help in detecting malicious behaviour efficiently, so that malware infection and propagation can be stopped. The ML classifiers used will depend on the malware family explored as well as data available.

Systemic-Risk Modelling Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc A number of features of the digital environment create the potential for systemic risk, i.e., the possibility that a single event or development might trigger widespread failures and negative effects spanning multiple organisations, sectors or nations. These features include the interconnectivity of digital systems, homogeneity of devices meaning vulnerabilities may affect a large proportion of systems, and high dependency of organisations on service providers. This project would aim to develop models that allow analysis and quantification of the systemic risk that a population may face. The approaches used may include agent-based simulations and/or the development of harm trees.
Visualising Large Cybersecurity Datasets Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc A key challenge for cybersecurity operations in organisations is the sheer scale of the data generated within organisations as they increasingly digitise their operations. The scale of data makes is challenging for personnel responsible for monitoring security to maintain a clear situational awareness of network activity, and detect, identify and respond to potential attacks. This project focuses on identifying and implementing approaches to visualising large cybersecurity datasets in a way that is comprehensible and digestible for users. The project might include requirements gathering (e.g., through interviews) to identify users’ needs; implementation and validation of visualization approaches in a laboratory setting; or testing of the developed visualizations with end users to assess utility.
Sonification for detecting cyber-attacks Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc

In the face of increasingly frequent, sophisticated and varied cyber-attacks, organisations must continuously adapt and improve their network defences. There is a need for effective tools to help security practitioners to engage with and understand the data communicated over the network, and the outputs of automated attack-detection methods. Visual (text-based and graphical) presentations of data are usually used for this. Over the last few years, the utility of sonification (the representation of data as sound) for network-security monitoring has been theorised and explored experimentally.

This project seeks to build on prior research in our research group and externally, in which the effectiveness of sonification at representing a limited range of attack types has been experimented with. The aim of the project is to expand on this experimentation by assessing and comparing the effectiveness of various sonification designs at representing a wider range of attack types. This will involve identifying the key indicators present in network data for each attack type, exploring how the sonification design can best represent these indicators, and experimenting with the effectiveness of the resulting sonification approaches. Attack types experimented with could include, for example, various malwares and ransomwares, advanced persistent threats, and various types of flooding attack (this could use both real and synthetic attack datasets).

 

Threat Models for Blockchains Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc This project would seek to study Blockchain-based systems to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. The project might focus on the general form of Blockchains, or focus on a particular implementations or consensus protocols. A number of different approaches might be considered, for example: formal modelling of protocols used to establish consensus or resolve conflicts; design of test suites using attack graphs and validation in a laboratory setting.
Complex Object Querying and Data Science Michael Benedikt Data, Knowledge and Action MSc "We will look at query languages for transforming nested collections (collections that might contain collections). Such languages can be useful for preparing large scale feature data for machine learning algorithms. We have a basic implementation of such a language that we implement on top of the big-data framework Spark. The goal of the project is to extend the language with iteration. One goal will be to look at how to adapt processing techniques for nested data to support iteration. Another, closer to application is to utilize iteration to support additional steps of a data science pipeline, such as sampling. "
Decision procedures for arithmetic with powers Michael Benedikt Data, Knowledge and Action C MSc The project will look at implication and satisfaction problems involving formulas that involve additive arithmetic, inequalities, and some form of exponentiations with a constant base. For example, systems of inequalities with constant multiples of variables and constant multiple of expressions like 2^x. We will also look at first order logic built up from these inequalities. In the absence of exponentiatial terms like 2^x, the theory is known as Presburger Arithmetic, and has been heavily investigated in both theory and practice. In the presence of expenontial terms to a fixed base, it is known to be decidable, from work of Semenov in the late 70's and 80's. Recent work has shown fragments where the complexity is not too terrible, and in the process some new algorithms have been proposed. The goal of the project is to implement decision procedures for expressions of this form. A theoretically-oriented student can also work on complexity analysis. A pre-requisite for the project is good background in logic - Logic and Proof and preferably one of CAFV or KRR.
Graph Machine Learning with Neo4j Michael Benedikt Data, Knowledge and Action MSc

Co-supervisor Neo4J (industrial partner)

We have discussed a set of projects concerning data science on graphs with Brian Shi (an Oxford alumni) of the Neo4j data science team, including (but not limited to): -- understanding and evaluating different initialization regimes in node embeddings, -- explainability and interpretability for graph ML, design of online grap ML algorithms/models where nodes and edges are addfed/modified in near real-timel A student would be working with Michael Benedikt and the Neo4j team: the project will be research-focused and any software produced would be in the public domain.  The balance of experiment and theory could be tuned to the student's interests. The main prerequisite would be a very good knowledge of graph ML, at the level of Oxford's GRL course.

Interpolation Michael Benedikt Data, Knowledge and Action C MSc

Let F1 and F2 be sentences (in first-order logic, say) such that F1 entails F2: that is, any model of F1 is also a model of F2. An interpolant is a sentence G such that F1 entails G, and G entails F2, but G only uses relations and functions that appear in *both* F1 and F2.

The goal in this project is to explore and implement procedures for constructing interpolants, particularly for certain decidable fragments of first-order logic. It turns out that finding interpolants like this has applications in some database query rewriting problems.

Prerequisites: Logic and Proof (or equivalent)

Optimization of Web Query Plans Michael Benedikt Data, Knowledge and Action MSc

This project will look at how to find the best plan for a query, given a collection of data sources with access restrictions.

We will look at logic-based methods for analyzing query plans, taking into account integrity constraints that may exist on the data.

Optimized reasoning with guarded logics Michael Benedikt Data, Knowledge and Action C MSc "Inference in first-order logic is undecidable, but a number of logics have appeared in the last decade that achieve decidability. Many of them are guarded logics, which include the guarded fragment of first-order logic. Although the decidability has been known for some decades, no serious implementation has emerged. Recently we have developed new algorithms for deciding some guarded logics, based on resolution, which are more promising from the perspective of implementation. The project will pursue this both in theory and experimentally." Prerequisites A knowledge of first-order logic, e.g. from the Foundations of CS or Knowledge Representation and Reasoning courses, would be important.
Bypassing Post-Filtering Guardrails in Large Language Models. Adel Bibi, Philip Torr, Atılım Güneş Baydin Artificial Intelligence and Machine Learning MSc

This task aims to assess the reliability of post-filtering methods used to prevent language models from generating specific content. These methods involve monitoring generated content for certain keywords and pausing the model when detected. The project is on investigating whether adversaries can bypass these filters, leading to the generation of undesirable content while avoiding the filtered words. This research will include the formalization of the problem and the development of efficient solutions to address this challenge. This task helps evaluate existing safety measures and explore the potential for enhancing safeguards against automated harmful content generation. It also has applications in undermining watermarks, as they rely on less common data subsets, making detection easier. 

Data Leakage in Large Language Models through Prompting Adel Bibi, Philip Torr, Atılım Güneş Baydin Artificial Intelligence and Machine Learning MSc

Many online services rely on large language models fine-tuned with concealed system prompts detailing operational behavior. Adversaries may attempt to synthesize input to uncover these hidden prompts and manipulate the model, posing security risks. Instances of online games and competitions already exist, attempting to reveal hidden passwords in system prompts. The project is about formalizing this problem and developing systematic approaches to generating adversarial prompts that reveal vulnerabilities. This includes optimizing prompts to elicit affirmative responses indicative of data leakage. The goal is to identify minimal adversarial prompts that can bypass restrictions on prompt queries, enhancing security.

AI Safety and Synthetic Data Reuben Binns, Naman Goel Human Centred Computing MSc Synthetic data refers to artificially or algorithmically generated data, as opposed to real data that is generated by real-world events. This project will explore opportunities and risks of synthetic data, focusing on the safety of AI systems that use synthetic data in development or evaluation processes. We are interested in a range of features (reasoning, alignment, factuality, etc) and a range of application domains (WWW, social sciences/economics, health, finance, education, and other critical sectors). The student will perform empirical research by collecting novel data, conducting experiments to understand the effect of synthetic data, proposing (and evaluating) solutions to address the unintended effects and weaknesses of synthetic data. Interested students are welcome to contact Naman Goel to discuss or propose their own ideas related to above (naman.goel@cs.ox.ac.uk). Prerequisites: Good understanding and hands-on experience with machine learning, mathematical maturity, proficiency in Python. Experience with large language models (huggingface, running local models, using APIs of closed models, etc) or ability to learn quickly. Understanding of potential real-world harms will be a big plus. References: 1. Liu, Ruibo, et al. "Best practices and lessons learned on synthetic data." First Conference on Language Modeling. 2024. https://openreview.net/pdf?id=OJaWBhh61C
Applications of Complex-Order Fractional Diffusion to Physical and Biological Processes Alfonso Bueno-Orovio Computational Biology and Health Informatics B C Fractional differential equations (where standard derivatives are generalised to derivatives of non-integer, real order) have become a fundamental modelling approach for understanding and simulating the many aspects of spatial heterogeneity of complex materials and systems [1,2]. Very recently [3], we have been able to further extend such ideas to derivatives of complex-order. This project aims to further explore the capabilities of these novel complex-order fractional operators in modulating the spatiotemporal dynamics of different systems of interest. To investigate how complex-order fractional operators can advance the description of multiscale transport phenomena in physical and biological processes highly influenced by the heterogeneity of complex media. Students will investigate, by means of scientific computing and modelling and simulation, how complex-order fractional operators modulate the response of different systems of physical and biological interest. Different research options will be available, including the analysis of: (i) novel Turing patterns in physical and biological processes of morphogenesis; (ii) periodic precipitation patterns in the manufacturing of micro- and nano-structures; or (iii) models of population dynamics and epidemiology. 1. Fractional diffusion models of cardiac electrical propagation: role of structural heterogeneity in dispersion of repolarization role of structural heterogeneity in dispersion of repolarization https://doi.org/10.1098/rsif.2014.0352 2. Fourier spectral methods for fractional-in-space reastion- diffusion equations. https://doi.org/10.1007/s10543-014-0484-2 3. The Complex-Order Fractional Laplacian: Stable Operators, Spectral Methods, and Applications to Heterogeneous Media. https://doi.org/10.20944/preprints202107.0569.v1
High Throughput, High Resolution, and High Frame-Rate Analysis of Cellular Heart Function Alfonso Bueno-Orovio Computational Biology and Health Informatics B C MSc

Co-supervised by Christopher Toepfer

Cardiomyocytes are the cells responsible for generating contractile force in the heart. They are highly adaptable and alter their function in response to many stimuli (electrical stimuli, calcium, contraction, organisation of cellular structures, metabolism, and transcriptional signatures). These processes are all interrelated and dynamic in live cardiomyocytes, but we lack the ability to visualise them simultaneously and correlate them in real-time. We have previously developed specialised software [1, 2] to automate high-throughput analysis of cardiomyocyte function. However, these are not optimised to work with large files and cause data fragmentation as they do not work in tandem.

To develop novel software solutions for the analysis of cellular heart function for three channel, high throughput, resolution, and frame-rate imaging and analysis of beating cardiomyocytes.

Imaging live cardiomyocytes during contraction and relaxation necessitates high-frame rates (100s of FPS), which must be twinned with high resolutions (<70 nm per pixel), also allowing the simultaneous analysis of multiple well plates for high-throughput. In collaboration with the Oxford Department of Cardiovascular Medicine, and building on extensive datasets of high-resolution, high-rate fluorescent imaging of live cardiomyocytes, students taking this project will develop novel open-source software solutions integral to discovery and translational cardiovascular research. Different research options would be available, such as:

  • optimisation and/or development of novel algorithms for the analysis of sarcomere contractility and relaxation;
  • development of an analysis pipeline for masking, segmenting, and fitting action potentials from a variety of cellular systems;
  • development of an analysis pipeline for the automated analysis of mitochondrial shape, abundance, and cellular metabolites.

[1] CalTrack: High-Throughput Automated Calcium Transient Analysis in Cardiomyocytes. https://doi.org/10.1161/CIRCRESAHA.121.318868

[2] SarcTrack: an adaptable software tool for efficient large-scale analysis of sarcomere function in hiPSC-cardiomyocytes. https://doi.org/10.1161/CIRCRESAHA.118.314505

Interpretable Cardiac Anatomy and Electrophysiology Modelling in Paediatric Patients Using Variational Mesh Autoencoders Alfonso Bueno-Orovio, Vicente Grau Computational Biology and Health Informatics C MSc

Cardiac anatomy and function vary considerably across the human population with important implications for clinical diagnosis, treatment planning, and prognosis of disease. Consequently, many computer-based approaches have been developed to capture this variability for a wide range of applications, including explainable cardiac disease detection and prediction, dimensionality reduction, cardiac shape analysis, and the generation of virtual heart populations. Here, we will leverage on these technologies to further investigate connections between cardiac anatomy and electrocardiographic (ECG) signals in paediatric patients with Hypertrophic Cardiomyopathy (HCM), the most common hereditary heart disease and leading cause of sudden cardiac death in the young and competitive athletes.

To investigate connections between patterns of cardiac hypertrophic and their manifestation in the ECG in HCM paediatric patients. Generation of virtual populations of HCM paediatric hearts, capturing the variability in cardiac shape and ECG signals in this group of high-risk patients.

We have recently proposed novel variational mesh autoencoder (mesh VAE) methods as a novel geometric deep learning approach to model population-wide variations in cardiac anatomy [1], which enable direct processing of surface mesh representations of the cardiac anatomy in an efficient manner. These methods can also be extended to simultaneously learn the ECG signal of a given patient [2]. Exploiting cardiac Magnetic Resonance Imaging (MRI) and ECG resources from established clinical collaborators, this project will explore the quality and interpretability of the mesh VAE's latent space for the reconstruction and synthesis of multi-domain cardiac signals in paediatric patients with HCM. It will also investigate the method's ability to generate realistic virtual populations of paediatric cardiac anatomies and ECG signals in terms of multiple clinical metrics in this group of patients.

[1] Interpretable Cardiac Anatomy Modeling Using Variational Mesh Autoencoders. https://doi.org/10.3389/fcvm.2022.983868

[2] Multi-Domain Variational Autoencoders for Combined Modelling of MRI-Based Biventricular Anatomy and ECG-Based Cardiac Electrophysiology. https://doi.org/10.3389/fphys.2022.886723

Pre-requisites: Computational Medicine (recommended)

Modelling and Simulation of Genetic Heart Disease Alfonso Bueno-Orovio Computational Biology and Health Informatics C MSc

The goal of Precision Medicine is to provide therapies tailored to each patient. This is especially an urgent need in inherited heart disease, where current drug therapy mostly relies on symptom relief. In this regard, computational modelling and simulation constitutes a flexible platform for investigating the mechanisms underlying arrhythmic risk in these patients, and for the virtual screening of pharmacological therapy of both established and novel agents. 

To investigate mechanisms of arrhythmic risk and/or response to therapy using multiscale computational models (cell to whole-organ) of inherited heart disease.

Students will investigate, by means of computational modelling and simulation, how changes in structure and cellular function modulate the risk of life-threatening events and/or response to pharmacological therapy in patients with hypertrophic cardiomyopathy (HCM). Different research options will be available, including:
(i) arrhythmia mechanisms in HCM due to calcium dysregulation and spontaneous calcium release;
(ii) modelling of impaired energetics and impaired metabolism in HCM;
(iii) role of abnormalities in tissue microstructure as precursors of arrhythmic triggers;
(iv) HCM mutation-specific safety and efficacy of the latest pharmacological therapies (myosin inhibitors) when accounting for disease progression;
(v) HCM mutation-specific safety and efficacy of genetic therapy;
(vi) modelling and simulation of paediatric patients.

[1] Mechanisms of pro-arrhythmic abnormalities in
ventricular repolarisation and anti-arrhythmic therapies in human hypertrophic cardiomyopathy. https://doi.org/10.1016/j.yjmcc.2015.09.003

[2] Improving the clinical understanding of hypertrophic
cardiomyopathy by combining patient data, machine learning and computer simulations: A case study. https://doi.org/10.1016/j.morpho.2019.09.001

[3] Electrophysiological and contractile effects of
disopyramide in patients with obstructive hypertrophic cardiomyopathy: a translational study. https://doi.org/10.1016/j.jacbts.2019.06.004

[4] Mechanism based therapies enable personalised
treatment of hypertrophic cardiomyopathy. https://doi.org/10.1038/s41598-022-26889-2

Pre-requisites: Computational Medicine (recommended)

Frontiers in Graph Representation Learning Ismail Ilkan Ceylan Artificial Intelligence and Machine Learning C MSc

There are various projects available in the theme of ‘Graph Representation Learning’ (e.g., graph neural networks), please get in touch for the most up-to-date information. There are multiple projects aiming for (i) designing new graph neural network models to overcome some well-known existing limitations of graph neural networks (e.g., oversmoothing, oversquashing, inexpressiveness), (ii) improving our overall theoretical understanding of graph representation learning models (foundational aspects related to probability theory, graph theory, logic, as well as algorithmic aspects), (iii) improving the inductive bias of graph neural networks for machine learning over different types of graphs (e.g., directed, relational graphs), (iv) exploring tasks beyond widely studied ones, such as graph generation, from a foundational aspect (v) novel applications of graph represenation learning. It is important to identify a project which matches the students background and so the details are somewhat subtle to be included in a project description. There are various industrial and academic collaborators, depending on the specific direction. Most of the necessary background information is covered in the graph representation learning course offered in the Department in MT.  

Detection of cycles in the cryptographic protocol verifier ProVerif’s saturation procedure Vincent Cheval Security B C MSc

Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol).

The problem of verifying security properties (even simple secrecy) is undecidable in general, especially when considering an unlimited number of sessions. For this reason, tools such as ProVerif and Tamarin may not terminate, as they allow users to model unbounded number of sessions. In the vein of [1], many recent features have been introduced in ProVerif to help the tool terminate. However, most of these features require some manual intervention on the part of users, and it can be quite challenging to understand which feature to use depending on the protocol under study. Internally, ProVerif’s core algorithm consists of saturating of a set of Horn clauses with free resolution. In practice, most, if not all, cases of non-termination occur due to the saturation procedure entering an infinite loop. In this project, we aim to develop a procedure that can automatically detect a large class of loops and appropriately apply some of the ProVerif’s features to exit the detected loops.

Objectives:

  • get familiar with ProVerif’s saturation procedure (see [1]. The paper [2] is also interesting to have a broader view on ProVerif’s theory)       
  • get familiar with the tool itself and the current benchmark for some non-terminating cases        
  • design the new algorithm for loop detection and decision of appropriate countermeasures.         
  • implement the algorithm and test them on provided benchmarks to evaluate their efficiency.

The library will be incorporated to the official release of ProVerif.

Bibliography: Reading the papers is not mandatory to complete the project but it will provide a good context for the project.

[1] Bruno Blanchet, Vincent Cheval, and Cortier Véronique. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 43th IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, May 2022.

[2] Bruno Blanchet. Modeling and Verifying Security Protocols with the Applied Pi Calculus and ProVerif. Foundations and Trends in Privacy and Security, 1(1-2):1-135, October 2016.

Pre-requisites: Logic and Proof, Functional Programming

Generation and verification of Proof Certificates for cryptographic protocols Vincent Cheval Security C MSc

Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol).

Due to the critical aspect of these security protocols, it is crucial to guarantee that possible bugs in these automatic tools do not affect the correctness of the security proofs. In addition, even though the efficiency of these tools have significantly improved over the years, the verification of real-life protocols may require an large amount time and memory to terminate. For instance, when the verification requires more than 200GB of memory, as it was the case for verifying privacy properties of an extension of TLS 1.3, the reproducibility of the verification results is severely affected.To tackle this problem, the approach taken in this project is to develop proof certificates that can be verified by a machine-checked verifier. One of the main difficulty is to ensure that certificates are sufficiently small in practice (in order that they can be easily exchanged) while guaranteeing a relatively small verification time. The certificate generation and verification will be based on ProVerif internal procedure.

Objectives:

  • define the formal content of the certificate to be generated by ProVerif
  • propose an algorithm to verify the certificate and prove its correctness (by hand) 
  • implement the generation of certificate in ProVerif 
  • implement the prover and its proof of correctness with F*[1]

We do not expect for the last objective to be fully completed as the implementation in F* may be extremely time consuming. However, a partial implementation will be elaborated.

[1] https://www.fstar-lang.org

[2] Bruno Blanchet, Vincent Cheval, and Cortier Véronique. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 43th IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, May 2022.

Pre-requisites: Logic and Proof, Functional Programming, (CAFV is a plus)

Handling cryptographic primitives with complex algebraic properties Vincent Cheval Security B C MSc

Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol).

Internally, these tools express cryptographic messages using terms. The algebraic properties of the cryptographic primitives are then expressed by mean of an equational theory on terms. For example, the equation dec(enc(x,y),y) = x expresses that deciphering (“dec”) with a key “y” a cipher of a message “x” by the same key “y”, i.e. “enc(x,y)”, allows to retrieve the message “x”. Equality of terms modulo the equational theory is one of the core problems that these tools must solve. Although unification algorithms for many primitives with complexes algebraic properties, such as finite variant, XOR with homomorphism, abelian groups, etc are known, these algorithms only work when the equations do not mix primitives, in other words when the equations are not distinct. We aim to develop new efficient unification algorithm for the union of non-distinct equational theories. Proofs of correctness would be first done by hand but an implementation in F* [1] would be of interest (or Coq [2]).

Objectives:

  • get familiar with currently known algorithm for unification of union of distinct equational theories. [3]
  • create a semi-decision algorithm for unification of general union of equational theories and establish its limitation. It will be based on a general framework that extend the work of [4]         
  • implement the resulting algorithm and evaluate its efficiency         
  • implement the correctness proof on F* or Coq

The implemented library will be incorporated in the tool ProVerif.

[1] https://www.fstar-lang.org

[2] https://coq.inria.fr/

[3] FRANZ BAADER, KLAUS U. SCHULZ, Unification in the Union of Disjoint Equational Theories: Combining Decision Procedures, Journal of Symbolic Computation, Volume 21, Issue 2, 1996

[4] Bruno Blanchet, Martín Abadi, and Cédric Fournet. Automated Verification of Selected Equivalences for Security Protocols. Journal of Logic and Algebraic Programming, 75(1):3-51, February-March 2008.

Pre-requisites: Logic and Proof, Functional Programming

Multicore overhaul of the cryptographic protocol verifier ProVerif Vincent Cheval Security B

Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol).

Of the three aforementioned tools, ProVerif is generally considered to be the most efficient. It is also the only tool that does not rely on distributed or parallel computing. ProVerif’s internal procedure is based on the saturation of sets of Horn clauses which is inherently more suited to sequential computing. However, several time-consuming algorithms used in the saturation procedure can greatly benefit from parallel computing, such as Horn clauses subsumption checking, unification and matching of terms modulo an equational theory. This project aims to transform many algorithms used in ProVerif using the multicore programming framework that is provided in OCaml 5. We can expect significant speedup as, for instance, Horn clauses subsumption checking represents usually 80 to 90% of the total execution time of ProVerif. One of the difficulties however will be to appropriately combine parallel computing algorithm with other optimization features already implemented that are specifically tailored for sequential computing (such as hash consing techniques).

Objectives:        

  • get familiar with ProVerif’s saturation procedure (see [1]. The paper [2] is also interesting to have a broader view on ProVerif’s theory)         
  • identify the components in ProVerif’s saturation procedure that can benefit from concurrent programming        
  • design the new algorithms for these components        
  • evaluate the efficiency impact on the overall execution time (a benchmark is provided).

The library will be incorporated to the official release of ProVerif.

Bibliography: Reading the papers is not mandatory to complete the project but it will provide a good context for the project.

[1] Bruno Blanchet, Vincent Cheval, and Cortier Véronique. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 43th IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, May 2022.

[2] Bruno Blanchet. Modelling and Verifying Security Protocols with the Applied Pi Calculus and ProVerif. Foundations and Trends in Privacy and Security, 1(1-2):1-135, October 2016.

Pre-requisites: Logic and Proof, Functional Programming, Concurrent Programming

Reducing memory consumption of the cryptographic protocol verifier ProVerif with hash consing techniques Vincent Cheval Security B C MSc

Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol).

While the recent overhaul of ProVerif [1] significantly improved its efficiency, it did not really affect its memory consumption. Indeed complex real-life protocols, such as TLS, required between 10 to 100GB of memory depending on the type of security properties being proved. This problem of memory is crucial as we reach the limits of the capabilities of our computers. We aim to change the internal representations of messages within ProVerif by using hash consing techniques [2] that allow to maximize the sharing of memory. In particular, two messages semantically equal will thus be ensured to be also physically equal (they will point to the same address in the memory). It will require to rework most of the implemented algorithms operating on messages (resolution, unification, matching, subsumption,…). The internal algorithm of ProVerif being based mostly on the saturation of Horn clauses, a maximal sharing of memory within each Horn clause but also within the set of Horn clauses generated by ProVerif will allow us to considerably reduce the memory consumption of ProVerif. Maximizing the memory sharing within one clause is fairly straightforward. However, for sets of Horn clauses, the problem becomes much harder due to the presence of variables that sometimes need to be considered distinct within two Horn clauses.

Objectives:         

  • get familiar with hash consing techniques and implement a library for managing the new representation of terms.         
  • study the classical algorithm for unification on DAG terms and propose an adaptation for our new structure of terms.         
  • create similar algorithms for resolution, matching, subsumption, unification modulo and equational theory        
  • propose an algorithm for sharing memory on a set of Horn clauses and find the complexity upper bound of the problem
  • implement these algorithms and test them on provided benchmarks to evaluate their efficiency.

Bibliography: Reading the papers and slides in the bibliography is not mandatory to complete the project but it will provide a good context for the project.

[1] Bruno Blanchet, Vincent Cheval, and Cortier Véronique. Proverif with lemmas, induction, fast subsumption, and much more. In Proceedings of the 43th IEEE Symposium on Security and Privacy (S&P’22). IEEE Computer Society Press, May 2022.

[2] Jean-Christophe Filliâtre, Sylvain Conchon. Type-safe modular hash-consing. ML 2006: 12-19

[3] https://www3.risc.jku.at/education/courses/ss2018/unification/slides/02_Syntactic_Unification_Improved_Algorithms_handout.pdf

Pre-requisites: Logic and Proof, Functional Programming

Verifying privacy-type properties in a probabilistic setting Vincent Cheval Security C MSc

Security protocols aim at securing communications. They are used in various applications (e.g. establishment of secure channels over the Internet, secure messaging, electronic voting, mobile communications, etc.) Their design is known to be error prone and flaws are difficult to fix once a protocol is largely deployed. Hence a common practice is to analyze the security of a protocol using formal techniques and in particular automatic tools. This approach has lead to the development of several verification tools such as ProVerif (https://proverif.inria.fr), Tamarin (https://tamarin-prover.github.io) and DeepSec (https://deepsec-prover.github.io) that rely on symbolic models to express protocols and their security properties. Symbolic, automated verification has been a very successful approach for revealing vulnerabilities or proving the absence of entire families of attacks. For instance, these tools were used to verify the security of several major protocols (e.g. TLS 1.3, 5G protocols, Belenios electronic voting protocol).

The DeepSec prover specializes in the verification of security properties that can be expressed as behavioural equivalence, such as anonymity, vote privacy, unlinkability, strong secrecy, etc. It currently implements a procedure for checking “trace equivalence” between processes. The paper that describes the theory behind DeepSec also describe a theoretical procedure that allows to prove two stronger behavioural equivalences, that are similarity and bissimilarity. The description of these procedures is still considered “theoretical” as they are heavily non-deterministic, hence preventing efficient implementations. One part of this project thus aims to design an effective and efficient procedure for checking bissimilairty and similarity. Currently, symbolic models are purely non-deterministic. For instance, the random generation of numbers are intuitively abstracted and they are assumed to be impossible to guess by the attacker. While this is generally sensible, it is not always possible to eliminate probabilities altogether: some protocols specifically rely on probabilistic coin flips in their specification, and a recent paper [2] has shown that giving the attacker the ability to use probabilistic choices can enable him to break behavioural equivalence. This paper provides the general probabilistic model but only describe a procedure for checking probabilistic equivalence in a restrictive case, based on the internal procedure of DeepSec. This article provides the general probabilistic model, but only describes a procedure for checking probabilistic equivalence in a restrictive case, based on DeepSec's internal procedure. The second part of this project will aim to design a (potentially theoretical) procedure for verifying probabilistic equivalence in the general setting. Currently, symbolic models are purely non-deterministic. For instance, the random generation of numbers are intuitively abstracted and they are assumed to be impossible to guess by the attacker. While this is generally sensible, it is not always possible to eliminate probabilities altogether: some protocols specifically rely on probabilistic coin flips in their specification, and a recent paper [2] has shown that giving the attacker the ability to use probabilistic choices can enable him to break behavioural equivalence. This paper provides the general probabilistic model but only describe a procedure for checking probabilistic equivalence in a restrictive case, based on the internal procedure of DeepSec. This article provides the general probabilistic model, but only describes a procedure for checking probabilistic equivalence in a restrictive case, based on DeepSec's internal procedure. The second part of this project will aim to design a (potentially theoretical) procedure for verifying probabilistic equivalence in the general setting.

Objectives:         

  • get familiar with the internal procedure of DeepSec [1]         
  • design and implement an efficient algorithm for checking similarity/bissimilarity        
  • design an algorithm for checking probabilistic equivalences in general setting       
  • prove the correctness of all designed algorithms

The implemented library will be incorporated in the tool DeepSec.

[1] V. Cheval, S. Kremer and I. Rakotonirina. DEEPSEC: Deciding Equivalence Properties in Security Protocols - Theory and Practice. In Proceedings of the 39th IEEE Symposium on Security and Privacy (S&P'18), IEEE Computer Society Press, 2018.

[2] Symbolic protocol verification with dice. Vincent Cheval‚ Raphaëlle Crubillé and Steve Kremer. In J. Comput. Secur.. Vol. 31. No. 5. Pages 501–538. 2023.

Pre-requisites: Logic and Proof, Functional Programming, Probabilistic Model Checking

Detecting and Modelling Mirrors for 3D Scene Reconstruction Ronnie Clark Artificial Intelligence and Machine Learning, Systems C MSc

Elements such as mirrors and glass still present a major difficulty for simultaneous localisation and mapping (SLAM) and 3D reconstruction algorithms [1]. This project will address this challenge by developing a method for accurately identifying mirrors in a scene and accounting for their unique properties in the reconstruction process. The goal will be to improve the fidelity and accuracy of the 3D models. The work will involve aspects of computer vision, geometric modelling, and possibly ray tracing techniques.

[1] https://www.thomaswhelan.ie/Whelan18siggraph.pdf

Pre-requisites: Suitable for those who have taken a course in machine learning or computer graphics

Enhanced Single Image Depth Prediction using a Percentile-based Loss Ronnie Clark Artificial Intelligence and Machine Learning, Systems C MSc

Single image depth perception is a common task in computer vision which involves predicting the distance from the camera to the scene for each pixel in an image. This problem has broad applications in autonomous systems, virtual and augmented reality, as well as graphics. However, a significant challenge in single image depth prediction is the estimation of the absolute scene scale. Therefore, most depth estimation methods focus on predicting relative depth rather than absolute measurements [1]. This project aims to explore a new relative depth parameterisation based on the percentile ranking of depth values. The student will be expected to train a model using this parameterisation and evaluate its effectiveness by comparing its performance with existing methods on widely-used datasets such as NYUv2.

[1] https://arxiv.org/pdf/1907.01341v3.pdf

[2] https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html

Pre-requisites: Suitable for those who have taken a course in machine learning

Low Rank training of Neural Fields Ronnie Clark Artificial Intelligence and Machine Learning, Systems C MSc

Neural fields [1] have emerged as a promising method for representing 3D data, utilising Multi-Layer Perceptrons (MLPs) to predict the properties of a field at every point in space. Despite their potential, a significant drawback is the lengthy training process, often due to the large size of the MLPs. This project aims to explore the application of low-rank adaptation (LoRA) [2] in enhancing the training efficiency of neural fields. An essential part of this investigation will involve benchmarking the performance of low-rank training against full training across various types of fields, such as Signed Distance Functions (SDFs) or radiance fields, to evaluate the effectiveness of LoRA in this context.

[1] https://neuralfields.cs.brown.edu/siggraph23.html

[2] https://arxiv.org/pdf/2106.09685.pdf

Pre-requisites: Suitable for those who have taken a course in machine learning or computer graphics

Unsupervised Visual Learning using Segment-Masks Ronnie Clark Artificial Intelligence and Machine Learning, Systems C MSc

Masked language modelling is the prevalent method for pretraining large language models (LLMs). This involves 'masking' words in the text and prompting the model to predict the hidden words. This approach has also been adapted in computer vision, notably in Vision Transformers (ViTs) [1], where patches of images are dropped, and the model is tasked with predicting these missing regions. However, in the vision context process isn't ideal, as patches in images don't correspond directly to words in text. Unlike words, which represent complete concepts, image patches can contain parts of various objects. This project will investigate a novel approach to vision pretraining: using segmentation masks instead of patches. A key aspect of the project will be comparing the effectiveness of segmentation-based training against the traditional patch-based method in downstream tasks, such as semantic segmentation or depth estimation.

[1] https://arxiv.org/pdf/2111.06377.pdf

Pre-requisites: Suitable for those who have taken a course in machine learning

Topics in Online Algorithms and Learning-Augmented Algorithms Christian Coester Algorithms and Complexity Theory B C MSc Description: An online algorithm is an algorithm that receives its input over time, e.g. as a sequence of requests that need to be served. The algorithm must act on each request upon its arrival, without knowledge of future requests, often with the goal of minimising some cost. (For example, assigning taxis to ride requests with the goal of minimising distance traveled by taxis.) Due to the lack of information about future requests, the algorithm cannot always make optimal decisions, but for many problems there exist algorithms that provably find solutions whose cost is within a bounded factor of the optimum, regardless of the input. An emerging related field of research are learning-augmented algorithms, where the algorithm receives some (perhaps erroneous) predictions about the future as additional input. In this project, the student will work on a selected problem in the field of online algorithms or learning-augmented algorithms, with the goal of designing algorithms and theoretically analysing their performance. The project is suitable for mathematically oriented students with an interest in proving theorems about the performance of algorithms.
REASONING ABOUT TEMPORAL KNOWLEDGE Bernardo Cuenca Grau, Przemysław Wałęga Data, Knowledge and Action MSc

"Our world produces nowadays huge amounts of time stamped data, say measurements from meteorological stations, recording of online payments, GPS locations of your mobile phone, etc. To reason on top of such massive temporal datasets effectively we need to provide a well-structured formalisation of temporal knowledge and to devise algorithms with good computational properties. This, however, is highly non-trivial; in particular logical formalisms for temporal reasoning often have high computational complexity.

This project provides an opportunity to join the Knowledge Representation and Reasoning group and participate in exciting EPSRC-funded research on temporal reasoning, temporal knowledge graphs, and reasoning over streaming data. There are opportunities to engage both in theoretically-oriented and practically-oriented research in this area. For example, in recent months, we have been investigating the properties and applications of DatalogMTL---a temporal extension of the well-known Datalog rule language which is well-suited for reasoning over large and frequently-changing temporal datasets; the project could focus on analysing the theoretical properties of DatalogMTL and its fragments such as its complexity and expressiveness, or alternatively in more practical aspects such as optimisation techniques for existing reasoning algorithms. There are many avenues for research in this area and we would be more than happy to discuss possible concrete alternatives with the student(s)."

The theoretical part of the project requires good understanding of logics (completing Knowledge Representation and Reasoning course could be beneficial), whereas the practical part is suited for those who have programming skills.

Realistic Benchmarks for Explainable GNN-Based Models Bernardo Cuenca Grau, David Tena Cucala Data, Knowledge and Action B C

Graph Neural Networks (GNNs) are a family of machine learning models that have been successfully applied to many real-world tasks involving graph-structured data. The predictions made by GNNs, however, are often difficult to explain because of the opaque nature of the model. This becomes especially problematic in safety-critical applications or in contexts that require compliance with principles of explainability. The Data & Knoweldge Group has recently implemented a GNN-based system where every prediction made by the model can be explained by a rule expressed in Datalog, a logical language with well-defined semantics. For example, suppose we use our system to recommend novels to users based on bibliographic information about the novels and previous user-novel interactions. Imagine that the system recommends the novel "Pride and Prejudice" to user Alex. This prediction can be explained by the Datalog rule “Recommend(x, y) :- RomanticGenre(x) & Liked(y,z) & RomanticGenre(z)” (recommend a romantic novel if the user already liked a romantic novel) if the input contains facts such as RomanticGenre(PrideAndPrejudice), RomanticGenre(TheNotebook), and Liked(Alex,TheNotebook). The first aim of this project is to prepare a new suite of benchmarks for our system, by finding and pre-processing relevant data sources. The student will then train and test our GNN-based system on these benchmarks, and evaluate the quality of the computed explanations. If time permits, the student will also investigate how to adjust our system's rule extraction algorithm to produce more useful explanations.

Pre-requisites: Experience with Python and familiarity with first-order logic is recommended. Experience with Knowledge Graphs is optional, but desirable.

Automated Synthesis of Norms in Multi-Agent Systems Giuseppe De Giacomo Data, Knowledge and Action C MSc Norms have been widely proposed to coordinate and regulate multi-agent systems (MAS) behaviour. In this project we want to consider the problem of synthesising and revising the set of norms in a normative MAS to satisfy a design objective expressed in logic. We focus on dynamic norms, that is, that allow and disallow agent actions depending on the history so far. In other words, the norm places different constraints on the agents' behaviour depending on its own state and the state of the underlying MAS. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement norm synthesis techniques against given logical specifications. Reactive Synthesis, Game-Theoretic Techniques, Temporal Logics, ATL, ATL*, Strategy Logics, Reasoning about Actions, Planning, Model Checking of Multi-Agent Systems Natasha Alechina, Giuseppe De Giacomo, Brian Logan, Giuseppe Perelli: Automatic Synthesis of Dynamic Norms for Multi-Agent Systems. KR 2022 (and reference there in)
Exploring Large Language Models for Reactive Synthesis Problems Giuseppe De Giacomo Data, Knowledge and Action C MSc

Reactive synthesis represents the culmination of declarative programming. By focusing on what a system should accomplish rather than how to achieve it we are able, on the one hand, to simplify the system design process while avoiding human mistakes and, on the other hand, to allow an autonomous agent to self-program itself just from high-level specifications. Linear Temporal Logic (LTL) or LTL on finite traces (LTLf) synthesis is one of the most popular variants of reactive synthesis, being the problem of automatically designing correct-by-construction reactive systems with the guarantee that all its behaviours comply with desired dynamic properties expressed in LTL/LTLf.

Recently, there has been a growing interest in applying Large Language Models (LLMs) to various problems in computer science such as Automated Reasoning, Knowledge Representations and Planning, and Formal Methods in general. Emerging studies have explored the capabilities of LLMs in these fields.

The objective of this project is to investigate the feasibility and effectiveness of using LLMs for reactive synthesis problems with specifications expressed in LTL and LTLf.

Focus:

• Insights into the potential and limitations of LLMs for reactive synthesis with LTL and LTLf specifications.

• An LLM-guided reactive synthesis framework that integrates natural language processing into the synthesis process.

• Empirical evaluation of the performance and effectiveness of LLM-based synthesis algorithms.

References:

• Amir Pnueli The Temporal Logic of Programs. FOCS 1977: 46-57

• Amir Pnueli, Roni Rosner: On the Synthesis of a Reactive Module. POPL 1989: 179-190

• Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860

• Giuseppe De Giacomo , Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI • Philipp J. Meyer, Salomon Sickert, Michael Luttenberger: Strix: Explicit Reactive Synthesis Strikes Back! CAV (1) 2018: 578-586 • Marco Favorito, Shufang Zhu. LydiaSyft: A Compositional Symbolic Synthesizer for LTLf Specifications

• Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao Kambhampati: On the Planning Abilities of Large Language Models - A Critical Investigation. CoRR abs/2305.15771 (2023)

Pre-requisites: Foundations of Self-Programming Agents and Computer-Aided Formal Verification not required but will help

Nondeterministic Situation Calculus Giuseppe De Giacomo Data, Knowledge and Action C MSc Background and focus The standard situation calculus assumes that atomic actions are deterministic. But many domains involve nondeterministic actions. The key point when dealing with nondeterministic actions is that we need to clearly distinguish between choices that can be made by the agent and choices that are made by the environment, i.e., angelic vs. devilish nondeterminism. “Nondeterministic Situation Calculus” is an extension to the standard situation calculus that accommodates nondeterministic actions and preserves Reiter’s solution to the frame problem. It allows for answering projection queries through regression. But it also provides the means to formalize, e.g., (first-order) planning in nondeterministic domains and to define execution of ConGolog high-level program in nondeterministic domains. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Research topic/question and expected contribution Under suitable assumptions (e.g., propositional or finite-object cases), devise and implement reasoning about action and strategic reasoning techniques for nondeterministic situation calculus based on formal methods methodologies Method Reasoning about Actions, Planning, Model Checking, reactive synthesis Existing work and references Giuseppe De Giacomo, Yves Lespérance: The Nondeterministic Situation Calculus. KR 2021: 216-226 Ray Reiter: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems, MIT Press 2001 Giuseppe De Giacomo, Yves Lespérance, Hector J. Levesque: ConGolog, a concurrent programming language based on the situation calculus. Artif. Intell. 121(1-2): 109-169 (2000) Giuseppe De Giacomo, Yves Lespérance, Fabio Patrizi: Bounded situation calculus action theories. Artif. Intell. 237: 172-203 (2016)
Obligation Games for Reactive Synthesis Giuseppe De Giacomo Data, Knowledge and Action B C MSc

This project focuses on developing an algorithmic solution for solving obligation games as part of reactive synthesis. Obligation properties, a unique class within the safety-progress hierarchy of property specifications [2], play a foundational role in specifying complex system behaviors, particularly when a system interacts with an unpredictable or adversarial environment. Expressed through Linear Temporal Logic (LTL) [1] and its finite variant (LTLf) [3]—the most widely used logical languages in Computer Science and Artificial Intelligence for specifying temporal properties—obligation properties combine safety and guarantee elements to outline conditions for conditional behaviors based on both desired and undesired events.

Safety properties ensure that “nothing bad happens,” requiring specific conditions to hold at every instant of the system's operation. For example, a safety property might dictate that a robot must avoid obstacles continuously. Violations of safety properties are identifiable in finite time, meaning the system or observer can detect and respond to any bad state immediately. Guarantee properties, in contrast, specify that “something good eventually happens.” These properties represent goals that the system should fulfill at least once within a finite timeframe, such as ensuring that a network packet eventually reaches its destination. Together, safety and guarantee properties define comprehensive behavioral expectations for systems operating in dynamic or adversarial settings.

Obligation properties, introduced as part of the safety-progress hierarchy by Lichtenstein, Pnueli, and Zuck (1985) and further developed by Manna and Pnueli [2], represent Boolean combinations of safety and guarantee properties. Reactive synthesis is the problem of automatically synthesizing a system that meets desired dynamical properties in a partially controllable environment. The environment is typically adversarial; however, in case it is stochastic, reactive synthesis becomes MDP solving for temporal specification. This project aims to devise novel synthesis techniques that address obligation properties in either adversarial or stochastic environments.

References:

  1. Giuseppe De Giacomo and Moshe Y. Vardi. Linear temporal logic and linear dynamic logic on finite traces. In IJCAI 2013.
  2. Zohar Manna and Amir Pnueli. A hierarchy of temporal properties. In PODC 1990.
  3. Amir Pnueli. The temporal logic of programs. In FOCS, 1977.
Planning for Temporally Extended Goals in Linear Time Logics of Finite Traces Giuseppe De Giacomo Data, Knowledge and Action C MSc Planning for temporally extended goals expressed in Linear Time Logics on finite traces requires synthesizing a strategy to fulfil the temporal (process) specification expressed by the goal. To do so one can take advantage of techniques developed in formal methods, based on reduction to two-players game and forms model checking. It is crucial to keep in mind that planning is always performed in a domain, which can be thought of a specification of the possible environment reactions. Typically, such a domain specification is is Markovian, i.e., the possible reactions of the environment depend on the current state and the agent action only. However, forms of non-Markovian domains are also possible and, in fact, of great interest. Note that the domain can be fully observable, partially observable, or non-observable at all. Each of these settings leads to different forms of planning and different techniques for solving it. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement techniques for synthesizing strategies that fulfil temporally extend goals in a planning domain, based on both AI and formal methods methodologies Reasoning about Actions, Planning, Model Checking, reactive synthesis Giuseppe De Giacomo, Sasha Rubin: Automata-Theoretic Foundations of FOND Planning for LTLf and LDLf Goals. IJCAI 2018: 4729-4735 Giuseppe De Giacomo, Moshe Y. Vardi: LTLf and LDLf Synthesis under Partial Observability. IJCAI 2016: 1044-1050 Giuseppe De Giacomo, Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860 Giuseppe De Giacomo, Moshe Y. Vardi: Automata-Theoretic Approach to Planning for Temporally Extended Goals. ECP 1999: 226-238
Reactive Program Synthesis Under Environment Specifications in Linear Time Logics on Finite and Infinite Traces. Giuseppe De Giacomo Data, Knowledge and Action C MSc “Reactive synthesis” is the automated synthesis of programs for interactive/reactive ongoing computations (protocols, operating systems, controllers, robots, etc.). This kind of synthesis is deeply related to planning in nondeterministic environments. In this project we are interested in Reactive Synthesis under Environment Specifications, where the agent can take advantage of the knowledge about the environment's behaviour in achieving its tasks. We consider how to obtain the game arena out of the LTL and LTLf specifications of both the agent task and the environment behaviours, and how to solve safety games, reachability games, games for LTLf objectives, and for objectives expressed in fragments of LTL. We also want to understand how winning regions of such arenas are related to the notion of “being able” to achieve desired properties (without necessarily committing to a specific strategy for doing so). We focus on agent tasks that eventually terminate and hence are specified in LTLf. While for the environment we focus on safety specifications and limited forms of guarantee, reactivity, fairness, and stability. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement reactive synthesis techniques for fulfil agent task in environment with specified behaviour. Reactive Synthesis, Temporal Logics, 2-Player games, Reasoning about Actions, Planning, Model Checking Benjamin Aminof, Giuseppe De Giacomo, Aniello Murano, Sasha Rubin: Planning under LTL Environment Specifications. ICAPS 2019: 31-39 Giuseppe De Giacomo, Antonio Di Stasio, Moshe Y. Vardi, Shufang Zhu: Two-Stage Technique for LTLf Synthesis Under LTL Assumptions. KR 2020: 304-314 Giuseppe De Giacomo, Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860 Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin: Best-Effort Synthesis: Doing Your Best Is Not Harder Than Giving Up. IJCAI 2021: 1766-1772 Shufang Zhu, Giuseppe De Giacomo: Synthesis of Maximally Permissive Strategies for LTLf Specifications. IJCAI 2022: 2783-2789 Shufang Zhu, Giuseppe De Giacomo: Act for Your Duties but Maintain Your Rights. KR 2022 Giuseppe De Giacomo, Antonio Di Stasio, Francesco Fuggitti, Sasha Rubin: Pure-Past Linear Temporal and Dynamic Logic on Finite Traces. IJCAI 2020: 4959-4965
Reactive Program Synthesis and Planning under Multiple Environments Giuseppe De Giacomo Data, Knowledge and Action C MSc In this project we consider an agent that operates, with multiple models of the environment, e.g., two of them: one that captures expected behaviours and one that captures additional exceptional behaviours. We want to study the problem of synthesizing agent strategies that enforce a goal against environments operating as expected while also making a best effort against exceptional environment behaviours. We want to formalize these concepts in the context of linear-temporal logic (especially on finite traces) and give algorithms for solving this problem. More generally, we want to formalize and solve synthesis in the case of multiple, even contradicting, assumptions about the environment. One solution concept is based on ``best-effort strategies'' which are agent plans that, for each of the environment specifications individually, achieve the agent goal against a maximal set of environments satisfying that specification. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement reactive synthesis techniques that are effective under multiple environments. Reactive Synthesis, Temporal Logics, 2-Player games, Reasoning about Actions, Planning, Model Checking Benjamin Aminof, Giuseppe De Giacomo, Alessio Lomuscio, Aniello Murano, Sasha Rubin: Synthesizing Best-effort Strategies under Multiple Environment Specifications. KR 2021: 42-51 Benjamin Aminof, Giuseppe De Giacomo, Alessio Lomuscio, Aniello Murano, Sasha Rubin: Synthesizing strategies under expected and exceptional environment behaviors. IJCAI 2020: 1674-1680 Daniel Alfredo Ciolek, Nicolás D'Ippolito, Alberto Pozanco, Sebastian Sardiña: Multi-Tier Automated Planning for Adaptive Behavior. ICAPS 2020: 66-74 Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin: Best-Effort Synthesis: Doing Your Best Is Not Harder Than Giving Up. IJCAI 2021: 1766-1772 Benjamin Aminof, Giuseppe De Giacomo, Aniello Murano, Sasha Rubin: Planning under LTL Environment Specifications. ICAPS 2019: 31-39 Paolo Felli, Giuseppe De Giacomo, Alessio Lomuscio: Synthesizing Agent Protocols from LTL Specifications Against Multiple Partially-Observable Environments. KR 2012 Giuseppe De Giacomo, Moshe Y. Vardi: Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860
Reinforcement learning under safety non-Markovian Safety Specifications and Rewards expressed in Linear Temporal Logics on Finite Traces Giuseppe De Giacomo Data, Knowledge and Action C MSc In some cases, the agent has a simulator of the environment instead of a formal specification, so it needs to learn its strategies to achieve its task in the environment. Sometimes even the task is only implicitly specified through rewards. The key issue is that the type of properties we are often interested in are non-Markovian (e.g., specified in LTL or LTLf), and hence we need to introduce non-Markovian characteristics in decision processes and reinforcement learning. A particular promising direction is when such non-Markovian characteristics can be expressed in Pure Past Linear Temporal Logics. Having taken the Foundation of Self-Programming and Computer-Aided Formal Verification is not required but will be of help. Under suitable assumptions, devise and implement reinforcement learning techniques that remain safe wrt safety specification and achieve rewards specified in linear temporal logics on finite traces Reinforcement Learning, MDP, Non-Markovian Rewards, Non-Markovian Decision Processes, Linear Temporal Logics Ronen I. Brafman, Giuseppe De Giacomo, Fabio Patrizi: LTLf/LDLf Non-Markovian Rewards. AAAI 2018: 1771-1778Synthesis for LTL and LDL on Finite Traces. IJCAI 2015: 1558-1564 Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, Fabio Patrizi: Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications. ICAPS 2019: 128-136 Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. AAAI-18: 2669–2678. Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi: Imitation Learning over Heterogeneous Agents with Restraining Bolts. ICAPS 2020: 517-521 Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi, Alessandro Ronca: Temporal Logic Monitoring Rewards via Transducers. KR 2020: 860-870 Giuseppe De Giacomo, Antonio Di Stasio, Francesco Fuggitti, Sasha Rubin: Pure-Past Linear Temporal and Dynamic Logic on Finite Traces. IJCAI 2020: 4959-4965 Giuseppe De Giacomo, Moshe Y. Vardi: Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. IJCAI 2013: 854-860
Complexity of popularity in hedonic games Edith Elkind Artificial Intelligence and Machine Learning B C MSc

Coalitions and coalition formation are central concerns in the study of multiagent systems as well as cooperative game theory. Typical real-world examples include individuals joining clubs or societies such as music groups or sport teams, task allocation, or students living in shared housing. The formal study of coalition formation is often realized using so-called hedonic games, which originate from economic theory and focus on coalition structures (or partitions) that satisfy various desiderata based on the agents’ preferences over coalitions. One such desideratum is the notion of popularity:A coalition structure S is said to be popular if, for every other coalition structure S', the number of agents preferring S to S' is larger than the number of agents preferring S' to S. In other words, a popular state is a status quo that cannot be replaced by proposing a different solution that wins a majority vote against the status quo. Popular coalition structures seem to be desirable, but, unfortunately, there exist hedonic games, for which popular outcomes do not exist.

The concept of popularity gives rise to various interesting computational problems, most importantly the existence problem: given a hedonic game, does it contain a popular coalition structure? Two important classes of hedonic games are additively separable hedonic games and fractional hedonic games. For these classes of games, it is known that the existence problem is NP-hard and coNP-hard. However, the best known upper bound is that this problem is contained in Sigma_2^p and it is conjectured that the problem is, in fact, Sigma_2^p-complete. The primary goal of the project is to work on this conjecture. Other possible directions of the project are to consider popularity in restricted classes of hedonic games or to consider related solution concepts, such as strong popularity, mixed popularity, or joint-majority stability.

Computational analysis of sports ranking systems through game theory Edith Elkind Artificial Intelligence and Machine Learning B C MSc This project proposes a computational game theory analysis focused on addressing the challenge of ranking athletes in sports settings where direct pairwise comparisons may be unavailable, with a particular emphasis on sports like Ironman, Triathlon, and Marathon. Given the inherent difficulty of arranging head-to-head competitions among top athletes in such endurance sports, the project seeks to analyze the shortcomings of existing methodologies as well as develop and assess novel ranking systems using insights from computational game theory. The first step in this project will first involve establishing a concrete model and set of assumptions that mirror this real-world problem. Using this model, the project then aims to both theoretically and empirically analyze the current methodology behind the ranking systems being used today. Ultimately, the goal is to provide a provably fair ranking system for such competitions, either based on extensions of existing systems or through a novel ranking concept.
Strategic nomination in multiwinner voting Edith Elkind Artificial Intelligence and Machine Learning B C MSc In multiwinner voting, each voter specifies their preferences over available candidates (by ranking all candidates, or specifying which candidates they approve), and the goal is to select a fixed-size set of winners. An important phenomenon in real-life voting scenarios is that if voters are not familiar with the candidates, they prefer not to participate in the elections. However, once a voter chooses to participate, they tend to learn more about the candidates and therefore vote according to their true preferences. Thus, a candidate may benefit from nominating another candidate, if they believe that the new candidate's electorate will also support them. The goal of this project is to explore the complexity of the strategic nomination problem, for a variety of voting rules.
Economic aspects of cybersecurity Patricia Esteve-Gonzalez, Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc

The links between cybersecurity and economics are multiple and complex, offering topics of discussion at the user, organisational, national, and international levels. The literature covering the overlap of these two fields is limited, and there is a need to cast light on how decisions are taken, how to incentivise the prioritisation of cybersecurity, and how big is the economic incentives behind the protection of digitalised societies. Approaches to the project might involve empirical or theoretical approaches, for example game theory, surveys, interviews, statistical analysis, etc.

See, for example: Kianpour, M., Kowalski, S. J., and Øverby, H. 2021. Systematically Understanding Cybersecurity Economics: A Survey. Sustainability, 13(24). https://doi.org/10.3390/su132413677

Hojda, M.H. 2022. Information Security Economics: Cyber Security Threats. Proceedings of the International Conference on Business Excellence, 16(1): 584-592. https://doi.org/10.2478/picbe-2022-0056

Thematic Analysis of National Cybersecurity Maturity Assessments Patricia Esteve-Gonzalez, Ioannis Agrafiotis, Louise Axon, Michael Goldsmith, Sadie Creese Security B C MSc There is a large investment being made by the international community aimed at helping nations and regions to develop their capacity in cybersecurity. The work of the Global Cyber Security Capacity Centre (based at the Oxford Martin School) studies and documents this: https://www.sbs.ox.ac.uk/cybersecurity-capacity/content/front. There is scope to study in more detail the global trends in capacity building in cybersecurity, the nature of the work and the partnerships that exist to support it. An interesting analysis might be to identify what is missing (through comparison with the Cybersecurity Capacity Maturity Model, a key output of the Centre), and also to consider how strategic, or not, such activities appear to be. An extension of this project, or indeed a second parallel project, might seek to perform a comparison of the existing efforts with the economic and technology metrics that exist for countries around the world, exploring if the data shows any relationships exist between those metrics and the capacity building activities underway. This analysis would involve regression techniques.
Beyond MCMC -- scalable and approximate Bayesian inference for computational statistics in global health Seth Flaxman Artificial Intelligence and Machine Learning B C MSc

Co-supervisor: Dr Swapnil Mishra (https://s-mishra.github.io/)

In applied work, especially disease modeling, we have reached the limits of what standard MCMC samplers can solve in a reasonable amount of computational time. We will investigate a variety of recently proposed inference schemes, from variational Bayes to deep learning ensemble models to parallel implementations of Sequential Monte Carlo, applying them to popular models in biostatistics, especially multilevel / hierarchical models. The goal will not be just to figure out "what works" but to understand the shortcomings of existing tools, and come up with guidance for practitioners. Co-supervisor

Disease Mapping with Neural Networks Seth Flaxman Artificial Intelligence and Machine Learning C MSc Traditionally, disease mapping has heavily leaned on statistical models such as multivariate normal distributions and Gaussian Processes. However, the ever-evolving landscape of machine learning, especially in the realm of neural networks, presents an untapped potential for disease mapping. This project aims to harness the latest advancements in spatial data modelling and bring them to the forefront of disease mapping. The goal of the project is to explore cutting-edge techniques, including Bayesian inference with Markov Chain Monte Carlo (MCMC), to unlock new insights and capabilities in disease mapping, bridging the gap between traditional statistical methods and the power of deep learning.
Topics in Randomised Algorithms and Computational Complexity Andreas Galanis Algorithms and Complexity Theory C MSc Description: Andreas Galanis is willing to supervise projects in the areas of randomised algorithms and computational complexity. Problems of interest include (i) the analysis of average case instances of hard combinatorial problems (example: can we satisfy a random Boolean formula?), and (ii) the computational complexity of counting problems (example: can we count approximately the number of proper colourings of a graph G?). The projects would suit mathematically oriented students, especially those with an interest in applying probabilistic methods to computer science.
Dynamic information design Jiarui Gan Artificial Intelligence and Machine Learning MSc

In information design, a more-informed player (sender) influences a less-informed decision-maker (receiver) by signalling information about the state of the world. The problem for the sender is to compute an optimal signalling strategy, which leads to the receiver taking actions that benefit the sender. Dynamic information design, as a new frontier of information design, generalises the one-shot framework to dynamic settings that are modelled based on Markov decision processes. The goal of the project is to study several variants of the dynamic information design problem and it can be approached from both theoretical and empirical perspectives. Theoretically, the focus is on determining the computational complexity of the optimal information design in different dynamic settings and developing algorithms. A background in computational complexity and algorithm design is beneficial. Practically, the objective is to apply existing algorithms to novel applications, such as traffic management or board games, and to develop algorithms that work effectively in real-world scenarios using state-of-the-art methods. Knowledge in Markov Decision Processes, stochastic/sequential games, and Reinforcement Learning is preferred.

Related work:

J. Gan, R. Majumdar, G. Radanovic, A. Singla. Bayesian persuasion in sequential decision-making. AAAI '22

E. Kamenica, M. Gentzkow. Bayesian persuasion. American Economic Review, 2011

S. Dughmi. Algorithmic information structure design: a survey. ACM SIGecom Exchanges 15.2 (2017): 2-24.

Wu, J., Zhang, Z., Feng, Z., Wang, Z., Yang, Z., Jordan, M. I., & Xu, H. (2022). Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning. arXiv preprint arXiv:2202.10678.

AI for a Decentralised Web Naman Goel, Rui Zhao, Jun Zhao Human Centred Computing MSc In response to the increasing centralisation of the World Wide Web and the concerns such centralisation raises (e.g. privacy violations, censorship, loss of individual autonomy, etc), decentralised personal online data stores (pods) have emerged as a promising alternative. A leading example is Solid (https://solidproject.org). However, there remain a number of unsolved challenges in ensuring that this kind of alternative is as useful for people as the existing Web, encouraging adoption and further development. This project will explore how some recent advances in AI can be integrated in Solid. An example challenge that the project could focus on is how to provide Google-like search features in Solid environment, for public, communal and private data, while respecting the decentralised nature of Solid. The student will not only propose novel solution(s) to address the technical challenges but also empirically analyse the proposed solution and compare with baselines. Interested students are welcome to contact Rui Zhao and Naman Goel to discuss or propose their own ideas related to above (rui.zhao@cs.ox.ac.uk, naman.goel@cs.ox.ac.uk). Prerequisites: - Willingness to learn new technologies - Good understanding of distributed systems - Good understanding of the Web architecture - Technical experience with at least one of the following will be a big plus: - Solid - RDF / Linked Data / Semantic Web / Knowledge Graph - NLP / LLM, and (text) similarity search - Confident with an appropriate language to work with AI and/or Solid, especially Python or JS/TS References: 1. https://link.springer.com/article/10.1007/s41019-024-00263-w 2. https://dl.acm.org/doi/10.1145/3589335.3651251 3. https://www.w3.org/DesignIssues/PrivateData.html
Improving privacy-preserving collective computation Naman Goel, Rui Zhao, Jun Zhao Human Centred Computing MSc In response to the increasing centralisation of the World Wide Web and the concerns such centralisation raises (e.g. privacy violations, censorship, loss of individual autonomy, etc), decentralised personal online data stores (pods) have emerged as a promising alternative. A leading example is Solid (https://solidproject.org). It provides a new paradigm to solve issues with privacy-friendly data usage at individual level, but challenges still exist for privacy-preserving computations that involve collective data. We have proposed an architecture targeting exactly this issue, Libertas [1], and have made further exploration in improving the efficiency. That forms a foundation for the problem, but several new interesting open questions emerge. For example: - How to further improve the efficiency of the computation? - How to better facilitate users’ trust selection, such as forming provenance or undeniable records of agents’ behaviour history? - How to allow users to express exercise autonomy, such as by allowing more types of restrictions in agent selection and computation performance? - What technical improvements can lower users’ cognitive burden when setting up and using the architecture? In this project, the student will explore creative or user-friendly approaches to improve the privacy-preserving collective computation architecture, to address one of the questions above or propose their own questions. The student will not only propose novel solution(s) to address the technical challenges but also empirically analyse the proposed solution and compare with baselines. Interested students are welcome to contact Rui Zhao and Naman Goel to discuss or propose their own ideas related to above (rui.zhao@cs.ox.ac.uk, naman.goel@cs.ox.ac.uk). Prerequisites: - Willingness to learn new technologies - Good understanding of distributed systems - Good understanding of the Web architecture - Technical experience with at least one of the following will be a big plus: - Solid - Cryptographics in general, or one of the following topics: - Differential Privacy - Multi-Party Computation - Trusted Execution Environment - Familarity with Python or any programming language to work with an MPC framework, such as MP-SPDZ - Familarity with JS/TS, to work with Solid References: [1] https://arxiv.org/abs/2309.16365 [2] https://www.w3.org/DesignIssues/PrivateData.html
Cake-cutting with low envy Paul Goldberg Algorithms and Complexity Theory C MSc Cake-cutting refers to the design of protocols to share a divisible good amongst a collection of agents. A standard starting-point for cake-cutting protocols is the classical "I cut, you choose" rule. This rule is said to be "envy-free" since each player can ensure that they value their own share at least as much as they value the other player's share. Well-known further work has extended this idea to more than 2 players. In the paper at the URL below, we identify various classes of protocols, and show that one can convert protocols from one kind to another so as to maintain the worst-case level of envy that results. https://arxiv.org/abs/2108.03641 The project is to look at the computational complexity of evaluating the worst-case level of envy that can result from using a given protocol, and related questions about protocols belonging to classes of interest. The project is mainly based on mathematical analysis as opposed to computational experiment. But there is scope for some computational experiment, for example in searching for value functions that result in a high envy.
Learning probabilistic automata Paul Goldberg Algorithms and Complexity Theory C MSc Probabilistic automata are a means of producing randomly-generated strings that are accepted/recognised by a finite automaton. (They are conceptually similar to hidden Markov models.) The general topic of the project is to find algorithms to reconstruct an automaton based on observations of strings generated by that automaton. It's a topic that has led to a steady stream of papers in the research literature, and the project would focus on one of the many variants of the problem. For example, "labelled automata" in which partial information is provided about the current state of the automaton in the form of random labels associated with states. Another general issue is how to efficiently extract extra information present in long strings generated by the unknown automaton. Within this project topic, there is scope for focusing either on experiments, or on mathematical analysis of algorithms. In the latter case, it would be helpful to know the basic concepts of computational learning theory.
Bioinformatics Projects Jotun Hein Computational Biology and Health Informatics B C MSc

Jotun Hein has been in Oxford since 2001 and has supervised many students from Computer Science. His main interest is computational biology and combinatorial optimization problems in this area, especially from phylogenetics, biosequence analysis, population genetics, grammars and the origin of life. Some idea for projects can be obtained by browsing these pages:

https://heingroupoxford.com/learning-resources/topics-in-computational-biology/

https://heingroupoxford.com/previous-projects/

A student is also welcome to propose his or her own project.

Distributional prior in Inductive Logic Programming Celine Hocquette MSc Inductive Logic Programming is a form (ILP) of Machine Learning based on Computational Logic. Given examples and some background knowledge, the goal of an ILP learner is to search for a hypothesis that generalises the examples. However, the search space in ILP is a function of the size of the background knowledge. All predicates are treated as equally likely, and current ILP systems cannot make use of distributional assumptions to improve the search. This project focuses on making use of an assumed given prior probability over the background predicates to learn more efficiently. The idea is to order subsets of the background predicates in increasing order of probability of appearance. This project is a mix of theory, implementation, and experimentation. Prerequisites: logic programming, probability theory, or interest to learn about it
Identifying relevant background knowledge in Inductive Logic Programming Celine Hocquette MSc

Inductive Logic Programming (ILP) is a form of Machine Learning based on Computational Logic. Given examples and some background knowledge, the goal of an ILP learner is to search for a hypothesis that generalises the examples. The search space in ILP is a function of the size of the background knowledge. To improve search efficiency, we want to identify relevant areas of the search space as relevant background knowledge predicates. We propose to evaluate and compare several relevance identification methods such as compression of the examples or statistical approaches. This project is a mix of theory, implementation, and experimentation.

Prerequisites: logic programming, statistical learning, or interest to learn about it

A fast numerical solver for multiphase flow David Kay Computational Biology and Health Informatics B C MSc

The Allen-Cahn equation is a differential equation used to model the phase separation of two, or more, alloys. This model may also be used to model cell motility, including chemotaxis and cell division. The numerical approximation, via a finite difference scheme, ultimately leads to a large system of linear equation. In this project, using numerical linear algebra techniques, we will develop a computational solver for the linear systems. We will then investigate the robustness of the proposed solver.

Efficient solution of one dimensional airflow models David Kay Computational Biology and Health Informatics B C MSc In this project we will investigate the use of numerical and computational methods to efficiently solve the linear system of equations arising from a one dimensional airflow model within a network of tree like branches. This work will build upon the methods presented within the first year Linear Algebra and Continuous Maths courses. All software to be developed can be in the student’s preferred programming language.  General area: Design and computational implementation of fast and reliable numerical models arising from biological phenomena.
Extensions of the square root law of steganography Andrew Ker Security B C MSc

This is a theoretical project that aims to prove probabilistic results relating to the capacity of different types of cover to conceal a hidden message. It will use tools of probability, information theory, and mathematics. An advanced square root law can be found at http://www.cs.ox.ac.uk/andrew.ker/docs/ADK71B.pdf, and this project either aim for elementary proofs of special cases of that result, or elementary extensions, or to find counterexamples to illustrate that the hypotheses are necessary. Suitable for a student with a background in mathematics who has taken the Probability & Computing option.

Bots for a Board Game Stefan Kiefer Automated Verification B C MSc The goal of this project is to develop bots for the board game Hex. In a previous project, an interface was created to allow future students to pit their game-playing engines against each other. In this project the goal is to program a strong Hex engine. The student may choose the algorithms that underly the engine, such as alpha-beta search, Monte-Carlo tree search, or neural networks. The available interface allows a comparison between different engines. It is hoped that these comparisons will show that the students' engines become stronger over the years. The project involves reading about game-playing algorithms, selecting promising algorithms and datastructures, and design and development of software (in Java or Scala).
Using Virtual Reality to predict how we use memory in natural behaviour: collaborative interdisciplinary projects. Stefan Kiefer Automated Verification B C MSc Name and Email Address of Research Project Supervisor: Dr Dejan Draschkow, dejan.draschkow@psy.ox.ac.uk; Project Description: Although human knowledge and memories represent the past, we form and use them to support future behavior (Nobre & Stokes, 2019). Understanding which factors contribute to learning about the world and successfully finding the learned information in mind is of critical importance for developing methods for supporting this behavior in healthy individuals, but also in individuals with a range of neurocognitive and psychiatric conditions, such as stroke, Alzheimer’s, and Parkinson’s disease. Our novel virtual reality task (Draschkow, Kallmayer, & Nobre, 2021) combines the ecological validity, experimental control, and sensitive measures required to investigate the naturalistic interplay between memory and perception and opens the doors to investigating and supporting complex cognitive functions (https://www.youtube.com/watch?v=GT2kLkCJQbY). In the proposed interdisciplinary projects, computer science and experimental psychology students will be paired to develop and validate sophisticated virtual reality protocols for measuring and supporting complex cognitive mechanism. Specific projects will focus on selected sub-topics and vary in scope, depending on students' interests and what kind of project it is (3rd, 4th, or MSc). These include: • Programming and refining game-like cognitive VR tasks in C# (Unity) • Developing protocols for online-based assessments of cognitive functions in C#/JavaScript • Developing algorithms for detecting markers of neurocognitive symptoms (such as tremor for Parkinson’s disease) in multivariate VR data (R/Python) • Developing proof-of-concept multimodal (voice, visual, and touch) protocols for supporting learning and memory in VR (with implications for supporting dementia patients) (C#/JavaScript/R/Python) The projects are suitable for students who feel comfortable with highly interdisciplinary work/teams and have experience with (or be open to learn) scientific programming in C#/JavaScript/R/Python. Students will be fully integrated in a successful and collaborative research group and get hands-on experience with an interactive product-development cycle, including multiple stakeholders. Further related readings are: (Ballard et al., 1997; Hayhoe, 2017; Hayhoe & Ballard, 2014) Relevant readings from cognitive science: Ballard, D. H., Hayhoe, M. M., Pook, P. K., & Rao, R. P. N. (1997). Deictic codes for the embodiment of cognition. In Behavioral and Brain Sciences (Vol. 20, Issue 4, pp. 723–767). https://doi.org/10.1017/S0140525X97001611 Draschkow, D., Kallmayer, M., & Nobre, A. C. (2021). When Natural Behavior Engages Working Memory. Current Biology, 31(4), 869-874.e5. https://doi.org/10.1016/j.cub.2020.11.013 Hayhoe, M. M. (2017). Vision and Action. Annual Review of Vision Science, 3(1), 389–413. https://doi.org/10.1146/annurev-vision-102016-061437 Hayhoe, M. M., & Ballard, D. (2014). Modeling task control of eye movements. Current Biology : CB, 24(13), R622-8. https://doi.org/10.1016/j.cub.2014.05.020 Nobre, A. C. (Kia), & Stokes, M. G. (2019). Premembering Experience: A Hierarchy of Time-Scales for Proactive Attention. Neuron, 104(1), 132–146. https://doi.org/10.1016/j.neuron.2019.08.030
Quantum Software Projects Aleks Kissinger Quantum C MSc

I am interested in supervising projects in these areas:

(i) developing better quantum compilers for translating high-level algorithms to real hardware,

(ii) simulating complex quantum computations with classical (super) computers, and

(iii) automatically verifying the exact or approximate correctness of quantum computations.

Many of these projects rely on a mathematical methods based on the ZX calculus or related graphical calculi, mostly developed here in Oxford. Rather than specific, pre-made projects, I prefer to meet with students and design a project together, often with input/cosupervision from other members of the Quantum Group. Students interested in doing a project are highly encouraged to take the MSc/PartC courses Quantum Processes and Computation and Quantum Software. We also organise various events and group seminars for students doing (or interested in doing) dissertations with the quantum group, usually starting in HT. If you are interested, contact me (aleks.kissinger@cs.ox.ac.uk) and we'll make sure you get informed about those.

Research topic - Quantum Software: compiling, classical simulation, and verification

Computation Theory with Atoms Bartek Klin Programming Languages B C MSc

Sets with atoms, also known as nominal sets, are an abstract foundational approach to computing over data structures that are infinite but highly symmetrical, so much so they are finitely presentable and amenable to algorithmic manupulation. Many basic results of classical mathematics and computation theory become more subtle, or even fail, in sets with atoms. For example, in sets with atoms not every vector space has a basis, and a Turing machine may not determinize.

A variety of specific topics are available, aimed at mathematically oriented students.

Prerequisities: Strong mathematical background is essential. Students who enjoy courses such as "Models of Computation", "Categories, Proofs and Processes" or "Computer-Aided Formal Verification" will find this subject suitable.

Some relevant literature:

  • M. Bojanczyk: Slightly Infinite Sets. draft available from https://www.mimuw.edu.pl/~bojan/upload/main-6.pdf
  • A. Pitts: Nominal Sets: Names and Symmetry in Computer Science. Cambridge University Press, 2013.
Asymptotically automatic sequences Jakub Konieczny, James Worrell Automated Verification B C

Asymptotically automatic sequences - that is, sequences whose n-th term can be computed by a finite automaton given as input the expansion of n in a given base k - have long been studied in theoretical computer science, number theory, combinatorics and algebra, to name just a few. Recently I introduced a notion tentatively dubbed "asymptotically automatic sequence". This class of sequences remains largely unexplored. It's likely that new results can be obtained by minor modifications of existing arguments, while other cases will present more interesting challenges.

The purpose of this project is to see what results on automatic sequences admit a straightforward generalisation to the asymptotic regime, and for which it's possible to construct a counterexample.

The goal is to find at least one, preferably more, existing results on automatic sequences and either generalise it to asymptotically automatic sequences, or to disprove such a generalisation.

Basic results and definitions can be found in: https://arxiv.org/abs/2305.09885

And a variant of Cobham's theorem can be found in: http://arxiv.org/abs/2209.09588

The standard reference for automatic sequences is “Automatic Sequences: Theory, Applications, Generalizations” by J.-P. Allouche and J. Shallit.

Pre-requisites: Familiarity with automatic sequences / regular languages; basic analysis

Extensions of Presburger arithmetic by polynomial-like functions Jakub Konieczny, James Worrell Automated Verification B C

Presburger arithmetic – that is, the first order theory of the natural numbers with addition – is decidable, meaning that there exists a procedure to determine if a given sentence in the language of this theory is true or false. In contrast, Peano arithmetic – which also includes multiplication – is undecidable. This naturally leads to the much-studied question: Which extensions of Presburger arithmetic are decidable? For instance, adding the square function allows us to define multiplication and hence leads to an undecidable theory. The same applies to other polynomials. But what about more general functions? We recently noticed that for a number of simple generalised polynomials – that is, expressions built up from the usual polynomials using addition, multiplication and the integer part function – the corresponding extensions are also undecidable. The idea behind this project is to investigate more broadly the question of decidability of extensions of the Presburger arithmetic by generalised polynomials.

The purpose of this project is to apply existing techniques to show in concrete cases that the extension of the Presburger arithmetic by a given generalised polynomial is undecidable. More generally, one can consider other classes of sequences which have polynomial-like behaviour in a suitable sense.

The goal is to show undecidability of the extension of the Presburger arithmetic by at least one generalised polynomial which was not previously covered. More ambitious variants correspond to dealing with wider classes of generalised polynomials.

Background on generalised polynomials can be found in (the introductory sections of) the papers:

A canonical form and the distribution of values of generalized polynomials by A. Leibman

Distribution of values of bounded generalized polynomials by V. Bergelson and A. Leibman

Pre-requisites: A course in logic, some analysis preferred but not required

Truthful scheduling for graphs Elias Koutsoupias Algorithms and Complexity Theory B C MSc The aim of the project is to advance our understanding of the limitations of mechanism design for the scheduling problem, the "holy grail" of algorithmic mechanism design. Specifically, we will consider the graphical scheduling problem, in which every task can be only allocated to two machines, and study the approximation ratio of mechanisms for this setting. The aim is to prove both lower and upper bounds. Both directions are hard problems, and we plan to try to gain some understanding by experimentally searching for lower bounds or trying to verify potentially useful statements about the upper bound. Of particular importance is the case of trees and their special case of stars, i.e., when every task can be given either to the root or to a particular leaf. We plan to study not only the standard objective of the makespan, but the more general class of objectives in which the mechanism minimizes the L^p norm of the times of the machines. The case L^infinity is to minimize the makespan, L^1 is to minimize the welfare, and the case L^0 corresponds to the Nash Social Welfare problem, all of which are interesting problems. Further possible directions include considering fractional scheduling and mechanisms without money. Bibliography: George Christodoulou, Elias Koutsoupias, Annamária Kovács: Truthful Allocation in Graphs and Hypergraphs. ICALP 2021: 56:1-56:20 (https://arxiv.org/abs/2106.03724)
Truthful scheduling for graphs Elias Koutsoupias Algorithms and Complexity Theory B C MSc The aim of the project is to advance our understanding of the limitations of mechanism design for the scheduling problem, the "holy grail" of algorithmic mechanism design. Specifically, we will consider the graphical scheduling problem, in which every task can be only allocated to two machines, and study the approximation ratio of mechanisms for this setting. The aim is to prove both lower and upper bounds. Both directions are hard problems, and we plan to try to gain some understanding by experimentally searching for lower bounds or trying to verify potentially useful statements about the upper bound. Of particular importance is the case of trees and their special case of stars, i.e., when every task can be given either to the root or to a particular leaf. We plan to study not only the standard objective of the makespan, but the more general class of objectives in which the mechanism minimizes the L^p norm of the times of the machines. The case L^infinity is to minimize the makespan, L^1 is to minimize the welfare, and the case L^0 corresponds to the Nash Social Welfare problem, all of which are interesting problems. Further possible directions include considering fractional scheduling and mechanisms without money. Bibliography: George Christodoulou, Elias Koutsoupias, Annamária Kovács: Truthful Allocation in Graphs and Hypergraphs. ICALP 2021: 56:1-56:20 (https://arxiv.org/abs/2106.03724)
Probabilistic Modelling Checking Marta Kwiatkowska Automated Verification B C

Professor Marta Kwiatkowska is happy to supervise projects involving probabilistic modelling, verification and strategy synthesis. This is of interest to students taking the Probabilistic Model Checking course and/or those familiar with probabilistic programming.

Below are some concrete project proposals, but students’ own suggestions will also be considered:

  • Synthesis of driver assistance strategies in semi-autonomous driving. Safety of advanced driver assistance systems can be improved by utilising probabilistic model checking. Recently (http://qav.comlab.ox.ac.uk/bibitem.php?key=ELK+19) a method was proposed for correct-by-construction synthesis of driver assistance systems. The method involves cognitive modelling of driver behaviour in ACT-R and employs PRISM. This project builds on these techniques to analyse complex scenarios of semi-autonomous driving such as multi-vehicle interactions at road intersections.
  • Equilibria-based model checking for stochastic games. Probabilistic model checking for stochastic games enables formal verification of systems where competing or collaborating entities operate in a stochastic environment. Examples include robot coordination systems and the Aloha protocol. Recently (http://qav.comlab.ox.ac.uk/papers/knps19.pdf) probabilistic model checking for stochastic games was extended to enable synthesis of strategies that are subgame perfect social welfare optimal Nash equilibria, soon to be included in the next release of PRISM-games (www.prismmodelchecker.org). This project aims to model and analyse various coordination protocols using PRISM-games.

  • Probabilistic programming for affective computing. Probabilistic programming facilitates the modelling of cognitive processes (http://probmods.org/). In a recent paper (http://arxiv.org/abs/1903.06445), a probabilistic programming approach to affective computing was proposed, which enables cognitive modelling of emotions and executing the models as stochastic, executable computer programs. This project builds on these approaches to develop affective models based on, e.g., this paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=PK18).
Safety Assurance for Deep Neural Networks Marta Kwiatkowska Automated Verification B C

Safety Assurance for Deep Neural Networks

Professor Marta Kwiatkowska is happy to supervise projects in the area of safety assurance and automated verification for deep learning, including Bayesian neural networks. For recent papers on this topic see  http://qav.comlab.ox.ac.uk/bibitem.php?key=WWRHK+19, http://qav.comlab.ox.ac.uk/bibitem.php?key=RHK18 and http://qav.comlab.ox.ac.uk/bibitem.php?key=CKLPPW+19, and also https://www.youtube.com/watch?v=XHdVnGxQBfQ.

Below are some concrete project proposals, but students’ own suggestions will also be considered:

  • Robustness of attention-based sentiment analysis models to substitutions. Neural network models for NLP tasks such as sentiment analysis are susceptible to adversarial examples. In a recent paper (https://www.aclweb.org/anthology/D19-1419/) a method was proposed for verifying robustness of NLP tasks to symbol and word substitutions. The method was evaluated on CNN models. This project aims to develop similar techniques for attention-based NLP models (www-nlp.stanford.edu/pubs/emnlp15_attn.pdf).
  • Attribution-based safety testing of deep neural networks. Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. In a recent paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=WWRHK+19) a game-based method was proposed for robustness evaluation, which can be used to provide saliency analysis. This project aims to extend these techniques with the attribution method (http://arxiv.org/abs/1902.02302) to produce a methodology for computing the causal effect of each feature and evaluate it on image data.
  • Uncertainty quantification for end-to-end neural network controllers. NVIDIA has created a deep learning system for end-to-end driving called PilotNet (http://devblogs.nvidia.com/parallelforall/explaining-deep-learning-self-driving-car/). It inputs camera images and produces a steering angle. The network is trained on data from cars being driven by real drivers, but it is also possible to use the Carla simulator. In a recent paper (http://arxiv.org/abs/1909.09884) a robustness analysis with statistical guarantees for different driving conditions was carried out for a Bayesian variant of the network. This project aims to develop a methodology based on these techniques and semantic transformation of weather conditions (see http://proceedings.mlr.press/v87/wenzel18a/wenzel18a.pdf) to evaluate the robustness of PilotNet or similar end-to-end controllers in a variety of scenarios.

 

An Autopsy Windows registry analysis plugin Harjinder Lallie, Michael Goldsmith Security B

You will develop an Autopsy Forensic Browser plugin. The report will highlight all installed programmes with dates, user account data, other important temporal information. On the face of it, this is a reasonably straightforward task, however we will seek to enrich the functionality for example: 1) a comparative analysis of a windows 10 and the brand new windows 11 registry. 2) identify programmes that were once installed on the machine. 3) provide intelligence on devices that were once attached.

Project preparation: If you are considering adopting this project, you should spend some time learning Netbeans IDE, learning how to compile Autopsy Forensic Browser, and strengthening your Java skills.

References Singh, A., Venter, H.S. and Ikuesan, A.R., 2020. Windows registry harnesser for incident response and digital forensic analysis. Australian Journal of Forensic Sciences, 52(3), pp.337-353. Shaaban, A. and Abdelbaki, N., 2018. Comparison study of digital forensics analysis techniques; Findings versus resources. Procedia Computer Science, 141, pp.545-551.

An Autopsy drone analysis plugin Harjinder Lallie, Michael Goldsmith Security B

Law enforcement investigations increasingly rely on the evidence generated on drone devices. Although there is an abundance of published research on the topic, there are no accepted drone forensics tools. You will develop a drone forensics tool which integrates with Autopsy Forensic Browser. To be able to do that, you will need to use the Netbeans IDE environment and be able to program Java. You will be provided with some code developed by previous project students. The tool you develop {processes, extracts} evidence from {a drone make/model, a small range of 2-3 make models} which records {Android, IOS, BIN} evidence and {presents all the evidence to the investigator in a graphical format, presents the evidence to the investigator to then be investigated using third part tools}. Elements in curly brackets to be decided. You will be provided with access to a set of drone forensic images. I might (exceptionally) be interested in a non-Autopsy based tool.

Project preparation: If you are considering adopting this project, you should spend some time learning Netbeans IDE, learning how to compile Autopsy Forensic Browser, and strengthening your Java skills.

References: Bouafif, H., Kamoun, F., Iqbal, F. and Marrington, A., 2018, February. Drone forensics: challenges and new insights. In 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS) (pp. 1-6). IEEE. Al-Dhaqm, A., Ikuesan, R.A., Kebande, V.R., Razak, S. and Ghabban, F.M., 2021. Research challenges and opportunities in drone forensics models. Electronics, 10(13), p.1519.

Applying social network analysis in hard disk based digital investigations: An Autopsy plugin Harjinder Lallie, Michael Goldsmith Security B

Connections between entities in a forensic case can be analysed using social networking theory such as degree centrality, betweenness centrality, closeness centrality, and EigenCentrality. This is a novel way of analysing forensic cases, and can reveal better and more intelligent insights into a case than traditional methods. Although social network analysis has been applied in numerous other fields, there is little research performed within digital investigations of artefacts seized from a crime scene. In this project, you will write a Netbeans/Java based tool which analyses email communications in a forensics case. Using the emails, you will develop social networking graphs at helping forensic investigators identify key persons in a case.

Project preparation: If you are considering adopting this project, you should spend some time learning Netbeans IDE, learning how to compile Autopsy Forensic Browser, and strengthening your Java skills.

References: Freeman, L., 2004. The development of social network analysis. A Study in the Sociology of Science, 1(687), pp.159-167. Scott, J., 2012. What is social network analysis? (p. 114). Bloomsbury Academic.

Dashcam analysis Harjinder Lallie B C

There exist no current guidelines and very few tools to aid the investigation of a dashcam device, particularly for the purpose of extracting and mapping geospatial data contained therein. This project aims to extract, map and chart geospatial data from a dashcam device and provide insights into the routes and speeds taken by a passenger.You will be provided with a number of dashcam forensic images (.E01 files which can easily be extracted back to MOV/MP4 files). You will be expected to develop the solution using Python, and if possible, integrate the solution with Autopsy, an open-source digital forensic tool. You might consider using an open-source tool such as Exiftool to process the EXIF data. You could choose to take this project to a somewhat different angle by choosing to extract watermark data from the dashcam footage and map that instead.

To aid in understanding the background to this project, please see the reference below and the video here: https://warwick.ac.uk/fac/sci/wmg/mediacentre/events/dfls/previousevents/dashcamforensics/

Lallie, H.S., 2020. Dashcam forensics: A preliminary analysis of 7 dashcam devices. Forensic Science International: Digital Investigation, 33, p.200910.

Prerequisite. You may want to use tools such as EXIFTOOL, an open-source EXIF data extraction tool. Although EXIFTOOL might serve your needs, you will most probably want to develop an excellent knowledge of the EXIF standard. Additional support can be provided by providing you with access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. I will also provide you with sufficient background in the topic prior to you beginning the study. You will need good programming skills, preferably in Python.

Keyword searching audio/video files Harjinder Lallie B C Digital forensic investigators often search for the existence of keywords on hard disks or other storage medium. Keywords are easily searchable in PDF/word/ text/other popular formats, however, current digital forensic tools do not allow for keyword searching through movies/audio. This is essential in cases which involve dashcam footage, recorded conversations etc. The aim of this project is to produce a tool which auto-transcribes audio data and then performs a keyword search on the transcribed file – pinpointing the point(s) in the file where the keyword(s) appear. You will be expected to develop the solution using Python, and if possible, integrate the solution with Autopsy, an open-source digital forensic tool. Prerequisite. Additional support can be provided by providing you with access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. You are likely to use the Python SpeechRecognition and possibly the PyAudio libraries. For convenience and demonstration of practicality, you may want to integrate the solution with the open-source forensics tool – Autopsy – and hence will need to develop a good understanding of this tool particularly the keyword search facility.
Reasoning with Causality in Digital Forensic Cases. Harjinder Lallie B C Unlike traditional 'wet' forensics, digital forensics is heavily reliant on circumstantial evidence. Quite often a digital investigator is tasked with outlining the sequence of events that took place on a computer system to establish causality and a chain of events. These event sequences are reasonably straightforward to map conceptually, however, there is no solution which 'auto-links' these events. This project applies probabilistic techniques using an attack tree (or alternative) methodology to determining probable event sequences from sequence fragments available on a computer system. You will be supplied with a digital forensic image of a used hard disk, or you may choose to develop your own. You will most likely organize event sequences into categories (e.g. software installation, execution, file download, browsing etc) and then test your proposals on further hard disk images and possibly supplement your findings through qualitative feedback from law enforcement experts (I can connect you). Prerequisite. Knowledge of Autopsy, an open-source forensic tool. Additional support will be given, if required, by providing access to my online wiki self-guided tutorials and by providing access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. I will also provide you with sufficient background in the topic prior to you beginning the study. You will need good programming skills, preferably in Python.
Ivan Martinovic Security B C MSc

Power-Line Communication (PLC) has seen a wide adoption in power grid and critical infrastructure applications in recent years. For example, for the interconnection of smart meters. Another example is the Combined Charging System (CCS), one of the most widely adopted DC fast charging standards for Electric Vehicles (EVs), which uses PLC for the communication between the vehicle and the charging station. This communication channel is used to exchange safety critical information, such as battery temperature, maximum charging voltage and current, and state of charge.

Unfortunately, our recent research has shown that the PLC communication used by CCS is vulnerable to wireless attacks on the physical layer [1, 2]. We demonstrated that an adversary can eavesdrop on the communication and showed that the charging communication can easily be disrupted. Given the nature of PLC and its tendency to crosstalk, both attacks can be conducted wirelessly and from a distance. 

In this project, the student will explore PLC security in different contexts, such as smart homes and smart metering infrastructure. The focus will be on adapting and replicating wireless attacks from previous work.

 

Prerequisites: Some familiarity in the area of digital signal processing and with Python.

Useful URLs

https://github.com/ssloxford/brokenwire
https://gitlab.com/rbaker/hpgp-emis-rx


References:

[1] Baker and Martinovic. "Losing the car keys: Wireless PHY-layer insecurity
in EV charging." USENIX, 2019.

[2] Köhler et al. "Brokenwire: Wireless disruption of ccs electric vehicle
charging." Network and Distributed System Security (NDSS) Symposium
2023.

Detecting ship misbehaviour through SAR satellite imagery and RF signal analysis Ivan Martinovic, Simon Birnbach Security B C MSc

A key challenge for maritime security is automatically detecting and classifying ships. This is essential so that government and law enforcement agencies know where ships are and what they are up to. This way they can combat key maritime threats such as piracy, unsustainable fishing, pollution, or the smuggling of illicit goods. Due to limitations of the two main technologies used for maritime security—the automatic identification system (AIS) and synthetic aperture radar (SAR) satellite imagery—current solutions are inaccurate and often require manual intervention. AIS is a cooperative tracking system that provides precise and frequent updates of a vessel’s location and identity. But, due to the cooperative nature of AIS, ships can choose to stop participating in the system or even falsify messages that mask their location or identity. As a non-cooperative system, SAR can detect ships even if they want to remain hidden. However, SAR has its own disadvantages, including low image resolutions and infrequent updates of each location.

Students may choose to tackle this problem in one of three possible ways:

• By improving ship detection and classification systems

• By developing a transmitter fingerprinting system for AIS messages

• By leveraging RF signals to localise AIS messages

Prerequisites: This project will require knowledge in machine learning, data analysis, and a good grasp of python.

References:

For the ship classification sub project:

[1] Fernando Paolo, et al. xview3-sar: Detecting dark fishing activity using synthetic aperture radar imagery. Advances in Neural Information Processing Systems, 35, 2022.

[2] Xiyue Hou, et al. Fusar-ship: Building a high-resolution sar-ais matchup dataset of gaofen-3 for ship detection and recognition. Science China Information Sciences, 63, 2020.

For the transmitter fingerprinting sub project:

[1] Joshua Smailes et al. Watch this space: securing satellite communication through resilient transmitter fingerprinting. Conference on Computer and Communications Security, ACM, 2023. https://ora.ox.ac.uk/objects/uuid:6d23ae00-8a25-434a-952d-0908cc9a3b89

[2] Qi Jiang et al. Rf fingerprinting identification in low snr scenarios for automatic identification system. IEEE Transactions on Wireless Communications, 23:3, 2024.

For the RF localisation sub project:

[1] Eric Jedermann, et al. Orbit-based authentication using tdoa signatures in satellite networks. In Proceedings of the 14th ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2021.

EV Charging Testbed Ivan Martinovic Security B C MSc Co Supervised by Systems Security Lab Right now, there is a format war for methods of charging electric vehicles (EVs). One of the most developed systems is the Combined Charging System (CCS). It is underpinned by an international standard called ISO 15118, which describes a complex web of communication that allows vehicle charging to take place.  The car talks to the charger it is connected to, in order to report status and place requests for current. The charger, in turn, talks to the backend of the company that runs the charging site (the so-called charging provider), in order to authorise charging sessions and report any errors. It may also talk to a site controller at the same physical location, which is responsible for coordinating multiple chargers and ensuring capacity is shared appropriately between them. The charging provider's backend may federate to other charging providers to permit interoperability between networks.  In the near future, it is also expected that the charging provider will be able to talk to energy companies for automated billing, or even to allow electricity to flow the other way -- from vehicle-to-grid (V2G).  Exploring the operations and security of this interconnected system is crucially important, while it is still being developed, to avoid building insecure infrastructure that we have to live with for many years to come. A number of problems have already been discovered (see links below) and many more are still to come. Having access to realistic testbeds will accelerate this work substantially, by allowing researchers worldwide to evaluate ideas on a small scale, without having to gain access to costly real-world infrastructure.  Our lab has already built a physical testbed for the car-to-charger communication, and successfully attacked it. Other researchers have built a system that can simulate multiple such links, but only virtually. Certainly, neither of these existing testbeds implement the full system end-to-end, particularly the backend or federation aspects, so it is hard to understand what all the effects of an attack would be.  In this project, a student would work towards extending the existing testbed implementations to integrate backend functionality and therefore simulate the whole charging ecosystem. When complete, a user could create a virtual charging provider and register some customers, spawn charging sites with chargers, spawn vehicles -- and then simulate full charging sessions. In the ideal case, the implementation would be containerised so that it can be deployed on real hardware, in virtual machines, or a combination of both. It is important to say that there is a lot of scope in this project, and it is not expected that all of it would be implemented in any one year. In fact, there are many parts to suit different interests, across system development and scientific investigation. Meaningful contributions might include: * Build a working, low-power CCS charger for a toy car  * Containerise and integrate several open projects into a connected system  * Implement functionality for a charging provider as an open-source reference  * Model the PKI operations described in ISO 15118  * Investigate the capability of attackers at various stages of the charging process  * Make something crazy, like a car-to-car charger with offline crypto payments  Some research links:  * Our first paper on CCS (testbed in appendix) : https://www.usenix.org/conference/usenixsecurity19/presentation/baker  * The virtual testbed from Padua : https://spritz.math.unipd.it/projects/MiniV2G/  * An overview of CCS : https://en.wikipedia.org/wiki/Combined_Charging_System  Some open implementations of parts of the ecosystem: * OpenV2G : http://openv2g.org/  * RISE-V2G : https://github.com/SwitchEV/RISE-V2G  * OCPP : https://www.openchargealliance.org/
Key Management: Evaluating Network Load and Storage Requirements Ivan Martinovic, Joshua Smailes Security B C MSc

In large satellite networks, particularly interplanetary networks, key management is currently an unsolved problem - pre-shared keys become infeasible due to the large number of nodes in the network, and PKI is made more difficult due to the long distances and intermittent connectivity between nodes.

Recent work within our group has made use of a network simulator to test the suitability of terrestrial PKI to large-scale satellite systems, finding that it can be used with a small number of modifications. This project will seek to extend this work by implementing additional assessment criteria to the network simulator. Current simulations focus on connection establishment time and the time taken for revocation messages to cover the entire network - this will be extended to add storage and network load measurement capabilities to the simulator, showing that it is possible to use protocols that are not only faster but also require less space and network load.

Prerequisites This project will require experience with network protocols, key management, and a good grasp of Python.

Useful URLs https://arxiv.org/abs/2408.10963v3

Key Management: Formal Verification Ivan Martinovic, Joshua Smailes Security B C MSc

In large satellite networks, particularly interplanetary networks, key management is currently an unsolved problem - pre-shared keys become infeasible due to the large number of nodes in the network, and PKI is made more difficult due to the long distances and intermittent connectivity between nodes. Recent work within our group makes use of a network simulator to test the suitability of terrestrial PKI to large-scale satellite systems, finding that it can be used with a small number of modifications. This project would seek to extend this work with the aid of formal verification techniques, seeking to formally establish convergence properties, time bounds, and the ordering of actions for a given network topology and PKI system.

Prerequisites This project will require an understanding of formal verification techniques,
network protocols, and a basic understanding of key management/revocation techniques.

Useful URLs https://arxiv.org/abs/2408.10963v3

Securing Satellite Communication through Transmitter Fingerprinting Ivan Martinovic, Joshua Smailes Security B C MSc

Existing work has shown physical-layer fingerprinting to be an effective technique for authenticating satellite communication, particularly in the absence of cryptographic authentication. In fingerprinting, physical-layer characteristics of the signal are observed in order to identify the transmitter and distinguish legitimate transmitters from attacker-controlled radios.

In this project a student will extend an existing satellite fingerprinting system, which has been primarily evaluated using the Iridium satellite constellation. The student will extend the system to work with additional satellite systems, assessing the transferability of the techniques. This will require the capture of a new dataset of satellite communication from a different constellation, which may include building a decoder for the messages, depending on availability of open-source decoders. An extension of the work might involve the development of new techniques to work better across multiple satellite systems.

Prerequisites: This project will require experience with machine learning (specifically TensorFlow), and an understanding of signal modulation and coding (the project will likely require building a message decoder). Experience with software-defined radios is preferable but not required.

Useful URLs: See also https://arxiv.org/pdf/2305.06947.pdf (current work) https://github.com/ssloxford/SatIQ

Securing the Electric Vehicle Charging Communication Ivan Martinovic, Sebastian Köhler, Marcell  Szakaly Security B C MSc

Our recent research showed that the most-widely adopted Electric Vehicle
(EV) charging standard, known as the Combined Charging System (CCS), is
insecure and vulnerable to attacks, affecting millions of EVs world-wide [1,
2]. An attacker can wirelessly eavesdrop on the communication or inject
their own signals to disrupt it. Because EVs are an integral part of the power
grid and nation critical infrastructure, any attack on them could have serious
consequences, such as blackouts.


In this project, the student will investigate potential countermeasures to
detect and prevent attacks. This will involve extensive testing on real
hardware to evaluate performance under realistic conditions. This may also
involve collaborative experimentation with our industry partners.


Prerequisites

Some familiarity in the area of digital signal processing and with Python.
Knowledge of FPGAs would be an advantage, but is not required.


Useful URLs https://github.com/ssloxford/brokenwire
https://gitlab.com/rbaker/hpgp-emis-rx


References

[1] Baker and Martinovic. "Losing the car keys: Wireless PHY-layer insecurity
in EV charging." USENIX, 2019.

[2] Köhler et al. "Brokenwire: Wireless disruption of ccs electric vehicle
charging." Network and Distributed System Security (NDSS) Symposium
2023.

Security Evaluation of Power-Line Communication Ivan Martinovic, Sebastian Köhler, Marcell  Szakaly Security B C MSc

Power-Line Communication (PLC) has seen a wide adoption in power grid and critical infrastructure applications in recent years. For example, for the interconnection of smart meters. Another example is the Combined Charging System (CCS), one of the most widely adopted DC fast charging standards for Electric Vehicles (EVs), which uses PLC for the communication between the vehicle and the charging station. This communication channel is used to exchange safety critical information, such as battery temperature, maximum charging voltage and current, and state of charge.

Unfortunately, our recent research has shown that the PLC communication used by CCS is vulnerable to wireless attacks on the physical layer [1, 2]. We demonstrated that an adversary can eavesdrop on the communication and showed that the charging communication can easily be disrupted. Given the nature of PLC and its tendency to crosstalk, both attacks can be conducted wirelessly and from a distance.


In this project, the student will explore PLC security in different contexts, such as smart homes and smart metering infrastructure. The focus will be on adapting and replicating wireless attacks from previous work.


Prerequisites: Some familiarity in the area of digital signal processing and with Python.

Useful URLs: https://github.com/ssloxford/brokenwire
https://gitlab.com/rbaker/hpgp-emis-rx


References:

[1] Baker and Martinovic. "Losing the car keys: Wireless PHY-layer insecurity
in EV charging." USENIX, 2019.

[2] Köhler et al. "Brokenwire: Wireless disruption of ccs electric vehicle
charging." Network and Distributed System Security (NDSS) Symposium
2023.

Signal Injection Attacks Against Modern Sensors Ivan Martinovic, Sebastian Köhler Security B C MSc

In recent years, the boundaries between the physical and the digital world have become increasingly blurry. Nowadays, many digital systems interact in some way with the physical world. Large and complex cyber-physical systems, such as autonomous and electric vehicles, combine the physical and the digital world and enable the interaction between those two domains. Usually, such systems are equipped with numerous sensors to measure physical quantities, such as temperature, pressure, light, and sound. These physical quantities are vital inputs for the computations and can influence the decision-making process of the system.

However, the nature of an analog sensor makes it not easily possible to
authenticate the physical quantity that triggered a stimulus [1]. For
instance, a temperature sensor cannot detect if the stimulus was caused by
a legitimate temperature increase or by an adversary using a hairdryer. This
is a major concern because the integrity of sensor measurements is critical
to ensuring that a system behaves as intended, and a violation of this
principle can have serious security, safety and reliability consequences. In
our research, we have shown that different sensors are vulnerable to signal
injection attacks on the physical layer [2, 3, 4].

In this project, a student would analyse the vulnerability of sensors as they
are used in modern systems, such as cars, the smart grid and IoT devices.
The project will enable the student to research signal injection attacks using
different modalities, such as light, acoustic and electromagnetic waves.
Moreover, the student will be able to assess the impact of a successful
attack against a system as a whole and work on novel countermeasures that
can help to improve the security of the next generation of systems.

 

Prerequisites Some familiarity in the area of digital signal processing and with Python.


Useful URLs https://github.com/ssloxford/ccd-signal-injection-attacks
https://github.com/ssloxford/they-see-me-rollin
https://arxiv.org/pdf/2305.06901


References

[1] Kune, Denis Foo, et al. "Ghost talk: Mitigating EMI signal injection attacks
against analog sensors." 2013 IEEE Symposium on Security and Privacy. IEEE,
2013.

[2] Köhler, Sebastian, Richard Baker, and Ivan Martinovic. "Signal injection
attacks against ccd image sensors." Proceedings of the 2022 ACM on Asia
Conference on Computer and Communications Security. 2022.

[3] Köhler, Sebastian, et al. "They See Me Rollin’: Inherent Vulnerability of
the Rolling Shutter in CMOS Image Sensors." Annual Computer Security
Applications Conference. 2021.

[4] Szakály, Marcell, et al. "Assault and Battery: Evaluating the Security of
Power Conversion Systems Against Electromagnetic Injection Attacks." arXiv
preprint arXiv:2305.06901 (2023).

SparSDR++: Wideband Monitoring and Discovery of Communication Ivan Martinovic, Sebastian Köhler, Joshua Smailes Security B C MSc

The goal of this project is to extend the "SparSDR" toolset to make it more useful for passive wideband monitoring and discovery of communication.

Software Defined Radios (SDRs) have been used extensively in recent years to monitor radio communication. These devices can tune a wide range of frequencies (typically up to 6GHz) and can reach even wider frequencies with appropriate up/down-converters. However, their bandwidth is often limited by both the sample rate of the chip and the USB/ethernet link back to the main PC.

The SparSDR project aims to fix the latter of these problems, by using the onboard FPGA provided by some SDRs to perform onboard power triggering. This enables the SDR to monitor a wide bandwidth on a limitedrate backhaul, by only sending back the samples containing potentially meaningful data.

This project will involve work with the following technologies:

  • SDRs, GNURadio
  • FPGA development (Xilinx, Vivado)
  •  High performance C++

Prerequisites Students taking on the project should be experienced with some of these
areas, and able/willing to learn the others.

References - SparSDR paper: https://dl.acm.org/doi/pdf/10.1145/3307334.3326088
- SparSDR source code: https://github.com/ucsdsysnet/sparsdr
- GNU Radio: https://www.gnuradio.org/

Transport Layer Security for Satellite Networks Ivan Martinovic, Simon Birnbach Security B C MSc

Co-Supervised by System Security Lab Long-range satellite communications networks suffer from high network latencies due to the long distances to satellites. These high latencies have a detrimental effect on the performance of common protocols used for internet traffic, such as the Transmission Control Protocol (TCP). Currently widely-used tools to optimise TCP performance are incompatible with the encrypted traffic of current VPNs. This has led to many operators resorting to providing their communication services unencrypted, leaving customers exposed to eavesdropping attacks.

In our research group, we have developed QPEP [1], a novel combination of a VPN and a satellite performance-enhancing proxy, which enables the use of encrypted traffic over satellite links without the usual performance drawbacks.

This project would improve on this work in one or more of the following ways:

  • Low Earth Orbit Evaluation
  • Packet Loss Resilience
  • Scalability & Multi-user Environments

This project is in collaboration with the European Space Agency.

References

[1] Pavur, J. C., et al. "QPEP: An actionable approach to secure and performant broadband from geostationary orbit." (2021). https://github.com/ssloxford/qpep https://www.ndss-symposium.org/wp-content/uploads/2021-074-paper.pdf

Applied Formal Verification Tom Melham Automated Verification B C MSc

I am happy to supervise projects at all levels in applied formal verification. These typically involve hands-on investigation and innovation to develop new, practical methods of correctness analysis and verification for various kinds of computer systems – either hardware, software, or a combination. They are ideally suited to students who have an interest in computer systems, logic, and hands-on practical work that has a solid theoretical basis. You don’t have to have taken the course in Computer-Aided Formal Verification, if you are willing to learn the theories and technologies required.

All my projects are designed to have a strong element of research and sufficient challenge to allow a motivated student to make an excellent contribution. Projects are usually therefore quite ambitious, but they are also designed realistically to fit into the time available. And they always have some fall-back options that are less challenging but can still result in an excellent achievement.

Rather than offer readymade project ideas, I encourage students with an interest in this area to meet me and together discuss what might align best with their background and interests. I always have several project ideas that link to my current research or research being done by my group. Often projects will have a connection to real-world verification problems in industry, and many of my students will have contact through their projects with leading verification researchers and engineers in industry.

If interested to discuss, please contact me.

Developing computational tools to aid the design of CRISPR/Cas9 gene editing experiments Peter Minary Computational Biology and Health Informatics B C MSc

At present, the most versatile and widely used gene-editing tool is the CRISPR/Cas9 system, which is composed of a Cas9 nuclease and a short oligonucleotide guide RNA (or guide) that guides the Cas9 nuclease to the targeted DNA sequence (on-target) through complementary binding.  There are a large number of computational tools to design highly specific and efficient CRISPR-Cas9 guides but there is a great variation in performance and lack of consensus among the tools.  We aim to use ensemble learning to combine the benefits of a selected set of guide design tools to reach superior performance compared to any single method in predicting the efficiency of guides (for which experimental data on their efficiency is available) correctly.

Recommended for students who has done the Machine Learning and the Probability and Computing courses.

 

Developing machine learning models for off-target prediction in CRISPR/Cas9 gene editing Peter Minary Computational Biology and Health Informatics B C MSc The CRISPR/Cas9 gene editing system is composed of a Cas9 nuclease and a short oligonucleotide guide RNA (or guide) that guides the Cas9 nuclease to the targeted DNA sequence (on-target) through complementary binding but the Cas9 nuclease may also cleave off-target genomics DNA sequences, which contain mismatches compared to the gRNA, therefore, undesired cleavage could occur. The obvious factors influencing off-target cleavage activity of the CRISPR/Cas9 gene editing system are the sequence identities of the guide RNA and the off-target DNA . Various 'basic features' derived from said sequences have been fueling the development of procedural and machine learning models for off-target cleavage activity prediction but there are numerous 'non-basic fetures' (such as the sequence context around the off-target DNA) that may also influence off-target cleavage activity. The project will aim for the development of novel off-target clavage activity predictions models using approaches that include but not limited to combining 'basic features' and 'non-basic features' to increase the accuracy of model predictions of experimental off-target cleavage activities. Prerequisites: Recommended for students who has done a Machine Learning course and has interest in molecular biology.
Implementing a Datalog Reasoner Boris Motik Data, Knowledge and Action B The objective of this project would be for a student to implement a simple Datalog reasoner. Depending on the student’s ambitions, the reasoner could be running in main memory (easier version) or on disk (more challenging). The student would be expected to design the data structures needed to store the data (in RAM or on disk, as agreed with me). On top of this, the student would implement the seminaive algorithm and evaluate the system on a couple of medium-sized datasets. We have in our group ready-made datasets that could be used for evaluation. A variation of this project where the student would reuse an existing relational database to store the data. Then, the seminaive algorithm would be implemented either as an extension of the database (assuming the database is open source), or on top of the database (by running SQL queries). This last variant would arguably be much easier as it would involve less design, and more just reusing existing technologies. More advanced students could extend their system to implement various incremental reasoning algorithms. For example, I would give them one of the many papers I’ve written on this topic, and they would have to (a) understand the formal material and (b) implement and evaluate the algorithms. Hence, this project would also give students the opportunity to go as far as they can. Having attended the Databases or Database System Implementation courses, and perhaps to a lesser extent the KRR course, would be a prerequisite for doing this project.
Implementing a Tableaux Reasoner for Description Logics Boris Motik Data, Knowledge and Action B The objective of this project would be for a student to implement a simple tableau-based reasoner for description logics. The project would involve the student designing the core data structures, implementing the basic algorithm, realising backtracking, and implementing various optimisations typically used in practical reasoners. The student could use a preexisting API (e.g., the OWL API) to load a description logic ontology, so they could just focus on solving the core problem. The student would be expected to evaluate their implementation on the ontologies from the repository that we’re maintaining in the KRR group (http://www.cs.ox.ac.uk/isg/ontologies/). The project seems to me to provide good opportunity for the student to demonstrate how well they absorbed the CS material (e.g., algorithms, data structures, computational complexity analysis, etc.) taught in our degrees. Also, the project is sufficiently open-ended so that the student can go quite a long way before running out of options of things to do. Having attended the KRR course would be a prerequisite for doing this project.
Topics in Automata Theory, Program Verification and Programming Languages (including lambda calculus and categorical semantics) Andrzej Murawski Programming Languages C MSc

Prof Murawski is willing to supervise in the area of automata theory, program verification and programming languages (broadly construed, including lambda calculus and categorical semantics).

For a taste of potential projects, follow this link.

Concurrent Algorithms and Data Structures Hanno Nickau Quantum C MSc Projects in the area of Concurrent Algorithms and Data Structures
Concurrent Programming Hanno Nickau Quantum B C MSc Projects in the area of Concurrent Programming
Continual learning by surrogate objectives Yangchen Pan , Jiarui Gan Artificial Intelligence and Machine Learning MSc

A critical challenge in continual learning is catastrophic forgetting (CF), where the acquisition of new information leads to the erosion of previously learned knowledge. This phenomenon poses a substantial barrier, particularly in the context of updating large models, rendering the process computationally unscalable as data increase. CF is primarily attributed to biased gradients resulting from shifts in data sampling distribution over time. This project explores a computationally efficient approach to alleviate forgetting. The core idea involves identifying surrogate objective functions that circumvent the need for extensive memory and computational resources during optimization. The student undertaking this project will work on:

  1. Literature Review: Delving into key papers on continual learning to gain insights into the intricacies of the problem and experiment settings.
  2. Implementation: implementing the proposed method and conducting a comparative analysis against existing approaches, with a focus on both sample and computation efficiency.
Sample and Computation Efficient Online Adaptation through Offline Reinforcement Learning Yangchen Pan , Jiarui Gan Artificial Intelligence and Machine Learning MSc

In addressing real-world challenges, AI agents, often driven by expansive neural networks like Large Language Models (LLMs) such as GPT, face significant computational and sample-related costs during training and deployment. Notably, Reinforcement Learning (RL) agents frequently undergo training on vast offline datasets with relaxed computation budgets, followed by deployment or fine-tuning during an online stage that demands rapid computation. This project aims to investigate methods that capitalize on the disparity between offline and online computation budgets to streamline the training and deployment of RL agents.

The student undertaking this project will first delve into relevant literature by studying recommended papers. Subsequently, the student will implement offline RL methods to facilitate downstream online learning. In the third phase, the student will experiment with various online RL algorithm variants. The final algorithm will undergo rigorous empirical testing and comparison to validate its efficiency in handling the challenges posed by large models and varying computation budgets.

Deep Reinforcement Learning for High-Dimensional POMDPs David Parker, Nick Hawes Automated Verification MSc

Background

Recent work has demonstrated that Deep Reinforcement Learning (DRL) algorithms can achieve human-level control policies across various applications. This project will focus on developing and testing DRL methods specifically for Partially Observable Markov Decision Processes (POMDPs), where the agent must make decisions in environments with limited and noisy observations. A key challenge is ensuring that the algorithm remains robust in environments with high-dimensional observations.

Focus

The student undertaking this project will gain familiarity with POMDP definitions and relevant environments. The objective is to implement DRL-based POMDP algorithms capable of deriving robust solutions for high-dimensional observations. The student will explore the following techniques during the project:

• Dimensionality reduction using neural networks to compress high-dimensional observations into lower-dimensional latent representations;

• Attention mechanisms to focus on the most relevant parts of high-dimensional observations for decision-making, linking these to specific beliefs;

• Processing observations in a hierarchical structure at different resolutions to improve computational efficiency;

• Designing specific loss functions that incorporate reconstruction, contrastive, and belief consistency terms to learn compact, task-relevant representations.

Method

References:

• Igl M, Zintgraf L, Le T A, et al. Deep variational reinforcement learning for POMDPs[C]. International conference on machine learning. PMLR, 2018.

• Meng L, Gorbet R, Kulić D. Memory-based deep reinforcement learning for pomdps[C]. IEEE/RSJ international conference on intelligent robots and systems. IROS, 2021.

• Lauri M, Hsu D, Pajarinen J. Partially observable Markov decision processes in robotics: A survey[J]. IEEE Transactions on Robotics, 2022.

Model checking of POMDPs David Parker Automated Verification B C MSc Formal verification techniques have recently been developed for probabilistic models with partial observability, notably partially observable Markov decision processes (POMDPs), and implemented in software tools such as PRISM. This project will investigate extensions of these techniques, including for example the applicability of sampling based solution methods, adaptation to more expressive temporal logics or extension to partially observable stochastic games.
Model checking of stochastic games David Parker Automated Verification B C MSc This project will develop formal verification techniques for stochastic games, in particular by considering techniques based on game-theoretic notions of equilibria. A range of such techniques are already implemented in the PRISM-games model checker. This project will consider extensions of these approaches to tackling new types of equilibria, such as Stackelberg equilibria, with potential directions including designing extended temporal logics and solution methods, or modelling new applications, for example from multi-robot coordination.
Probabilistic Model Checking under Uncertainty David Parker Automated Verification B C MSc Formal methods for analysing models such as Markov chains and Markov decision processes can be extended to explicitly reason about model uncertainty, for example by building and analysing interval Markov decision processes. This project will investigate alternative approaches to tackling this problem, which could include alternative models of transition probability uncertainty, factoring in dependencies between different sources of uncertainty, or using bayesian inference to learn model parameters.
Building databases of mathematical objects in Sagemath (Python) Dmitrii Pasechnik B C MSc

There is an enormous amount of information on constructing various sorts of ``interesting'', in one or another way, mathematical objects, e.g.

block designs, linear and non-linear codes, Hadamard matrices, elliptic curves, etc.  There is considerable interest in having this information available in computer-ready form.  However, usually the only  available form is a paper describing the construction, while no computer code (and often no detailed description of a possible implementation) is provided. This provides interesting algorithmic and software engineering challenges in creating verifiable implementations; properly structured and documented code, supplemented by unit tests, has to be provided, preferably in functional programming style (although  performance is important too).

 

Sagemath project aims in part to remedy this, by implementing such constructions, see e.g. Hadamard matrices in Sagemath: http://doc.sagemath.org/html/en/reference/combinat/sage/combinat/matrices/hadamard_matrix.html and http://arxiv.org/abs/1601.00181.

 

The project will contribute to such implementations.

There might be a possibility for participation in Google Summer of Code (GSoC) with Sagemath as a GSoC organisation, and being partially funded by the EU project ``Open Digital Research Environment Toolkit for the Advancement of Mathematics''

http://opendreamkit.org/.

Prerequisites: Interest in open source software, some knowledge of Python, some maths background.

Computing with semi-algebraic sets in Sage (math) Dmitrii Pasechnik B C MSc

Semi-algebraic sets are subsets of R^n specified by polynomial inequalities. The project will extend capabilities of Sage (http://www.sagemath.org) to deal with them, such as CADs computations or sums of squares based (i.e. semi-definite programming based) methods. There might be a possibility for participation in Google Summer of Code (GSoC) or Google Semester of Code with Sage as a GSoC organisation.

Prerequisites

Interest in open source software, some knowledge of Python, appropriate maths background.

Automatic translation to GPGPU Joe Pitt-Francis Computational Biology and Health Informatics B C

This project involves running cardiac cell models on a high-end GPU card. Each model simulates the electrophysiology of a single heart cell and can be subjected to a series of computational experiments (such as being paced at particular heart rates). For more information about the science and to see it in action on CPU see "Cardiac Electrophysiology Web Lab" at https://travis.cs.ox.ac.uk/FunctionalCuration/ An existing compiler (implemented in Python) is able to translate from a domain specific XML language (http://models.cellml.org) into a C++ implementation. The goal of the project is to add functionality to the compiler in order to get OpenCL or CUDA implementations of the same cell models and to thus increase the efficiency of the "Web Lab".

General GPGPU and high performance computing projects Joe Pitt-Francis Computational Biology and Health Informatics B C

I am willing to supervise projects which fit into the general areas of General Purpose Graphics Processing Unit (GPGPU) programming and High-Performance Computing (HPC). Specific technologies used are likely to based around

  • NVIDIA CUDA for GPU programming;
  • MPI for distributed-memory cluster computing.

All application areas considered although geometric algorithms are favourites of mine.

General graphics projects Joe Pitt-Francis Computational Biology and Health Informatics B C

I am interested in supervising general projects in the area of computer graphics.  If you have a particular area of graphics-related research that you are keen to explore then we can tailor a bespoke project for you.  Specific projects I have supervised in the past include

  • "natural tree generation" which involved using Lindenmayer systems to grow realistic looking bushes and trees to be rendered in a scene;
  • "procedural landscape generation" in which an island world could be generated on-the-fly using a set of simple rules as a user explored it;
  • "gesture recognition" where a human could control a simple interface using hand-gestures;
  • "parallel ray-tracing" on distributed-memory clusters and using multiple threads on a GPU card;
  • "radiosity modelling" used for analysing the distribution of RFID radio signal inside a building; and
  • "non-photorealistic rendering" where various models were rendered with toon/cel shaders and a set of pencil-sketch shaders.

(Such a project is generally not suitable for MSc students.  They should note that in order for this option to work as a potential MSc project then it should be combined with a taught-course topic such as machine learning, concurrent programming, linguistics etc.)

Graphics pipeline animator Joe Pitt-Francis Computational Biology and Health Informatics B

Pre-requisites: Computer graphics, Object-oriented programming

The idea behind this project is to build an educational tool which enables the stages of the graphics pipeline to be visualised. One might imagine the pipeline being represented by a sequence of windows; the user is able to manipulate a model in the first window and watch the progress of her modifications in the subsequent windows. Alternatively, the pipeline might be represented by an annotated slider widget; the user inputs a model and then she moves the slider down the pipeline, watching an animation of the process

Intuitive exploration through novel visualisation Joe Pitt-Francis Computational Biology and Health Informatics B C I am interested in novel visualisation as a way to represent things in a more appealing and intuitive way. For example the Gnome disk usage analyzer (Baobab) uses either a "ring chart" or "treemap chart" Representation to show us which sub-folders are using the most disk. In the early 1990s the IRIX file system navigator used a 3D skyscraper representation to show us similar information. There are plenty more ways of representing disk usage: from DAGs to centralised Voronoi diagrams. What kind of representation is most intuitive for finding a file which hogging disk-space and which is most intuitive for helping us to remember where something is located in the file-system tree? The aim is to explore other places where visualisation gain intuition: for example, to visualise the output of a profiler to find bottlenecks in software, to visual a code coverage tool in order to check that test-suites are are testing the appropriate functionality or even to visualise the prevalence of diabetes and heart disease in various regions of the country.
The fundamental problem of counterfactual estimation Francesco Quinzan, Mark van der Wilk Artificial Intelligence and Machine Learning MSc

This project deals with the fundamental problem of counterfactual estimation. Counterfactual estimation (CE) refers to the process of estimating or predicting what would have happened in a given situation if different actions or events had taken place. Counterfactual estimation is particularly vital in healthcare due to the complex and interconnected nature of biological systems and the need to understand the true causes behind diseases, treatment effects, and patient outcomes.

Although CE holds tremendous promise and potential, estimating counterfactuals remains a significant challenge. Recently, transformer-type algorithms have emerged as a powerful tool for the related problem of zero-shot treatment effect estimation. In this project, we will look into transformer-type algorithms for CE. The work will consist of: (i) reading and understanding [1]; (ii) extend the framework of [1] to counterfactual estimation from interventional data; (iii) implement and perform experiments with a base model for this extended framework.

[1] Towards Causal Foundation Model: on Duality between Causal Inference and Attention. Jiaqi Zhang et al. CoRR abs/2310.00809 (2023)

Pre-requisites: A student suitable for this project will work with causal inference, attention mechanisms, and possibly the basics of reproducing kernel Hilbert spaces and feature maps. A background on any of these topics is desirable. Coding skills are required.

 

Statistical shape atlas of cardiac anatomy Blanca Rodriguez, Pablo Lamata Computational Biology and Health Informatics MSc

Description: Cardiac remodelling is the change in shape of the anatomy due to disease processes. 3D computational meshes encode shape variation in cardiac anatomy, and render higher diagnostic than conventional geometrical metrics. Better shape metrics will enable an earlier detection, a more accurate stratification of disease, and more reliable evaluation of remodelling response. This project will contribute to the development of a toolkit for the construction of anatomical atlases of cardiac anatomy, and its translation to clinical adoption. The student will learn about the challenges and opportunities of the cross-disciplinary field between image analysis, computational modelling and cardiology, and be part of a project with a big potential impact on the management of cardiovascular diseases.

Prerequisites: Motivation. Good programming skills. Experience with computational graphics and image analysis is an advantage.

Resurrecting Extinct Computers Alex Rogers Systems B C MSc

While the architecture of current reduced instruction set processors is well established, and relatively static, the early days of computing saw extensive experimentation and exploration of alternative designs. Commercial processors developed during the 1960s, 1970s and 1980s included stack machines, LISP machines and massively parallel machines, such as the Connection Machine (CM-1) consisting of 65,536 individual one-bit processors connected as a 12-dimensional hypercube. This period also saw the development of the first single chip microprocessors, such as the Intel 4004, and the first personal computers, such as the Altair 8800 using the Intel 8080 microprocessor. This project will attempt to resurrect one of these extinct designs (or a scaled down version if necessary) using software simulation or a low-cost field-programmable gate array (FPGA). You will be required research the chosen processor, using both original and modern sources, and then develop a register level description of the device that can be implemented in software or on an FPGA. The final device should be able to run the software of the original.

Prerequisites Digital Systems and Computer Architecture useful but not essential.

Understanding Bias in Object Detection Models Christian Rupprecht Artificial Intelligence and Machine Learning B C MSc

Object detection is a fundamental task in computer vision. The goal is to locate and classify individual instances of objects in images (e.g., people, cars, cups, sheep, etc.). Most current models have been trained on benchmark datasets that consist of hand-annotated images collected from the internet. This introduces bias in the training data which in turn has been shown to lead to biased models.

In this project, we will make use of modern image editing techniques, such as in-painting diffusion models, to modify images in specific ways (e.g., changing the apparent age of people, skin-color, day-night, …), which allows us to measure the impact of the edit on the object detection performance.

Goals:

• Setup an evaluation framework of object detection models that allows modification of the test data.

• Use different image editing techniques to edit the test data.

• Test several hypotheses for bias in multiple different models.

Stretch Goal:

• Bias can potentially mitigated by including the modified data during training of the model.

References:

Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models. 2022 IEEE." CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021.

Brooks, Tim, Aleksander Holynski, and Alexei A. Efros. "Instructpix2pix: Learning to follow image editing instructions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014.

Singh, Krishna Kumar, et al. "Don't judge an object by its context: Learning to overcome contextual bias." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

Pre-requisites: Machine Learning

Category-theoretic syntactic models of programming languages Philip Saville, Sam Staton Programming Languages C MSc

Syntactic models -- categories constructed from the syntax of a programming language -- play a key role in category-theoretic denotational semantics. Showing such models exist and satisfy a suitable universal property amounts to giving a sound and complete semantic interpretation for the language in question. Often this involves carefully studying the interplay between program features and categorical structures.

The three main aims of the project are as follows. Firstly, to construct syntactic models for two idealised effectful functional programming languages, namely Moggi's monadic metalanguage [1] and computational lambda calculus [2]. Next, to prove their universal properties, and finally to use these to give syntactic translations between the languages.

The starting point would be to understand the categorical semantics of the simply-typed lambda calculus, the monadic metalanguage and the computational lambda calculus. Extensions would include exploring extensionality / well-pointedness and constructing fully abstract syntactic models of these languages.

[1] E. Moggi, "Notions of computation and monads," 1991 [2] E. Moggi,

Pre-requisite courses:

     - Categories, proofs and processes Useful background courses:

     - Principles of programming languages

    - Lambda calculus and types

 

Achieving Superalignment through Weak-to-Strong Generalization Christian Schroeder de Witt, Philip Torr, Ani Calinescu Artificial Intelligence and Machine Learning B C MSc

The capabilities of AI systems have significantly grown within a short amount of time. Further advances in synthetic data generation, self-play, and other techniques are set to improve their performance further, potentially resulting in superhuman capabilities. This poses an important safety question: But how can humans supervise future systems that are much smarter than themselves? Whereas current systems are aligned using human data, tasks that are too hard to solve for humans will require progress in the nascent field of superalignment [1]. In this project, we will conduct a critical examination of existing superalignment frameworks [2], and leverage latest work in semi-supervised learning and other fields in order to advance our understanding of superalignment. This project is designed to lead to publication. We are looking for a highly-motivated student.

For this project, we will be able to receive advice from Collin Burns (OpenAI).

[1] https://openai.com/blog/introducing-superalignment

[2] https://cdn.openai.com/papers/weak-to-strong-generalization.pdf

Active Defenses against Illusory Attacks Christian Schroeder de Witt, Philip Torr, Alessandro Abate Automated Verification B C MSc

Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We recently demonstrated that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. In response, we introduced illusory attacks, a novel form of adversarial attack on sequential decision-makers that is both effective and of epsilon-bounded statistical detectability. In this project, we, for the first time, investigate how victims of epsilon-bounded illusory attacks can actively defend themselves, i.e. adapt their test-time behaviour to maximise information-theoretic detectability. Drawing from the active learning literature, and inspired by Bayesian optimal experiment design and constraint preferences in economics, we will develop novel ways of optimising for the victim’s best response to a cost-bounded adversary. This project is designed to lead to publication. We are looking for a highly-motivated student with interest in theory.

[1] Tim Franzmeyer, Stephen Marcus McAleer, Joao F. Henriques, Jakob Nicolaus Foerster, Philip Torr, Adel Bibi, Christian Schroeder de Witt, Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers, https://openreview.net/forum?id=F5dhGCdyYh (under review at ICLR 2024)

 

Enhancing Worst-Case Safety in Large Language Models through Influence Functions and Backdoor Detection Christian Schroeder de Witt, Philip Torr, Ani Calinescu Artificial Intelligence and Machine Learning B C MSc

Large language models have advanced in various fields, but concerns about their safety in worst-case scenarios persist [1]. Interpretability methods are used to understand these models' decisions but have limitations — they mainly focus on average cases and lack ground truth [2]. This proposal aims to overcome these limitations by injecting backdoors into language models to provide ground truth and developing an efficient influence function-based method for backdoor detection. In doing so, we will leverage exciting new progress allowing influence functions to be scaled to large language models [3]. This project is designed to lead to publication. We are looking for a highly-motivated student. We will be able to collaborate with Dr. Ameya Prabhu (Tuebingen University) on this project.

[1] Carlini et al., Are aligned neural networks adversarially aligned?. June 2023. http://arxiv.org/abs/2306.15447. arXiv:2306.15447 [cs]

[2] Conmy et al., Towards Automated Circuit Discovery for Mechanistic Interpretability, October 2023. http://arxiv.org/abs/2304.14997. arXiv:2304.14997 [cs]

[3] Grosse et al., Studying Large Language Model Generalization with Influence Functions, https://arxiv.org/abs/2308.03296, arXiv:2308.03296 [cs.LG]

Extending AI-Generated Steganography to Generative Diffusion Models Christian Schroeder de Witt, Philip Torr, Mark van der Wilk Artificial Intelligence and Machine Learning B C MSc

Steganography is the practice of encoding a plaintext message into another piece of content, called a stegotext, in such a way that an adversary would not realise that hidden communication is occurring. In a recent breakthrough [1,2,3], we showed that messages can be encoded into the output distribution of arbitrary autoregressive neural networks with perfect security [4], i.e. without changing the output distribution, hence rendering the hiding of secret messages information-theoretically undetectable. In this project, we extend perfectly secure steganography to generative AI-models that are not explicitly representable as discrete autoregressive distributions, such as e.g. diffusion models. Such an extension would allow for applying information-theoretic steganography to high-resolution images, potentially resulting in novel tools for investigative journalism and watermarking.

This project is designed to lead to publication. We are looking for a highly-motivated student.

[1] https://arxiv.org/abs/2210.14889

[2] https://www.quantamagazine.org/secret-messages-can-hide-in-ai-generated-media-20230518/

[3] https://www.scientificamerican.com/article/ai-could-smuggle-secret-messages-in-memes/

[4] https://www.sciencedirect.com/science/article/pii/S0890540104000409

Learning Minimum-Entropy Couplings using AlphaZero and Geometric Deep Learning Christian Schroeder de Witt, Philip Torr, Michael Bronstein Artificial Intelligence and Machine Learning B C MSc

Steganography is the practice of encoding a plaintext message into another piece of content, called a stegotext, in such a way that an adversary would not realise that hidden communication is occurring. In a recent breakthrough [1,2,3], we showed that messages can be encoded into the output distribution of arbitrary autoregressive neural networks with perfect security [4], i.e. without changing the output distribution, hence rendering the hiding of secret messages information-theoretically undetectable. In this project, we seek to improve the heart of our perfectly-secure encoding algorithm, namely minimum entropy couplings. As constructing exact minimum entropy couplings is known to be NP-hard, we currently use an approximation algorithm that runs in loglinear time. For large block sizes, which are crucial to achieving good transmission rates, this algorithm is slow as it is not readily parallelisable. In this project, we will build upon recent advances in deep reinforcement learning self-play [5] and leverage graph networks to implement constraints on the sparse neural network outputs [6] in order to learn deep minimum entropy coupling networks. If successful, this project will greatly increase the feasibility of AI-generated steganography on mobile devices, as well as lead to novel methodological approaches for neural network architectures for sparse doubly-stochastic matrix outputs.

In this project, we will be working with state-of-the-art MARL implementations in JAX [2]. This project is designed to lead to publication. We are looking for a highly-motivated student.

[1] https://arxiv.org/abs/2210.14889

[2] https://www.quantamagazine.org/secret-messages-can-hide-in-ai-generated-media-20230518/

[3] https://www.scientificamerican.com/article/ai-could-smuggle-secret-messages-in-memes/

[4] https://www.sciencedirect.com/science/article/pii/S0890540104000409

[5] https://deepmind.google/discover/blog/alphadev-discovers-faster-sorting-algorithms/

[6] Modeling relational data with graph convolutional networks, M Schlichtkrull, TN Kipf, P Bloem, R Van Den Berg, I Titov, M Welling, The Semantic Web: 15th International Conference, ESWC 2018

[7] Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges, Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković, arXiv:2104.13478 [cs.LG]

 

Preventing Malicious Collusion between Advanced AI Systems Christian Schroeder de Witt, Philip Torr, Alessandro Abate Automated Verification B C

Consider settings in which one or more principals assign a task to a team of generative AI agents, for example a scheduling or negotiation task. The principals monitor the (not necessarily human-intelligible) communications between the agents and intervene if deemed necessary, hoping to prevent agents from pursuing undesirable joint strategies. In this project, we investigate the question if, when, and how optimisation pressure may lead generative AI agents to hide communications from their principals.

We survey the landscape of steganography (information hiding), and, for a given level of security, identify the required knowledge and capabilities of generative AI agents.

From these, we design a roadmap for model evaluation building on our recent work in this space [1][2]. We then empirically test the ability of state-of-the-art LLMs to engage in different types of covert communication given a variety of optimisation pressures. This project is designed to lead to publication. We are looking for a highly-motivated student.

[1] Secret Collusion Among Generative AI Agents: A Model Evaluation Framework, Sumeet Ramesh Motwani, Mikhail Baranchuk, Philip H.S. Torr, Lewis Hammond, and

Christian Schroeder de Witt, to appear - also see this talk here: https://www.alignment-workshop.com/nola-talks/christian-schroeder-de-witt-perfectly-secure-steganography-and-llm-collus

[2] https://www.quantamagazine.org/secret-messages-can-hide-in-ai-generated-media-20230518/

Supercharging Out-of-Distribution Dynamics Detection Christian Schroeder de Witt, Philip Torr, Varun Kanade Artificial Intelligence and Machine Learning B C MSc

The challenge in Out-of-Distribution Dynamics Detection (O2D3) is to test whether a given sequential control environment at test-time is the same as the one the agent was trained in. This capability is central to both AI safety, and security: A robot controller may have systematic errors at test-time, or an adversary may attack the AI agent’s sensors. In both cases, if anomalies in the test-time environment are not detected as soon as possible, potentially catastrophic outcomes may follow. In this project, we are building upon the latest advances in statistics, including doubly-robust estimation techniques, information-theoretic hypothesis testing and latest advances in out-of-distribution detection, in order to surpass the current state-of-the-art in O2D3 [1]. This project is designed to lead to publication. We are looking for a highly-motivated student with interest in theory.

[1] Linas Nasvytis, Kai Sandbrink, Jakob Foerster, Tim Franzmeyer and Christian Schroeder de Witt, Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection, AAMAS 2024 (Oral)

Adapting Red to the Language Server Protocol Bernard Sufrin Programming Languages B C

The Language Server Protocol is an open, JSON-RPC-based protocol for use between source code editors or integrated development environments (IDEs) and servers that provide ”language intelligence tools”: programming language-specific features like code completion, syntax highlighting and marking of warnings and errors, as well as refactoring routines. The goal of the protocol is to allow programming language support to be implemented and distributed independently of any given editor or IDE. In the early 2020s LSP quickly became a norm for language intelligence tools providers.


Red (also known as AppleRed) is no-frills, unicode-capable, modeless text editor with a simple implementation that can be customized using the redscript language (a Lisp-like notation). Its underlying capabilities can be straightforwardly extended using Scala. It has no pretensions to being an all-encompassing workplace, and unlike many IDE and modern editors does not confuse its user by spontaneously trying to be helpful: everything it does it does in response to user input from mouse, pad, or keyboard. Sometimes such “helpful” systems can be tricky to use because it’s not clear who or what is in control.


The aim of this project is to adapt Red to the LSP, and to use one or two specific Language Servers as test cases. Examples of language servers are: Metals (for Scala), rust.Analyzer(for Rust), haskell language server (for Haskell), and texlab (for Latex).

Modeless Structure Editing Bernard Sufrin Programming Languages B C

Oege de Moor and I wrote a paper called Modeless Structure Editing for Tony Hoare’s retirement symposium. What we had in mind was a family of modeless editors whose underlying model of the document being edited is akin to its abstract syntax tree. It’s clear what this means if the document represents, for example, a program in a programming language or a proof in a proof calculus; but we conjecture that structured prose documents will also be amenable to this approach to a useful extent.

Editing an abstract syntax tree model does not mean that one must necessarily work with a “tree-shaped” view of the document.

The benefits of editing an abstract syntax tree model include:

  • The ability to perform semantic checks incrementally and rapidly. Of course semantic checks are realized differently for different formalisms.
  • The potential to interact with varieties of views of the document, and the potential to switch views rapidly.
  • The potential to mix formalisms within a single document, and to derive a variety of artefacts from it. Imagine, for example, writing lectures about programs in a programming language: one wants (parts of) the program source code to appear in the lecture; and one wants the entire source code of the program to be derived from their “reference text” in the document.

Such an approach has the potential to yield dividends in performance and simplicity of implementation of incremental semantic checks, codegeneration, etc.

The first phase of this project will construct (in Scala) a prototype structure editing toolkit implementing the model explained in this paper. Subsequent phases will provide one or two case studies in its use. The architecture of an editor that could use such a document model already exists – in the shape of our AppleRed editor, whose document model is (in principle) “pluggable”, and that currently uses a linear model (document as a sequence of lines of text). What will be needed is to construct a new “plugin” conforming to the AppleRed document model interface, and to adapt and generalise that interface where necessary.

Parallel reasoning in Sequoia David Tena Cucala, Bernardo Cuenca Grau Data, Knowledge and Action C MSc

Sequoia is a state-of-the-art system developed by researchers in our department for automated reasoning in OWL 2, a standard ontology language in the Semantic Web. Sequoia uses consequence-based calculus, a novel and promising approach towards fast and scalable reasoning. Consequence-based algorithms are naturally amenable to parallel reasoning; however, the current version of Sequoia has limited support for parallel reasoning. This project aims at optimising Sequoia by identifying new and effective ways to exploit parallel reasoning. The student will perform an extensive empirical evaluation of the current version of Sequoia, identify key bottlenecks, and devise new ways to make efficient and powerful use of parallel reasoning. The student will evaluate the performance of any new versions of the system that they propose, with an eye to understanding their strengths and weaknesses. The student involved in this project will acquire intensive knowledge of state-of-the-art techniques for reasoning in OWL 2.

Pre-requisites: Knowledge Representation and Reasoning course. Experience with concurrent programming and/or Scala is desirable

Developing a Novel Pipeline for Pose-Conditioned Image/Video Generation with Diffusion Models Philip Torr, Oishi Deb Artificial Intelligence and Machine Learning B C MSc Supervisors: Oishi Deb; Philip Torr This project focuses on developing a novel pipeline for Pose-Conditioned Image Generation using Diffusion Models. This approach ensures the preservation of identity and viewpoints across azimuthal angles. It has significant applications in downstream tasks, such as 3D object articulation and 3D mesh deformation. The generated target will serve as pseudo-2D targets for 3D mesh deformation, effectively eliminating the need for 3D targets. These images will retain the identity of the reference frames while adopting the target pose specified through textual prompts. Achieving this requires a degree of disentanglement between identity and pose information. The project will leverage state-of-the-art diffusion models, such as Stable Diffusion 3, which offer high fidelity to user input text prompts. Additionally, this framework can be extended to animation, utilizing stable video diffusion models for target generation, paving the way for innovative applications in animated 3D content creation. This project is in collaboration between VGG and TVG. This is a well-scoped project which will lead to a publication soon, and a codebase is already developed, so it will be well-suited for students who would like to get published. References: 1. Stable Diffusion 3 - paper 2. Stable Video Diffusion - paper 3. MVDream - paper 4. Qianyi Deng*, Oishi Deb*, Amir Patel, Christian Rupprecht, Philip Torr, Niki Trigoni, and Andrew Markham. “Towards multi-modal animal pose estimation: An in-depth analysis”, 2024 - https://arxiv.org/pdf/2410.09312. 5. Oishi Deb, Diffusion Models for 3D Reconstruction and Articulation of Deformable Objects, Oxford Computer Science Conference, 2024, Link. 6. Personalized Image Animator (PIA) - https://github.com/open-mmlab/PIA?tab=readme-ov-file
Multi-Modal Partially Labelled Stream Philip Torr Artificial Intelligence and Machine Learning MSc Data on large systems is often stream lined and multi modal, e.g., textual, images, videos, and or sound. All this data is being accumulated while jointly changing in distribution. Moreover, much of this data presented from the stream is only partially labelled. We seek to study the problem of training models on a partially labelled streams in multi-modal setting. In particular, we seek to find new effective algorithms to performing joint self-supervised continual learning on the unlabelled data while learning in supervised fashion the labelled portion of the stream.
Navigating the Genetic Perturbation Landscape: Multi-modal causal representation learning for target discovery Philip Torr, Jonathan Hedley Artificial Intelligence and Machine Learning MSc

Cardiometabolic disorders remain the leading cause of mortality globally [1,2]. Addressing this major public health issue necessitates identifying effective pharmacological interventions, which requires a detailed understanding of the complex aetiology of these disorders. Cardiometabolic diseases are driven by an intricate interplay of genetic and environmental factors that impact the functionality of diverse cell types across the human body [3]. To tackle this complexity, new drug discovery approaches are essential to navigate the vast combinatorial landscape of potential pharmacological interventions and cellular phenotypes.

This project aims to develop an innovative predictive model for cellular response to genetic perturbations, a key step towards discovering drug targets for cardiometabolic disorders. By focusing on how cells react to genetic modifications (e.g., gene knockouts or gene silencing), this model will provide insights into the druggable genome—a critical factor for target discovery.

The Torr Vision Group has recently begun a collaboration with Novo Nordisk; together, we have the following objectives:

  1. Develop a Predictive Model: Create a model capable of accurately predicting unseen cellular responses to specific genetic perturbations across various cell types. This will be grounded on the comprehensive data generated in-house at the Novo Nordisk Research Centre in Oxford (NNRCO), where a framework is being developed to deeply characterise cellular phenotypes at scale.
  2. Develop Enhanced Cellular Representations: Develop multi-modal cellular representations that capture detailed patterns in imaging, gene expression and proteomics data, improving the accuracy of the predictive model. Learning these detailed patterns may also provide insights on genetic interactions and the gene regulatory network.
  3. Active Learning for Efficient Genome Screening: Given the scale of the human genome, exploring the combinatorial perturbation landscape defined by 20,000 protein-encoding genes poses a significant experimental challenge. Our approach will utilize an active learning framework to guide sequential, optimal experimental perturbation screens. This will enable efficient and targeted exploration of the genetic perturbation landscape, accelerating the discovery of therapeutic targets.

Interested students will have the opportunity to contribute to these multiple aspects of this project, from designing cellular representations to developing the active learning framework. This work will provide hands-on experience at the intersection of ML and genetics, contributing meaningfully to ML-driven drug discovery efforts.

[1] GBD 2021 Diabetes Collaborators (2023). Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet, 402(10397), 203–234.

[2] G.R. Dagenais, D. P. Leong, S. Rangarajan, et al. (2020). Variations in common diseases, hospital admissions, and deaths in middle-aged adults in 21 countries from five continents (PURE): a prospective cohort study. Lancet; 395(10226):785–794.

[3] C. Priest, P. Tontonoz, (2019). Inter-organ cross-talk in metabolic syndrome. Nature Metabolism ;1(12):1177–1188.

ARETHA: A Transparent and Respectful Virtual Assistant for the Home Max Van Kleek Human Centred Computing B C MSc Virtual assistants like Alexa and Siri have become hugely popular conveniences in homes worldwide. However, they also pose unprecedented privacy risks by being black-box vehicles of corporate surveillance by the likes of Amazon, Apple and others. Is it possible to design a virtual assistant that can be guaranteed to protect user privacy, and be fully transparent and controllable by end-users? This project will involve designing and developing a fully modular, open source virtual assistant called ARETHA from the ground up, using open source components, which can provide strong guarantees about being prirvacy-preserving and pro-user.
Privopticon: A privacy-preserving mesh OS for self-observation Max Van Kleek Human Centred Computing B C MSc

Self-monitoring has many potential applications within the home, such as the ability to understand important health and activity rhythms within a household automatically. But such monitoring activities also have associated extreme privacy risks. Can we design new kinds of sensing architectures that are designed to preserve inhabitants'  privacy?  We have made initial inroads of a new mesh operating system prototype for raspberry π hardware for new classes of privacy-preserving self-monitoring applications for the home, which will provide  user-configurable degrees of information fidelity, have built-in forgetting and are accountable by design. We need your help to try and evaluate different kinds of methods for achieving these goals.

3D demos of geometric concepts Irina Voiculescu Computational Biology and Health Informatics B The Geometric Modelling course (Part B) deals with several interesting concepts which would be best visualised in a suite of applications. A coherent suite of 3D demos could easily become a useful tool for this course, as well as for users worldwide. This project would be most suitable for a candidate who already have some experience using a 3D graphics library of their choice and want to improve this skill. The mathematical concepts are well-documented.
3D environment for Hand Physiotherapy Irina Voiculescu Computational Biology and Health Informatics B C After hand surgery, it is almost always necessary for patients to have physiotherapy afterwards to help with their recovery. As part of this, the patient will need to perform hand exercises at home. However, the patient may not always do the exercises correctly, or they might forget to do their exercises. The goal of this project is to use the Leap Motion to create a user-friendly GUI which a patient could use to aid them with their home exercises. The interface would show the user where their hand should be and they would then need to follow the movements. It could work from a web-based software or a downloaded software. It would need to be tailored to the patient so it contained their specific required exercises, which could be input by the physiotherapist. It would need to store data on how the patient is doing and feedback this data to the patient, and possibly also to the physiotherapist via the internet. If internet-based, patient confidentiality and security would need to be considered. This project would be performed in close collaboration with a physiotherapist, an orthopaedic hand surgeon, and a post-doctoral researcher based at the Nuffield Orthopaedic Centre.
3D printing medical scan data Irina Voiculescu Computational Biology and Health Informatics B C MSc

Computed tomography (CT) scanning is a ubiquitous scanning modality. It produces volumes of data representing internal parts of a human body. Scans are usually output in a standard imaging format (DICOM) and come as a series of axial slices (i.e. slices across the length of the person's body, in planes perpendicular to the imaginary straight line along the person's spine.)

The slices most frequently come at a resolution of 512 x 512 voxels, achieving an accuracy of about 0.5 to 1mm of tissue per voxel, and can be viewed and analysed using a variety of tools. The distance between slices is a parameter of the scanning process and is typically much larger, about 5mm.

During the analysis of CT data volumes it is often useful to correct for the large spacing between slices. For example when preparing a model for 3D printing, the axial voxels would appear elongated. These could be corrected through an interpolation process along the spinal axis.

This project is about the interpolation process, either in the raw data output by the scanner, or in the post-processed data which is being prepared for further analysis or 3D printing.

The output models would ideally be files in a format compatible with 3D printing, such as STL. The main aesthetic feature of the output would be measurable as a smoothness factor, parameterisable by the user.

Existing DICOM image analysis software designed within the Spatial Reasoning Group at Oxford is available to use as part of the project.

3D stereo display of medical scan data Irina Voiculescu, Stuart Golodetz Computational Biology and Health Informatics, Systems B C The Medical Imaging research group has been working with a variety of data sourced from CT and MRI scans. This data comes in collections of (generally greyscale) slices which together make up 3D images. Our group has developed software to generate 3D models of the major organs in these images. This project aims to develop a simple augmented reality simulation for the Oculus Rift which will render these organs within a transparent model of a human and allow the user to walk around the model so as to view the organs from any angle. This has a number of possible applications, including to train medical students and to help surgeons to explain medical procedures to their patients.
Exact Algorithms for Complex Root Isolation Irina Voiculescu Computational Biology and Health Informatics B C MSc

Not available in 2013/14

Isolating the complex roots of a polynomial can be achieved using subdivision algorithms. Traditional Newton methods can be applied in conjunction with interval arithmetic. Previous work (jointly with Prof Chee Yap and MSc student Narayan Kamath) has compared the performance of three operators: Moore's, Krawczyk's and Hansen-Sengupta's. This work makes extensive use of the CORE library, which is is a collection of C++ classes for exact computation with algebraic real numbers and arbitrary precision arithmetic. CORE defines multiple levels of operation over which a program can be compiled and executed. Each of these levels provide stronger guarantees on exactness, traded against efficiency. Further extensions of this work can include (and are not limited to): (1) Extending the range of applicability of the algorithm at CORE's Level 1; (2) Making an automatic transition from CORE's Level 1 to the more detailed Level 2 when extra precision becomes necessary; (3) Designing efficiency optimisations to the current approach (such as confirming a single root or analysing areas potentially not containing a root with a view to discarding them earlier in the process); (4) Tackling the isolation problem using a continued fraction approach. The code has been included and is available within the CORE repository. Future work can continue to be carried out in consultation with Prof Yap at NYU.

Gesture recognition using Leap Motion Irina Voiculescu Computational Biology and Health Informatics B C MSc

Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia - a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computer-based tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users being asked to carry out static gestures above a Leap Motion sensor, and these gestures being scored according to a variety of criteria. A prototype has been constructed to gather data, and some data has been gathered from a few controls and patients. In order to deploy this as a clinical tool into the NHS, there is need for a systematic data collection and analysis tool, based on machine learning algorithms to help classify the data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia. Also, the graphical user interface needs to be extended to give particular kinds of feedback to the patient in the form of home exercises, as part of a rehabilitation programme.

This project was originally set up in collaboration with Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new co-supervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016.

Identifying features in MRI scan data Irina Voiculescu Computational Biology and Health Informatics B C MSc In recent years, medical diagnosis using a variety of scanning modalities has become quasi-universal and has brought about the need for computer analysis of digital scans. Members of the Spatial Reasoning research group have developed image processing software for CT (tomography) scan data. The program partitions (segments) images into regions with similar properties. These images are then analysed further so that particular features (such as bones, organs or blood vessels) can be segmented out. The team's research continues to deal with each of these two separate meanings of medical image segmentation. The existing software is written in C++ and features carefully-crafted and well-documented data structures and algorithms for image manipulation. In certain areas of surgery (e.g. orthopaedic surgery involving hip and knee joint) the magnetic resonance scanning modality (MRI) is preferred, both because of its safety (no radiation involved) and because of its increased visualisation potential. This project is about converting MRI scan data into a format that can become compatible with existing segmentation algorithms. The data input would need to be integrated into the group's analysis software in order then to carry out 3D reconstructions and other measurements. This project is co-supervised by Professor David Murray MA, MD, FRCS (Orth), Consultant Orthopaedic Surgeon at the Nuffield Orthopaedic Centre and the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), and by Mr Hemant Pandit MBBS, MS (Orth), DNB (Orth), FRCS (Orth), DPhil (Oxon)Orthopaedic Surgeon / Honorary Senior Clinical Lecturer, Oxford Orthopaedic Engineering Centre (OOEC), NDORMS.
Reinforcement learning techniques for games Irina Voiculescu Computational Biology and Health Informatics B C

This project is already taken for 2017-2018

 

Psychology has inspired and informed a number of machine learning methods. Decisions within an algorithm can be made so as to improve an overall aim of maximising a (cumulative) reward. Supervised learning methods in this class are known as Reinforcement Learning. A basic reinforcement learning model consists of establishing a number of environment states, a set of valid actions, and rules for transitioning between states. Applying this model to the rules of a board game means that the machine can be made to learn how to play a simple board game by playing a large number of games against itself. The goal of this project is to set up a reinforcement learning environment for a simple board game with a discrete set of states (such as Backgammon). If time permits, this will be extended to a simple geometric game (such as Pong) where the states may have to be parameterised in terms of geometric actions to be taken at each stage in the game.

Simple drawing analysis Irina Voiculescu Computational Biology and Health Informatics B C MSc

Scientists in the Experimental Psychology Department study patients with a variety of motor difficulties, including apraxia - a condition usually following stroke which involves lack of control of a patient over their hands or fingers. Diagnosis and rehabilitation are traditionally carried out by Occupational Therapists. In recent years, computer-based tests have been developed in order to remove the human subjectivity from the diagnosis, and in order to enable the patient to carry out a rehabilitation programme at home. One such test involves users drawing simple figures on a tablet, and these figures being scored according to a variety of criteria. Data has already been gathered from 200 or so controls, and is being analysed for a range of parameters in order to assess what a neurotypical person could achieve when drawing such simple figures. Further machine learning analysis could help classify such data into different categories. Algorithms are also needed in order to classify data from stroke patients, and to assess the degree of severity of their apraxia.

This project was originally co-supervised by Prof Glyn Humphreys, Watts Professor of Experimental Psychology. Due to Glyn's untimely death a new co-supervisor needs to be found in the Experimental Psychology Department. It is unrealistic to assume this project can run in the summer of 2016.

Efficient linear algebra for block structured matrices Jonathan Whiteley Computational Biology and Health Informatics B C

Many large systems of linear equations are sparse, i.e. if the matrix that describes the linear system is of size N times N, where N is large, there are very few non-zero entries in each row.  Under these conditions there may not be enough memory to store the whole matrix, and so only the non-zero entries are stored.  This prevents techniques such as LU decomposition being used to solve the linear system; instead an iterative technique such as the conjugate gradient technique for symmetric positive definite matrices, or GMRES for more general matrices is used.  The number of iterations needed with these techniques can be large, rendering these techniques inefficient.  To prevent this, preconditioning techniques are used - if the linear system is defined by Ax=b, then a preconditioner P is used and the system solved is instead PAx = Pb, where P is cheap to calculate and both PAx and Pb are cheap to evaluate.  In this project we will investigate matrices with a block structure that arises in many fields, such as constrained optimisation and continuum mechanics.  We will utilise the block structure of these matrices to heuristically derive candidate preconditioners, and compare their performances.

Prerequisites: linear algebra, continuous mathematics

Parameter recovery for models described by differential equations Jonathan Whiteley Computational Biology and Health Informatics B C Phenomena in many fields are described by differential equations where a quantity of interest (for example, depending on the phenomena modelled, a reaction rate, a capacitance, or a subject's cardiac output) appears as a parameter in the differential equation.  We then determine the quantity of interest by choosing parameters in the differential equation so that the computed solution of the differential equation using these parameters matches experimental data as closely as possible.  This may be posed as an optimisation problem that may be tackled by either a classical optimisation approach (as seen in the continuous mathematics course) or a Bayesian optimisation approach.  The aim of this project is to compare these approaches. Prerequisites: linear algebra, continuous mathematics, probability
Topics in Linear Dynamical Systems James Worrell Automated Verification B C MSc A linear dynamical system is a discrete- or continuous-time system whose dynamics is given by a linear function of the current state. Examples include Markov chains, linear recurrence sequences (such as the Fibonacci sequence), and linear differential equations. This project involves investigating the decidability and complexity of various reachability problems for linear dynamical systems. It would suit a mathematically oriented student. Linear algebra is essential, and number theory or complexity theory is desirable. A relevant paper is Ventsislav Chonev, Joël Ouaknine, James Worrell: The orbit problem in higher dimensions. STOC 2013: 941-950.
Distributed Programming with Distributed Protocols in Scala Nobuko Yoshida Programming Languages B C MSc

Session type systems facilitate the development of concurrent and distributed software by enabling programmers to describe communication protocols in types. Not only does this aid understanding and correctness of communicating code, it can guarantee desirable behavioural properties (e.g. deadlock-freedom). Concurrency libraries, e.g. Effpi (Scala), enable the use of session types in real-world programming languages and systems. The goal of this project is to understand basic of session types, to implement distributed protocols in Scala and investigate its usability.

Reference:

[1] Session Types https://dl.acm.org/doi/pdf/10.1145/2873052

[2] http://dx.doi.org/10.1145/3314221.3322484

Extension to Probabilistic Resource-Aware Session Types Nobuko Yoshida, Joe Paulus Programming Languages B

"Probabilistic session [1] types explores how uncertainty and likelihood influence communication protocols in distributed systems. We propose an extension on the session type system in which an abstract notion of success and failure can be attached either to the syntax of session types or to the both the type and process level such as explicit failure terms. The goal of which is to extend the scope of [1] showing that probabilistic analysis of binary session types [2] can be applied. Similarly insight are applied in the behavioural equivalence of probabilistically nondeterministic behaviour as well of implementations of extensions within NomosPro/PRast."

[1] Probabilistic Resource-Aware Session Types (acm.org) 

[2] Probabilistic Analysis of Binary Sessions (dagstuhl.de) 

Pre-requisites- The student wishes to learn a type theory of concurrency and communication, LCT will be helpful

Formalism of The Go Language Nobuko Yoshida Programming Languages B

Go is a popular statically typed programming language designed and implemented by Google, and used, among others, by Uber, Netflix, and Docker. The recent release of Go 1.18 finally realised the long awaited addition of Generics (parametric polymorphism) to the Go type system. Much academic work has been conducted to ensure that generics are correctly implemented, but there is still more to do [1,2,3,4]. This project aims to give a survey of those four papers, and take one of mini-projects:

(1) build some examples using the existing prototype of [1,2,3,4] or (2) survey important features included in Go 1.18 such as type inference and type sets [5] but not included in [1,2,3,4].

[1] https://dl.acm.org/doi/pdf/10.1145/3428217

[2] https://dl.acm.org/doi/10.1007/978-3-030-89051-3_7

[3] https://dl.acm.org/doi/10.1007/978-3-031-16912-0_7

[4] https://arxiv.org/abs/2208.06810v2

[5] https://go.googlesource.com/proposal/+/master/design/43651-type-parameters.md

Mechanisation of Distributed Protocol Specifications Nobuko Yoshida Programming Languages C MSc

Multiparty session types (MPST) offer a specification and verification framework for concurrency: communicating systems can be safely implemented in a distributed fashion, when well-typed against local session types, provided that such local types are obtained by projection of a single choreographic protocol (global type).

Multiple projects are available, please get in touch with nobuko.yoshida@cs.ox.ac.uk for more detail.

Possible aims are: (i) exploring and implementing an algorithms for MPST protocol compositionality or (ii) formalising correctness of advanced MPST systems (e.g., featuring merge, subtyping, or delegation). Students with a strong interest in the mechanisation of MPST in proof assistants (Coq, Isabelle, Idris) are very welcome to reach out.

[1] Mechanisation Paper https://dl.acm.org/doi/10.1145/3453483.3454041

[2] Multiparty Session Types Paper https://link.springer.com/chapter/10.1007/978-3-030-36987-3_5

Model-checking Timed Session Types Nobuko Yoshida Programming Languages B C MSc

This project lies in the intersetion of model-checking, verification, timed systems, and session types, It aims to tackle the critical challenge of ensuring deadlock-freedom in distributed systems employing timed session types [1, 2] and the application of timed model-checking tools. Timed session types provide a formalism for specifying communication protocols with temporal constraints, while deadlocks pose a persistent threat to system stability. Drawing on established time model-checking tools [3], our objective is to systematically analyse the specified models to detect and scrutinise potential deadlocks. The project also endeavours to build Timed Session Types using a bottom-up approach, and then verify the type-level properties by using a model checker, as in [4]. This research aspires to augment the reliability and correctness of concurrent and distributed systems, contributing to the progression of formal verification techniques.

[1] http://mrg.doc.ic.ac.uk/publications/timed-multiparty-session-types/CONCUR14.pdf

[2] https://mrg.cs.ox.ac.uk/publications/meeting-deadlines-together/

[3] https://uppaal.org/features/ (UPPAAL model checker)

[4] https://dl.acm.org/doi/10.1145/3290343

Using a model-checker to verify the type-level properties for timed binary sessions, along with a literature survey on timed systems with extensions into deadlock-freedom checking could be a mini-project of shorter duration.

Pre-requisites: A student who wishes to learn model checking is welcome. Familiarity with the basic automata theory. Familiarity with the model checking tools will be helpful but not required.

Probabilistic Session Types: semantics and tool development Nobuko Yoshida, Joe Paulus Programming Languages C MSc

Probabilistic session [1] types explores how uncertainty and likelihood influence communication protocols in distributed systems. Extending Probabilistic Resource-Aware Session Types allows for many possibilities, they typing is based on the system of DILL [2] ( a session type system with a Curry Howard isomorphism with linear logic) and can be extended with parallel and restriction similarly alternate rules can be derived from the classical viewpoint [3] expanding typeability. This project is be a mix of both theory and practice where an extension of the implementation NomosPro/PRast. Overall, this extension aims to provide a more comprehensive framework for designing and analysing distributed systems with probabilistic and resource-aware communication protocols. The project also develops bisimulation semantics for concurrent processes.

[1] Probabilistic Resource-Aware Session Types (acm.org) 

[2] concur10.pdf (cmu.edu) 

[3] propositions-as-sessions.pdf (ed.ac.uk)

Pre-requisites: DPTP

Program Transformation of Distributed Protocol Specification Nobuko Yoshida Programming Languages C MSc

Session types [1] describe patterns of communication in concurrent and distributed systems. Moreover, they guarantee certain desirable behavioural properties, such as deadlock freedom. Although session type libraries exist for a range of programming languages, there is little programmer support for their integration into existing and legacy code. This necessarily raises the level of required expertise to introduce and maintain session typed systems, and can consequently be a source of error. The goal of this project is to develop tools and techniques that can assist programmers in the introduction and manipulation of session types in legacy code. Inspired by similar work for parallelism [2], the project will focus on program transformation techniques, including refactoring [2,3,4], to develop safe and semi-automatic means to not only introduce session types, but modify existing session types in situ.

[1] N. Yoshida and L. Gheri: A Very Gentle Introduction to Multiparty Session Types. ICDCIT 2020: 73-93

[2] C. Brown, et al.: Refactoring GrPPI: Generic Refactoring for Generic Parallelism in C++. Int. J. Parallel Program. 48(4): 603-625 (2020)

[3] T. Mens and T. Tourwé: A Survey of Software Refactoring. IEEE Trans. Software Eng. 30(2): 126-139 (2004)

[4] S. J. Thompson and H. Li: Refactoring tools for functional languages. J. Funct. Program. 23(3): 293-350 (2013)

[5] R. N. S. Rowe, et al.: Characterising renaming within OCaml's module system: theory and implementation. PLDI 2019: 950-965

Rust programming language for communication and distribution. Nobuko Yoshida Programming Languages C

Rust is the most beloved programming language since 2016 according to the annual survey by Stack Overflow. Thanks to its efficiency and memory safety, it is now one of the most popular languages of large-scale concurrent applications such as Servo, a browser engine by Firefox, Stratis, a file system manager for Fedora, and Microsoft Azure. Memory safety is one of the core principles of Rust but does not extend to communication safety, making the implementation of deadlock-free protocols challenging. Our group has been working on implementing library for asynchronous Rust programming and we wish to tackle several challenges such as refinement types, dependent types, and reversible computing with Rust. Such projects can be both theoretical and practical.

EXAMPLE (1): Dependent types and asynchronous message reordering

In this project, we are interested in developing the dependent type theory for asynchronous message reordering [2] and use Rumpsteak to implement the theory [1].

EXAMPLE (2): Reversible computing

In this project, we are interested in using Rust to implement a reversible process calculus. Reversible process calculi are widely studied, in particular for debugging [3]. In this project, we want to explore the usefulness of reversibility as a programming primitive, similar to [4]. Direct applications include fault tolerance (e.g. when merged with affine MPST [5]) or speculative execution.

Reference:

[1] Zak Cutner, Nobuko Yoshida, Martin Vassor: Deadlock-Free Asynchronous Message Reordering in Rust with Multiparty Session Types. PPoPP '22 : 261 - 246. https://dl.acm.org/doi/10.1145/3503221.3508404

[2] Silvia Ghilezan, Jovanka Pantovic, Ivan Prokic, Alceste Scalas, Nobuko Yoshida: Precise Subtyping for Asynchronous Multiparty Sessions. POPL 2021 : 16:1 - 16:28. https://dl.acm.org/doi/10.1145/3434297

[3] See for instance the CauDEr: https://github.com/mistupv/cauder

[4] Controlling Reversibility in Higher-Order Pi, Ivan Lanese, Claudio Antares Mezzina, Alan Schmitt & Jean-Bernard Stefani, https://doi.org/10.1007/978-3-642-23217-6_20

[5] http://mrg.doc.ic.ac.uk/publications/affine-rust-programming-with-multiparty-session-types/

Rust programming language for communication and distribution. Nobuko Yoshida Programming Languages B

Rust is the most beloved programming language since 2016 according to the annual survey by Stack Overflow. Thanks to its efficiency and memory safety, it is now one of the most popular languages of large-scale concurrent applications such as Servo, a browser engine by Firefox, Stratis, a file system manager for Fedora, and Microsoft Azure. Memory safety is one of the core principles of Rust but does not extend to communication safety, making the implementation of deadlock-free protocols challenging. Our group has been working on implementing library for asynchronous Rust programming.

The mini-projects focus on either:

(1) survey recent work on Rust with session types and implement an example using [1]; or

(2) survey recent work on Rust with session types and study a theory of asynchronous message reordering in Rust [2]

Reference:

[1] Zak Cutner, Nobuko Yoshida, Martin Vassor: Deadlock-Free Asynchronous Message Reordering in Rust with Multiparty Session Types. PPoPP '22 : 261 - 246. https://dl.acm.org/doi/10.1145/3503221.3508404

[2] Silvia Ghilezan, Jovanka Pantovic, Ivan Prokic, Alceste Scalas, Nobuko Yoshida: Precise Subtyping for Asynchronous Multiparty Sessions. POPL 2021 : 16:1 - 16:28. https://dl.acm.org/doi/10.1145/3434297

Survey of Mechanisation of Distributed Protocol Specifications, Session Types Nobuko Yoshida Programming Languages B

Session type systems for distributed and concurrent computations, encompass concepts such as interfaces, communication protocols, and contracts. The session type of a software component specifies its expected patterns of interaction using expressive type languages, so types can be used to determine automatically whether the component interacts correctly with other components.

There are several recent work on mechanisation (proof assistants) of session types (Coq, Isabelle, Idris). This project gives an updated survey on mechanisation of session types.

[1] Example of mechanisation papers in Coq https://dl.acm.org/doi/10.1145/3453483.3454041

[2] Multiparty Session Types Paper https://link.springer.com/chapter/10.1007/978-3-030-36987-3_5

Survey of Session Types Literature Nobuko Yoshida Programming Languages B

This project produces an updated literature survey of session types. Session type systems for distributed and concurrent computations, encompass concepts such as interfaces, communication protocols, and contracts. The session type of a software component specifies its expected patterns of interaction using expressive type languages, so types can be used to determine automatically whether the component interacts correctly with other components.

Session type systems are studied in the context of theory (including automata theory, semantics, linear logic and type theories), programming languages (Java, Scala, Go, Rust, TypeScripts, PureScripts, MPI-C, C, Python, Erlang, F*, F#, Haskell, OCaml, and more) and system applications.

There are four surveys in past but they are already outdated.

The project produces an updated survey focusing on theory, programming languages and/or applications.

[1] Theory https://dl.acm.org/doi/pdf/10.1145/2873052

[2] Tools https://www.awesomebooks.com/book/9788793519824/behavioural-types-from-theory-to-tools-river-publishers-series-in-automation-control-and-robotics

[3] Security https://www.sciencedirect.com/science/article/pii/S2352220815000851?via%3Dihub

[4] Programming Languages https://doi.org/10.1561/2500000031

The Go Language with Generic Types Nobuko Yoshida Programming Languages C

Go is a popular statically typed programming language desinged and implemented by Google, and used, among others, by Uber, Netflix, and Docker. The recent release of Go 1.18 finally realised the long awaited addition of Generics (parametric polymorphism) to the Go type system. Much academic work has been conducted to ensure that generics are correctly implemented, but there is still more to do [1,2,3,4]. This project aims to further reduce the gap between theory and practice, allowing the Go Team to improve their implementation of generics in Go.

Project 1:

Formalise (and prove correct) a dictionary-passing translation from Featherweight Generic Go (FGG) to a lambda-calculus with pattern matching or other suitable low-level language.

Project 2:

The current model of generics in Go (FGG) does not include important features included in Go 1.18 such as type inference and type sets [5]. The proposed project would be to formalise a true FGG (including aforementioned features) along with a correct translation.

[1] https://dl.acm.org/doi/pdf/10.1145/3428217

[2] https://dl.acm.org/doi/10.1007/978-3-030-89051-3_7

[3] https://dl.acm.org/doi/10.1007/978-3-031-16912-0_7

[4] https://arxiv.org/abs/2208.06810v2

[5] https://go.googlesource.com/proposal/+/master/design/43651-type-parameters.md

Verifying and implementing security protocols in Rust Nobuko Yoshida Programming Languages C MSc

In this project, our goal is to develop a small working voting software like Helios [1] using Session Types. In the beginning. server will send credentials to all eligible voters before an election. On the election day, all the eligible voters will authenticate themselves and cast their ballot. At this point, the server will encrypt the ballot and will give two options to the voter, either decrypt the ballot --to ensure that server is not cheating-- or cast it. In case of decryption, the voter will again be prompted to enter a new ballot and given two options again, decrypt or cast. A voter can decrypt the ballot as many times as possible, to ensure that server is not cheating, before casting the ballot. Once it's cast, the server will send a confirmation to the voter and post the encrypted ballot to a bulletin board. After the end of the election, the server combines all the ballots homomorphically and decrypts the final tally and declares the winner.

Our first goal is to capture a simple version of Helios protocol using Session Types in NuScr (nuScribble) and generate an API, and later fill the API with Rust implementation. This project requires the knowledge of cryptography (basic understanding of ElGamal encryption), NuScibble, and Rust. 

[1]https://github.com/benadida/helios-server

[2]https://www.usenix.org/legacy/events/sec08/tech/full_papers/adida/adida.pdf

Verifying security protocols (Rust) Nobuko Yoshida Programming Languages B

Sigma protocols are a particularly simple and efficient kind of zero-knowledge proof and have seen wide deployment; they remain a leading kind of proof both in terms of simplicity and deployment but recent advances in succinct zero-knowledge proofs offer greater efficiency. The first efficient sigma protocol was introduced by Schnorr [2], several years before the class was defined. You can read more about sigma protocols in chapter 5 of this book [1].

To give you some insight into sigma protocols, we briefly discuss the most famous sigma protocol, the Schnorr protocol [2]. Given some public input (G, g, q, h) where G is a cyclic group of prime order q, and g and h are two generators of the group G, the prover claims that she knows a witness w for the statement h = g^w; the existence of such a w is immediate because g generates the group. However, does the prover know the witness w? In order to convince the verifier, the prover and the verifier do the following:

  1. the prover picks a random number u, computes c = g^u, and sends c to the verifier
  2. the verifier picks a random challenge e and sends it to the prover
  3. the prover computes t = u + e ∗ w and sends t to the verifier

The verifier accepts if g^t = c ∗ h^e, otherwise rejects.

In this project, the goal is to use session types to capture the communication between the Prove and the Verifier. It is a very simple binary session type with three messages. Our first step will be to encode the Schnorr protocol, described above, in NuScribble, generate Rust API fron NuScribble encoding, and fill the rest of the cryptographic code. Later, we will develop the Parallel composition, AND composition, OR composition, and NEQ composition of sigma protocols (see the chapter 5 of [1]) 

[1] https://www.win.tue.nl/~berry/2WC13/LectureNotes.pdf

[2] https://link.springer.com/article/10.1007/BF00196725

Facilitating User Autonomy through Data Terms of Use for Decentralised Web Rui Zhao, Jun Zhao Human Centred Computing MSc

Abstract

Ticking without reading “I have read and agree to the Terms of Service” when registering online accounts is known as “the biggest lie on the Internet”. We are all guilty of ticking without reading it, but we recognize the necessity of the existence of these terms.

Data Terms of Use (DToU) is a similar but more approachable concept. It specifies what the data users need to follow before, during, and after using the data. In a decentralised web context - where everyone becomes a data provider, everyone will need to write their own
DToU. In addition, not only do they need to govern the direct use of data, the use of derived data is also subject to such DToU.

Handling such DToU is often labour-consuming and error-prone. Formalisation and automated reasoning have the potential to improve the handling and understanding of such DToU. We have previously developed a formal DToU language and the reasoning engine for
decentralised contexts, with also integration into Solid (https://solidproject.org), with potential to be used in wider scenarios. That is based on symbolic AI techniques, featuring interoperability, explainability and accountability. However, usability of symbolic AI systems
can be a major barrier for wider user communities such as normal citizens. In this project, the student is expected to explore mechanism facilitating the expression and utilisation of DToU, such as investigating how modern machine learning-based AI methods can facilitate
DToU expression and utilisation for wider user communities. Topics of interest include but are not limited to:
- How can NLP/LLM technology facilitate the expression and comprehension of DToU policies?
- How to utilise agents to perform preference learning through users’ activities?
- Mechanisms for better accountability or trustworthiness, such as provenance, instantiation to ODRL, etc

Interested students are welcome to contact Rui Zhao and Jun Zhao to discuss or propose their own ideas related to above (rui.zhao@cs.ox.ac.uk, jun.zhao@cs.ox.ac.uk).
Prerequisites:
- Appropriate foundation of first-order logic
- Although the student is not required to extend the DToU language, appropriate foundation of FOL is needed to understand the language, and may be useful for the machine learning part for explainability
- Practical and theoretical knowledge of machine learning, were the student interested in the modern AI aspects
- Technical experience with at least one of the following will be a big plus:
- Solid
- RDF / Linked Data / Semantic Web / Knowledge Graph
- NLP / LLM
- Confident with an appropriate language to work with AI and/or Solid, especially Python or JS/TS

References:
1. https://arxiv.org/abs/2403.07587

Topics in Algorithms, Complexity, and Combinatorial Optimisation Standa Živný Algorithms and Complexity Theory B C MSc

Prof Zivny is willing to supervise in the area of algorithms, complexity, and combinatorial optimisation. In particular, on problems related to convex relaxations (linear and semidefinite programming relaxations), submodular functions, and algorithms for and complexity of homomorphisms problems and Constraint Satisfaction Problems. Examples of supervised projects involve extensions of min-cuts in graphs, analysis of randomised algorithms for graph and hypergraph colourings, and sparsification of graphs and hypergraphs. The projects would suit mathematically oriented students, with interest in rigorous analysis of algorithms and applications of combinatorics and probabilistic methods to computer science.