Session Types for Reliable Distributed Systems (STARDUST)

1st October 2022 to 30th September 2025

Distributed software systems are an essential part of the infrastructure of modern society. Such systems typically comprise diverse software components deployed across networks of hosts. Ensuring their reliability is challenging as software components must correctly communicate and synchronise with each other, and any of the hardware or software components may fail. Software failure and service outage are estimated to cost the world economy more than a trillion dollars annually.

Failures can occur at all levels of the system stack: hardware, operating system, network, software, and user. This project focuses on using advanced programming language technologies to enable the software level to better handle failures using a combination of fault prevention and fault tolerance. Specifically, it will combine the communication-structuring mechanism of session types with the scalability and fault-tolerance of actor-based software architectures.

Actor languages are based on independent processes (actors) communicating by asynchronous messages. Reliability is facilitated by actors having isolated state, and hence an actor can fail independently. Two key techniques for achieving reliability in actor languages are timeouts and supervision, and these are the main focus of STARDUST. Timeouts allow failures to be identified during execution, and failures are handled by establishing triggers and alternative behaviours within the code. An actor may supervise other actors, detecting failures and taking remedial action like restarting a failed actor.

Session types provide a way to specify and constrain the communication behaviour (protocol) between nodes in a system. A session type system excludes any non-conforming behaviour, perhaps statically (for fault prevention), dynamically, or a mixture of both (for fault tolerance). Several languages now have session-type support via libraries and tools.

By combining the strengths of actor languages and session types, this project will develop a well-founded theory of reliable actor programming with clearly defined communication structures. Key aims are to deliver tools that provide lightweight support for developers, e.g. warn of potential issues, and to allow developers to continue to use established idioms. By doing so this project will deliver a step change in the engineering of reliable distributed software systems.

Principal Investigator

Nobuko Yoshida

Session Types for Reliable Distributed Systems (STARDUST)

Principal Investigator

Sponsors

See also