Foundational AI Research
The project covers the following key areas:
Participatory budgeting for chores
In participatory budgeting, agents typically express preferences over projects, and a central authority decides which subset of projects can be implemented, based on agents' preferences, project costs and budget constraints. It is typically assumed that each project has a non-negative value for each agent. However, sometimes one needs to select a subset of activities that have a non-positive value for all agents involved: e.g., during the covid-19 pandemics the UK government had to select a set of mitigation measures to bring the R number under 1 (with different measures having a different impact on various subpopulations), and, to fight climate change, the society may need to implement various restrictions on energy use. There are several decision-making rules and concepts of fairness for the case of projects with non-negative values.
The goal of this project area is to extend this analysis to "chores", i.e., projects with non-positive costs, as well as mixed projects, which may have positive value to some agents and negative value to other agents (e.g., closing some streets to traffic). The analysis of "bads", or "chores" is a topic that has recently become popular in the adjacent field of fair division, and the experience from that field shows that adapting existing mechanisms and axioms from "goods" to "bads" is often a non-trivial task.
Benchmarking foundation models: Theory of Mind / Human Values
Foundation models are large ML models trained on large, broad data sets. Foundation models such as GPT-3 have been shown to have remarkable capabilities for generating realistic natural language, and, to some extent, capabilities for problem solving and common sense reasoning.
This project area aims to develop Institute expertise around the problem of precisely understanding the capabilities of such models. The main issue this aims to address is that of “benchmarking” such models: although such models appear to be very capable in some respects, they fail on apparently simple tasks, in unpredictable ways. In short, humans don't have a clear understanding of the capabilities and shortcomings of such systems - which raises concerns for their use.
- Theory of Mind
This activity will entail work with GPT-3, focussing on the extent to which such models have a theory of mind. Much human reasoning is social, involving the beliefs and aspirations of others. To what extent can this capability be acquired by training on textual data sets?
- Human Values
This activity will entail investigating the extent to which such foundation models learn human values. To what extent can it be said to understand the human values raised in such scenarios? Is it consistent in the application of such values? Does it hold such values immutably, or are they malleable?