Navigating the Genetic Perturbation Landscape: Multi-modal causal representation learning for target discovery
Supervisors
Suitable for
Abstract
Cardiometabolic disorders remain the leading cause of mortality globally [1,2]. Addressing this major public health issue necessitates identifying effective pharmacological interventions, which requires a detailed understanding of the complex aetiology of these disorders. Cardiometabolic diseases are driven by an intricate interplay of genetic and environmental factors that impact the functionality of diverse cell types across the human body [3]. To tackle this complexity, new drug discovery approaches are essential to navigate the vast combinatorial landscape of potential pharmacological interventions and cellular phenotypes.
This project aims to develop an innovative predictive model for cellular response to genetic perturbations, a key step towards discovering drug targets for cardiometabolic disorders. By focusing on how cells react to genetic modifications (e.g., gene knockouts or gene silencing), this model will provide insights into the druggable genome—a critical factor for target discovery.
The Torr Vision Group has recently begun a collaboration with Novo Nordisk; together, we have the following objectives:
- Develop a Predictive Model: Create a model capable of accurately predicting unseen cellular responses to specific genetic perturbations across various cell types. This will be grounded on the comprehensive data generated in-house at the Novo Nordisk Research Centre in Oxford (NNRCO), where a framework is being developed to deeply characterise cellular phenotypes at scale.
- Develop Enhanced Cellular Representations: Develop multi-modal cellular representations that capture detailed patterns in imaging, gene expression and proteomics data, improving the accuracy of the predictive model. Learning these detailed patterns may also provide insights on genetic interactions and the gene regulatory network.
- Active Learning for Efficient Genome Screening: Given the scale of the human genome, exploring the combinatorial perturbation landscape defined by 20,000 protein-encoding genes poses a significant experimental challenge. Our approach will utilize an active learning framework to guide sequential, optimal experimental perturbation screens. This will enable efficient and targeted exploration of the genetic perturbation landscape, accelerating the discovery of therapeutic targets.
Interested students will have the opportunity to contribute to these multiple aspects of this project, from designing cellular representations to developing the active learning framework. This work will provide hands-on experience at the intersection of ML and genetics, contributing meaningfully to ML-driven drug discovery efforts.
[1] GBD 2021 Diabetes Collaborators (2023). Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet, 402(10397), 203–234.
[2] G.R. Dagenais, D. P. Leong, S. Rangarajan, et al. (2020). Variations in common diseases, hospital admissions, and deaths in middle-aged adults in 21 countries from five continents (PURE): a prospective cohort study. Lancet; 395(10226):785–794.
[3] C. Priest, P. Tontonoz, (2019). Inter-organ cross-talk in metabolic syndrome. Nature Metabolism ;1(12):1177–1188.