Explainable Machine Learning for Identifying the Full Heterogeneity of Peptidoforms and Proteoforms

explAInProt aims to enhance proteomics by developing explainable, end-to-end machine learning models to identify undetected protein variants and improve clinical applications through advanced sequencing methods.

Subsidie
€ 1.992.500
2024

Projectdetails

Introduction

Mass spectrometry driven proteomics allows deep insights into the working of cells. Still, the vast majority of proteoforms, representing the full heterogeneity of molecular forms of protein products in a sample, currently remain undetected in proteomics experiments.

Limitations

This lack of information strongly restricts our knowledge of disease progression, possible biomarkers, and therapeutic targets across a large number of diseases. Several machine learning approaches have been developed for proteomics data, but not being trained end-to-end, they cannot capture the full wealth of proteomic mass spectra and commonly remain unexplained black boxes.

Project Goals

Within explAInProt, my team and I will develop representations of spectra that allow deploying explainable, end-to-end machine learning models on the wealth of proteomic data available, regarding both bottom-up and top-down spectra to identify novel protein variants.

Importance of Explanations

Explanations will allow identifying the origin of predictions and help reduce bias, building up the trustworthiness of AI systems required for clinical applications.

Verification Strategies

To verify results, we will pioneer orthogonal real-time strategies based on selective sequencing approaches and calling of amino acids that we will introduce for nanopore sequencing devices as a complementary acquisition method.

Expected Outcomes

All combined, this will allow us to drastically increase our knowledge about the current dark matter of mass spectrometry driven proteomics: those proteins and peptides that are non-canonically modified, non-tryptic, have potentially multiple amino acid substitutions, or no close match in databases or result from structural variants such as fusion proteins that remain undetected in current analyses.

Applicability

We will highlight applicability in two areas of particular concern in current approaches:

  1. The detection of structural variants in proteomic mass spectra
  2. The characterization of novel microbial organisms without sufficient database information.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.992.500
Totale projectbegroting€ 1.992.500

Tijdlijn

Startdatum1-12-2024
Einddatum30-11-2029
Subsidiejaar2024

Partners & Locaties

Projectpartners

  • HASSO-PLATTNER-INSTITUT FUR DIGITAL ENGINEERING GGMBHpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC STG

MANUNKIND: Determinants and Dynamics of Collaborative Exploitation

This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.

€ 1.497.749
ERC STG

Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure

The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.

€ 1.498.280
ERC STG

Uncovering the mechanisms of action of an antiviral bacterium

This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.

€ 1.500.000
ERC STG

The Ethics of Loneliness and Sociability

This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.

€ 1.025.860

Vergelijkbare projecten uit andere regelingen

ERC STG

Learning Isoform Fingerprints to Discover the Molecular Diversity of Life

This project aims to revolutionize proteomics by developing a novel data analysis strategy using deep learning to discover and quantify protein isoforms through their unique multi-dimensional fingerprints (ORIGINs).

€ 1.498.939
ERC ADG

A Native Mass Spectrometry Systemic View of Cellular Structural Biology

This project aims to enhance native mass spectrometry for studying protein interactions and diversity in their natural cellular environments, advancing structural biology and related fields.

€ 2.954.167
ERC STG

Deep Spatial Proteomics: connecting cellular neighbourhoods to functional states

Developing Deep Spatial Proteomics (DSP) to link cellular neighborhoods to proteome states, aiming to uncover disease mechanisms and improve patient stratification in cancer immunotherapy.

€ 1.470.851
ERC POC

Precise, Rapid and Scalable Proteomics Solutions for Archaeology, Ecology, Wildlife Forensics and Food-chain Authentication

The PReciSe project aims to develop a fast, cost-effective proteomics method for taxonomic identification to enhance archaeological, ecological, and food supply chain verification.

€ 150.000