Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science Framework

ScReeningData aims to develop a scalable learning framework to enhance statistical robustness and reproducibility in high-dimensional data analysis, reducing false positives across scientific domains.

Subsidie
€ 1.500.000
2022

Projectdetails

Introduction

Data science has quickly expanded the boundaries of signal processing and statistical learning beyond their accustomed domains. Powerful and complex machine learning architectures have evolved to distinguish relevant information from randomness, artifacts, and irrelevant data.

Challenges in Existing Frameworks

However, existing learning frameworks lack computationally scalable, tractable, and robust methods for high-dimensional data. Consequently, discoveries, for example, in genomic data can be the result of coincidental findings that happen to reach statistical significance.

Need for Improved Learning Frameworks

As long as groundbreaking advances in biotechnology are not accompanied by appropriate learning frameworks, valuable efforts are spent on researching false positives.

Project Overview

ScReeningData develops a coherent fast and scalable learning framework that jointly addresses the fundamental challenges of:

  1. Drastically reducing computational complexity
  2. Providing statistical and robustness guarantees
  3. Quantifying reproducibility in large-scale and high-dimensional settings

Innovative Approach

An unprecedented approach is developed that builds upon very recent work of the PI. The underlying concept is to repeat randomized controlled experiments that use computer-generated fake variables as negative controls to trigger an early stopping of the learning algorithms, thereby mitigating the so-called curse of dimensionality.

Advantages of Proposed Methods

In contrast to existing methods, the proposed methods are completely tractable and scalable to ultra-high dimensions. The gains of developing advanced robust learning methods that are computed ultra-fast and with tight guarantees on the targeted rate of false positives are enormous.

Broader Impacts

They lead to new reproducible discoveries that can be made with high statistical power. Due to the fundamental nature and the broad applicability of the proposed learning methods, the impacts of this project extend far beyond the considered biomedical signal processing use-cases, benefiting all scientific domains that analyze high-dimensional data.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.500.000
Totale projectbegroting€ 1.500.000

Tijdlijn

Startdatum1-9-2022
Einddatum31-8-2027
Subsidiejaar2022

Partners & Locaties

Projectpartners

  • TECHNISCHE UNIVERSITAT DARMSTADTpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC STG

MANUNKIND: Determinants and Dynamics of Collaborative Exploitation

This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.

€ 1.497.749
ERC STG

Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure

The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.

€ 1.498.280
ERC STG

Uncovering the mechanisms of action of an antiviral bacterium

This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.

€ 1.500.000
ERC STG

The Ethics of Loneliness and Sociability

This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.

€ 1.025.860

Vergelijkbare projecten uit andere regelingen

ERC COG

Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications

APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.

€ 1.999.375
ERC COG

Foundation models for molecular diagnostics - machine learning with biological ‘common sense’

FoundationDX aims to enhance molecular diagnostics by using self-supervised learning on diverse biomolecular data to accurately predict cancer subtypes and treatment outcomes without extensive labeled datasets.

€ 2.000.000