Inference in High Dimensions: Light-speed Algorithms and Information Limits

The INF^2 project develops information-theoretically grounded methods for efficient high-dimensional inference in machine learning, aiming to reduce costs and enhance interpretability in applications like genome-wide studies.

Subsidie

€ 1.662.400

2024

Projectdetails

Introduction

Extracting information from data is the key challenge of our time, and in many applications (e.g., genome-wide association studies, data compression, and virtual assistants such as ChatGPT) both the data and the machine learning model used to extract information are increasingly high-dimensional.

Challenges in Traditional Approaches

As traditional statistical theory is ill-equipped to face this explosion in the dimensionality of the problem, machine learning is now predominantly experimental. However, empirical approaches come with huge costs affordable only to large companies, and they lack interpretability, which is especially troublesome in medical applications.

Project Overview

To address these issues, the INF^2 project develops information-theoretically principled methods for high-dimensional inference in machine learning and data science. The key insight is that, via a “mean-field” approach, high-dimensional quantities are well approximated by low-dimensional ones and then characterized exactly.

Goals of the Project

Leveraging this characterization, we will:

Establish the fundamental limits of inference, i.e., the minimal amount of data necessary to solve the problem.
Design efficient algorithms requiring only the minimal amount of data.

Practical Applications

The challenge we tackle is to apply this paradigm to practical settings, in which data are structured and heterogeneous (as in genome-wide association studies), and models consist of complex architectures tailored to applications (auto-encoders for data compression, and transformers for ChatGPT).

Theoretical Framework

Through a novel analysis of spectral methods, approximate message passing, and gradient descent, INF^2 builds a theoretical framework having conceptual impact, as well as vast applicability, in machine learning and information theory.

Real-World Impact

This framework is then brought to the real world via applications in genome-wide association studies. Broadly, our results enable the principled design of machine learning algorithms and models, drastically reducing costs and providing interpretable solutions.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag	€ 1.662.400
Totale projectbegroting	€ 1.662.400

Tijdlijn

Startdatum	1-10-2024
Einddatum	30-9-2029
Subsidiejaar	2024

Partners & Locaties

Projectpartners

INSTITUTE OF SCIENCE AND TECHNOLOGY AUSTRIApenvoerder

Land(en)

Austria

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science Framework ScReeningData aims to develop a scalable learning framework to enhance statistical robustness and reproducibility in high-dimensional data analysis, reducing false positives across scientific domains.	ERC Starting...	€ 1.500.000	2022	Details
The missing mathematical story of Bayesian uncertainty quantification for big data This project aims to enhance scalable Bayesian methods through theoretical insights, improving their accuracy and acceptance in real-world applications like medicine and cosmology.	ERC Starting...	€ 1.492.750	2022	Details
Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.	ERC Consolid...	€ 1.999.375	2023	Details
Provable Scalability for high-dimensional Bayesian Learning This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.	ERC Starting...	€ 1.488.673	2023	Details
Overcoming the curse of dimensionality through nonlinear stochastic algorithms This project aims to develop algorithms that overcome the curse of dimensionality in high-dimensional function approximations for stochastic control, PDEs, and supervised learning, enhancing computational efficiency and understanding.	ERC Consolid...	€ 1.351.528	2023	Details

ERC Starting...

Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science Framework

ScReeningData aims to develop a scalable learning framework to enhance statistical robustness and reproducibility in high-dimensional data analysis, reducing false positives across scientific domains.

ERC Starting Grant

€ 1.500.000

2022

Details

ERC Starting...

The missing mathematical story of Bayesian uncertainty quantification for big data

This project aims to enhance scalable Bayesian methods through theoretical insights, improving their accuracy and acceptance in real-world applications like medicine and cosmology.

ERC Starting Grant

€ 1.492.750

2022

Details

ERC Consolid...

Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications

APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.

ERC Consolidator Grant

€ 1.999.375

2023

Details

ERC Starting...

Provable Scalability for high-dimensional Bayesian Learning

This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.

ERC Starting Grant

€ 1.488.673

2023

Details

ERC Consolid...

Overcoming the curse of dimensionality through nonlinear stochastic algorithms

This project aims to develop algorithms that overcome the curse of dimensionality in high-dimensional function approximations for stochastic control, PDEs, and supervised learning, enhancing computational efficiency and understanding.

ERC Consolidator Grant

€ 1.351.528

2023

Details