Inference in High Dimensions: Light-speed Algorithms and Information Limits

The INF^2 project develops information-theoretically grounded methods for efficient high-dimensional inference in machine learning, aiming to reduce costs and enhance interpretability in applications like genome-wide studies.

Subsidie
€ 1.662.400
2024

Projectdetails

Introduction

Extracting information from data is the key challenge of our time, and in many applications (e.g., genome-wide association studies, data compression, and virtual assistants such as ChatGPT) both the data and the machine learning model used to extract information are increasingly high-dimensional.

Challenges in Traditional Approaches

As traditional statistical theory is ill-equipped to face this explosion in the dimensionality of the problem, machine learning is now predominantly experimental. However, empirical approaches come with huge costs affordable only to large companies, and they lack interpretability, which is especially troublesome in medical applications.

Project Overview

To address these issues, the INF^2 project develops information-theoretically principled methods for high-dimensional inference in machine learning and data science. The key insight is that, via a “mean-field” approach, high-dimensional quantities are well approximated by low-dimensional ones and then characterized exactly.

Goals of the Project

Leveraging this characterization, we will:

  1. Establish the fundamental limits of inference, i.e., the minimal amount of data necessary to solve the problem.
  2. Design efficient algorithms requiring only the minimal amount of data.

Practical Applications

The challenge we tackle is to apply this paradigm to practical settings, in which data are structured and heterogeneous (as in genome-wide association studies), and models consist of complex architectures tailored to applications (auto-encoders for data compression, and transformers for ChatGPT).

Theoretical Framework

Through a novel analysis of spectral methods, approximate message passing, and gradient descent, INF^2 builds a theoretical framework having conceptual impact, as well as vast applicability, in machine learning and information theory.

Real-World Impact

This framework is then brought to the real world via applications in genome-wide association studies. Broadly, our results enable the principled design of machine learning algorithms and models, drastically reducing costs and providing interpretable solutions.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.662.400
Totale projectbegroting€ 1.662.400

Tijdlijn

Startdatum1-10-2024
Einddatum30-9-2029
Subsidiejaar2024

Partners & Locaties

Projectpartners

  • INSTITUTE OF SCIENCE AND TECHNOLOGY AUSTRIApenvoerder

Land(en)

Austria

Vergelijkbare projecten binnen European Research Council

ERC Starting...

Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science Framework

ScReeningData aims to develop a scalable learning framework to enhance statistical robustness and reproducibility in high-dimensional data analysis, reducing false positives across scientific domains.

€ 1.500.000
ERC Starting...

The missing mathematical story of Bayesian uncertainty quantification for big data

This project aims to enhance scalable Bayesian methods through theoretical insights, improving their accuracy and acceptance in real-world applications like medicine and cosmology.

€ 1.492.750
ERC Consolid...

Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications

APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.

€ 1.999.375
ERC Starting...

Provable Scalability for high-dimensional Bayesian Learning

This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.

€ 1.488.673
ERC Consolid...

Overcoming the curse of dimensionality through nonlinear stochastic algorithms

This project aims to develop algorithms that overcome the curse of dimensionality in high-dimensional function approximations for stochastic control, PDEs, and supervised learning, enhancing computational efficiency and understanding.

€ 1.351.528