Inference in High Dimensions: Light-speed Algorithms and Information Limits
The INF^2 project develops information-theoretically grounded methods for efficient high-dimensional inference in machine learning, aiming to reduce costs and enhance interpretability in applications like genome-wide studies.
Projectdetails
Introduction
Extracting information from data is the key challenge of our time, and in many applications (e.g., genome-wide association studies, data compression, and virtual assistants such as ChatGPT) both the data and the machine learning model used to extract information are increasingly high-dimensional.
Challenges in Traditional Approaches
As traditional statistical theory is ill-equipped to face this explosion in the dimensionality of the problem, machine learning is now predominantly experimental. However, empirical approaches come with huge costs affordable only to large companies, and they lack interpretability, which is especially troublesome in medical applications.
Project Overview
To address these issues, the INF^2 project develops information-theoretically principled methods for high-dimensional inference in machine learning and data science. The key insight is that, via a “mean-field” approach, high-dimensional quantities are well approximated by low-dimensional ones and then characterized exactly.
Goals of the Project
Leveraging this characterization, we will:
- Establish the fundamental limits of inference, i.e., the minimal amount of data necessary to solve the problem.
- Design efficient algorithms requiring only the minimal amount of data.
Practical Applications
The challenge we tackle is to apply this paradigm to practical settings, in which data are structured and heterogeneous (as in genome-wide association studies), and models consist of complex architectures tailored to applications (auto-encoders for data compression, and transformers for ChatGPT).
Theoretical Framework
Through a novel analysis of spectral methods, approximate message passing, and gradient descent, INF^2 builds a theoretical framework having conceptual impact, as well as vast applicability, in machine learning and information theory.
Real-World Impact
This framework is then brought to the real world via applications in genome-wide association studies. Broadly, our results enable the principled design of machine learning algorithms and models, drastically reducing costs and providing interpretable solutions.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 1.662.400 |
Totale projectbegroting | € 1.662.400 |
Tijdlijn
Startdatum | 1-10-2024 |
Einddatum | 30-9-2029 |
Subsidiejaar | 2024 |
Partners & Locaties
Projectpartners
- INSTITUTE OF SCIENCE AND TECHNOLOGY AUSTRIApenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science FrameworkScReeningData aims to develop a scalable learning framework to enhance statistical robustness and reproducibility in high-dimensional data analysis, reducing false positives across scientific domains. | ERC Starting... | € 1.500.000 | 2022 | Details |
The missing mathematical story of Bayesian uncertainty quantification for big dataThis project aims to enhance scalable Bayesian methods through theoretical insights, improving their accuracy and acceptance in real-world applications like medicine and cosmology. | ERC Starting... | € 1.492.750 | 2022 | Details |
Reconciling Classical and Modern (Deep) Machine Learning for Real-World ApplicationsAPHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration. | ERC Consolid... | € 1.999.375 | 2023 | Details |
Provable Scalability for high-dimensional Bayesian LearningThis project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models. | ERC Starting... | € 1.488.673 | 2023 | Details |
Overcoming the curse of dimensionality through nonlinear stochastic algorithmsThis project aims to develop algorithms that overcome the curse of dimensionality in high-dimensional function approximations for stochastic control, PDEs, and supervised learning, enhancing computational efficiency and understanding. | ERC Consolid... | € 1.351.528 | 2023 | Details |
Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science Framework
ScReeningData aims to develop a scalable learning framework to enhance statistical robustness and reproducibility in high-dimensional data analysis, reducing false positives across scientific domains.
The missing mathematical story of Bayesian uncertainty quantification for big data
This project aims to enhance scalable Bayesian methods through theoretical insights, improving their accuracy and acceptance in real-world applications like medicine and cosmology.
Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications
APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.
Provable Scalability for high-dimensional Bayesian Learning
This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.
Overcoming the curse of dimensionality through nonlinear stochastic algorithms
This project aims to develop algorithms that overcome the curse of dimensionality in high-dimensional function approximations for stochastic control, PDEs, and supervised learning, enhancing computational efficiency and understanding.