Controlling Large Language Models

Develop a framework to understand and control large language models, addressing biases and flaws to ensure safe and responsible AI adoption.

Subsidie

€ 1.500.000

2024

Projectdetails

Introduction

Large language models (LMs) are quickly becoming the backbone of many artificial intelligence (AI) systems, achieving state-of-the-art results in many tasks and application domains. Despite the rapid progress in the field, AI systems suffer from multiple flaws inherited from the underlying LMs: biased behavior, out-of-date information, confabulations, flawed reasoning, and more.

Understanding and Controlling LMs

If we wish to control these systems, we must first understand how they work and develop mechanisms to intervene, update, and repair them. However, the black-box nature of LMs makes them largely inaccessible to such interventions. In this proposal, our overarching goal is to:

Develop a framework for elucidating the internal mechanisms in LMs and for controlling their behavior in an efficient, interpretable, and safe manner.

Objectives

To achieve this goal, we will work through four objectives:

Dissecting Internal Mechanisms
We will dissect the internal mechanisms of information storage and recall in LMs and develop ways to update and repair such information.
Illuminating Higher-Level Capabilities
We will illuminate the mechanisms of higher-level capabilities of LMs to perform reasoning and simulations. We will also repair problems stemming from alignment steps.
Investigating Training Processes
We will investigate how training processes of LMs affect their emergent mechanisms and develop methods for fine-grained control over the training process.
Establishing a Benchmark
We will establish a standard benchmark for mechanistic interpretability of LMs to consolidate disparate efforts in the community.

Expected Outcomes

Taken as a whole, we expect the proposed research to empower different stakeholders and ensure a safe, beneficial, and responsible adoption of LMs in AI technologies by our society.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag	€ 1.500.000
Totale projectbegroting	€ 1.500.000

Tijdlijn

Startdatum	1-11-2024
Einddatum	31-10-2029
Subsidiejaar	2024

Partners & Locaties

Projectpartners

TECHNION - ISRAEL INSTITUTE OF TECHNOLOGYpenvoerder

Land(en)

Israel

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
MANUNKIND: Determinants and Dynamics of Collaborative Exploitation This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.	ERC STG	€ 1.497.749	2022	Details
Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.	ERC STG	€ 1.498.280	2022	Details
Uncovering the mechanisms of action of an antiviral bacterium This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.	ERC STG	€ 1.500.000	2023	Details
The Ethics of Loneliness and Sociability This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.	ERC STG	€ 1.025.860	2023	Details

ERC STG

MANUNKIND: Determinants and Dynamics of Collaborative Exploitation

This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.

ERC Starting Grant

€ 1.497.749

2022

Details

ERC STG

Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure

The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.

ERC Starting Grant

€ 1.498.280

2022

Details

ERC STG

Uncovering the mechanisms of action of an antiviral bacterium

This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.

ERC Starting Grant

€ 1.500.000

2023

Details

ERC STG

The Ethics of Loneliness and Sociability

This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.

ERC Starting Grant

€ 1.025.860

2023

Details

Vergelijkbare projecten uit andere regelingen

Project	Regeling	Bedrag	Jaar	Actie
Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.	ERC COG	€ 1.999.375	2023	Details
DEep COgnition Learning for LAnguage GEneration This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.	ERC COG	€ 1.999.595	2023	Details
Control for Deep and Federated Learning CoDeFeL aims to enhance machine learning methods through control theory, developing efficient ResNet architectures and federated learning techniques for applications in digital medicine and recommendations.	ERC ADG	€ 2.499.224	2024	Details

ERC COG

Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications

APHELEIA aims to create robust, interpretable, and efficient machine learning models that require less data by integrating classical methods with modern deep learning, fostering interdisciplinary collaboration.

ERC Consolidator Grant

€ 1.999.375

2023

Details

ERC COG

DEep COgnition Learning for LAnguage GEneration

This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.

ERC Consolidator Grant

€ 1.999.595

2023

Details

ERC ADG

Control for Deep and Federated Learning

CoDeFeL aims to enhance machine learning methods through control theory, developing efficient ResNet architectures and federated learning techniques for applications in digital medicine and recommendations.

ERC Advanced Grant

€ 2.499.224

2024

Details