Evaluating and Programming Intelligent Chatbots for Any Language

EPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity.

Subsidie
€ 2.498.200
2025

Projectdetails

Introduction

Intelligent chatbots (ICs) such as ChatGPT have revolutionized the generation of content for a few languages such as English. However, there are 7099 currently spoken languages in the world. EPICAL will, for the first time, determine how to add new low resource languages (LRLs) to ICs.

Project Advances

We will make six advances to revolutionize the capabilities of ICs, unifying different areas of research that are incorrectly studied separately:

  1. Hallucination-Free Text Generation
    Determine how to generate hallucination-free text using ICs, and how to enter a virtuous cycle where LRL text is created using cross-lingual knowledge from ICs. This process will involve quickly post-editing and training upon the generated text, resulting in a better LRL representation in the IC.

  2. Powerful Encoding and Language Adaptation
    Develop more powerful encoding and language adaptation approaches which combine the benefits of fine-tuning and adapters. This will take full advantage of linguistically related languages to model LRLs.

  3. Self-Assessment of LRL Capabilities
    Enable ICs to reason about their own LRL capabilities and determine what they know and do not know.

  4. Unification of Machine Translation and ICs
    Unify research on machine translation and ICs to obtain ICs that can translate to LRLs with state-of-the-art accuracy.

  5. High-Quality Speech Processing
    Enable high-quality text-to-speech and automatic speech recognition of LRLs with ICs. This will unify the research on low resource speech processing with research on LRL text processing.

  6. Novel Evaluation Methodology
    Develop a novel evaluation methodology, including a robust method for automatically measuring fact hallucination.

Research Significance

My research group is well-known for LRL research, which differs from large commercial labs focusing only on the top 200 languages. Our work is critical for a multilingual Europe that values the role of minority languages, culture, and heritage.

Broader Impact

Our innovations will benefit natural language processing beyond text generation and machine translation, and will strongly impact other areas of machine learning research suffering from data bottlenecks.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 2.498.200
Totale projectbegroting€ 2.498.200

Tijdlijn

Startdatum1-3-2025
Einddatum28-2-2030
Subsidiejaar2025

Partners & Locaties

Projectpartners

  • TECHNISCHE UNIVERSITAET MUENCHENpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC STG

MANUNKIND: Determinants and Dynamics of Collaborative Exploitation

This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.

€ 1.497.749
ERC STG

Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure

The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.

€ 1.498.280
ERC STG

Uncovering the mechanisms of action of an antiviral bacterium

This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.

€ 1.500.000
ERC STG

The Ethics of Loneliness and Sociability

This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.

€ 1.025.860

Vergelijkbare projecten uit andere regelingen

ERC COG

DEep COgnition Learning for LAnguage GEneration

This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.

€ 1.999.595
ERC POC

A prototype system for obtaining and managing training data for multilingual learning

The project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype.

€ 150.000
ERC STG

Controlling Large Language Models

Develop a framework to understand and control large language models, addressing biases and flaws to ensure safe and responsible AI adoption.

€ 1.500.000
ERC COG

Educational Inequalities and Generative AI: A Focus on Language Diversity

ChatEQUITY explores the impact of generative AI on educational inequalities by developing a new assessment tool and analyzing access disparities among students and teachers in diverse linguistic contexts.

€ 2.195.500