Evaluating and Programming Intelligent Chatbots for Any Language
EPICAL aims to enhance intelligent chatbots by integrating low resource languages through innovative techniques in text generation, translation, and speech processing, promoting multilingual inclusivity.
Projectdetails
Introduction
Intelligent chatbots (ICs) such as ChatGPT have revolutionized the generation of content for a few languages such as English. However, there are 7099 currently spoken languages in the world. EPICAL will, for the first time, determine how to add new low resource languages (LRLs) to ICs.
Project Advances
We will make six advances to revolutionize the capabilities of ICs, unifying different areas of research that are incorrectly studied separately:
-
Hallucination-Free Text Generation
Determine how to generate hallucination-free text using ICs, and how to enter a virtuous cycle where LRL text is created using cross-lingual knowledge from ICs. This process will involve quickly post-editing and training upon the generated text, resulting in a better LRL representation in the IC. -
Powerful Encoding and Language Adaptation
Develop more powerful encoding and language adaptation approaches which combine the benefits of fine-tuning and adapters. This will take full advantage of linguistically related languages to model LRLs. -
Self-Assessment of LRL Capabilities
Enable ICs to reason about their own LRL capabilities and determine what they know and do not know. -
Unification of Machine Translation and ICs
Unify research on machine translation and ICs to obtain ICs that can translate to LRLs with state-of-the-art accuracy. -
High-Quality Speech Processing
Enable high-quality text-to-speech and automatic speech recognition of LRLs with ICs. This will unify the research on low resource speech processing with research on LRL text processing. -
Novel Evaluation Methodology
Develop a novel evaluation methodology, including a robust method for automatically measuring fact hallucination.
Research Significance
My research group is well-known for LRL research, which differs from large commercial labs focusing only on the top 200 languages. Our work is critical for a multilingual Europe that values the role of minority languages, culture, and heritage.
Broader Impact
Our innovations will benefit natural language processing beyond text generation and machine translation, and will strongly impact other areas of machine learning research suffering from data bottlenecks.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 2.498.200 |
Totale projectbegroting | € 2.498.200 |
Tijdlijn
Startdatum | 1-3-2025 |
Einddatum | 28-2-2030 |
Subsidiejaar | 2025 |
Partners & Locaties
Projectpartners
- TECHNISCHE UNIVERSITAET MUENCHENpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
MANUNKIND: Determinants and Dynamics of Collaborative ExploitationThis project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery. | ERC STG | € 1.497.749 | 2022 | Details |
Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressureThe UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance. | ERC STG | € 1.498.280 | 2022 | Details |
Uncovering the mechanisms of action of an antiviral bacteriumThis project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function. | ERC STG | € 1.500.000 | 2023 | Details |
The Ethics of Loneliness and SociabilityThis project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field. | ERC STG | € 1.025.860 | 2023 | Details |
MANUNKIND: Determinants and Dynamics of Collaborative Exploitation
This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.
Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure
The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.
Uncovering the mechanisms of action of an antiviral bacterium
This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.
The Ethics of Loneliness and Sociability
This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
DEep COgnition Learning for LAnguage GEnerationThis project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks. | ERC COG | € 1.999.595 | 2023 | Details |
A prototype system for obtaining and managing training data for multilingual learningThe project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype. | ERC POC | € 150.000 | 2023 | Details |
Controlling Large Language ModelsDevelop a framework to understand and control large language models, addressing biases and flaws to ensure safe and responsible AI adoption. | ERC STG | € 1.500.000 | 2024 | Details |
Educational Inequalities and Generative AI: A Focus on Language DiversityChatEQUITY explores the impact of generative AI on educational inequalities by developing a new assessment tool and analyzing access disparities among students and teachers in diverse linguistic contexts. | ERC COG | € 2.195.500 | 2025 | Details |
DEep COgnition Learning for LAnguage GEneration
This project aims to enhance NLP models by integrating machine learning, cognitive science, and structured memory to improve out-of-domain generalization and contextual understanding in language generation tasks.
A prototype system for obtaining and managing training data for multilingual learning
The project aims to empower less-resourced language communities to create parallel corpora for machine translation, enhancing language preservation and cultural heritage through an open-source prototype.
Controlling Large Language Models
Develop a framework to understand and control large language models, addressing biases and flaws to ensure safe and responsible AI adoption.
Educational Inequalities and Generative AI: A Focus on Language Diversity
ChatEQUITY explores the impact of generative AI on educational inequalities by developing a new assessment tool and analyzing access disparities among students and teachers in diverse linguistic contexts.