Natural Language Understanding for non-standard languages and dialects
DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.
Projectdetails
Introduction
Dialects are ubiquitous and for many speakers are part of everyday life. They carry important social and communicative functions. Yet, dialects and non-standard languages in general are a blind spot in research on Natural Language Understanding (NLU). Despite recent breakthroughs, NLU still fails to take linguistic diversity into account. This lack of modeling language variation results in biased language models with high error rates on dialect data. This failure excludes millions of speakers today and prevents the development of future technology that can adapt to such users.
Need for a Paradigm Shift
To account for linguistic diversity, a paradigm shift is needed:
- Away from data-hungry algorithms with passive learning from large data and single ground truth labels, which are known to be biased.
- To go past current learning practices, the key is to tackle variation at both ends: in input data and label bias.
Proposed Approach: DIALECT
With DIALECT, I propose an integrated approach to devise algorithms which aid transfer from rich variability in inputs, and interactive learning which integrates human uncertainty in labels. This will reduce the need for data and enable better adaptation and generalization.
Advances in Deep Learning
Advances in salient areas of deep learning research now make it possible to tackle this challenge. DIALECT's objectives are to devise:
a) New algorithms and insights to address extremely scarce data setups and biased labels;
b) Novel representations which integrate auxiliary sources of information such as complement text data with speech;
c) New datasets with conversational data in its most natural form.
Conclusion
By integrating dialectal variation into models able to learn from scarce data and biased labels, the foundations will be established for fairer and more accurate NLU to break down language and literary barriers. I am privileged to carry out this integration as I have contributed to research in top venues on both cross-lingual learning and learning from biased labels.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 1.997.815 |
Totale projectbegroting | € 1.997.815 |
Tijdlijn
Startdatum | 1-10-2022 |
Einddatum | 30-9-2027 |
Subsidiejaar | 2022 |
Partners & Locaties
Projectpartners
- LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHENpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
MANUNKIND: Determinants and Dynamics of Collaborative ExploitationThis project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery. | ERC STG | € 1.497.749 | 2022 | Details |
Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressureThe UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance. | ERC STG | € 1.498.280 | 2022 | Details |
Uncovering the mechanisms of action of an antiviral bacteriumThis project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function. | ERC STG | € 1.500.000 | 2023 | Details |
The Ethics of Loneliness and SociabilityThis project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field. | ERC STG | € 1.025.860 | 2023 | Details |
MANUNKIND: Determinants and Dynamics of Collaborative Exploitation
This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.
Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure
The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.
Uncovering the mechanisms of action of an antiviral bacterium
This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.
The Ethics of Loneliness and Sociability
This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Next-Generation Natural Language GenerationThis project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications. | ERC STG | € 1.420.375 | 2022 | Details |
Algorithmic Bias Control in Deep learningThe project aims to develop a theory of algorithmic bias in deep learning to improve training efficiency and generalization performance for real-world applications. | ERC STG | € 1.500.000 | 2022 | Details |
Dynamics-Aware Theory of Deep LearningThis project aims to create a robust theoretical framework for deep learning, enhancing understanding and practical tools to improve model performance and reduce complexity in various applications. | ERC STG | € 1.498.410 | 2022 | Details |
Personalized and Subjective approaches to Natural Language ProcessingPERSONAE aims to revolutionize NLP by developing personalizable language technologies that empower individuals to adapt subjective tasks like sentiment analysis and abusive language detection. | ERC STG | € 1.499.775 | 2024 | Details |
Next-Generation Natural Language Generation
This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.
Algorithmic Bias Control in Deep learning
The project aims to develop a theory of algorithmic bias in deep learning to improve training efficiency and generalization performance for real-world applications.
Dynamics-Aware Theory of Deep Learning
This project aims to create a robust theoretical framework for deep learning, enhancing understanding and practical tools to improve model performance and reduce complexity in various applications.
Personalized and Subjective approaches to Natural Language Processing
PERSONAE aims to revolutionize NLP by developing personalizable language technologies that empower individuals to adapt subjective tasks like sentiment analysis and abusive language detection.