Natural Language Understanding for non-standard languages and dialects

DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.

Subsidie

€ 1.997.815

2022

Projectdetails

Introduction

Dialects are ubiquitous and for many speakers are part of everyday life. They carry important social and communicative functions. Yet, dialects and non-standard languages in general are a blind spot in research on Natural Language Understanding (NLU). Despite recent breakthroughs, NLU still fails to take linguistic diversity into account. This lack of modeling language variation results in biased language models with high error rates on dialect data. This failure excludes millions of speakers today and prevents the development of future technology that can adapt to such users.

Need for a Paradigm Shift

To account for linguistic diversity, a paradigm shift is needed:

Away from data-hungry algorithms with passive learning from large data and single ground truth labels, which are known to be biased.
To go past current learning practices, the key is to tackle variation at both ends: in input data and label bias.

Proposed Approach: DIALECT

With DIALECT, I propose an integrated approach to devise algorithms which aid transfer from rich variability in inputs, and interactive learning which integrates human uncertainty in labels. This will reduce the need for data and enable better adaptation and generalization.

Advances in Deep Learning

Advances in salient areas of deep learning research now make it possible to tackle this challenge. DIALECT's objectives are to devise:

a) New algorithms and insights to address extremely scarce data setups and biased labels;
b) Novel representations which integrate auxiliary sources of information such as complement text data with speech;
c) New datasets with conversational data in its most natural form.

Conclusion

By integrating dialectal variation into models able to learn from scarce data and biased labels, the foundations will be established for fairer and more accurate NLU to break down language and literary barriers. I am privileged to carry out this integration as I have contributed to research in top venues on both cross-lingual learning and learning from biased labels.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag	€ 1.997.815
Totale projectbegroting	€ 1.997.815

Tijdlijn

Startdatum	1-10-2022
Einddatum	30-9-2027
Subsidiejaar	2022

Partners & Locaties

Projectpartners

LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHENpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
MANUNKIND: Determinants and Dynamics of Collaborative Exploitation This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.	ERC STG	€ 1.497.749	2022	Details
Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.	ERC STG	€ 1.498.280	2022	Details
Uncovering the mechanisms of action of an antiviral bacterium This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.	ERC STG	€ 1.500.000	2023	Details
The Ethics of Loneliness and Sociability This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.	ERC STG	€ 1.025.860	2023	Details

ERC STG

MANUNKIND: Determinants and Dynamics of Collaborative Exploitation

This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.

ERC Starting Grant

€ 1.497.749

2022

Details

ERC STG

Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure

The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.

ERC Starting Grant

€ 1.498.280

2022

Details

ERC STG

Uncovering the mechanisms of action of an antiviral bacterium

This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.

ERC Starting Grant

€ 1.500.000

2023

Details

ERC STG

The Ethics of Loneliness and Sociability

This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.

ERC Starting Grant

€ 1.025.860

2023

Details

Vergelijkbare projecten uit andere regelingen

Project	Regeling	Bedrag	Jaar	Actie
Next-Generation Natural Language Generation This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.	ERC STG	€ 1.420.375	2022	Details
Algorithmic Bias Control in Deep learning The project aims to develop a theory of algorithmic bias in deep learning to improve training efficiency and generalization performance for real-world applications.	ERC STG	€ 1.500.000	2022	Details
Dynamics-Aware Theory of Deep Learning This project aims to create a robust theoretical framework for deep learning, enhancing understanding and practical tools to improve model performance and reduce complexity in various applications.	ERC STG	€ 1.498.410	2022	Details
Personalized and Subjective approaches to Natural Language Processing PERSONAE aims to revolutionize NLP by developing personalizable language technologies that empower individuals to adapt subjective tasks like sentiment analysis and abusive language detection.	ERC STG	€ 1.499.775	2024	Details

ERC STG

Next-Generation Natural Language Generation

This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.

ERC Starting Grant

€ 1.420.375

2022

Details

ERC STG

Algorithmic Bias Control in Deep learning

The project aims to develop a theory of algorithmic bias in deep learning to improve training efficiency and generalization performance for real-world applications.

ERC Starting Grant

€ 1.500.000

2022

Details

ERC STG

Dynamics-Aware Theory of Deep Learning

This project aims to create a robust theoretical framework for deep learning, enhancing understanding and practical tools to improve model performance and reduce complexity in various applications.

ERC Starting Grant

€ 1.498.410

2022

Details

ERC STG

Personalized and Subjective approaches to Natural Language Processing

PERSONAE aims to revolutionize NLP by developing personalizable language technologies that empower individuals to adapt subjective tasks like sentiment analysis and abusive language detection.

ERC Starting Grant

€ 1.499.775

2024

Details