Natural Language Understanding for non-standard languages and dialects

DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.

Subsidie
€ 1.997.815
2022

Projectdetails

Introduction

Dialects are ubiquitous and for many speakers are part of everyday life. They carry important social and communicative functions. Yet, dialects and non-standard languages in general are a blind spot in research on Natural Language Understanding (NLU). Despite recent breakthroughs, NLU still fails to take linguistic diversity into account. This lack of modeling language variation results in biased language models with high error rates on dialect data. This failure excludes millions of speakers today and prevents the development of future technology that can adapt to such users.

Need for a Paradigm Shift

To account for linguistic diversity, a paradigm shift is needed:

  1. Away from data-hungry algorithms with passive learning from large data and single ground truth labels, which are known to be biased.
  2. To go past current learning practices, the key is to tackle variation at both ends: in input data and label bias.

Proposed Approach: DIALECT

With DIALECT, I propose an integrated approach to devise algorithms which aid transfer from rich variability in inputs, and interactive learning which integrates human uncertainty in labels. This will reduce the need for data and enable better adaptation and generalization.

Advances in Deep Learning

Advances in salient areas of deep learning research now make it possible to tackle this challenge. DIALECT's objectives are to devise:

a) New algorithms and insights to address extremely scarce data setups and biased labels;
b) Novel representations which integrate auxiliary sources of information such as complement text data with speech;
c) New datasets with conversational data in its most natural form.

Conclusion

By integrating dialectal variation into models able to learn from scarce data and biased labels, the foundations will be established for fairer and more accurate NLU to break down language and literary barriers. I am privileged to carry out this integration as I have contributed to research in top venues on both cross-lingual learning and learning from biased labels.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.997.815
Totale projectbegroting€ 1.997.815

Tijdlijn

Startdatum1-10-2022
Einddatum30-9-2027
Subsidiejaar2022

Partners & Locaties

Projectpartners

  • LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHENpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC STG

MANUNKIND: Determinants and Dynamics of Collaborative Exploitation

This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.

€ 1.497.749
ERC STG

Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure

The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.

€ 1.498.280
ERC STG

Uncovering the mechanisms of action of an antiviral bacterium

This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.

€ 1.500.000
ERC STG

The Ethics of Loneliness and Sociability

This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.

€ 1.025.860

Vergelijkbare projecten uit andere regelingen

ERC STG

Next-Generation Natural Language Generation

This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.

€ 1.420.375
ERC STG

Algorithmic Bias Control in Deep learning

The project aims to develop a theory of algorithmic bias in deep learning to improve training efficiency and generalization performance for real-world applications.

€ 1.500.000
ERC STG

Dynamics-Aware Theory of Deep Learning

This project aims to create a robust theoretical framework for deep learning, enhancing understanding and practical tools to improve model performance and reduce complexity in various applications.

€ 1.498.410
ERC STG

Personalized and Subjective approaches to Natural Language Processing

PERSONAE aims to revolutionize NLP by developing personalizable language technologies that empower individuals to adapt subjective tasks like sentiment analysis and abusive language detection.

€ 1.499.775