Modelling Text as a Living Object in Cross-Document Context
InterText establishes a comprehensive framework for intertextuality in NLP, enabling efficient cross-document understanding through novel data models and neural representations for diverse applications.
Projectdetails
Introduction
Interpreting text in the context of other texts is very hard: it requires understanding the fine-grained semantic relationships between documents called intertextual relationships. This is critical in many areas of human activity, including research, business, journalism, and others.
Challenges in Intertextuality
However, finding and interpreting intertextual relationships and tracing information throughout heterogeneous sources remains a tedious manual task. Natural language processing (NLP) fails to adequately support it: mainstream NLP considers texts as static, isolated entities, and existing approaches to cross-document understanding focus on narrow use cases and lack a common, theoretical foundation.
Data Limitations
Data is scarce and difficult to create, and the field lacks a principled framework for modelling intertextuality.
InterText Framework
InterText breaks new ground by proposing the first general framework for studying intertextuality in NLP. We instantiate our framework in three intertextuality types:
- Inline commentary
- Implicit linking
- Semantic versioning
We produce new datasets and generalizable models for each of them.
New Data Model
Rather than treating text as a sequence of words, we introduce a new data model that naturally reflects document structure and cross-document relationships. We use this data model to create novel, intertextuality-aware neural representations of text.
Synergies in Intertextuality
While prior work ignores similarities between different types of intertextuality, we target their synergies. Thus, we offer solutions that scale to a wide range of tasks and across domains.
Transfer Learning
To enable modular and efficient transfer learning, we propose new document-level adapter-based architectures.
Case Studies
We investigate integrative properties of our framework in two case studies:
- Academic peer review
- Conspiracy theory debunking
Conclusion
InterText creates a solid research platform for intertextuality-aware NLP crucial for managing the dynamic, interconnected digital discourse of today.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 2.499.721 |
Totale projectbegroting | € 2.499.721 |
Tijdlijn
Startdatum | 1-4-2023 |
Einddatum | 31-3-2028 |
Subsidiejaar | 2023 |
Partners & Locaties
Projectpartners
- TECHNISCHE UNIVERSITAT DARMSTADTpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
An Application for leveraging large-scale historical textbasesHistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques. | ERC Proof of... | € 150.000 | 2024 | Details |
Next-Generation Natural Language GenerationThis project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications. | ERC Starting... | € 1.420.375 | 2022 | Details |
Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text CorporaIntellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions. | ERC Synergy ... | € 9.902.166 | 2024 | Details |
Tensors and Neural Networks for Computational CreativityThis project aims to develop unsupervised language models using tensor constructs and advanced neural networks to enhance creativity in natural language generation. | ERC Consolid... | € 1.988.500 | 2024 | Details |
Natural Language Understanding for non-standard languages and dialectsDIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models. | ERC Consolid... | € 1.997.815 | 2022 | Details |
An Application for leveraging large-scale historical textbases
HistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques.
Next-Generation Natural Language Generation
This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.
Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora
Intellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions.
Tensors and Neural Networks for Computational Creativity
This project aims to develop unsupervised language models using tensor constructs and advanced neural networks to enhance creativity in natural language generation.
Natural Language Understanding for non-standard languages and dialects
DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Intenties van tekst herkennen middels neurale netwerkenMaxwell Labs en Xomnia ontwikkelen een intelligente cognitieve engine voor natural language processing van Europese talen, gericht op verbeterde intent herkenning via neurale netwerken. | Mkb-innovati... | € 199.960 | 2019 | Details |
Project HominisHet project richt zich op het ontwikkelen van een ethisch AI-systeem voor natuurlijke taalverwerking dat vooroordelen minimaliseert en technische, economische en regelgevingsrisico's beheert. | Mkb-innovati... | € 20.000 | 2022 | Details |
Intenties van tekst herkennen middels neurale netwerken
Maxwell Labs en Xomnia ontwikkelen een intelligente cognitieve engine voor natural language processing van Europese talen, gericht op verbeterde intent herkenning via neurale netwerken.
Project Hominis
Het project richt zich op het ontwikkelen van een ethisch AI-systeem voor natuurlijke taalverwerking dat vooroordelen minimaliseert en technische, economische en regelgevingsrisico's beheert.