Modelling Text as a Living Object in Cross-Document Context

InterText establishes a comprehensive framework for intertextuality in NLP, enabling efficient cross-document understanding through novel data models and neural representations for diverse applications.

Subsidie
€ 2.499.721
2023

Projectdetails

Introduction

Interpreting text in the context of other texts is very hard: it requires understanding the fine-grained semantic relationships between documents called intertextual relationships. This is critical in many areas of human activity, including research, business, journalism, and others.

Challenges in Intertextuality

However, finding and interpreting intertextual relationships and tracing information throughout heterogeneous sources remains a tedious manual task. Natural language processing (NLP) fails to adequately support it: mainstream NLP considers texts as static, isolated entities, and existing approaches to cross-document understanding focus on narrow use cases and lack a common, theoretical foundation.

Data Limitations

Data is scarce and difficult to create, and the field lacks a principled framework for modelling intertextuality.

InterText Framework

InterText breaks new ground by proposing the first general framework for studying intertextuality in NLP. We instantiate our framework in three intertextuality types:

  1. Inline commentary
  2. Implicit linking
  3. Semantic versioning

We produce new datasets and generalizable models for each of them.

New Data Model

Rather than treating text as a sequence of words, we introduce a new data model that naturally reflects document structure and cross-document relationships. We use this data model to create novel, intertextuality-aware neural representations of text.

Synergies in Intertextuality

While prior work ignores similarities between different types of intertextuality, we target their synergies. Thus, we offer solutions that scale to a wide range of tasks and across domains.

Transfer Learning

To enable modular and efficient transfer learning, we propose new document-level adapter-based architectures.

Case Studies

We investigate integrative properties of our framework in two case studies:

  1. Academic peer review
  2. Conspiracy theory debunking

Conclusion

InterText creates a solid research platform for intertextuality-aware NLP crucial for managing the dynamic, interconnected digital discourse of today.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 2.499.721
Totale projectbegroting€ 2.499.721

Tijdlijn

Startdatum1-4-2023
Einddatum31-3-2028
Subsidiejaar2023

Partners & Locaties

Projectpartners

  • TECHNISCHE UNIVERSITAT DARMSTADTpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC Proof of...

An Application for leveraging large-scale historical textbases

HistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques.

€ 150.000
ERC Starting...

Next-Generation Natural Language Generation

This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.

€ 1.420.375
ERC Synergy ...

Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora

Intellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions.

€ 9.902.166
ERC Consolid...

Tensors and Neural Networks for Computational Creativity

This project aims to develop unsupervised language models using tensor constructs and advanced neural networks to enhance creativity in natural language generation.

€ 1.988.500
ERC Consolid...

Natural Language Understanding for non-standard languages and dialects

DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.

€ 1.997.815

Vergelijkbare projecten uit andere regelingen

Mkb-innovati...

Intenties van tekst herkennen middels neurale netwerken

Maxwell Labs en Xomnia ontwikkelen een intelligente cognitieve engine voor natural language processing van Europese talen, gericht op verbeterde intent herkenning via neurale netwerken.

€ 199.960
Mkb-innovati...

Project Hominis

Het project richt zich op het ontwikkelen van een ethisch AI-systeem voor natuurlijke taalverwerking dat vooroordelen minimaliseert en technische, economische en regelgevingsrisico's beheert.

€ 20.000