Modelling Text as a Living Object in Cross-Document Context

InterText establishes a comprehensive framework for intertextuality in NLP, enabling efficient cross-document understanding through novel data models and neural representations for diverse applications.

Subsidie

€ 2.499.721

2023

Projectdetails

Introduction

Interpreting text in the context of other texts is very hard: it requires understanding the fine-grained semantic relationships between documents called intertextual relationships. This is critical in many areas of human activity, including research, business, journalism, and others.

Challenges in Intertextuality

However, finding and interpreting intertextual relationships and tracing information throughout heterogeneous sources remains a tedious manual task. Natural language processing (NLP) fails to adequately support it: mainstream NLP considers texts as static, isolated entities, and existing approaches to cross-document understanding focus on narrow use cases and lack a common, theoretical foundation.

Data Limitations

Data is scarce and difficult to create, and the field lacks a principled framework for modelling intertextuality.

InterText Framework

InterText breaks new ground by proposing the first general framework for studying intertextuality in NLP. We instantiate our framework in three intertextuality types:

Inline commentary
Implicit linking
Semantic versioning

We produce new datasets and generalizable models for each of them.

New Data Model

Rather than treating text as a sequence of words, we introduce a new data model that naturally reflects document structure and cross-document relationships. We use this data model to create novel, intertextuality-aware neural representations of text.

Synergies in Intertextuality

While prior work ignores similarities between different types of intertextuality, we target their synergies. Thus, we offer solutions that scale to a wide range of tasks and across domains.

Transfer Learning

To enable modular and efficient transfer learning, we propose new document-level adapter-based architectures.

Case Studies

We investigate integrative properties of our framework in two case studies:

Academic peer review
Conspiracy theory debunking

Conclusion

InterText creates a solid research platform for intertextuality-aware NLP crucial for managing the dynamic, interconnected digital discourse of today.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag	€ 2.499.721
Totale projectbegroting	€ 2.499.721

Tijdlijn

Startdatum	1-4-2023
Einddatum	31-3-2028
Subsidiejaar	2023

Partners & Locaties

Projectpartners

TECHNISCHE UNIVERSITAT DARMSTADTpenvoerder

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
An Application for leveraging large-scale historical textbases HistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques.	ERC Proof of...	€ 150.000	2024	Details
Next-Generation Natural Language Generation This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.	ERC Starting...	€ 1.420.375	2022	Details
Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora Intellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions.	ERC Synergy ...	€ 9.902.166	2024	Details
Tensors and Neural Networks for Computational Creativity This project aims to develop unsupervised language models using tensor constructs and advanced neural networks to enhance creativity in natural language generation.	ERC Consolid...	€ 1.988.500	2024	Details
Natural Language Understanding for non-standard languages and dialects DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.	ERC Consolid...	€ 1.997.815	2022	Details

ERC Proof of...

An Application for leveraging large-scale historical textbases

HistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques.

ERC Proof of Concept

€ 150.000

2024

Details

ERC Starting...

Next-Generation Natural Language Generation

This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.

ERC Starting Grant

€ 1.420.375

2022

Details

ERC Synergy ...

Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora

Intellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions.

ERC Synergy Grant

€ 9.902.166

2024

Details

ERC Consolid...

Tensors and Neural Networks for Computational Creativity

This project aims to develop unsupervised language models using tensor constructs and advanced neural networks to enhance creativity in natural language generation.

ERC Consolidator Grant

€ 1.988.500

2024

Details

ERC Consolid...

Natural Language Understanding for non-standard languages and dialects

DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.

ERC Consolidator Grant

€ 1.997.815

2022

Details

Vergelijkbare projecten uit andere regelingen

Project	Regeling	Bedrag	Jaar	Actie
Intenties van tekst herkennen middels neurale netwerken Maxwell Labs en Xomnia ontwikkelen een intelligente cognitieve engine voor natural language processing van Europese talen, gericht op verbeterde intent herkenning via neurale netwerken.	Mkb-innovati...	€ 199.960	2019	Details
Project Hominis Het project richt zich op het ontwikkelen van een ethisch AI-systeem voor natuurlijke taalverwerking dat vooroordelen minimaliseert en technische, economische en regelgevingsrisico's beheert.	Mkb-innovati...	€ 20.000	2022	Details

Mkb-innovati...

Intenties van tekst herkennen middels neurale netwerken

Maxwell Labs en Xomnia ontwikkelen een intelligente cognitieve engine voor natural language processing van Europese talen, gericht op verbeterde intent herkenning via neurale netwerken.

Mkb-innovatiestimulering Topsectoren R&D Samenwerking

€ 199.960

2019

Details

Mkb-innovati...

Project Hominis

Het project richt zich op het ontwikkelen van een ethisch AI-systeem voor natuurlijke taalverwerking dat vooroordelen minimaliseert en technische, economische en regelgevingsrisico's beheert.

Mkb-innovatiestimulering Topsectoren Haalbaarheid

€ 20.000

2022

Details

Projectdetails

Introduction

Challenges in Intertextuality

Data Limitations

Data is scarce and difficult to create, and the field lacks a principled framework for modelling intertextuality.

InterText Framework

InterText breaks new ground by proposing the first general framework for studying intertextuality in NLP. We instantiate our framework in three intertextuality types:

Inline commentary
Implicit linking
Semantic versioning

We produce new datasets and generalizable models for each of them.

New Data Model

Synergies in Intertextuality

While prior work ignores similarities between different types of intertextuality, we target their synergies. Thus, we offer solutions that scale to a wide range of tasks and across domains.

Transfer Learning

To enable modular and efficient transfer learning, we propose new document-level adapter-based architectures.

Case Studies

We investigate integrative properties of our framework in two case studies:

Academic peer review
Conspiracy theory debunking

Conclusion

InterText creates a solid research platform for intertextuality-aware NLP crucial for managing the dynamic, interconnected digital discourse of today.

Vergelijkbare projecten binnen European Research Council

Project	Regeling	Bedrag	Jaar	Actie
An Application for leveraging large-scale historical textbases HistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques.	ERC Proof of...	€ 150.000	2024	Details
Next-Generation Natural Language Generation This project aims to enhance natural language generation by integrating neural models with symbolic representations for better control, adaptability, and reliable evaluation across various applications.	ERC Starting...	€ 1.420.375	2022	Details
Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora Intellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions.	ERC Synergy ...	€ 9.902.166	2024	Details
Tensors and Neural Networks for Computational Creativity This project aims to develop unsupervised language models using tensor constructs and advanced neural networks to enhance creativity in natural language generation.	ERC Consolid...	€ 1.988.500	2024	Details
Natural Language Understanding for non-standard languages and dialects DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.	ERC Consolid...	€ 1.997.815	2022	Details

ERC Proof of...

An Application for leveraging large-scale historical textbases

ERC Proof of Concept

€ 150.000

2024

Details

ERC Starting...

Next-Generation Natural Language Generation

ERC Starting Grant

€ 1.420.375

2022

Details

ERC Synergy ...

Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora

ERC Synergy Grant

€ 9.902.166

2024

Details

ERC Consolid...

Tensors and Neural Networks for Computational Creativity

This project aims to develop unsupervised language models using tensor constructs and advanced neural networks to enhance creativity in natural language generation.

ERC Consolidator Grant

€ 1.988.500

2024

Details

ERC Consolid...

Natural Language Understanding for non-standard languages and dialects

DIALECT aims to enhance Natural Language Understanding by developing algorithms that integrate dialectal variation and reduce bias in data and labels for fairer, more accurate language models.

ERC Consolidator Grant

€ 1.997.815

2022

Details

Vergelijkbare projecten uit andere regelingen

Project	Regeling	Bedrag	Jaar	Actie
Intenties van tekst herkennen middels neurale netwerken Maxwell Labs en Xomnia ontwikkelen een intelligente cognitieve engine voor natural language processing van Europese talen, gericht op verbeterde intent herkenning via neurale netwerken.	Mkb-innovati...	€ 199.960	2019	Details
Project Hominis Het project richt zich op het ontwikkelen van een ethisch AI-systeem voor natuurlijke taalverwerking dat vooroordelen minimaliseert en technische, economische en regelgevingsrisico's beheert.	Mkb-innovati...	€ 20.000	2022	Details

Mkb-innovati...

Intenties van tekst herkennen middels neurale netwerken

Maxwell Labs en Xomnia ontwikkelen een intelligente cognitieve engine voor natural language processing van Europese talen, gericht op verbeterde intent herkenning via neurale netwerken.

Mkb-innovatiestimulering Topsectoren R&D Samenwerking

€ 199.960

2019

Details

Mkb-innovati...

Project Hominis

Het project richt zich op het ontwikkelen van een ethisch AI-systeem voor natuurlijke taalverwerking dat vooroordelen minimaliseert en technische, economische en regelgevingsrisico's beheert.

Mkb-innovatiestimulering Topsectoren Haalbaarheid

€ 20.000

2022

Details