An Application for leveraging large-scale historical textbases

HistText is a user-friendly application designed for large-scale data mining of historical texts, enabling scholars to extract insights from vast multilingual corpora using advanced machine learning techniques.

Subsidie
€ 150.000
2024

Projectdetails

Introduction

HistText is a groundbreaking application developed to address the complex challenges of large-scale data mining in textual corpora, with a particular focus on historical documents. Created in the context of the ERC-funded ENP-China project, which aims to study the evolution of Chinese elites from the 19th century to 1949, HistText is the result of a synergistic collaboration between historians and computer scientists exploring machine learning applications for extensive text archives.

Features

Designed to manage databases containing billions of words across millions of multilingual documents, HistText offers a robust and versatile platform that streamlines the process of extracting and visualizing valuable insights. The application features:

  • A user-friendly interface
  • Advanced text analysis techniques
  • Powerful data visualization capabilities

It provides a simplified approach for novice users to conduct complex data queries and analyses, while also offering a comprehensive R-library for more expert users.

Challenges

The main challenge that the proof of concept aims to tackle is to make HistText a fully packageable and transferable tool that can cater to the specialized needs of scholars and institutions holding vast digital repositories.

Impact

With its focus on advanced text analysis and user accessibility, HistText stands as an invaluable resource not only for academics in the digital humanities but also for students and the general public.

Broader Applications

In terms of broader applications, HistText has the potential to be integrated into a wide range of institutions, including:

  1. Libraries
  2. Digital content providers
  3. Other educational and research institutions

The platform is exceptionally well-suited for analyzing a wide range of text genres, including newspapers, periodicals, directories, and diaries, among others.

Conclusion

By offering a scalable, user-friendly, and methodologically rigorous tool, HistText aims to revolutionize how we approach large-scale textual analysis, providing a new pathway for understanding historical documents.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 150.000
Totale projectbegroting€ 150.000

Tijdlijn

Startdatum1-9-2024
Einddatum28-2-2026
Subsidiejaar2024

Partners & Locaties

Projectpartners

  • UNIVERSITE D'AIX MARSEILLEpenvoerder

Land(en)

France

Vergelijkbare projecten binnen European Research Council

ERC Advanced...

Modelling Text as a Living Object in Cross-Document Context

InterText establishes a comprehensive framework for intertextuality in NLP, enabling efficient cross-document understanding through novel data models and neural representations for diverse applications.

€ 2.499.721
ERC Proof of...

Tool for the Analysis of Information Transfer in Manuscript Cultures

The TInTraMac project develops a free, flexible research tool to encourage scholars to adopt digital methods for analyzing manuscript texts, enhancing accessibility and collaboration in textual studies.

€ 150.000
ERC Synergy ...

Geology of Texts, Genealogy of Concepts, Intellectual Ecosystems: Mapping the Indic and Tibetic Buddhist Text Corpora

Intellexus aims to uncover the interdependent development of Indic and Tibetic Buddhist texts and ideas through innovative mapping and visualization methods, enhancing understanding of their cultural traditions.

€ 9.902.166
ERC Synergy ...

Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script

MIDRASH aims to develop an interdisciplinary methodology using advanced technologies to study and reconstruct medieval Hebrew manuscripts, enhancing understanding of Jewish literary culture and its historical significance.

€ 10.296.259
ERC Proof of...

Building an AI-tool to facilitate the integration, accessibility, and usability of heterogeneous cultural heritage data on medieval manuscripts.

ManuscriptAI aims to integrate and enhance access to medieval manuscript data using machine learning, promoting digital preservation and inclusivity in Europe's cultural heritage narrative.

€ 150.000

Vergelijkbare projecten uit andere regelingen

Mkb-innovati...

Bias Neutraliser

CorTexter ontwikkelt een deep learning software om onbedoelde vooroordelen in recruitmentteksten te herkennen en te neutraliseren, waardoor gelijke kansen voor werkzoekenden worden bevorderd.

€ 20.000
Mkb-innovati...

Elementa Labs MIT 2022 – Digitaliseren van Lab notebooks van kennisinstanties

Het project digitaliseert oude lab notebooks van universiteiten om waardevolle kennis toegankelijk te maken, waardoor nieuwe onderzoekers fouten kunnen vermijden en efficiënter kunnen experimenteren.

€ 19.200
Mkb-innovati...

Real time knowledge extraction from unstructured big data streams

Dit project ontwikkelt een applicatie voor het structureren van ongestructureerde data uit sociale media om de productiviteit in de agrarische sector te verbeteren via machine learning.

€ 199.307
Mkb-innovati...

CorTexter

CorTexter ontwikkelt een inclusieve tool voor datagedreven screening en matching, die biases voorkomt en aanvullende informatie benut om eerlijke arbeidskansen te bevorderen.

€ 20.000
Mkb-innovati...

Inzet van computational linguistics voor het vergaren van military intelligence

Dit project onderzoekt de haalbaarheid van computational linguistics voor het vergaren van militaire inlichtingen ter verbetering van veiligheid.

€ 20.000