Planetary-scale indexing of sequencing data

Develop a planetary genomic search engine to efficiently index and analyze vast DNA and RNA sequencing data, enabling groundbreaking biological discoveries and improved data accessibility.

Subsidie
€ 1.933.625
2023

Projectdetails

Introduction

Nowadays a huge amount of planetary DNA and RNA sequencing data is available and continues to double every two years. However, efficiently analyzing this data is impossible due to its size measured in petabases. Many high-impact biological discoveries could be made but are prevented by the lack of fast search algorithms. I recently demonstrated this potential by discovering an order of magnitude more RNA virus species within all public RNA samples. A global index, i.e. a planetary genomic search engine, would unlock instant and inexpensive search within petabase-scale data.

Hypothesis

I hypothesize that I can create a searchable index for all of the public DNA and RNA sequencing data. Leveraging my unique expertise across algorithms and data structures for biological sequences, my plan is to design efficient methods to assemble and compress all available sequencing data, and then construct an external-memory index that will support versatile biological queries.

Potential Impact

A planetary sequencing data index will enable a myriad of bioinformatics analyses that are currently out of reach. I will demonstrate the utility of the index by:

  1. Constructing a database of human transcripts with novel disease associations.
  2. Discovering novel microbial species.
  3. Providing a search engine for environmental metagenomes.

The resulting unprecedented collection of assembled genomes and compressed reads will lift a major challenge in data accessibility, improving its efficiency by several orders of magnitude, revolutionizing the scale of future bioinformatics analyses.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.933.625
Totale projectbegroting€ 1.933.625

Tijdlijn

Startdatum1-9-2023
Einddatum31-8-2028
Subsidiejaar2023

Partners & Locaties

Projectpartners

  • INSTITUT PASTEURpenvoerder

Land(en)

France

Vergelijkbare projecten binnen European Research Council

ERC STG

MANUNKIND: Determinants and Dynamics of Collaborative Exploitation

This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.

€ 1.497.749
ERC STG

Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure

The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.

€ 1.498.280
ERC STG

Uncovering the mechanisms of action of an antiviral bacterium

This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.

€ 1.500.000
ERC STG

The Ethics of Loneliness and Sociability

This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.

€ 1.025.860

Vergelijkbare projecten uit andere regelingen

ERC STG

Optical Sequencing inside Live Cells with Biointegrated Nanolasers

HYPERION aims to revolutionize intracellular biosensing by using plasmonic nanolasers for real-time detection of RNA, enhancing our understanding of molecular processes in living cells.

€ 1.577.695
EIC Pathfinder

Processing-in-memory architectures and programming libraries for bioinformatics algorithms

This project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis.

€ 1.966.665
ERC POC

Linking genome variation with haplotype-resolved sequencing

The project aims to validate and scale the haplotagging technique for DNA sequencing, enhancing haplotype context while integrating with existing Illumina technology to improve disease detection.

€ 150.000
ERC ADG

The sequencing microscope - a path to look at the molecules of biology

This project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy.

€ 2.500.000