Planetary-scale indexing of sequencing data
Develop a planetary genomic search engine to efficiently index and analyze vast DNA and RNA sequencing data, enabling groundbreaking biological discoveries and improved data accessibility.
Projectdetails
Introduction
Nowadays a huge amount of planetary DNA and RNA sequencing data is available and continues to double every two years. However, efficiently analyzing this data is impossible due to its size measured in petabases. Many high-impact biological discoveries could be made but are prevented by the lack of fast search algorithms. I recently demonstrated this potential by discovering an order of magnitude more RNA virus species within all public RNA samples. A global index, i.e. a planetary genomic search engine, would unlock instant and inexpensive search within petabase-scale data.
Hypothesis
I hypothesize that I can create a searchable index for all of the public DNA and RNA sequencing data. Leveraging my unique expertise across algorithms and data structures for biological sequences, my plan is to design efficient methods to assemble and compress all available sequencing data, and then construct an external-memory index that will support versatile biological queries.
Potential Impact
A planetary sequencing data index will enable a myriad of bioinformatics analyses that are currently out of reach. I will demonstrate the utility of the index by:
- Constructing a database of human transcripts with novel disease associations.
- Discovering novel microbial species.
- Providing a search engine for environmental metagenomes.
The resulting unprecedented collection of assembled genomes and compressed reads will lift a major challenge in data accessibility, improving its efficiency by several orders of magnitude, revolutionizing the scale of future bioinformatics analyses.
Financiële details & Tijdlijn
Financiële details
Subsidiebedrag | € 1.933.625 |
Totale projectbegroting | € 1.933.625 |
Tijdlijn
Startdatum | 1-9-2023 |
Einddatum | 31-8-2028 |
Subsidiejaar | 2023 |
Partners & Locaties
Projectpartners
- INSTITUT PASTEURpenvoerder
Land(en)
Vergelijkbare projecten binnen European Research Council
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
MANUNKIND: Determinants and Dynamics of Collaborative ExploitationThis project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery. | ERC STG | € 1.497.749 | 2022 | Details |
Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressureThe UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance. | ERC STG | € 1.498.280 | 2022 | Details |
Uncovering the mechanisms of action of an antiviral bacteriumThis project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function. | ERC STG | € 1.500.000 | 2023 | Details |
The Ethics of Loneliness and SociabilityThis project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field. | ERC STG | € 1.025.860 | 2023 | Details |
MANUNKIND: Determinants and Dynamics of Collaborative Exploitation
This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.
Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure
The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.
Uncovering the mechanisms of action of an antiviral bacterium
This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.
The Ethics of Loneliness and Sociability
This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.
Vergelijkbare projecten uit andere regelingen
Project | Regeling | Bedrag | Jaar | Actie |
---|---|---|---|---|
Optical Sequencing inside Live Cells with Biointegrated NanolasersHYPERION aims to revolutionize intracellular biosensing by using plasmonic nanolasers for real-time detection of RNA, enhancing our understanding of molecular processes in living cells. | ERC STG | € 1.577.695 | 2022 | Details |
Processing-in-memory architectures and programming libraries for bioinformatics algorithmsThis project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis. | EIC Pathfinder | € 1.966.665 | 2022 | Details |
Linking genome variation with haplotype-resolved sequencingThe project aims to validate and scale the haplotagging technique for DNA sequencing, enhancing haplotype context while integrating with existing Illumina technology to improve disease detection. | ERC POC | € 150.000 | 2022 | Details |
The sequencing microscope - a path to look at the molecules of biologyThis project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy. | ERC ADG | € 2.500.000 | 2024 | Details |
Optical Sequencing inside Live Cells with Biointegrated Nanolasers
HYPERION aims to revolutionize intracellular biosensing by using plasmonic nanolasers for real-time detection of RNA, enhancing our understanding of molecular processes in living cells.
Processing-in-memory architectures and programming libraries for bioinformatics algorithms
This project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis.
Linking genome variation with haplotype-resolved sequencing
The project aims to validate and scale the haplotagging technique for DNA sequencing, enhancing haplotype context while integrating with existing Illumina technology to improve disease detection.
The sequencing microscope - a path to look at the molecules of biology
This project aims to develop a novel technique that uses sequencing data to infer spatial information in tissues, enhancing our understanding of biological systems without advanced microscopy.