Scalable Graph Algorithms for Bioinformatics using Structure, Parameterization and Dynamic Updates

This project aims to develop scalable exact graph algorithms for processing sequencing data, enhancing accuracy in RNA transcript discovery and genomic database indexing.

Subsidie
€ 1.999.868
2025

Projectdetails

Introduction

Sequencing technologies have developed to be cheap and accurate, leading to major breakthroughs, such as the complete sequence of a human genome, the creation of nationwide population gene banks, or the discovery of novel viruses. As the amount of data produced grows exponentially and their applications become more broad and complex, the community needs accurate computational methods that scale.

Algorithmic Challenges

At the core of many algorithmic methods for processing sequencing data is the basic primitive of finding a set of paths or walks in graphs of various nature. Under different formulations and objective functions, the resulting problems can be:

  1. NP-hard (e.g. flow decompositions)
  2. Polynomial-time (e.g. path covers)

These problems are impractical on large graphs. Thus, many practical tools prefer fast heuristics to exact algorithms. While these may be optimized for specific inputs, they may not be reliable or accurate in general, which is a highly relevant issue in, for example, medical and life-science research.

Project Objectives

This project will develop general methods to massively scale such exact graph algorithms. The approach will include:

  1. Novel Graph Structures:

    • Safety structures, e.g. sets of paths that can be quickly found to appear in all optimal solutions and thus simplify the problem.
    • Variation structures that limit the hardness of a problem only to graph areas rich in genetic variation.
  2. Modern Algorithmic Techniques:

    • Parameterizing polynomial algorithms to run in time linear in the graph size and superlinear only in a small parameter.
    • Dynamic algorithms that, as the input grows, update solutions based only on the new data.

Applications

We will apply these methods in two high-impact applications:

  1. Long-read RNA transcript discovery.
  2. Indexing massive and rapidly growing genomic databases.

Conclusion

This project paves the way for exact graph algorithms usable independently of the problem complexity or of the input size, applicable to real-world problems.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.999.868
Totale projectbegroting€ 1.999.868

Tijdlijn

Startdatum1-9-2025
Einddatum31-8-2030
Subsidiejaar2025

Partners & Locaties

Projectpartners

  • HELSINGIN YLIOPISTOpenvoerder

Land(en)

Finland

Vergelijkbare projecten binnen European Research Council

ERC STG

MANUNKIND: Determinants and Dynamics of Collaborative Exploitation

This project aims to develop a game theoretic framework to analyze the psychological and strategic dynamics of collaborative exploitation, informing policies to combat modern slavery.

€ 1.497.749
ERC STG

Elucidating the phenotypic convergence of proliferation reduction under growth-induced pressure

The UnderPressure project aims to investigate how mechanical constraints from 3D crowding affect cell proliferation and signaling in various organisms, with potential applications in reducing cancer chemoresistance.

€ 1.498.280
ERC STG

Uncovering the mechanisms of action of an antiviral bacterium

This project aims to uncover the mechanisms behind Wolbachia's antiviral protection in insects and develop tools for studying symbiont gene function.

€ 1.500.000
ERC STG

The Ethics of Loneliness and Sociability

This project aims to develop a normative theory of loneliness by analyzing ethical responsibilities of individuals and societies to prevent and alleviate loneliness, establishing a new philosophical sub-field.

€ 1.025.860

Vergelijkbare projecten uit andere regelingen

ERC STG

Towards a New Theory of Optimal Dynamic Graph Algorithms

This project aims to systematically study and establish intrinsically optimal dynamic graph algorithms with constant update times, potentially revolutionizing the field and enhancing real-world applications.

€ 1.400.000
EIC Pathfinder

Processing-in-memory architectures and programming libraries for bioinformatics algorithms

This project aims to enhance genomics research by developing energy-efficient, cost-effective edge computing solutions using processing-in-memory technologies for high-throughput sequencing data analysis.

€ 1.966.665
ERC STG

Provable Scalability for high-dimensional Bayesian Learning

This project develops a mathematical theory for scalable Bayesian learning methods, integrating computational and statistical insights to enhance algorithm efficiency and applicability in high-dimensional models.

€ 1.488.673
ERC STG

Integrated Structural and Probabilistic Approaches for Biological and Epidemiological Systems

INSPIRE aims to develop a framework integrating structural, robust, and probabilistic methods to analyze and control uncertain biological and epidemiological systems for improved prediction and intervention.

€ 1.500.000