Understanding the Language of Life: Identifying and Characterizing the Language Units in Protein Sequences

This project aims to decipher the "language of life" by developing methods to identify protein vocabulary and functions, paving the way for advancements in health and disease treatment.

Subsidie
€ 1.982.800
2023

Projectdetails

Introduction

Proteins play a key role in biological processes that govern and maintain life. Although they are three-dimensional entities, they can be represented in textual form as sequences of amino acids that largely determine their structures and functions.

Analogy with Natural Languages

By analogy with natural (human) languages, we can consider proteins as written with a language, which we refer to in this proposal as the "language of life". Natural languages can be read and understood by humans. However, we cannot yet understand the language of life. We do not even know what the vocabulary is, i.e., what the basic language units are (analogous to words in human languages).

Current State of Research

Textual representation of proteins has enabled the application of natural language processing (NLP) techniques to the study of proteins, and breakthrough results have been achieved in various downstream tasks such as protein structure prediction. However, these efforts remain only at the "processing level" of the language of life.

Project Goals

The main goal of this project is to go beyond the level of language processing and open new research horizons for understanding the language of life. Using my expertise in NLP and bioinformatics, I will pursue the following objectives:

  1. Develop innovative methods to determine the language units (i.e., the vocabulary) of the language of life.
  2. Identify the characteristics of this language as well as its variability among species.
  3. Develop novel methods to identify and characterize the functions of the language units.

Future Implications

This research will lay the foundation for a new field of research, molecular language understanding, which aims to develop methods for understanding the messages encoded in molecular sequences. The ultimate goal of this project is to decipher the language of life, which will lead to groundbreaking consequences for understanding life and health, and will shed light on the development of novel prevention, diagnosis, and treatment strategies for diseases.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.982.800
Totale projectbegroting€ 1.982.800

Tijdlijn

Startdatum1-11-2023
Einddatum31-10-2028
Subsidiejaar2023

Partners & Locaties

Projectpartners

  • BOGAZICI UNIVERSITESIpenvoerder

Land(en)

Türkiye

Vergelijkbare projecten binnen European Research Council

ERC Consolid...

Proteome diversification in evolution

PROMISE aims to decode protein sequences and structures using AI to understand their interactions and evolution, ultimately transforming big data into actionable biological insights.

€ 1.952.762
ERC Starting...

An intelligent agent for general-purpose protein engineering

Develop an AI-driven system for efficient, user-defined protein engineering, enhancing sustainability and healthcare through continuous learning and explainable design.

€ 1.498.680
ERC Synergy ...

Mechanisms of co-translational assembly of multi-protein complexes

This project aims to uncover the mechanisms of co-translational protein complex assembly using advanced techniques to enhance understanding of protein biogenesis and its implications for health and disease.

€ 9.458.525
ERC Synergy ...

Dynamics of Protein–Ligand Interactions

The project aims to advance protein dynamics research by integrating time-resolved X-ray crystallography, NMR spectroscopy, and molecular simulations to elucidate molecular recognition processes at atomic resolution.

€ 8.721.625
ERC Starting...

Deciphering co-translational protein folding, assembly and quality control pathways, in health and disease

This project aims to elucidate co-translational protein folding and degradation mechanisms to understand misfolding diseases and improve therapeutic strategies.

€ 1.412.500

Vergelijkbare projecten uit andere regelingen

EIC Pathfinder

Computation driven development of novel vivo-like-DNA-nanotransducers for biomolecules structure identification

This project aims to develop DNA-nanotransducers for real-time detection and analysis of conformational changes in biomolecules, enhancing understanding of molecular dynamics and aiding drug discovery.

€ 3.000.418