Graphs without Labels: Multimodal Structure Learning without Human Supervision

The project aims to enhance multimodal learning by using graph-based representations to capture semantic structures and relationships in diverse data, improving data efficiency and fairness in label-free learning.

Subsidie
€ 1.499.438
2024

Projectdetails

Introduction

Multimodal learning focuses on training models with data in more than one modality, such as videos capturing visual and audio information or documents containing image and text. Current approaches use such data to train large-scale deep learning models without human supervision by sampling pair-wise data, e.g., an image-text pair from a website, and train the network to identify matching vs. not matching pairs to learn better representations.

Argument for Multimodal Learning

We argue that multimodal learning can do more: by combining information from different sources, multimodal models capture cross-modal semantic entities. As most multimodal documents are a collection of connected modalities and topics, multimodal models should allow us to capture the inherent high-level topology of such data.

Project Goals

The goal of the following project is to learn semantic structures from multimodal data to capture long-range concepts and relations in multimodal data via multimodal and self-supervision learning without human annotation.

  1. We will represent this information in the form of a graph, considering latent semantic concepts as nodes and their connectivity as edges.
  2. Based on this structure, we will extend current unimodal approaches to capture and process data from different modalities in a single structure.

Challenges and Opportunities

Finally, we will explore the challenges and opportunities of the proposed idea with respect to their impact on two main challenges in machine learning:

  • Data-efficient learning
  • Fairness in label-free learning.

Bridging Trends

By bridging the gap between those two parallel trends, multimodal supervision and graph-based representations, we combine their strengths of generating and processing topological data. This will not only allow us to build new applications and tools but also open new ways of processing and understanding large-scale data that are currently out of reach.

Financiële details & Tijdlijn

Financiële details

Subsidiebedrag€ 1.499.438
Totale projectbegroting€ 1.499.438

Tijdlijn

Startdatum1-4-2024
Einddatum31-3-2029
Subsidiejaar2024

Partners & Locaties

Projectpartners

  • EBERHARD KARLS UNIVERSITAET TUEBINGENpenvoerder
  • RHEINISCHE FRIEDRICH-WILHELMS-UNIVERSITAT BONN

Land(en)

Germany

Vergelijkbare projecten binnen European Research Council

ERC Starting...

Discovering and Analyzing Visual Structures

This project aims to assist experts in pattern analysis within unannotated images by developing interpretable visual structures, enhancing discovery in historical documents and Earth imagery.

€ 1.493.498
ERC Consolid...

Universal Geometric Transfer Learning

Develop a universal framework for transfer learning in geometric 3D data to enhance analysis across tasks with minimal supervision and improve generalization in diverse applications.

€ 1.999.490
ERC Starting...

Omni-Supervised Learning for Dynamic Scene Understanding

This project aims to enhance dynamic scene understanding in autonomous vehicles by developing innovative machine learning models and methods for open-world object recognition from unlabeled video data.

€ 1.500.000
ERC Consolid...

Reinventing Multiterminal Coding for Intelligent Machines

IONIAN aims to revolutionize cooperative perception in intelligent machines by developing a multiterminal coding paradigm that enhances data compression and communication for safer autonomous navigation.

€ 1.999.403
ERC Starting...

Reinventing the Theory of Machine Learning on Graphs

Project MALAGA aims to establish a foundational theory for Graph Machine Learning to enhance the performance and reliability of Graph Neural Networks across diverse domains like biology and social sciences.

€ 1.479.643