Top Banner
Martin Luther University Halle-Wittenberg Falk Schreiber From Big Data to Smart Knowledge Integrating Multimodal Biological Data and Modelling Metabolism 14/07/2014 1 Leibniz Institute IPK Gatersleben
56

Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

May 10, 2015

Download

Science

Modern data acquisition methods in the life sciences allow the procurement of different types of data in increasing quantity, facilitating a comprehensive view of biological systems. As data is usually gathered and interpreted by separate domain scientists, it is hard to grasp multi‐domain properties and structures. Consequently there is a need for the integration, analysis, modelling, simulation, and visualisation of life science data from different sources and of different types.
This talk focuses on these two aspects: firstly, methods for the integration and visualization of multimodal biological data are presented. This is achieved based on two graphs representing the meta‐relations between biological data, and the measurement combinations, respectively. Both graphs are linked and serve as different views of the integrated data with navigation and exploration possibilities. Data can be combined and visualised multifariously, resulting in views of the integrated biological data. Secondly, methods to reconstruct, simulate, and analyse detailed metabolic models are presented. We will focus on stoichiometric models, and see how different types of data are used to gather new insights into metabolic processes shown on an example of metabolism in plants.

First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Martin Luther University Halle-Wittenberg

Falk Schreiber

From Big Data to Smart Knowledge

Integrating Multimodal Biological Data and Modelling Metabolism

14/07/2014 1

Leibniz Institute IPK Gatersleben

Page 2: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Observations

1.  A tidal wave of scientific data

Page 3: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Observations

1.  A tidal wave of scientific data

Year Time Costs (Mio. US$) 2003 13 years 2700 2007 a few months 1 2009 a few weeks 0,05 2014 a few days 0,001 ~2017 cheaper to reproduce data than storing it

Page 4: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Observations

1.  A tidal wave of scientific data 2.  From building blocks to complex systems

genes transcripts proteins metabolites

redu

ctio

nist

app

roac

h

Page 5: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Observations

1.  A tidal wave of scientific data 2.  From building blocks to complex systems

redu

ctio

nist

app

roac

h integrative approach

genes transcripts proteins metabolites

Page 6: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Observations

1.  A tidal wave of scientific data 2.  From building blocks to complex systems 3.  Multi-domain data

Page 7: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Observations

1.  A tidal wave of scientific data 2.  From building blocks to complex systems 3.  Multi-domain data

Page 8: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

From Data to Knowledge – Outline of the Talk

Understanding metabolism

via modelling

Page 9: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

From Data to Knowledge – Outline of the Talk

Understanding metabolism

via modelling

Integrating and exploring multimodal biological data

Page 10: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

! Network of thousands of biochemical reactions ! Enzyme-catalysed ! Transporter-mediated

!  Supports all biological activity

! Metabolic model = List of reactions + associated information

Metabolism

Source: http://www.genome.jp/kegg/ Source: Michael 1993

Page 11: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

+ kinetic rate laws + kinetic

parameters

Topological analysis

network structure

Petri net (P/T) analysis

+ thermodynamics + stoichiometry

Flux balance analysis (FBA)

+ mass balance + capacity constraints

+ stochastic rate laws

+ metabolite concentrations

Kinetic modelling

Petri net (SPN) analysis

Metabolic Models S

ize

of m

odel

Leve

l of d

etai

l

Page 12: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

+ kinetic rate laws + kinetic

parameters

Topological analysis

network structure

Petri net (P/T) analysis

+ thermodynamics + stoichiometry

Flux balance analysis (FBA)

+ mass balance + capacity constraints

+ stochastic rate laws

+ metabolite concentrations

Kinetic modelling

Petri net (SPN) analysis

Metabolic Models S

ize

of m

odel

Leve

l of d

etai

l

Page 13: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Flux Balance Analysis

! Constraint-based stoichiometric modelling approach to predict and analyse the metabolic steady state conversion rates (fluxes)

!  Advantages ! No kinetic parameters required ! Quantitative predictions ! Applicable to large systems

!  Applications ! Prediction of optimal metabolic yields and flux distributions ! Prediction of phenotype/viability of knockout-mutants ! Prediction of pathway redundancies ! And more

Page 14: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Principles of Flux Balance Analysis

Page 15: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Simulation

Oxygene level

Page 16: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Objective Function

How to identify plausible physiological states?

Question Objective What are the biochemical production capabilities?

Maximise metabolite product

What is the maximal growth rate and biomass yield?

Maximise growth rate

How efficiently can metabolism channel metabolites through the network?

Minimise the Euclidean norm

What is the tradeoff between biomass production and metabolite overproduction?

Maximise biomass production for a given metabolite production

How energetically efficient can metabolism operate?

Minimise ATP production or minimise nutrient uptake

Page 17: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

History of FBA

Page 18: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Software Tools and Pipelines for FBA

! CellNetAnalyzer (CNA) http://www.mpi-magdeburg.mpg.de/projects/cna/cna.html

! COBRA Toolbox http://gcrg.ucsd.edu/downloads/COBRAToolbox

!  FBA-SimVis http://fbasimvis.ipk-gatersleben.de

!  Thiele et al. A protocol for generating a high-quality genome-

scale metabolic reconstruction. Nature Protocols, 5(1): 93–121, 2010.

! Grafahrend-Belau et al. Plant metabolic pathways: databases and pipeline for stoichiometric analysis. In Agrawal and Rakwal (Eds.), Seed development: omics technologies toward improvement of seed quality and crop yield, Springer, 345-366, 2012.

Page 19: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

FBA Model of seed Metabolism in Hordeum vulgare

Grafahrend-Belau et al. Plant Physiology, 2009

Page 20: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

FBA Model of seed Metabolism in Hordeum vulgare

Grafahrend-Belau et al. Plant Physiology, 2009

Size 257 reactions, 234 metabolites

Pathways Glyc, TCA, PPP, oxP, Ferm, Rubisco, AA, Starch, CW, and others

Page 21: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Example of Model Application

!  Imaging uncovers metabolic compartmentation !  Alanine synthesis mainly in central endosperm, alanine gradient

reflects the local oxygen state ! Modelling purpose: elucidate the role of alanine metabolism

Source of images: L. Borisjuk and H. Rolletschek, IPK

Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011

Page 22: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Simulation of Region-specific Metabolism

A B

Central endosperm (hypoxic) Peripheral endosperm (aerobic)

Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011

Page 23: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Simulation of Region-specific Metabolism

A B

Central endosperm (hypoxic) Peripheral endosperm (aerobic)

Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011

Page 24: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Obtaining Parameters

!  Influx ! Quantification from video

data

! Relation of substances in the same area ! Multimodal alignment Scharfe et al. BMC Bioinformatics, 2010 Fester et al. GCB, 2009

!  Biomass accumulation ! Quantification from

image series Hartmann et al. BMC Bioinformatics, 2011

Page 25: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Scaling up - Multi* and High Throughput Modelling

Page 26: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Coupling of Organ-specific FBA Models

Page 27: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Coupling of FBA and FSA Models

Müller et al. IEEE PMA, 2012 Grafahrend-Belau et al. Plant Physiology, 2013

Page 28: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

High Throughput Modelling

!  Path2Models: A pipeline to compute draft models !  >140.000 kinetic, logical and constraint-based models

Le Novère et al. BMC Systems Biology, 2013

Page 29: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

High Throughput Modelling

!  Path2Models: A pipeline to compute draft models !  >140.000 kinetic, logical and constraint-based models

Le Novère et al. BMC Systems Biology, 2013

Page 30: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

From Data to Knowledge – Outline of the Talk

Understanding metabolism

via modelling

Integrating and exploring multimodal biological data

Page 31: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Multi-domain Biological Data

Page 32: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Data Domains

Page 33: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Available Tools

Page 34: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Data Integration – A Major Problem (Example: Networks)

Page 35: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

!  Bridge the abyss!

Data Integration – A Major Problem (Example: Networks)

Page 36: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

! Many information resources can be utilized as IDMappers: ! Web services, web sites

(e.g. PICR, CRONOS, …) ! Relational databases

(e.g. STRING, PDD, …) ! Flat files

(e.g. Kegg, UniProt, …)

Overview: Mehlhorn et al. TransID – the flexible identifier mapping service 112-121 (Internat. Symp. Integrative Bioinformatics), 2013.

! Unified using the

BridgeDB framework

IDMappers

Page 37: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

! Comprises a set of identifiers (nodes) and a set of identifier mappings (edges)

! Used to explore identifier interconnections !  Basis of the integration of biological networks !  Example

The Data Linkage Graph

(Tair) (UniProt) (EC number)

Page 38: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

! Composed of biological networks and the inferred identifier mappings as mapping edges

! Mapping edges represent identifier connections in the data linkage graph

!  Example

The Integrated Graph

Data linkage graph Integrated graph

Page 39: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

! Metabolic pathways: Glycolysis, Pyruvate metabolism from KEGG

! Gene regulatory network: Arabidopsis thaliana from Regulogs

Example

Page 40: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Example

Page 41: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Available Tools

Page 42: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Data, Mappings and Mapping Function

!  Set of measurements ! Mappings with the object path functions which derives the

relevant metadata and any set of graph element attributes !  Basis: ID Mappers

𝑚

𝑚

𝑚

𝑚

Rohn et al. Bioinformatics, 2011

Page 43: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Example of Integrated Data http://www.vanted.org

!  The ABC(DE)-model of Arabidopsis thaliana floral organ specification

! Determination of floral organ identity depends on the combinatorial expression of floral homeotic genes from different classes

!  Integration of color-coded images, representing floral homeotic gene expression patterns, into the context of a regulatory network

Junker et al. Frontiers in Plant Science, 2012.

Page 44: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Standards for Modelling and Simulation in SysBio

Page 45: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Standards for Modelling and Simulation in SysBio

Page 46: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Can You Understand This?

Page 47: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Can You Understand This?

Stimulates? but ... what exactly?

Associates into?

Trans- locates?

Reciprocal stimulation?

Is degraded?

Stimulates gene Trans- cription?

Page 48: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Ambiguity in Conventional Representation

Page 49: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Standardised Symbols are Important

Most English speaking country

Quebec Iran China Israel

Singapore Norway Poland USA and Canada

Page 50: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

What is SBGN?

!  A way to unambiguously describe biochemical and cellular events in graphs

!  Limited amount of symbols (~30) à Smooth learning curve

! Can graphically represent quantitative models, biochemical pathways, at different levels of granularity

! Developed since 2006 by a interdisciplinary community, part of COMBINE

!  Three languages ! Process Descriptions à one state = one glyph ! Entity Relationships à one entity = one glyph ! Activity Flow à conceptual level

Page 51: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Graph Trinity: Three Languages in One http://sbgn.org

Process Description maps

Entity Relationships

maps

Activity Flow

maps

! Unambiguous ! Mechanistic !  Sequential ! Combinatorial

explosion

! Unambiguous ! Mechanistic ! Non-Sequential

!  Ambiguous ! Conceptual !  Sequential

Le Novère et al. Nature Biotechnology, 2009

Page 52: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Graph Trinity: Three Languages in One

Process Description

Entity Relationships

Activity Flow

Page 53: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Systems Biology Graphical Notation (SBGN)

Page 54: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Working with SBGN http://www.sbgn-ed.org

!  Verification Czauderna et al. Bioinformatics, 2010 !  Synthesis / bricks Junker et al. Trends in Biotechnology, 2012 !  Translation Czauderna et al. BMC Bioinformatics, 2013

!  Layout Schreiber et al. BMC Bioinformatics, 2009 Dwyer et al. IEEE Transactions Visualization & Computer Graphics, 2008

! Data integration Junker et al. Nature Protocols, 2012

Page 55: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Modelling, Visual Analytics, Standards, Network Analysis

Optimise

Predict

visualise, explore, integrate, analyse, model present, understand simulate, predict

Page 56: Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

Thank You

“We now have unprecedented ability to collect data about nature but there is now a crisis developing in biology, in that completely unstructured information does not enhance understanding. We need a framework to put all of this knowledge and data into - that is going to be the problem in biology. […] Driving toward that framework is really the big challenge.” Sydney Brenner