Top Banner
Modeling and Associated Visualization Needs A Trilogy in Four Parts
41

Modeling and Associated Visualization Needs

Jan 15, 2016

Download

Documents

刘辰哲SPIKE

Modeling and Associated Visualization Needs. A Trilogy in Four Parts. The Acts: Not in Chronological Order. Overview of the G2P cyberinfrastructure Systems biology models (bottom up) Viz needs: Multivariate Dynamics, Inner Space, & Sensitivity Analysis Ecophysiological models (top down) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Modeling and Associated Visualization Needs

Modeling and Associated Visualization Needs

A Trilogy in Four Parts

Page 2: Modeling and Associated Visualization Needs

The Acts: Not in Chronological Order

• Overview of the G2P cyberinfrastructure• Systems biology models (bottom up)

– Viz needs: Multivariate Dynamics, Inner Space, & Sensitivity Analysis

• Ecophysiological models (top down)– Viz needs: The same, plus Outer Space

• Statistical models (non-mechanistic)– Viz needs: Help & fast!!

Page 3: Modeling and Associated Visualization Needs

Solving the G2P problem means developing a methodology…

…that lets one start with some species & trait that one knows very little about and end with the ability to quantitatively predict trait scores for target genotype/environment combinations.

ToolsIgnorance Prediction

Acquire data

Elicit hypothesesTesting

Build quantitative models

To work, such a methodology must be cyber-enabledTo work, such a methodology must be cyber-enabled

Page 4: Modeling and Associated Visualization Needs

User inferred

Seq data

Expression data

Metabolic data

Whole plant data

Environment data

Vis

ualiz

atio

n

DI

DI

DI

DI

DI

Experiment

Modeling and

Statistical Inference

Hyp

oth

esis

User inferred

Vis

ualiz

atio

n

Super-user Developer

Page 5: Modeling and Associated Visualization Needs

Systems Biology Models

Page 6: Modeling and Associated Visualization Needs

Modeling a single gene

M Amount of gene product at time t

Controlled by the amounts of upstream regulatory gene products

Some fraction of M degrades per unit time

Temperature

Change in amount Influx amount Efflux amountRate

unit time unit time unit time

Page 7: Modeling and Associated Visualization Needs

Linking multiple genes…

Promoter Region

Transcription Factor

“A” Gene CodonsRNAP DNA

DNA

PromoterRegion

RNAP“B” Gene Codons

Prot. Syn.

Transcription

Translation

Page 8: Modeling and Associated Visualization Needs

A “Bathtub” Model

Transcription Factors modulate reading

Temperature modulates all rates

Other Gene Products affect degradation

Page 9: Modeling and Associated Visualization Needs

What is a “product”? RNA’s: messenger (mRNA) & otherwise Some models do not distinguish mRNA & protein

(e.g., when time scales are long) Some models individually represent mRNA,

cytosolic protein, and nuclear protein Some models will separate products by tissue/organ

(e.g., leaves, phloem, meristem) Many models include metabolites & protein

complexes Basic equation is still the same (influx-eflux)

Page 10: Modeling and Associated Visualization Needs

Change in amount Influx amount Efflux amountRate

unit time unit time unit time

Input

Act

ivat

ion

Hill Function

0

1

Repressor

Enhancer

0

1

Repressor

Enhancer

Linear Constant Frac.

Michaelis-Menton

0

1

0 1 2 3

M

0

1

0 1 2 3

/(1 )M M

Mass Action

Etc.

dM

dt

Page 11: Modeling and Associated Visualization Needs

Temperature effects

Page 12: Modeling and Associated Visualization Needs

One form of temperature effect

(Abstracted from Ellgaard et al. 1999)

32.0 C

39.5 C

Nucleus

32.0 C

39.5 C

Nucleus

Folded protein packaged to go

Bad protein (unreleased)

Chaperone (folds/QC)

Endoplasmic reticulum

Page 13: Modeling and Associated Visualization Needs

Temperature effects

Input

Act

ivat

ion

Hill Function

0

1

Repressor

Enhancer

0

1

Repressor

Enhancer

Linear

Mass Action

dM

dt

Michaelis-Menton

0

1

0 1 2 3

M

0

1

0 1 2 3

/(1 )M M

Etc.

Constant Frac.

R

Page 14: Modeling and Associated Visualization Needs

A close up – the diurnal clock

Barak et al., 2000Locke et al., 2005 - 9 of 13 equations

?

Locke et al., 2005

Hill function Mass action Michaelis-MentonInflux - EffluxmRNA Translation Net transport into nucleusEnvironmental effect (light)

0t

1t

Page 15: Modeling and Associated Visualization Needs

(S.

Bra

dy)

Page 16: Modeling and Associated Visualization Needs

Photosynthesis biochemistry

Pla

nt

gro

wth

& m

etab

oli

sm

Soil conditions (water stress)

Root development

Flowering time prediction

Photosynthesis biochemistry

Pla

nt

gro

wth

& m

etab

oli

sm

Soil conditions (water stress)

Root development

Flowering time prediction

Page 17: Modeling and Associated Visualization Needs

Sensitivity Analysis & Sloppy Systems

LFY Gibberellinpathway

GI

CO

FPA

FVE

FCA

FLC Vernalizationpathway

FT SOC1

Autonomouspathway

Photoperiod pathway

Light

Temperature

Adapted from various literature

LH

YC

CA

1

TO

C1

AP1

Flowering

PHYB

CRY2

Nearly non-functional in the Landsberg erecta strain

Z. Dong, 2003.

C

C-

E

low

C-

B

B

D

C

CA

LFY Gibberellinpathway

GI

CO

FPA

FVE

FCA

FLC Vernalizationpathway

FT SOC1

Autonomouspathway

Photoperiod pathway

Light

Temperature

Adapted from various literature

LH

YC

CA

1

TO

C1

AP1

Flowering

PHYB

CRY2

Nearly non-functional in the Landsberg erecta strain

Z. Dong, 2003.

LFY Gibberellinpathway

GI

CO

FPA

FVE

FCA

FLC Vernalizationpathway

FT SOC1

Autonomouspathway

Photoperiod pathway

LightLight

Temperature

Adapted from various literature

LH

YC

CA

1

TO

C1

LH

YC

CA

1

TO

C1

AP1

Flowering

PHYB

CRY2

Nearly non-functional in the Landsberg erecta strain

Z. Dong, 2003.

CC

C-C-

EE

lowlow

C-C-

BB

BB

DD

CC

CCAA

Each letter is a power of two in sensitivity

Page 18: Modeling and Associated Visualization Needs

Stiff & Sloppy Directions

Parameter 1

Par

amet

er 2

All parameter combinations inside this ellipse yield essentially identical goodness-of-fit values

“Stiff” direction

“Sloppy” direction

Sloppy/Stiff ca. 1000

The “ellipses” may be “hyper-pancakes” with 15 to 30 sloppy directions. How can these be meaningfully visualized??

Optimum goodness-of-fit

Page 19: Modeling and Associated Visualization Needs

Sloppy directions in a clock model

0.000

0.004

0.008

0.012

0.016

0.020

0.024

0 24 48 72 96 120

Hours after sunset

Cy

tos

olic

'Y' p

rote

in

Locke et al

Simplified

GIGANTEA ?

71 parameters reduced to 46 parameters

Page 20: Modeling and Associated Visualization Needs

Ecophysiological Models…

• …come in three flavors– Environmental physics models (1945 to

present)– Crop simulation models (1965 to present)– Geochemical cycling models

• Blend the characteristics of both of the above• Are more recent

• …are now poised to contribute to the G2P problem via a top-down approach

Page 21: Modeling and Associated Visualization Needs

What is the focus of models in What is the focus of models in Environmental PhysicsEnvironmental Physics??

• Mimics conditions inside a uniform plant canopy;Mimics conditions inside a uniform plant canopy;

• The typical setting is an agricultural field;The typical setting is an agricultural field;– Includes plant-related, edaphic (soil), and meteorological inputs;Includes plant-related, edaphic (soil), and meteorological inputs;

• Based on physical principles;Based on physical principles;– Conservation of matter and energy; convection, conduction, convection;Conservation of matter and energy; convection, conduction, convection;–Some plant processes – gas exchange, photosynthesis, respirationSome plant processes – gas exchange, photosynthesis, respiration

• Plant structure consists of leaves, stems, roots;Plant structure consists of leaves, stems, roots;

• Time horizon typically a few days with time steps on the order Time horizon typically a few days with time steps on the order of minutes.of minutes.

•ErgoErgo plants often do not grow plants often do not grow

What is the focus of models in What is the focus of models in Environmental PhysicsEnvironmental Physics??

• Mimics conditions inside a uniform plant canopy;Mimics conditions inside a uniform plant canopy;

• The typical setting is an agricultural field;The typical setting is an agricultural field;– Includes plant-related, edaphic (soil), and meteorological inputs;Includes plant-related, edaphic (soil), and meteorological inputs;

• Based on physical principles;Based on physical principles;– Conservation of matter and energy; convection, conduction, convection;Conservation of matter and energy; convection, conduction, convection;–Some plant processes – gas exchange, photosynthesis, respirationSome plant processes – gas exchange, photosynthesis, respiration

• Plant structure consists of leaves, stems, roots;Plant structure consists of leaves, stems, roots;

• Time horizon typically a few days with time steps on the order Time horizon typically a few days with time steps on the order of minutes.of minutes.

•ErgoErgo plants often do not grow plants often do not grow

Page 22: Modeling and Associated Visualization Needs

Environmental Physics Models: 1945-75

• 1D or Bulk approach;1D or Bulk approach;

• Big Leaf / Big Root submodels;Big Leaf / Big Root submodels;

• Bucket soil submodels;Bucket soil submodels;

• Resistance analogs used for the Resistance analogs used for the atmospheric environment;atmospheric environment;

• Limited prediction of soil or Limited prediction of soil or canopy scalar variables;canopy scalar variables;

• Many empirical relationships;Many empirical relationships;

• Nebulous controlling variables Nebulous controlling variables ((e.g.e.g., canopy resistance to vapor , canopy resistance to vapor flux);flux);

• Poor plant/environment feedback.Poor plant/environment feedback.

• 1D or Bulk approach;1D or Bulk approach;

• Big Leaf / Big Root submodels;Big Leaf / Big Root submodels;

• Bucket soil submodels;Bucket soil submodels;

• Resistance analogs used for the Resistance analogs used for the atmospheric environment;atmospheric environment;

• Limited prediction of soil or Limited prediction of soil or canopy scalar variables;canopy scalar variables;

• Many empirical relationships;Many empirical relationships;

• Nebulous controlling variables Nebulous controlling variables ((e.g.e.g., canopy resistance to vapor , canopy resistance to vapor flux);flux);

• Poor plant/environment feedback.Poor plant/environment feedback.

Atmosphere

Bucketof

SoilBig

Ro

ot

Big Leaf

VPD

LEAF

W

AIRT

LEAFT

SOILT

Page 23: Modeling and Associated Visualization Needs

Environmental Physics Models: 1975-90

Canopy

Layers

Sunlit

Shade

Atmosphere

Layers

SoilLayers

Rooting

Profile

• Multi-layer atmosphere, soil, Multi-layer atmosphere, soil, and canopy;and canopy;

• ““Scaled leaf” approach within Scaled leaf” approach within canopy layers;canopy layers;

• Relationships between photo-Relationships between photo-synthesis, transpiration, and synthesis, transpiration, and biophysics (biophysics (e.g.e.g., stomatal , stomatal action);action);

• Use finite difference methods Use finite difference methods to compute soil heat, water, to compute soil heat, water, and gas flows;and gas flows;

• Incorporate root density Incorporate root density functions and soil physical functions and soil physical properties.properties.

TAIR , VPD, CO2 , wind speed profiles

TCANOPY , VPD, CO2 , canopy profiles

,W TSOIL , profiles

Page 24: Modeling and Associated Visualization Needs

What is a What is a Crop Growth ModelCrop Growth Model??

• Mimics one “average plant” at a field or smaller scale;Mimics one “average plant” at a field or smaller scale;

• The plant environment is an agricultural production setting;The plant environment is an agricultural production setting;– Includes cultural- and production-related I/O variables;Includes cultural- and production-related I/O variables;– Includes varietal, edaphic, and meteorological inputs;Includes varietal, edaphic, and meteorological inputs;

• Based on physiological processes;Based on physiological processes;– Photosynthesis, respiration, transpiration, nutrient uptake, carbon Photosynthesis, respiration, transpiration, nutrient uptake, carbon

partitioning, growth, and phenological development;partitioning, growth, and phenological development;

• Plant structure consists of leaves, stems, roots, & grain;Plant structure consists of leaves, stems, roots, & grain;

• Annual time horizon with daily or hourly time steps.Annual time horizon with daily or hourly time steps.

What is a What is a Crop Growth ModelCrop Growth Model??

• Mimics one “average plant” at a field or smaller scale;Mimics one “average plant” at a field or smaller scale;

• The plant environment is an agricultural production setting;The plant environment is an agricultural production setting;– Includes cultural- and production-related I/O variables;Includes cultural- and production-related I/O variables;– Includes varietal, edaphic, and meteorological inputs;Includes varietal, edaphic, and meteorological inputs;

• Based on physiological processes;Based on physiological processes;– Photosynthesis, respiration, transpiration, nutrient uptake, carbon Photosynthesis, respiration, transpiration, nutrient uptake, carbon

partitioning, growth, and phenological development;partitioning, growth, and phenological development;

• Plant structure consists of leaves, stems, roots, & grain;Plant structure consists of leaves, stems, roots, & grain;

• Annual time horizon with daily or hourly time steps.Annual time horizon with daily or hourly time steps.

Page 25: Modeling and Associated Visualization Needs

What is the current status of What is the current status of Crop Growth ModelsCrop Growth Models??

• Skillful models can account for Skillful models can account for ca.ca. 70% of yield variance; 70% of yield variance;

• Ongoing work focuses on refinement and applications;Ongoing work focuses on refinement and applications;– Problems being researched include methods for estimating cultivar Problems being researched include methods for estimating cultivar and soil characteristics on an operational scale;and soil characteristics on an operational scale;

• Model structures and approaches have matured;Model structures and approaches have matured;

• Recent physical theory may not be emphasized;Recent physical theory may not be emphasized;

• Physical theory does not seem to improve predictions.Physical theory does not seem to improve predictions.

Interestingly, incorporating crop growth model components into Interestingly, incorporating crop growth model components into physical models does not guarantee improved predictability physical models does not guarantee improved predictability

either, even though physical scientists recognize knowledge of either, even though physical scientists recognize knowledge of the plant as limitingthe plant as limiting..

What is the current status of What is the current status of Crop Growth ModelsCrop Growth Models??

• Skillful models can account for Skillful models can account for ca.ca. 70% of yield variance; 70% of yield variance;

• Ongoing work focuses on refinement and applications;Ongoing work focuses on refinement and applications;– Problems being researched include methods for estimating cultivar Problems being researched include methods for estimating cultivar and soil characteristics on an operational scale;and soil characteristics on an operational scale;

• Model structures and approaches have matured;Model structures and approaches have matured;

• Recent physical theory may not be emphasized;Recent physical theory may not be emphasized;

• Physical theory does not seem to improve predictions.Physical theory does not seem to improve predictions.

Interestingly, incorporating crop growth model components into Interestingly, incorporating crop growth model components into physical models does not guarantee improved predictability physical models does not guarantee improved predictability

either, even though physical scientists recognize knowledge of either, even though physical scientists recognize knowledge of the plant as limitingthe plant as limiting..

Page 26: Modeling and Associated Visualization Needs

Special case Special case Geochemical cycling modelsGeochemical cycling models

•Used to model “ecosystem services” and/or “land surface Used to model “ecosystem services” and/or “land surface processes” inside general circulation models processes” inside general circulation models

•Blend of both kinds of models;Blend of both kinds of models;

• Includes plant-related, edaphic, and meteorological inputs;Includes plant-related, edaphic, and meteorological inputs;

•Based on physical principlesBased on physical principles– Conservation of matter and energy; convection, conduction, convection;Conservation of matter and energy; convection, conduction, convection;–Some plant processes – gas exchange, photosynthesis, respirationSome plant processes – gas exchange, photosynthesis, respiration

• Plant structure consists of leaves, stems, roots;Plant structure consists of leaves, stems, roots;

• Time horizon of years with time steps on the order of minutes Time horizon of years with time steps on the order of minutes (depends on spatial scale).(depends on spatial scale).

Page 27: Modeling and Associated Visualization Needs

• Neither current crop growth models nor Neither current crop growth models nor environmental physics models adequately environmental physics models adequately depict plant process control mechanisms;depict plant process control mechanisms;

• This accounts for the failure of models to This accounts for the failure of models to mimic the plasticity of real plants across mimic the plasticity of real plants across different environments;different environments;

• The information needed to remedy this The information needed to remedy this situation is emerging from the genomic situation is emerging from the genomic sciences;sciences;

• Incorporating this information requires a Incorporating this information requires a reorganization of crop modelsreorganization of crop models

Main pointsMain points -- --

Page 28: Modeling and Associated Visualization Needs

New Crop Growth Model ConceptNew Crop Growth Model Concept

Energy Water N

Physical SubmodelPhysical Submodel[CPA

I]

[KE6

0]

Sensors

, ,T

,T

Control Submodel

Page 29: Modeling and Associated Visualization Needs

Viz needs for ecophysiological models and G2P components

• Largely the same as for systems biology models – multivariate dynamics in spatially discrete plant parts

• Note that our “G2P solution” specifies predicting trait scores in non-constant environments.– That most directly refers to the outdoors– Therefore geographic variation must also be

considered

Page 30: Modeling and Associated Visualization Needs

A hazy shade of winter…

• One frame of a movie comparing the standard deviation of flowering time for the Columbia strain of A. thaliana germinating on each day.

• Projected by the gene-based model of Wilczek et al, 2009.

• The standard deviation is over five years (left, 2004-2009, real data; right, 2094-2099, A1B climate scenario.)

Page 31: Modeling and Associated Visualization Needs

Statistical genetic methods I

• Can be used to – Predict phenotypes based on genotypes– Locate regions of the genome likely to contain genes

controlling particular phenotypes

• Can be used when– Knowledge of gene mechanisms is lacking

• Big Caveat– The mathematical form of the G2P relationship is just

assumed to be linear– … and the data & models elaborated until the job

gets done to adequate accuracy

Page 32: Modeling and Associated Visualization Needs

Statistical genetic methods II

• Why does it work?– Because there are sufficient regimes of near

linearity buried in mechanistic network eq’ns that general linear statistical models have levels of predictive skill useful for some purposes (e.g. crop breeding)

– Rest assured that there are limits to what should be expected of these models

• How does it work?

Page 33: Modeling and Associated Visualization Needs

What are genetic markers?

Position within gene

Aligned DNA sequences of 25 different genetic lines

Single nucleotide polymorphism (SNP)

(Data from the Purugganan Lab)

Page 34: Modeling and Associated Visualization Needs

Different sibling lines will have different marker combinations

The DNA sequence for line 1 has the same sequence as parent “B” at the location of marker “g17286”…

…but in line 8 the DNA matches parent “A” at that location

Page 35: Modeling and Associated Visualization Needs

Many different linear models

1 001 2 1 11 150000 last_marker, , ,m m ve m T G a mPheno X X X

Genome Wide Association

Finding quantitative trait loci (QTL)

other terms, , ,m i m i kj m k m jPheno X X X

1 Awhere if marker is from parent in line

0 B,m nX n m

Find markers i, j, and k such that

is a good fit

etc….

Page 36: Modeling and Associated Visualization Needs

What a QTL analysis output looks like. This is a “1d-scan” – i.e. Xm,j

(Buckler et al, Science, 2009)

Page 37: Modeling and Associated Visualization Needs

Two Stat Inf Viz Problems

• Higher order scans e.g.– Remember SNP numbers can be in the 150K

to 3M range.

• eQTL viz problems – Can be 30K phenotypes…– …and higher order scans

, , , , ,k j l m k m j m lX X X

Page 38: Modeling and Associated Visualization Needs

eQTL Analysis – Looking for Regulators

Promoter Region

Transcription Factor

“A” Gene CodonsRNAP DNA

DNA

PromoterRegion

RNAP“B” Gene Codons

Prot. Syn.

Let “Pheno” be the amount of mRNA (expression) produced by gene “B”. This could be different in lines that varied either in the promoter of “B” or in lines that had differences in the coding region of gene “A”. These are called “cis” and “trans” effects, respectively.

Page 39: Modeling and Associated Visualization Needs

Massive eQTL Variation75% of all genes have at least 1 eQTL

I II III IV VChromosome

Po

siti

on

of

eQT

L f

or

each

of

15,7

71 g

enes

Arr

ang

ed b

y P

hys

ical

Ord

er Bay +

Bay -

QTLEffect

Cis Diagonal

Trans Hotspot

(D.

Klie

benst

ein

)

Page 40: Modeling and Associated Visualization Needs

eQTL Viz Problems…

How to plot interaction effects?

That is Xm,jXm,k

and a gazillion phenotypes

Page 41: Modeling and Associated Visualization Needs

Questions?

Virtual soybean simulations from Han et al. 2007