Modeling and Associated Visualization Needs
Post on 15-Jan-2016
22 Views
Preview:
DESCRIPTION
Transcript
Modeling and Associated Visualization Needs
A Trilogy in Four Parts
The Acts: Not in Chronological Order
• Overview of the G2P cyberinfrastructure• Systems biology models (bottom up)
– Viz needs: Multivariate Dynamics, Inner Space, & Sensitivity Analysis
• Ecophysiological models (top down)– Viz needs: The same, plus Outer Space
• Statistical models (non-mechanistic)– Viz needs: Help & fast!!
Solving the G2P problem means developing a methodology…
…that lets one start with some species & trait that one knows very little about and end with the ability to quantitatively predict trait scores for target genotype/environment combinations.
ToolsIgnorance Prediction
Acquire data
Elicit hypothesesTesting
Build quantitative models
To work, such a methodology must be cyber-enabledTo work, such a methodology must be cyber-enabled
User inferred
Seq data
Expression data
Metabolic data
Whole plant data
Environment data
Vis
ualiz
atio
n
DI
DI
DI
DI
DI
Experiment
Modeling and
Statistical Inference
Hyp
oth
esis
User inferred
Vis
ualiz
atio
n
Super-user Developer
Systems Biology Models
Modeling a single gene
M Amount of gene product at time t
Controlled by the amounts of upstream regulatory gene products
Some fraction of M degrades per unit time
Temperature
Change in amount Influx amount Efflux amountRate
unit time unit time unit time
Linking multiple genes…
Promoter Region
Transcription Factor
“A” Gene CodonsRNAP DNA
DNA
PromoterRegion
RNAP“B” Gene Codons
Prot. Syn.
Transcription
Translation
A “Bathtub” Model
Transcription Factors modulate reading
Temperature modulates all rates
Other Gene Products affect degradation
What is a “product”? RNA’s: messenger (mRNA) & otherwise Some models do not distinguish mRNA & protein
(e.g., when time scales are long) Some models individually represent mRNA,
cytosolic protein, and nuclear protein Some models will separate products by tissue/organ
(e.g., leaves, phloem, meristem) Many models include metabolites & protein
complexes Basic equation is still the same (influx-eflux)
Change in amount Influx amount Efflux amountRate
unit time unit time unit time
Input
Act
ivat
ion
Hill Function
0
1
Repressor
Enhancer
0
1
Repressor
Enhancer
Linear Constant Frac.
Michaelis-Menton
0
1
0 1 2 3
M
0
1
0 1 2 3
/(1 )M M
Mass Action
Etc.
dM
dt
Temperature effects
One form of temperature effect
(Abstracted from Ellgaard et al. 1999)
32.0 C
39.5 C
Nucleus
32.0 C
39.5 C
Nucleus
Folded protein packaged to go
Bad protein (unreleased)
Chaperone (folds/QC)
Endoplasmic reticulum
Temperature effects
Input
Act
ivat
ion
Hill Function
0
1
Repressor
Enhancer
0
1
Repressor
Enhancer
Linear
Mass Action
dM
dt
Michaelis-Menton
0
1
0 1 2 3
M
0
1
0 1 2 3
/(1 )M M
Etc.
Constant Frac.
R
A close up – the diurnal clock
Barak et al., 2000Locke et al., 2005 - 9 of 13 equations
?
Locke et al., 2005
Hill function Mass action Michaelis-MentonInflux - EffluxmRNA Translation Net transport into nucleusEnvironmental effect (light)
0t
1t
(S.
Bra
dy)
Photosynthesis biochemistry
Pla
nt
gro
wth
& m
etab
oli
sm
Soil conditions (water stress)
Root development
Flowering time prediction
Photosynthesis biochemistry
Pla
nt
gro
wth
& m
etab
oli
sm
Soil conditions (water stress)
Root development
Flowering time prediction
Sensitivity Analysis & Sloppy Systems
LFY Gibberellinpathway
GI
CO
FPA
FVE
FCA
FLC Vernalizationpathway
FT SOC1
Autonomouspathway
Photoperiod pathway
Light
Temperature
Adapted from various literature
LH
YC
CA
1
TO
C1
AP1
Flowering
PHYB
CRY2
Nearly non-functional in the Landsberg erecta strain
Z. Dong, 2003.
C
C-
E
low
C-
B
B
D
C
CA
LFY Gibberellinpathway
GI
CO
FPA
FVE
FCA
FLC Vernalizationpathway
FT SOC1
Autonomouspathway
Photoperiod pathway
Light
Temperature
Adapted from various literature
LH
YC
CA
1
TO
C1
AP1
Flowering
PHYB
CRY2
Nearly non-functional in the Landsberg erecta strain
Z. Dong, 2003.
LFY Gibberellinpathway
GI
CO
FPA
FVE
FCA
FLC Vernalizationpathway
FT SOC1
Autonomouspathway
Photoperiod pathway
LightLight
Temperature
Adapted from various literature
LH
YC
CA
1
TO
C1
LH
YC
CA
1
TO
C1
AP1
Flowering
PHYB
CRY2
Nearly non-functional in the Landsberg erecta strain
Z. Dong, 2003.
CC
C-C-
EE
lowlow
C-C-
BB
BB
DD
CC
CCAA
Each letter is a power of two in sensitivity
Stiff & Sloppy Directions
Parameter 1
Par
amet
er 2
All parameter combinations inside this ellipse yield essentially identical goodness-of-fit values
“Stiff” direction
“Sloppy” direction
Sloppy/Stiff ca. 1000
The “ellipses” may be “hyper-pancakes” with 15 to 30 sloppy directions. How can these be meaningfully visualized??
Optimum goodness-of-fit
Sloppy directions in a clock model
0.000
0.004
0.008
0.012
0.016
0.020
0.024
0 24 48 72 96 120
Hours after sunset
Cy
tos
olic
'Y' p
rote
in
Locke et al
Simplified
GIGANTEA ?
71 parameters reduced to 46 parameters
Ecophysiological Models…
• …come in three flavors– Environmental physics models (1945 to
present)– Crop simulation models (1965 to present)– Geochemical cycling models
• Blend the characteristics of both of the above• Are more recent
• …are now poised to contribute to the G2P problem via a top-down approach
What is the focus of models in What is the focus of models in Environmental PhysicsEnvironmental Physics??
• Mimics conditions inside a uniform plant canopy;Mimics conditions inside a uniform plant canopy;
• The typical setting is an agricultural field;The typical setting is an agricultural field;– Includes plant-related, edaphic (soil), and meteorological inputs;Includes plant-related, edaphic (soil), and meteorological inputs;
• Based on physical principles;Based on physical principles;– Conservation of matter and energy; convection, conduction, convection;Conservation of matter and energy; convection, conduction, convection;–Some plant processes – gas exchange, photosynthesis, respirationSome plant processes – gas exchange, photosynthesis, respiration
• Plant structure consists of leaves, stems, roots;Plant structure consists of leaves, stems, roots;
• Time horizon typically a few days with time steps on the order Time horizon typically a few days with time steps on the order of minutes.of minutes.
•ErgoErgo plants often do not grow plants often do not grow
What is the focus of models in What is the focus of models in Environmental PhysicsEnvironmental Physics??
• Mimics conditions inside a uniform plant canopy;Mimics conditions inside a uniform plant canopy;
• The typical setting is an agricultural field;The typical setting is an agricultural field;– Includes plant-related, edaphic (soil), and meteorological inputs;Includes plant-related, edaphic (soil), and meteorological inputs;
• Based on physical principles;Based on physical principles;– Conservation of matter and energy; convection, conduction, convection;Conservation of matter and energy; convection, conduction, convection;–Some plant processes – gas exchange, photosynthesis, respirationSome plant processes – gas exchange, photosynthesis, respiration
• Plant structure consists of leaves, stems, roots;Plant structure consists of leaves, stems, roots;
• Time horizon typically a few days with time steps on the order Time horizon typically a few days with time steps on the order of minutes.of minutes.
•ErgoErgo plants often do not grow plants often do not grow
Environmental Physics Models: 1945-75
• 1D or Bulk approach;1D or Bulk approach;
• Big Leaf / Big Root submodels;Big Leaf / Big Root submodels;
• Bucket soil submodels;Bucket soil submodels;
• Resistance analogs used for the Resistance analogs used for the atmospheric environment;atmospheric environment;
• Limited prediction of soil or Limited prediction of soil or canopy scalar variables;canopy scalar variables;
• Many empirical relationships;Many empirical relationships;
• Nebulous controlling variables Nebulous controlling variables ((e.g.e.g., canopy resistance to vapor , canopy resistance to vapor flux);flux);
• Poor plant/environment feedback.Poor plant/environment feedback.
• 1D or Bulk approach;1D or Bulk approach;
• Big Leaf / Big Root submodels;Big Leaf / Big Root submodels;
• Bucket soil submodels;Bucket soil submodels;
• Resistance analogs used for the Resistance analogs used for the atmospheric environment;atmospheric environment;
• Limited prediction of soil or Limited prediction of soil or canopy scalar variables;canopy scalar variables;
• Many empirical relationships;Many empirical relationships;
• Nebulous controlling variables Nebulous controlling variables ((e.g.e.g., canopy resistance to vapor , canopy resistance to vapor flux);flux);
• Poor plant/environment feedback.Poor plant/environment feedback.
Atmosphere
Bucketof
SoilBig
Ro
ot
Big Leaf
VPD
LEAF
W
AIRT
LEAFT
SOILT
Environmental Physics Models: 1975-90
Canopy
Layers
Sunlit
Shade
Atmosphere
Layers
SoilLayers
Rooting
Profile
• Multi-layer atmosphere, soil, Multi-layer atmosphere, soil, and canopy;and canopy;
• ““Scaled leaf” approach within Scaled leaf” approach within canopy layers;canopy layers;
• Relationships between photo-Relationships between photo-synthesis, transpiration, and synthesis, transpiration, and biophysics (biophysics (e.g.e.g., stomatal , stomatal action);action);
• Use finite difference methods Use finite difference methods to compute soil heat, water, to compute soil heat, water, and gas flows;and gas flows;
• Incorporate root density Incorporate root density functions and soil physical functions and soil physical properties.properties.
TAIR , VPD, CO2 , wind speed profiles
TCANOPY , VPD, CO2 , canopy profiles
,W TSOIL , profiles
What is a What is a Crop Growth ModelCrop Growth Model??
• Mimics one “average plant” at a field or smaller scale;Mimics one “average plant” at a field or smaller scale;
• The plant environment is an agricultural production setting;The plant environment is an agricultural production setting;– Includes cultural- and production-related I/O variables;Includes cultural- and production-related I/O variables;– Includes varietal, edaphic, and meteorological inputs;Includes varietal, edaphic, and meteorological inputs;
• Based on physiological processes;Based on physiological processes;– Photosynthesis, respiration, transpiration, nutrient uptake, carbon Photosynthesis, respiration, transpiration, nutrient uptake, carbon
partitioning, growth, and phenological development;partitioning, growth, and phenological development;
• Plant structure consists of leaves, stems, roots, & grain;Plant structure consists of leaves, stems, roots, & grain;
• Annual time horizon with daily or hourly time steps.Annual time horizon with daily or hourly time steps.
What is a What is a Crop Growth ModelCrop Growth Model??
• Mimics one “average plant” at a field or smaller scale;Mimics one “average plant” at a field or smaller scale;
• The plant environment is an agricultural production setting;The plant environment is an agricultural production setting;– Includes cultural- and production-related I/O variables;Includes cultural- and production-related I/O variables;– Includes varietal, edaphic, and meteorological inputs;Includes varietal, edaphic, and meteorological inputs;
• Based on physiological processes;Based on physiological processes;– Photosynthesis, respiration, transpiration, nutrient uptake, carbon Photosynthesis, respiration, transpiration, nutrient uptake, carbon
partitioning, growth, and phenological development;partitioning, growth, and phenological development;
• Plant structure consists of leaves, stems, roots, & grain;Plant structure consists of leaves, stems, roots, & grain;
• Annual time horizon with daily or hourly time steps.Annual time horizon with daily or hourly time steps.
What is the current status of What is the current status of Crop Growth ModelsCrop Growth Models??
• Skillful models can account for Skillful models can account for ca.ca. 70% of yield variance; 70% of yield variance;
• Ongoing work focuses on refinement and applications;Ongoing work focuses on refinement and applications;– Problems being researched include methods for estimating cultivar Problems being researched include methods for estimating cultivar and soil characteristics on an operational scale;and soil characteristics on an operational scale;
• Model structures and approaches have matured;Model structures and approaches have matured;
• Recent physical theory may not be emphasized;Recent physical theory may not be emphasized;
• Physical theory does not seem to improve predictions.Physical theory does not seem to improve predictions.
Interestingly, incorporating crop growth model components into Interestingly, incorporating crop growth model components into physical models does not guarantee improved predictability physical models does not guarantee improved predictability
either, even though physical scientists recognize knowledge of either, even though physical scientists recognize knowledge of the plant as limitingthe plant as limiting..
What is the current status of What is the current status of Crop Growth ModelsCrop Growth Models??
• Skillful models can account for Skillful models can account for ca.ca. 70% of yield variance; 70% of yield variance;
• Ongoing work focuses on refinement and applications;Ongoing work focuses on refinement and applications;– Problems being researched include methods for estimating cultivar Problems being researched include methods for estimating cultivar and soil characteristics on an operational scale;and soil characteristics on an operational scale;
• Model structures and approaches have matured;Model structures and approaches have matured;
• Recent physical theory may not be emphasized;Recent physical theory may not be emphasized;
• Physical theory does not seem to improve predictions.Physical theory does not seem to improve predictions.
Interestingly, incorporating crop growth model components into Interestingly, incorporating crop growth model components into physical models does not guarantee improved predictability physical models does not guarantee improved predictability
either, even though physical scientists recognize knowledge of either, even though physical scientists recognize knowledge of the plant as limitingthe plant as limiting..
Special case Special case Geochemical cycling modelsGeochemical cycling models
•Used to model “ecosystem services” and/or “land surface Used to model “ecosystem services” and/or “land surface processes” inside general circulation models processes” inside general circulation models
•Blend of both kinds of models;Blend of both kinds of models;
• Includes plant-related, edaphic, and meteorological inputs;Includes plant-related, edaphic, and meteorological inputs;
•Based on physical principlesBased on physical principles– Conservation of matter and energy; convection, conduction, convection;Conservation of matter and energy; convection, conduction, convection;–Some plant processes – gas exchange, photosynthesis, respirationSome plant processes – gas exchange, photosynthesis, respiration
• Plant structure consists of leaves, stems, roots;Plant structure consists of leaves, stems, roots;
• Time horizon of years with time steps on the order of minutes Time horizon of years with time steps on the order of minutes (depends on spatial scale).(depends on spatial scale).
• Neither current crop growth models nor Neither current crop growth models nor environmental physics models adequately environmental physics models adequately depict plant process control mechanisms;depict plant process control mechanisms;
• This accounts for the failure of models to This accounts for the failure of models to mimic the plasticity of real plants across mimic the plasticity of real plants across different environments;different environments;
• The information needed to remedy this The information needed to remedy this situation is emerging from the genomic situation is emerging from the genomic sciences;sciences;
• Incorporating this information requires a Incorporating this information requires a reorganization of crop modelsreorganization of crop models
Main pointsMain points -- --
New Crop Growth Model ConceptNew Crop Growth Model Concept
Energy Water N
Physical SubmodelPhysical Submodel[CPA
I]
[KE6
0]
Sensors
, ,T
,T
Control Submodel
Viz needs for ecophysiological models and G2P components
• Largely the same as for systems biology models – multivariate dynamics in spatially discrete plant parts
• Note that our “G2P solution” specifies predicting trait scores in non-constant environments.– That most directly refers to the outdoors– Therefore geographic variation must also be
considered
A hazy shade of winter…
• One frame of a movie comparing the standard deviation of flowering time for the Columbia strain of A. thaliana germinating on each day.
• Projected by the gene-based model of Wilczek et al, 2009.
• The standard deviation is over five years (left, 2004-2009, real data; right, 2094-2099, A1B climate scenario.)
Statistical genetic methods I
• Can be used to – Predict phenotypes based on genotypes– Locate regions of the genome likely to contain genes
controlling particular phenotypes
• Can be used when– Knowledge of gene mechanisms is lacking
• Big Caveat– The mathematical form of the G2P relationship is just
assumed to be linear– … and the data & models elaborated until the job
gets done to adequate accuracy
Statistical genetic methods II
• Why does it work?– Because there are sufficient regimes of near
linearity buried in mechanistic network eq’ns that general linear statistical models have levels of predictive skill useful for some purposes (e.g. crop breeding)
– Rest assured that there are limits to what should be expected of these models
• How does it work?
What are genetic markers?
Position within gene
Aligned DNA sequences of 25 different genetic lines
Single nucleotide polymorphism (SNP)
(Data from the Purugganan Lab)
Different sibling lines will have different marker combinations
The DNA sequence for line 1 has the same sequence as parent “B” at the location of marker “g17286”…
…but in line 8 the DNA matches parent “A” at that location
Many different linear models
1 001 2 1 11 150000 last_marker, , ,m m ve m T G a mPheno X X X
Genome Wide Association
Finding quantitative trait loci (QTL)
other terms, , ,m i m i kj m k m jPheno X X X
1 Awhere if marker is from parent in line
0 B,m nX n m
Find markers i, j, and k such that
is a good fit
etc….
What a QTL analysis output looks like. This is a “1d-scan” – i.e. Xm,j
(Buckler et al, Science, 2009)
Two Stat Inf Viz Problems
• Higher order scans e.g.– Remember SNP numbers can be in the 150K
to 3M range.
• eQTL viz problems – Can be 30K phenotypes…– …and higher order scans
, , , , ,k j l m k m j m lX X X
eQTL Analysis – Looking for Regulators
Promoter Region
Transcription Factor
“A” Gene CodonsRNAP DNA
DNA
PromoterRegion
RNAP“B” Gene Codons
Prot. Syn.
Let “Pheno” be the amount of mRNA (expression) produced by gene “B”. This could be different in lines that varied either in the promoter of “B” or in lines that had differences in the coding region of gene “A”. These are called “cis” and “trans” effects, respectively.
Massive eQTL Variation75% of all genes have at least 1 eQTL
I II III IV VChromosome
Po
siti
on
of
eQT
L f
or
each
of
15,7
71 g
enes
Arr
ang
ed b
y P
hys
ical
Ord
er Bay +
Bay -
QTLEffect
Cis Diagonal
Trans Hotspot
(D.
Klie
benst
ein
)
eQTL Viz Problems…
How to plot interaction effects?
That is Xm,jXm,k
and a gazillion phenotypes
Questions?
Virtual soybean simulations from Han et al. 2007
top related