Over-parameterisation and model reduction
Post on 13-Jan-2016
29 Views
Preview:
DESCRIPTION
Transcript
Over-parameterisation and model reduction
Neil Crout
Environmental Science
School of Biosciences
University of Nottingham
Co-workers
Davide Tarsitano Glen Cox James Gibbons Andy Wood Jim Craigon Steve Ramsden Yan Jiao Tim Reid
with thanks to BBSRC and Leverhulme
Motivation
Increase the complexity of the model to improve agreement with observations
Can continue this until ALL the variation in the data is accounted for– Including the noise!
Models can become overparameterised
Over-parameterisation
Judgements have to be made about the level of complexity in a model (parsimony)
In a statistical context methods for model selection exist
Key feature - comparing models Key feature - comparing models
Over-parameterisation
So what!
Our models are not statistical
Mechanistic No fitting involved Predicting lots of different things (multi-variate) Have to account for lots of different cases Did we mention they’re mechanistic
– Newton– Thermodynamics– etc all on our side
Overparameterisation and mechanistic models? Mechanistically inspired – but empirically implemented So normal statistical rules of business apply Model parameters are often not formally fitted
– worse they may be ‘tuned’– might be measured (but model structure may not be consistent with
real value)
Often (maybe always) models are implicitly fitted– comparisons are made with the real world – model changes– development iterates
But normal rules of business apply – greater complexity may not equal greater realism
Overparameterisation and mechanistic models?
Mechanistic models have the potential for over-parameterisation
This is a source of uncertainty
Perhaps of equal importance Mechanistic models are cited as tools to test
understanding For this we do need to know whether a model’s
formulation really can be supported by observations
Judgements have to be made about the level of complexity in a model
Mostly this is subjective – supported by tools such as sensitivity analysis etc– sometimes by consideration of alternative model formulations– but this is generally difficult….
Want some statistical support for model selection But that requires a set of alternative models to compare
Model Reduction – Why?
Model Reduction – Why? Want to assess whether a model has the most
appropriate level of detail We can get some idea on this by comparing
models of the same system which have different levels of detail
But, we don’t have lots of different models So we consider reducing the model we have to
create alternative model formulations which can be compared
Comparison is restricted – reduced models are drawn from the same source– but hopefully better than no comparison at all
Some illustrations
Example: radioacaesium uptake
‘Absalom’ radiocaesium uptake model
NH4
% clay % C K+ pH
θclay θhumus Kx soil M_camgCEChumus
CECclay Kx humus(2)
mK(1)Kdclay(1)
Kdhumus
RIPclay(2)
CF(2)KdlKdr
D factor Cssol
Cssoil
Csplant
Model Reduction – How?
Start with a ‘full’ model of a system Reduce it (systematically, automatically) Producing a ‘set’ of alternative model formulations
for the same system Reduction?
– Inter-connected nature of typical models mean can’t simply leave things out
– We replace variables with a constant– e.g. the mean or median that the variable attains in the
full model
Reduction by variable replacement
Suppose the full model has a variable Vi
Which is a function of the model variables, inputs, and parameters
Arrange so that the model has
Where is either 0 (no replacement) or 1 (replaced) Systematically reducing the model then involves setting
the values of the vector
)( αI,V,ii fV
)()1( αI,V,iiiii fRV
Search the model replacement space
Search the combination of possible models (i.e. varying )– exhaustive if computationally feasible– stochastic search (e.g. MCMC) can be more efficient– discrete space
Calculate a posterior probability for each model configuration
This is based on comparison with observations– So this is a data-led approach– Data has to be representative for analysis to be
representative– Certainly need to be aware of the data’s limitations
10 candidate variables – 210 combinations
NH4
% clay % C K+ pH
θclay θhumus Kx soil M_camgCEChumus
CECclay Kx humus(2)
mK(1)Kdclay(1)
Kdhumus
RIPclay(2)
CF(2)KdlKdr
D factor Cssol
Cssoil
Csplant
Posterior probability for top 10 models
0.00001
0.0001
0.001
0.01
0.1
1
1 2 3 4 5 6 7 8 9 10
Model Rank
Mar
gin
al P
ost
erio
r P
rob
abil
ity
Full Model
NH4
% clay % C K+ pH
θclay θhumus Kx soil M_camgCEChumus
CECclay Kx humus(2)
mK(1)Kdclay(1)
Kdhumus
RIPclay(2)
CF(2)KdlKdr
D factor Cssol
Cssoil
Csplant
Full model (as published)
NH4
% clay % C K+ pH
θclay θhumus Kx soil M_camgCEChumus
CECclay Kx humus(2)
mK(1)Kdclay(1)
Kdhumus
RIPclay(2)
CF(2)KdlKdr
D factor Cssol
Cssoil
Csplant
this one works better…
Better?
Reduced uncertainty, better predicting models– Tested with independent data
Requiring fewer inputs– Specifically soil pH– Model is used spatially, so this is a good thing
operationally
Greater confidence that the model is not over-parameterised….?
Diagnostic to guide/support model development
Methane Emissions from Wetlands
Methane emission from wetlands
Top ‘performing’ models
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 2 3 4 5 6 7 8 9 10Model Rank
Ma
rgin
al P
rop
os
al M
od
el
Pro
ba
bili
ty
Full Model
Mechanistic Interpretation
Model diagnostics – present results as probability of replacing specific variables
Summing over the model combinations explored– seasonal transfer of plant matter to soil organic matter
(100%) – reducing methane oxidation from mechalis-menten to
first order (39%)– temperature dependency of methane oxidation (15%)– seasonal dependency of plant growth (99%).
C & N dynamics in the arctic
N15 applied to field plots
3 years subsequent measurement
used to parameterise a model based on MBL-GEM
C & N dynamics in the arctic
V e g C D etritu s C
C O 2
V e g N D etritu s N
In o rg a n icN itro g en
U pE ffo rt N
U p Effo rt C
A
C AR B O N
N ITR O G E N
AC C L IM ATIO N
L i tte r C
L i t te r N
U ptal e N /R e l e as e N M i n N /Im m o b N
Input N L e ac hi ng N
U ptal e C /Ve g R e s p M i c r o R e s p
Q P T/Q V R T Q M R T
Q N R T/Q V U T
Q M R T
14 candidate variables (16k combinations)
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
1
177
353
529
705
881
1057
1233
1409
1585
1761
1937
2113
2289
2465
2641
2817
2993
3169
3345
3521
3697
3873
4049
Model ID
Mar
gin
al P
ost
erio
r P
rob
abili
ty
Full Model
Ensemble prediction over set of models
-200
-150
-100
-50
0
50
100
0 10 20 30 40 50
Time (years)
Sy
ste
m N
et
C B
ala
nc
e (
g m
-2)
Sirius – Wheat Crop Growth Model
Soil moisture
Q
Qfc
Qwp
exw[1]
aw[1]
uw[1]
AW[1]
Water balance
Si leafarea
Percolation
WP (Rain +irrigation)
si
Waterbalance.Si AW[1..30]
exw[1..30]
SOIL.EXW[1..30]
aw[1..30]
Nex[1..30]
x
Equilibrium
dn
Nuw[1..30] Naw[1..30]
aw[1..30] uw[1..30]exw[1..30]
Nex[1..30]
Soil evapEVAPO.evsoil
evsoilaw[1..5] exw[1..5]
GRAIN
GrainDemand
DeadLeafNPoolN GrainN
AvN
SoilN
CropN
UptakeN
leafarea
LeafNc
ExLeafNcExStemNcStemNc
MinGrainDemand
GrainDM
TTOUTBiomass BIOANTH
DemandBiomassBGF
GrainNoverGrainDm
AreaANTH
N’Pulses
Qr
Norganic
NdpNi
NfNm Ts Ta
x[1..8]
Naw[1]
SumP
uw[1..8]
aw[1..8]
Naw[1..8] Nuw[1..8]
QQwp Qfc
Rain+Irr
FertTmean
Q
Wateruptake
aw[1..30]
x
WP(Pottrans)
EVAPO.pottrans
exw[1..30]
RZexw
TransP
SF
Rootlen
si
CropN
AvNstembiomass
Nex[1..30]
StemNanforP
Naw[1..30]leafarea
biomass
minstemDemandUptakeN
CropN
leafarea
StemNc
LeafNc
leafarea
StemExNLeafdemand
MaxStemdemand
SF
Rootlen
Dry matter
PAR
rad
BiomassTau
drfacgrowth
RUEEarWt
SFAnthesis
Therm
Lastleaf
PHENO
Grainend
Lastleaf
WINTER
CROPUP
Anthesis
Grainstart
Therm
IPHASETTOUT
BIOANTH
Biomass
BiomassBGF
AreaANTH
Reduc
FleafNo
VernalisationPotlfno
PrimordNo
Amnlfno
Vprog
Roots
TTROOT
Rootlength
TTOUT
Anthesis
CANOPY
SHOUT
TTOPT
CanTemp
LAIStageAreamax
TTFIX
Areaopt
Deadleaves
TTCAN
Maxgaklir
GAKLIR
Leafarea
TTOUT
Soilmax
SoilMin
TempMinTempMax
Leafnumber
DrFacLAITTOUT
SF
Reduc
EVAPOSOIL.AW[1..30]
PTSOIL
AVWATER
EVSOIL
SLOSLECUM
PTAYRN
Def
WATCON
TAU
tadj
ENAVEVAP
PENMAN
HSLOP(Tmean)
aw[1..30
TCminTCMax
Hsoil
Soilmax
Alpha
exw[1..30]
PotTransleafarea
Transp
HcropCONDUC
TmeanTemp Max
Temp MinSoilmin
TDeepsoil
Rad
Rootlen
DrFacgrowth
DrFacLai
SFInputs
Temp Max
Temp Max
Rain
Radiation
TTOUTTDeepsoil
VARIETYSoil
Results
Top 75% of the distribution/ensemble
0
1
2
3
4
5
6
7
1 1001 2001 3001 4001 5001
Model Rank
PM
Px1
04
1139 1052
40 other variables<<50%
Interpretation....
Crop N physiology
50%
Vernalisation99% N-mineralisation
Temp/moisture adjustment~99%
Canopy Temperature
96%Diurnal adjustment
of thermal time99%
Nitrogen Leaching
50%
Other applications…
Applying the same/similar approach to other models, e.g.
Marine Ecology Models (carbon cycling) JULES – land surface scheme for GCM FARM-ADAPT – very large farm management
model– Linear programming based model– V. large number of variables (10 000s)– More challenging
Some conclusions
Whether a model’s formulation is appropriate is uncertain
Model reduction provides a way to test (challenge) model formulation (perhaps a brutal test)
May give us some insurance against over-parameterisation (reduce uncertainty)
Does test the understanding built into our models All models investigated so far can be reduced by
variable replacement Most reduced models have some advantage
Useful references? NMJ Crout, D Tarsitano, AT Wood. Is my model too complex? Evaluating model
formulation using model reduction. Environmental Modelling & Software, in press. JM Gibbons, GM Cox, ATA Wood, J Craigon, SJ Ramsden, D Tarsitano, NMJ Crout
(2008). Applying Bayesian Model Averaging to mechanistic models: an example and comparison of methods. Environmental Modelling & Software, 23:973-985.
Cox GM, Gibbons JM, Wood ATA, Craigon J, Ramsden SJ, Crout NMJ (2006). Towards the systematic simplification of mechanistic models. Ecological Modelling 198:240-246.
Bernhardt, K. 2008. Finding alternatives and reduced formulations for process based models. Evolutionary Computation 16:1-16. Alternative approach.
Asgharbeygi, N., Langley, P., Bay, S., Arrigo, 2006. Inductive revision of quantitative process models. Ecol. Mod., 194:70-79. Alternative approach to model reduction, albeit without a statistical framework
Brooks RJ, Tobias AM (1996) Choosing the best model: level of detail, complexity, & model... Mathl. Comput. Mod. 24:1-14. Interesting article on model utility
Anderson, T.R., 2005. Plankton functional type modelling: running before we can walk? J. Plankton Research 27:1073-1081. Nice discussion of model complexity in context of marine systems. Quite controversial, replies to the replies to the replies.
Jakeman, A.J, Letcher, R.A., Norton, J.P., 2006 Ten iterative steps in development and evaluation of environmental models. Environmental Modelling & Software 21, 602-614. Very nice discussion on what is good practice in model development
Myung, J., Pitt, M. A., 2002. When a good fit can be bad. Trends. Cogn. Sci., 6:421-425. Good introduction to model selection criteria
Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretical Approach. 2nd ed. New York: Springer-Verlag. Ultimate reference, but not without critics.
top related