Over-parameterisation and model reduction

Neil Crout

Environmental Science

School of Biosciences

University of Nottingham

Co-workers

Davide Tarsitano Glen Cox James Gibbons Andy Wood Jim Craigon Steve Ramsden Yan Jiao Tim Reid

with thanks to BBSRC and Leverhulme

Motivation

Increase the complexity of the model to improve agreement with observations

Can continue this until ALL the variation in the data is accounted for– Including the noise!

Models can become overparameterised

Over-parameterisation

Judgements have to be made about the level of complexity in a model (parsimony)

In a statistical context methods for model selection exist

Key feature - comparing models Key feature - comparing models

Over-parameterisation

So what!

Our models are not statistical

Mechanistic No fitting involved Predicting lots of different things (multi-variate) Have to account for lots of different cases Did we mention they’re mechanistic

– Newton– Thermodynamics– etc all on our side

Overparameterisation and mechanistic models? Mechanistically inspired – but empirically implemented So normal statistical rules of business apply Model parameters are often not formally fitted

– worse they may be ‘tuned’– might be measured (but model structure may not be consistent with

real value)

Often (maybe always) models are implicitly fitted– comparisons are made with the real world – model changes– development iterates

But normal rules of business apply – greater complexity may not equal greater realism

Overparameterisation and mechanistic models?

Mechanistic models have the potential for over-parameterisation

This is a source of uncertainty

Perhaps of equal importance Mechanistic models are cited as tools to test

understanding For this we do need to know whether a model’s

formulation really can be supported by observations

Judgements have to be made about the level of complexity in a model

Mostly this is subjective – supported by tools such as sensitivity analysis etc– sometimes by consideration of alternative model formulations– but this is generally difficult….

Want some statistical support for model selection But that requires a set of alternative models to compare

Model Reduction – Why?

Model Reduction – Why? Want to assess whether a model has the most

appropriate level of detail We can get some idea on this by comparing

models of the same system which have different levels of detail

But, we don’t have lots of different models So we consider reducing the model we have to

create alternative model formulations which can be compared

Comparison is restricted – reduced models are drawn from the same source– but hopefully better than no comparison at all

Some illustrations

Example: radioacaesium uptake

‘Absalom’ radiocaesium uptake model

% clay % C K+ pH

θclay θhumus Kx soil M_camgCEChumus

CECclay Kx humus(2)

mK(1)Kdclay(1)

Kdhumus

RIPclay(2)

CF(2)KdlKdr

D factor Cssol

Cssoil

Csplant

Model Reduction – How?

Start with a ‘full’ model of a system Reduce it (systematically, automatically) Producing a ‘set’ of alternative model formulations

for the same system Reduction?

– Inter-connected nature of typical models mean can’t simply leave things out

– We replace variables with a constant– e.g. the mean or median that the variable attains in the

full model

Reduction by variable replacement

Suppose the full model has a variable Vi

Which is a function of the model variables, inputs, and parameters

Arrange so that the model has

Where is either 0 (no replacement) or 1 (replaced) Systematically reducing the model then involves setting

the values of the vector

)( αI,V,ii fV

)()1( αI,V,iiiii fRV

Search the model replacement space

Search the combination of possible models (i.e. varying )– exhaustive if computationally feasible– stochastic search (e.g. MCMC) can be more efficient– discrete space

Calculate a posterior probability for each model configuration

This is based on comparison with observations– So this is a data-led approach– Data has to be representative for analysis to be

representative– Certainly need to be aware of the data’s limitations

10 candidate variables – 210 combinations

% clay % C K+ pH

CECclay Kx humus(2)

mK(1)Kdclay(1)

Kdhumus

RIPclay(2)

CF(2)KdlKdr

D factor Cssol

Cssoil

Csplant

Posterior probability for top 10 models

0.00001

0.0001

1 2 3 4 5 6 7 8 9 10

Model Rank

Full Model

% clay % C K+ pH

CECclay Kx humus(2)

mK(1)Kdclay(1)

Kdhumus

RIPclay(2)

CF(2)KdlKdr

D factor Cssol

Cssoil

Csplant

Full model (as published)

% clay % C K+ pH

CECclay Kx humus(2)

mK(1)Kdclay(1)

Kdhumus

RIPclay(2)

CF(2)KdlKdr

D factor Cssol

Cssoil

Csplant

this one works better…

Better?

Reduced uncertainty, better predicting models– Tested with independent data

Requiring fewer inputs– Specifically soil pH– Model is used spatially, so this is a good thing

operationally

Greater confidence that the model is not over-parameterised….?

Diagnostic to guide/support model development

Methane Emissions from Wetlands

Methane emission from wetlands

Top ‘performing’ models

1 2 3 4 5 6 7 8 9 10Model Rank

Full Model

Mechanistic Interpretation

Model diagnostics – present results as probability of replacing specific variables

Summing over the model combinations explored– seasonal transfer of plant matter to soil organic matter

(100%) – reducing methane oxidation from mechalis-menten to

first order (39%)– temperature dependency of methane oxidation (15%)– seasonal dependency of plant growth (99%).

C & N dynamics in the arctic

N15 applied to field plots

3 years subsequent measurement

used to parameterise a model based on MBL-GEM

C & N dynamics in the arctic

V e g C D etritu s C

V e g N D etritu s N

In o rg a n icN itro g en

U pE ffo rt N

U p Effo rt C

C AR B O N

N ITR O G E N

AC C L IM ATIO N

L i tte r C

L i t te r N

U ptal e N /R e l e as e N M i n N /Im m o b N

Input N L e ac hi ng N

U ptal e C /Ve g R e s p M i c r o R e s p

Q P T/Q V R T Q M R T

Q N R T/Q V U T

Q M R T

14 candidate variables (16k combinations)

0.000001

0.00001

0.0001

Model ID

Full Model

Ensemble prediction over set of models

0 10 20 30 40 50

Time (years)

Sirius – Wheat Crop Growth Model

Soil moisture

exw[1]

Water balance

Si leafarea

Percolation

WP (Rain +irrigation)

Waterbalance.Si AW[1..30]

exw[1..30]

SOIL.EXW[1..30]

aw[1..30]

Nex[1..30]

Equilibrium

Nuw[1..30] Naw[1..30]

aw[1..30] uw[1..30]exw[1..30]

Nex[1..30]

Soil evapEVAPO.evsoil

evsoilaw[1..5] exw[1..5]

GrainDemand

DeadLeafNPoolN GrainN

UptakeN

leafarea

LeafNc

ExLeafNcExStemNcStemNc

MinGrainDemand

GrainDM

TTOUTBiomass BIOANTH

DemandBiomassBGF

GrainNoverGrainDm

AreaANTH

N’Pulses

Norganic

NfNm Ts Ta

x[1..8]

Naw[1]

uw[1..8]

aw[1..8]

Naw[1..8] Nuw[1..8]

QQwp Qfc

Rain+Irr

FertTmean

Wateruptake

aw[1..30]

WP(Pottrans)

EVAPO.pottrans

exw[1..30]

TransP

Rootlen

AvNstembiomass

Nex[1..30]

StemNanforP

Naw[1..30]leafarea

biomass

minstemDemandUptakeN

leafarea

StemNc

LeafNc

leafarea

StemExNLeafdemand

MaxStemdemand

Rootlen

Dry matter

BiomassTau

drfacgrowth

RUEEarWt

SFAnthesis

Lastleaf

Grainend

Lastleaf

WINTER

CROPUP

Anthesis

Grainstart

IPHASETTOUT

BIOANTH

Biomass

BiomassBGF

AreaANTH

FleafNo

VernalisationPotlfno

PrimordNo

Amnlfno

TTROOT

Rootlength

Anthesis

CANOPY

CanTemp

LAIStageAreamax

Areaopt

Deadleaves

Maxgaklir

GAKLIR

Leafarea

Soilmax

SoilMin

TempMinTempMax

Leafnumber

DrFacLAITTOUT

EVAPOSOIL.AW[1..30]

PTSOIL

AVWATER

EVSOIL

SLOSLECUM

PTAYRN

WATCON

ENAVEVAP

PENMAN

HSLOP(Tmean)

aw[1..30

TCminTCMax

Soilmax

exw[1..30]

PotTransleafarea

Transp

HcropCONDUC

TmeanTemp Max

Temp MinSoilmin

TDeepsoil

Rootlen

DrFacgrowth

DrFacLai

SFInputs

Temp Max

Radiation

TTOUTTDeepsoil

VARIETYSoil

Results

Top 75% of the distribution/ensemble

1 1001 2001 3001 4001 5001

Model Rank

1139 1052

40 other variables<<50%

Interpretation....

Crop N physiology

Vernalisation99% N-mineralisation

Temp/moisture adjustment~99%

Canopy Temperature

96%Diurnal adjustment

of thermal time99%

Nitrogen Leaching

Other applications…

Applying the same/similar approach to other models, e.g.

Marine Ecology Models (carbon cycling) JULES – land surface scheme for GCM FARM-ADAPT – very large farm management

model– Linear programming based model– V. large number of variables (10 000s)– More challenging

Some conclusions

Whether a model’s formulation is appropriate is uncertain

Model reduction provides a way to test (challenge) model formulation (perhaps a brutal test)

May give us some insurance against over-parameterisation (reduce uncertainty)

Does test the understanding built into our models All models investigated so far can be reduced by

variable replacement Most reduced models have some advantage

Useful references? NMJ Crout, D Tarsitano, AT Wood. Is my model too complex? Evaluating model

formulation using model reduction. Environmental Modelling & Software, in press. JM Gibbons, GM Cox, ATA Wood, J Craigon, SJ Ramsden, D Tarsitano, NMJ Crout

(2008). Applying Bayesian Model Averaging to mechanistic models: an example and comparison of methods. Environmental Modelling & Software, 23:973-985.

Cox GM, Gibbons JM, Wood ATA, Craigon J, Ramsden SJ, Crout NMJ (2006). Towards the systematic simplification of mechanistic models. Ecological Modelling 198:240-246.

Bernhardt, K. 2008. Finding alternatives and reduced formulations for process based models. Evolutionary Computation 16:1-16. Alternative approach.

Asgharbeygi, N., Langley, P., Bay, S., Arrigo, 2006. Inductive revision of quantitative process models. Ecol. Mod., 194:70-79. Alternative approach to model reduction, albeit without a statistical framework

Brooks RJ, Tobias AM (1996) Choosing the best model: level of detail, complexity, & model... Mathl. Comput. Mod. 24:1-14. Interesting article on model utility

Anderson, T.R., 2005. Plankton functional type modelling: running before we can walk? J. Plankton Research 27:1073-1081. Nice discussion of model complexity in context of marine systems. Quite controversial, replies to the replies to the replies.

Jakeman, A.J, Letcher, R.A., Norton, J.P., 2006 Ten iterative steps in development and evaluation of environmental models. Environmental Modelling & Software 21, 602-614. Very nice discussion on what is good practice in model development

Myung, J., Pitt, M. A., 2002. When a good fit can be bad. Trends. Cogn. Sci., 6:421-425. Good introduction to model selection criteria

Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretical Approach. 2nd ed. New York: Springer-Verlag. Ultimate reference, but not without critics.

Over-parameterisation and model reduction

model parsimonyin

model selectionbut

model structure

models formulation

set of alternative models

restricted reduced models

level of complexity

greater complexity

Documents

GEODETIC PARAMETERISATION OF THE CNGS PROJECT -...

NEWSLETTER Improved Processes and Parameterisation for...

Parameterisation of sediment geochemistry for simulating...

Parameterisation of the chemical effect of sprites in the...

(Old and) Novel concepts in land surface parameterisation...

Diffraction geometry parameterisation and refinement

Photocatalytic reduction of carbon dioxide over ...

Short Time Vibration Analysis and Parameterisation as a ...

FLake – A Lake Parameterisation Scheme for Numerical...

Tire · parameterisation · optimisation · Parameter...

A Geometric Comparison ofAerofoil Shape Parameterisation ...

Selective Catalytic Reduction of NO x over Alumina...

iPKO biznes - ADMINISTRATOR MANUALbiznes system, The first.....

Improved Processes and Parameterisation for Prediction in...

Aerosol water parameterisation: a single parameter...

REDUCTION OF CHANGE OVER TIME APPLYING SMED …