Top Banner
Adrian E. Raftery J. McLean Sloughter Tilmann Gneiting University of Washington Statistics JEFS Calibration: Bayesian Model Averaging Eric P. Grimit Clifford F. Mass Jeff Baars University of Washington Atmospheric Sciences Research supported by: Office of Naval Research Multi-Disciplinary University Research Initiative (MURI)
30

JEFS Calibration: Bayesian Model Averaging

Jan 12, 2016

Download

Documents

dougal

JEFS Calibration: Bayesian Model Averaging. Eric P. Grimit Clifford F. Mass Jeff Baars University of Washington Atmospheric Sciences. Adrian E. Raftery J. McLean Sloughter Tilmann Gneiting University of Washington Statistics. Research supported by: Office of Naval Research - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: JEFS Calibration: Bayesian Model Averaging

Adrian E. Raftery

J. McLean Sloughter

Tilmann Gneiting

University of Washington

Statistics

JEFS Calibration:Bayesian Model Averaging

Eric P. Grimit

Clifford F. Mass

Jeff Baars

University of Washington

Atmospheric Sciences

Research supported by:Office of Naval Research

Multi-Disciplinary University Research Initiative (MURI)

Page 2: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

The General Goal

“The general goal in EF [ensemble forecasting] is to produce a probability density function (PDF) for the future state of the atmosphere that is reliable…and sharp…”

-- Plan for the Joint Ensemble Forecast System (2nd Draft),

Maj. F. Anthony Eckel

Page 3: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

Calibration and Sharpness

Calibration ~ reliability (also: statistical consistency)A probability forecast p, ought to verify with relative frequency p.

The verification ought to be indistinguishable from the forecast ensemble (the verification rank histogram* is uniform).

However, a forecast from climatology is reliable (by definition), so calibration alone is not enough.

Sharpness ~ resolution (also: discrimination, skill)The variance, or confidence interval, should be as small as possible, subject to calibration.

*Verification Rank Histogram

Record of where verification fell (i.e., its rank) among the ordered ensemble members:

Flat Well-calibrated (truth is indistinguishable from ensemble members)

U-shaped Under-dispersive (truth falls outside the ensemble range too often)

Humped Over-dispersive

Page 4: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

5.0 %

4.2 %

1 2 3 4 5 6 7 8 936h *ACMEcore0.0

0.1

0.2

0.3

0.4

1 2 3 4 5 6 7 8 936h *ACMEcore0.0

0.1

0.2

0.3

0.4

36h *ACMEcore

36h *ACMEcore+

1 2 3 4 5 6 7 8 936h *ACMEcore0.0

0.1

0.2

0.3

0.4

36h *ACMEcore

36h *ACMEcore+

1 2 3 4 5 6 7 8 936h *ACMEcore0.0

0.1

0.2

0.3

0.4

Pro

bab

ilit

y

Verification Rank

(d) T2

(c) WS10

(b) MSLP

1 2 3 4 5 6 7 8 936h ACMEcore0.0

0.1

0.2

0.3

0.4

1 2 3 4 5 6 7 8 936h ACMEcore0.0

0.1

0.2

0.3

0.4

Verification Rank

(a) Z500

1 2 3 4 5 6 7 8 936h ACMEcore0.0

0.1

0.2

0.3

0.4

Verification Rank

1 2 3 4 5 6 7 8 936h ACMEcore0.0

0.1

0.2

0.3

0.4

*UWME*UWME+

EOP*

9.0 %

6.7 %

25.6 %

13.3 %

43.7 %

21.0 %

Surface/Mesoscale Variable

( Errors Depend on Model Uncertainty )

SynopticVariable

( Errors Depend on Analysis Uncertainty )

*UWME*UWME+

*UWME*UWME+

*UWME*UWME+

Typical Verification Rank Histograms

*Excessive Outlier Percentage

[c.f. Eckel and Mass 2005, Wea. Forecasting]

Page 5: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

Objective and Constraints

Objective: Calibrate JEFS (JGE and JME) output.

Utilize available analyses/observations as surrogates for truth.

Employ a method thataccounts for ensemble member construction and relative skill.

Bred-mode / ETKF initial conditions (JGE; equally skillful members)

Multiple models (JGE and JME; differing skill for sets of members)

Multi-scheme diversity within a single model (JME)

is adaptive.Can be rapidly relocated to any theatre of interest.

Does not require a long history of forecasts and observations.

accommodates regional/local variations within the domain.Spatial (grid point) dependence of forecast error statistics.

works for any observed variable at any vertical level.

Page 6: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

First Step: Mean Bias Correction

Calibrate the first moment the ensemble mean.

In a multi-model and/or multi-scheme physics ensemble, individual members have unique, often compensatory, systematic errors (biases).

Systematic errors do not represent forecast uncertainty.

Implemented a member-specific bias correction for UWME using a 14-day training period (running mean).

Advantages and disadvantages:Ensemble spread is reduced (in an under-dispersive system).

The ensemble spread-skill relationship is degraded.(Grimit 2004, Ph.D. dissertation)

Forecast probability skill scores improve.

Excessive outliers are reduced.

Verification rank histograms become quasi-symmetric.

Page 7: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

Second Step: Calibration

Calibrate the higher moments the ensemble variance.

Forecast error climatologyAdd the error variance from a long history of forecasts and observations to the current (deterministic) forecast.

For the ensemble mean, we shall call this forecast mean error climatology (MEC).

MEC is time-invariant (a static forecast of uncertainty; a climatology).

MEC is calibrated for large samples, but not very sharp.

Advantages and disadvantages:Simple. Difficult to beat!

Gaussian.

Not practical for JGE/JME implementation, since a long history is required.

A good baseline for comparison of calibration methods.

Page 8: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

FIT MEC

Mean Error Climatology (MEC) Performance

CRPS = continuous ranked probability score

[Probabilistic analog of the mean absolute error (MAE) for scoring deterministic forecasts]

Comparison of *UWME 48-h 2-m temperature forecasts:Member-specific mean bias correction applied to both [14-day running mean]

FIT = Gaussian fit to the raw forecast ensembleMEC = Gaussian fit to the ensemble-mean + the mean error climatology

[00 UTC Cycle; October 2002 – March 2004; 361 cases]

Page 9: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

Bayesian Model Averaging (BMA)

BMA has several advantages over MEC:A time-varying uncertainty forecast.

A way to keep multi-modality, if it is warranted.

Maximizes information from short (2-4 week) training periods.

Allows for different relative skill between members through the BMA weights (multi-model, multi-scheme physics).

Bayesian Model Averaging (BMA) Summary

Member-specific mean-bias correction parametersMember-specific BMA weightsBMA variance(not-member specific here, but can be)

[c.f. Raftery et al. 2005, Mon. Wea. Rev.]

Page 10: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

MEC

BMA Performance Using Analyses

BMA

BMA initially implemented using training data from the entire UWME 12-km domain (Raftery et al. 2005, MWR).

No regional variation of BMA weights, variance parameters.Used observations as truth.

After several attempts to implement BMA with local or regional training data using NCEP RUC 20-km analyses as truth, we found that:

when the training data is selected from a neighborhood of grid points with similar land-use type and elevation produced EXCELLENT results!Example application to 48-h 2-m temperature forecasts uses only 14 training days.

Page 11: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

MEC BMA

BMA-Neighbor* Calibration and Sharpness

*neighbors have same land use type and elevation difference < 200 m within a search radius of 3 grid points (60 km)

MEC BMA FIT

calibration

sharpness

Probability integral transform (PIT)

histograms an analog of verification rank histograms for

continuous forecasts

Page 12: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

BMA-Neighbor* CRPS Improvement

BMA improvement over MEC

*neighbors have same land use type and elevation difference < 200 m within a search radius of 3 grid points (60 km)

Page 13: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

BMA-Neighbor Using Observations

Use observations, remote if necessary, to train BMA.

Follow the Mass-Wedam procedure for bias correction, to select the BMA training data.1. Choose the N closest observing locations to

the center of the grid box, which have similar elevation and land-use characteristics.

2. Find the K occasions during a recent period (up to Kmax days previous), on which the interpolated forecast state was similar to the current interpolated forecast state at each station n = 1, …, N.a) Similar ensemble mean forecast states.

b) Similar min/median/max ensemble forecast states.

3. If N*K matches are not found, relax the similarity constraints and repeat (1) and (2).

Page 14: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

Summary and the Way Forward

Mean error climatologyGood benchmark to evaluate competing calibration methods.

Generally beats a raw ensemble, even though it is not state-dependent.

The ensemble mean contains most of the information we can use.

The ensemble variance (state-dependent) is generally a poor prediction of uncertainty, at least on the mesoscale.

Bayesian model averaging (BMA)A calibration method that is becoming popular. (CMC-MSC)

A calibration method that meets many of the constraints that FNMOC and AFWA will face with JEFS.

It accounts for differing relative skill of ensemble members (multi-model, multi-scheme physics).

It is adaptive (short training period).

It can be rapidly relocated to any theatre.

It can be extended to any observed variable at any vertical level

(although, research is ongoing on this point).

Page 15: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

For quantities such as wind speed and precipitation, distributions are not only non-Gaussian, but not purely continuous – there are point masses at zero.For probabilistic quantitative precipitation forecasts (PQPF):

Model P(Y=0) with a logistic regression.Model P(Y>0) with a finite Gamma mixture distribution.

Fit Gamma means as a linear regression of the cubed-root of observation on forecast and an indicator function for no precipitation.Fit Gamma variance parameters and BMA weights by the EM algorithm, with some modifications.

Extending BMA to Non-Gaussian Variables

[c.f. Sloughter et al. 200x, manuscript in preparation]

Page 16: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

PoP Reliability Diagrams

Ensemble consensus voting as crosses.

Results for January 1, 2003 through December 31, 2004 24-hour accumulation PoP forecasts, with 25-day training, no regional parameter variations.

[c.f. Sloughter et al. 200x, manuscript in preparation]

BMA PQPF model as red dots.

Page 17: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

PQPF Rank Histograms

Verification RankHistogram

PIT Histogram

[c.f. Sloughter et al. 200x, manuscript in preparation]

Page 18: JEFS Calibration: Bayesian Model Averaging

QUESTIONS and DISCUSSION

Page 19: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

-0.05

0.00

0.05

0.10

0.15

0.20

0.25

00 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48

Lead Time (h)

BS

S

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

00 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48

*ACMEcoreACMEcore*ACMEcore+ACMEcore+Uncertainty

*UWME

UWME

*UWME+

UWME+

Skill vs. Lead Time for FP of the event: WS10 > 18 kt

-0.05

00 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48

BSS = 1, perfect

BSS < 0, worthless

* Bias-corrected

Forecast Probability Skill vs. Lead Time The event: 10-m wind speed > 18 kt

Bri

er S

kill

Sco

re (

BS

S)

better

Forecast Probability Skill Example

(0000 UTC Cycle; October 2002 – March 2003)Eckel and Mass 2005

Page 20: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

Resolution (~ @ 45 N ) ObjectiveAbbreviation/Model/Source Type Computational Distributed Analysis

GFS, Global Forecast System (GFS), Spectral T382 / L64 1.0 / L14 SSINational Centers for Environmental Prediction ~35 km ~80 km 3D Var CMCG, Global Environmental Multi-scale (GEM), Finite 0.9 / L28 1.25 / L11 4D VarCanadian Meteorological Centre Diff ~70 km ~100 km ETA, North American Mesoscale limited–area model, Finite 12 km / L45 90 km / L37 SSINational Centers for Environmental Prediction Diff. 3D Var GASP, Global AnalysiS and Prediction model, Spectral T239 / L29 1.0 / L11 3D VarAustralian Bureau of Meteorology ~60 km ~80 km

JMA, Global Spectral Model (GSM), Spectral T213 / L40 1.25 / L13 4D VarJapan Meteorological Agency ~65 km ~100 km NGPS, Navy Operational Global Atmos. Pred. Sys. Spectral T239 / L30 1.0 / L14 3D VarFleet Numerical Meteorological & Oceanographic Cntr. ~60 km ~80 km

TCWB, Global Forecast System, Spectral T79 / L18 1.0 / L11 OITaiwan Central Weather Bureau ~180 km ~80 km UKMO, Unified Model, Finite 5/65/9/L30 same / L12 4D VarUnited Kingdom Meteorological Office Diff. ~60 km

UWME: Multi-Analysis/Forecast Collection

Page 21: JEFS Calibration: Bayesian Model Averaging

Perturbed surface boundary parameters according to their suspected uncertainty

Assumed differences between model physics options approximate model error coming from sub-grid scales

1) Albedo

2) Roughness Length

3) Moisture Availability

UWME

UWME: MM5 Physics Configuration(January 2005 - current)

vertical Cloud 36-km 12-km shlw. SST Land UseSoil diffusion Microphysics Domain Domain cumls. Radiation Perturbation Table

MRF 5-Layer Y Reisner II Kain-Fritsch Kain-Fritsch N CCM2 none default

GFS+ MRF LSM Y Simple Ice Kain-Fritsch Kain-Fritsch Y RRTM SST_pert01 LANDUSE.plus1

CMCG+ MRF 5-Layer Y Reisner II Grell Grell N cloud SST_pert02 LANDUSE.plus2

ETA+ Eta 5-Layer N Goddard Betts-Miller Grell Y RRTM SST_pert03 LANDUSE.plus3

GASP+ MRF LSM Y Shultz Betts-Miller Kain-Fritsch N RRTM SST_pert04 LANDUSE.plus4

JMA+ Eta LSM N Reisner II Kain-Fritsch Kain-Fritsch Y cloud SST_pert05 LANDUSE.plus5

NGPS+ Blackadar 5-Layer Y Shultz Grell Grell N RRTM SST_pert06 LANDUSE.plus6

TCWB+ Blackadar 5-Layer Y Goddard Betts-Miller Grell Y cloud SST_pert07 LANDUSE.plus7

UKMO+ Eta LSM N Reisner I Kain-Fritsch Kain-Fritsch N cloud SST_pert08 LANDUSE.plus8

UWME+

CumulusPBL / LSM

Page 22: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

Ave

rage

RM

SE

(C

)an

d

(sh

aded

) A

vera

ge B

ias

12 h

24 h 36

h 48 h

Member-Wise Forecast Bias Correction

(0000 UTC Cycle; October 2002 – March 2003)Eckel and Mass 2005

UWME+ 2-m Temperature

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

NGPS+ TCWB+ UKMO+ MEAN+ -2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

GFS+ CMCG+ ETA+ GASP+ JMA+

12 h

36 h

24 h

48 h

Page 23: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Ave

rag

e R

MS

E a

nd

Bia

s (m

b)

plus01 plus02 plus03 plus04 plus05 plus06 plus07 plus08 mean -2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Ave

rag

e R

MS

E a

nd

Bia

s (m

b)

Ave

rage

RM

SE

(C

)an

d

(sh

aded

) A

vera

ge B

ias

12 h

24 h 36

h 48 h

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

plus01 plus02 plus03 plus04 plus05 plus06 plus07 plus08 mean -2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Member-Wise Forecast Bias Correction

UWME+ 2-m Temperature14-day running-mean bias correction

(0000 UTC Cycle; October 2002 – March 2003)Eckel and Mass 2005

*NGPS+ *TCWB+ *UKMO+ *MEAN+ *GFS+ *CMCG+ *ETA+ *GASP+ *JMA+

Page 24: JEFS Calibration: Bayesian Model Averaging

Sample ensemble forecasts

Post-Processing: Probability Densities

Q: How should we infer forecast probability density functions from a finiteensemble of forecasts?

A: Some options are…

Democratic Voting (DV)P = x / Mx = # members > or < thresholdM = # total members

Uniform Ranks (UR)***Assume flat rank histogramsLinear interpolation of the DVprobabilities between adjacentmember forecastsExtrapolation using a fitted Gumbel(extreme-value) distribution

Parametric Fitting (FIT)Fit a statistical distribution (e.g.,normal) to the member forecasts

***currently operational scheme

Page 25: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

A Concrete Example

Page 26: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

A Concrete Example

Minimize False AlarmsMinimize Misses

Page 27: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

How to Model Zeroes

logit of proportion of rain versuscubed root of bin center

Page 28: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

How to Model Non-Zeroes

mean (left) and variance (right) of fitted gammas on each bin

Page 29: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

Power-Transformed Obs

Untransformed:

Square root:

Cube root:

Fourth root:

Page 30: JEFS Calibration: Bayesian Model Averaging

23 August 2005 11:30 AMJEFS Technical Meeting; Monterey, CA

A Possible Fix

Try a more complicated model, fitting a point mass at zero, an exponential for “drizzle,” and a gamma for true rain around each member forecast

Red: no rain, Green: drizzle, Blue: rain