Top Banner
Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1 , Yubao Liu 1 , Gregory Roux 1 , Wanli Wu 1 , Jason Knievel 1 , Tom Warner 1 , Scott Swerdlin 1 , John Pace 2 , Scott Halvorson 2 2 U.S. Army Test and Evaluation Comma 1
39

Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Jan 17, 2016

Download

Documents

Beryl Gray
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble

Tom Hopson1

Josh Hacker1, Yubao Liu1, Gregory Roux1, Wanli Wu1, Jason Knievel1, Tom Warner1, Scott Swerdlin1,

John Pace2, Scott Halvorson2

2U.S. Army Test and Evaluation Command

1

Page 2: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

OutlineI. Motivation: ensemble forecasting and post-

processingII. E-RTFDDA for Dugway Proving GroundsIII. Introduce Quantile Regression (QR; Kroenker

and Bassett, 1978)III. Post-processing procedureIV. Verification resultsV. Warning: dynamically finding ensemble

dispersion at risk ensemble mean utility VI. Conclusions

Page 3: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Goals of an EPS

• Predict the observed distribution of events and atmospheric states

• Predict uncertainty in the day’s prediction• Predict the extreme events that are possible on a

particular day• Provide a range of possible scenarios for a

particular forecast

Page 4: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

1. Greater accuracy of ensemble mean forecast (half the error variance of single forecast)

2. Likelihood of extremes3. Non-Gaussian forecast PDF’s4. Ensemble spread as a representation of forecast

uncertainty=> All rely on forecasts being calibrated

Further … -- Argue calibration essential for tailoring to local application:

NWP provides spatially- and temporally-averaged gridded forecast output

-- Applying gridded forecasts to point locations requires location specific calibration to account for local spatial- and temporal-scales of variability ( => increasing ensemble dispersion)

More technically …

Page 5: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Dugway Proving Grounds, Utah e.g. T Thresholds

• Includes random and systematic differences between members.

• Not an actual chance of exceedance unless calibrated.

Page 6: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Challenges in probabilistic mesoscale prediction

• Model formulation• Bias (marginal and conditional)• Lack of variability caused by truncation and approximation• Non-universality of closure and forcing

• Initial conditions• Small-scales are damped in analysis systems, and the model must

develop them• Perturbation methods designed for medium-range systems may not be

appropriate• Lateral boundary conditions

• After short time periods the lateral boundary conditions can dominate• Representing uncertainty in lateral boundary conditions is critical

• Lower boundary conditions• Dominate boundary-layer response• Difficult to estimate uncertainty in lower boundary conditions

Page 7: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

RTFDDA and Ensemble-RTFDDA

Liu et al. 2010 AMS Annual Meeting, 14th IOAS-AOLS, Atlanta, GA. January 18 – 23, [email protected]

Page 8: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

The Ensemble Execution Module

Perturbations

observations

Member 1

Perturbations

observations

Member 2

Perturbations

observations

Member 3

Perturbations

observations

Member N

36-48h

fcsts

36-48h

fcsts

36-48h

fcsts

36-48h

fcsts

Input to decision support

tools

Postprocessing

Archiving and verification

RTFDDA

RTFDDA

RTFDDA

RTFDDA

Liu et al. 2010 AMS Annual Meeting, 14th IOAS-AOLS, Atlanta, GA. January 18 – 23, [email protected]

Page 9: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Operated at US Army DPG

since Sep. 2007

D1

D2

D3

Surface and X-sections – Mean, Spread, Exceedance Probability, Spaghetti, …

Likelihood for SPD > 10m/s

Mean T & Wind

T Mean and SD

Wind Speed

T-2m

Wind Rose

Pin-point Surface and Profiles – Mean, Spread, Exceedance probability, spaghetti, Wind roses, Histograms …

Real-time Operational Products for DPG

Page 10: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Forecast “calibration” or “post-processing”Pr

obab

ility

calibration

Flow rate [m3/s]

Prob

abili

ty

Post-processing has corrected:• the “on average” bias• as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”)

“spread” or “dispersion”

“bias”obs

obs

ForecastPDF

ForecastPDF

Flow rate [m3/s]

Our approach:• under-utilized “quantile regression” approach• probability distribution function “means what it says”• daily variation in the ensemble dispersion directly relate to changes in forecast skill => informative ensemble skill-spread relationship

Page 11: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Example of Quantile Regression (QR)

Our application

Fitting T quantiles using QR conditioned on:

1) Ranked forecast ens

2) ensemble mean

3) ensemble median

4) ensemble stdev

5) Persistence

Page 12: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

T [K

]

Timeforecastsobserved

Regressor set: 1. reforecast ens2. ens mean3. ens stdev 4. persistence 5. LR quantile (not shown)

Prob

abili

ty/°

K

Temperature [K]

climatologicalPDF

Step I: Determineclimatological quantiles

Step 2: For each quan, use “forward step-wisecross-validation” to iteratively select best subsetSelection requirements: a) QR cost function minimum, b) Satisfy binomial distribution at 95% confidenceIf requirements not met, retain climatological “prior”

1.

3.2.

4.

Step 3: segregate forecasts into differing ranges of ensemble dispersion and refit models (Step 2) uniquely for each range

Time

forecasts

T [K

]

I. II. III. II. I.Pr

obab

ility

/°K

Temperature [K]

ForecastPDF

prior

posterior

Final result: “sharper” posterior PDFrepresented by interpolated quans

Page 13: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Measures Used:1) Rank histogram (converted to scalar measure)2) Root Mean square error (RMSE)3) Brier score4) Rank Probability Score (RPS)5) Relative Operating Characteristic (ROC) curve6) New measure of ensemble skill-spread utility

=> Using these for automated calibration model selection by using weighted sum of skill scores of each

Utilizing Verification measures near-real-time …

Page 14: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Problems with Spread-Skill Correlation … ECMWF spread-skill

(black) correlation << 1

Even “perfect model” (blue) correlation << 1 and varies with forecast lead-time

ECMWFr = 0.33“Perfect”r = 0.68

ECMWFr =“Perfect”r = 0.56

ECMWFr = 0.39“Perfect”r = 0.53

ECMWFr = 0.36“Perfect”r = 0.49

1 day

7 day

4 day

10 day

Page 15: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

National Security Applications Program Research Applications Laboratory

3-hr dewpoint time seriesBefore Calibration After Calibration

Station DPG S01

Page 16: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

42-hr dewpoint time seriesBefore Calibration After Calibration

Station DPG S01

Page 17: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

obs

Blue is “raw” ensembleBlack is calibrated ensembleRed is the observed value

Notice: significant change in both “bias” and dispersion of final PDF

(also notice PDF asymmetries)

PDFs: raw vs. calibrated

Page 18: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

National Security Applications Program Research Applications Laboratory

3-hr dewpoint rank histogramsStation DPG S01

Page 19: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

National Security Applications Program Research Applications Laboratory

Station DPG S01

42-hr dewpoint rank histograms

Page 20: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Skill Scores

• Single value to summarize performance.• Reference forecast - best naive guess;

persistence, climatology• A perfect forecast implies that the object

can be perfectly observed• Positively oriented – Positive is good

SS =Aforc −Aref

Aperf −Aref

Page 21: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

National Security Applications Program Research Applications Laboratory

Skill Score VerificationRMSE Skill Score CRPS Skill Score

Reference Forecasts:Black -- raw ensembleBlue -- persistence

Page 22: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Computational Resource Questions:

How best to utilize a multi-model simulations (forecast), especially if under-dispersive?

a) Should more dynamical variability be searched for? Orb) Is it better to balance post-processing with multi-model

utilization to create a properly dispersive, informative ensemble?

Page 23: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

National Security Applications Program Research Applications Laboratory

3-hr dewpoint rank histogramsStation DPG S01

Page 24: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

National Security Applications Program Research Applications Laboratory

RMSE of ensemble members

3hr Lead-time 42hr Lead-time

Station DPG S01

Page 25: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

National Security Applications Program Research Applications Laboratory

Significant calibration regressors

3hr Lead-time 42hr Lead-time

Station DPG S01

Page 26: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Questions revisited:How best to utilize a multi-model simulations (forecast),

especially if under-dispersive?

a) Should more dynamical variability be searched for? Orb) Is it better to balance post-processing with multi-model

utilization to create a properly dispersive, informative ensemble?

Warning: adding more models can lead to decreasing utility of the ensemble mean (even if the ensemble is under-dispersive)

Page 27: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Summary Quantile regression provides a powerful framework for improving the whole (potentially non-gaussian) PDF of an ensemble forecast – different regressors for different quantiles and lead-times

This framework provides an umbrella to blend together multiple statistical correction approaches (logistic regression, etc., not shown) as well as multiple regressors

As well, “step-wise cross-validation” based calibration provides a method to ensure forecast skill no worse than climatological and persistence for a variety of cost functions

As shown here, significant improvements made to the forecast’s ability to represent its own potential forecast error (while improving sharpness):

– uniform rank histogram– significant spread-skill relationship (new skill-spread measure)

Care should be used before “throwing more models” at an “under-dispersive” forecast problem

Further questions: [email protected] or [email protected]

Page 28: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.
Page 29: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Dugway Proving Ground

Page 30: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.
Page 31: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

other options …Assign dispersion bins, then:

2) Average the error values in each bin, then correlate

3) Calculate individual rank histograms for each bin, convert to a scalar measure

Page 32: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.
Page 33: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Example: French Broad RiverBefore Calibration => underdispersive

Black curve shows observations; colors are ensemble

Page 34: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Rank Histogram Comparisons

After quantile regression, rank histogram more uniform(although now slightly over-dispersive)

Raw full ensemble After calibration

Page 35: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Frequency Used forQuantile Fitting of Method I:

Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%

What Nash-Sutcliffe (RMSE) implies about Utility

Page 36: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Note:

Take home message:

For a “calibrated ensemble”, error variance of the ensemble mean is 1/2 the error variance of any ensemble member (on average), independent of the distribution being sampled

Prob

abili

ty

obsForecastPDF

Discharge

i=ensembleaverage

( fi −o)2iversus ( f −o)2

i

Simplifying

eq1 : fi2 −2of + o2

eq2 : f 2 −2of + o2

o : fj ⇒ j

eq1 : 2 f 2 − f 2( )

eq2 : f 2 − f 2

⇒ eq1=2 eq2

Page 37: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Sequentially-averaged models (ranked based on NS Score) and their resultant NS Score

=> Notice the degredation of NS with increasing # (with a peak at 2 models)

=> For an equitable multi-model, NS should rise monotonically

=> Maybe a smaller subset of models would have more utility? (A contradiction for an under-dispersive ensemble?)

What Nash-Sutcliffe (RMSE) implies about Utility (cont)

-- degredation with increased ensemble size

Page 38: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Initial Frequency Used forQuantile Fitting:

Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%

What Nash-Sutcliffe implies about Utility (cont)

Reduced Set Frequency Used for Quantile Fitting:

Best Model=73%Ensemble StDev=3%Ensemble Mean=32%Ranked Ensemble=29%

…using only top 1/3 of modelsTo rank and form ensemble mean …… earlier results …

=> Appears to be significant gains in the utility of the ensemble after “filtering” (except for drop in StDev) … however “proof is in the pudding” …=> Examine verification skill measures …

Page 39: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu.

Skill Score Comparisonsbetween full- and “filtered” ensemble sets

Points:

-- quite similar results for a variety of skill scores-- both approaches give appreciable benefit over the original raw multi-model output-- however, only in the CRPSS is there improvement of the “filtered” ensemble set over the full set

=> post-processing method fairly robust=> More work (more filtering?)!

GREEN -- full calibrated multi-modelBLUE -- “filtered” calibrated multi-modelReference – uncalibrated set