Top Banner
Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts
33

Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Mar 27, 2015

Download

Documents

Isaac Russell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 1/33

Ensemble Verification I

Renate Hagedorn European Centre for Medium-Range Weather Forecasts

Page 2: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 2/33

Objective of diagnostic/verification tools

Assessing the goodness of a forecast system involvesdetermining skill and value of forecasts

A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria.

A forecast has value if it helps the user to make better decisions than without knowledge of the forecast.

• Forecasts with poor skill can be valuable (e.g. location mismatch)

• Forecasts with high skill can be of little value (e.g. blue sky desert)

Page 3: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 3/33

Ensemble Prediction System

• 1 control run + 50 perturbed runs (TL399 L62)

added dimension of ensemble members

f(x,y,z,t,e)

• How do we deal with added dimension when

interpreting, verifying and diagnosing EPS output?

Transition from deterministic (yes/no) to probabilistic

Page 4: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 4/33

Assessing the quality of a forecast

• The forecast indicated 10% probability for rain

• It did rain on the day

• Was it a good forecast?

□ Yes

□ No

□ I don’t know (what a stupid question…)

• Single probabilistic forecasts are never completely wrong or right (unless they give 0% or 100% probabilities)

• To evaluate a forecast system we need to look at a (large) number of forecast–observation pairs

Page 5: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 5/33

Assessing the quality of a forecast system

• Characteristics of a forecast system:

Consistency*: Do the observations statistically belong to the distributions of the forecast ensembles? (consistent degree of ensemble dispersion)

Reliability: Can I trust the probabilities to mean what they say?

Sharpness: How much do the forecasts differ from the climatological mean probabilities of the event?

Resolution: How much do the forecasts differ from the climatological mean probabilities of the event, and the systems gets it right?

Skill: Are the forecasts better than my reference system (chance, climatology, persistence,…)?

* Note that terms like consistency, reliability etc. are not always well defined in verification theory and can be used with different meanings in other contexts

Page 6: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 6/33

Rank Histogram

• Rank Histograms asses whether the ensemble spread is consistent with the assumption that the observations are statistically just another member of the forecast distribution

Check whether observations are equally distributed amongst predicted ensemble

Sort ensemble members in increasing order and determine where the observation lies with respect to the ensemble members

Temperature ->

Rank 1 case Rank 4 case

Temperature ->

Page 7: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 7/33

Rank Histograms

A uniform rank histogram is a necessary but not sufficient criterion for determining that the ensemble is reliable (see also: T. Hamill, 2001, MWR)

OBS is indistinguishable from any other ensemble member

OBS is too often below the ensemble members (biased forecast)

OBS is too often outside the ensemble spread

Page 8: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 8/33

Reliability

• A forecast system is reliable if:

statistically the predicted probabilities agree with the observed frequencies, i.e.

taking all cases in which the event is predicted to occur with a probability of x%, that event should occur exactly in x% of these cases; not more and not less.

• A reliability diagram displays whether a forecast system is reliable

(unbiased) or produces over-confident / under-confident probability

forecasts

• A reliability diagram also gives information on the resolution (and

sharpness) of a forecast system

Forecast PDFClimatological PDF

Page 9: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 9/33

Reliability Diagram

Take a sample of probabilistic forecasts: e.g. 30 days x 2200 GP = 66000 forecasts

How often was event (T > 25) forecasted with X probability?

FC Prob. # FC OBS-Frequency(perfect model)

OBS-Frequency(imperfect model)

100% 8000 8000 (100%) 7200 (90%)

90% 5000 4500 ( 90%) 4000 (80%)

80% 4500 3600 ( 80%) 3000 (66%)

…. …. …. ….

…. …. …. ….

…. …. …. ….

10% 5500 550 ( 10%) 800 (15%)

0% 7000 0 ( 0%) 700 (10%)

25

25

25

Page 10: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 10/33

Reliability Diagram

Take a sample of probabilistic forecasts: e.g. 30 days x 2200 GP = 66000 forecasts

How often was event (T > 25) forecasted with X probability?

FC Prob. # FC OBS-Frequency(perfect model)

OBS-Frequency(imperfect model)

100% 8000 8000 (100%) 7200 (90%)

90% 5000 4500 ( 90%) 4000 (80%)

80% 4500 3600 ( 80%) 3000 (66%)

…. …. …. ….

…. …. …. ….

…. …. …. ….

10% 5500 550 ( 10%) 800 (15%)

0% 7000 0 ( 0%) 700 (10%)

OB

S-F

req

uency

0 100

100

••

••

FC-Probability0

Page 11: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 11/33

Reliability Diagram

over-confident model perfect model

Page 12: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 12/33

Reliability Diagram

under-confident model perfect model

Page 13: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 13/33

Reliability diagram

Reliability score (the smaller, the better)

imperfect model perfect model

Page 14: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 14/33

Components of the Brier Score

2

1

)(1

ii

I

ii ofn

NREL

N = total number of casesI = number of probability binsni = number of cases in probability bin i

fi = forecast probability in probability bin I

oi = frequency of event being observed when forecasted with fi

Reliability: forecast probability vs. observed relative frequencies

Page 15: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 15/33

Reliability diagram

Poor resolution Good resolution

Reliability score (the smaller, the better)

Resolution score (the bigger, the better)

c c

Size of red bullets represents number of forecasts in probability category (sharpness)

Page 16: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 16/33

Components of the Brier Score

2

1

)(1

ii

I

ii ofn

NREL

N = total number of casesI = number of probability binsni = number of cases in probability bin i

fi = forecast probability in probability bin I

oi = frequency of event being observed when forecasted with fi

c = frequency of event being observed in whole sample

Reliability: forecast probability vs. observed relative frequencies

Resolution: ability to issue reliable forecasts close to 0% or 100%

2

1

)(1

conN

RES i

I

ii

Uncertainty: variance of observations frequency in sample

)1( ccUNC

Brier Score = Reliability – Resolution + Uncertainty

Page 17: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 17/33

Brier Score

• The Brier score is a measure of the accuracy of probability forecasts

N

nnnN

BS op1

2

)(1

with p: forecast probability (fraction of members predicting event) o: observed outcome (1 if event occurs; 0 if event does not occur)

• BS varies from 0 (perfect deterministic forecasts) to 1 (perfectly wrong!)

• Considering N forecast – observation pairs the BS is defined as:

• BS corresponds to RMS error for deterministic forecasts

Page 18: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 18/33

Brier Skill Score

• Skill scores are used to compare the performance of forecasts with that

of a reference forecast such as climatology or persistence

cBS

BSBSS 1

• positive (negative) BSS better (worse) than reference

• Constructed so that perfect FC takes value 1 and reference FC = 0

Skill score = score of current FC – score for ref FC

score for perfect FC – score for ref FC

Page 19: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 19/33

Brier Skill Score & Reliability Diagram

UNC

RELRES

UNC

UNCRESREL

1

• How to construct the area of positive skill?

cBS

BSBSS 1

perfect reliability

Ob

serv

ed F

req

uency

Forecast Probability

line of no skill

area of skill (RES > REL)

climatological frequency (line of no resolution)

Page 20: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 20/33

Reliability: 2m-Temp.>0

0.0390.8990.141

BSSRel-ScRes-Sc

0.0390.8990.140

0.0950.9260.169

-0.001 0.877 0.123

0.0650.9180.147

-0.064 0.838 0.099

0.0470.8930.153

0.2040.9900.213

DEMETER: 1 month lead, start date May, 1980 - 2001

CERFACS CNRM ECMWF INGV

LODYC MPI UKMO DEMETER

Page 21: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 21/33

Assessing the quality of a forecast system

• Characteristics of a forecast system:

Consistency: Do the observations statistically belong to the distributions of the forecast ensembles? (consistent degree of ensemble dispersion)

Reliability: Can I trust the probabilities to mean what they say?

Sharpness: How much do the forecasts differ from the climatological mean probabilities of the event?

Resolution: How much do the forecasts differ from the climatological mean probabilities of the even, and the systems gets it right?

Skill: Are the forecasts better than my reference system (chance, climatology, persistence,…)?

Relia

bili

ty D

iag

ram

Rank Histogram

Brier Skill Score

Page 22: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 22/33

Discrimination

• Until now, we looked at the question:

If the forecast system predicts x, what is the observation y?

• When we are interested in the ability of a forecast system to discriminate between events and non-events, we investigate the question:

If the event y occurred, what was the forecast x?

• Based on signal-detection theory, the Relative Operating Characteristic (ROC) measures this discrimination ability

• The ROC curve is defined as the curve of the hit rate (H) over the false alarm rate (F)

• H and F can be calculated from the contingency table

Page 23: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 23/33

Verification of two category (yes/no) situations

• Compute 2 x 2 contingency table: (for a set of cases)

• Event Probability: s = (a+c) / n

• Probability of a Forecast of occurrence: r = (a+b) / n

• Frequency Bias: B = (a+b) / (a+c)

• Proportion Correct: PC = (a+d) / n

Event observed

Yes No total

Event

forecasted

Yes a b a+b

No c d c+d

total a+c b+d a+b+c+d=n

Page 24: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 24/33

Example of Finley Tornado Forecasts (1884)

• Compute 2 x 2 contingency table: (for a set of cases)

Event observed

Yes No total

Event

forecasted

Yes 28 72 100

No 23 2680 2703

total 51 2752 2803

• Event Probability: s = (a+c) / n = 51/2803 = 0.018

• Probability of a Forecast of occurrence: r = (a+b) / n = 100/2803 = 0.036

• Frequency Bias: B = (a+b) / (a+c) = 100/51 = 1.961

• Proportion Correct: PC = (a+d) / n = 2708/2803 = 0.966

96.6% Accuracy

Page 25: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 25/33

Example of Finley Tornado Forecasts (1884)

• Compute 2 x 2 contingency table: (for a set of cases)

• Event Probability: s = (a+c) / n = 51/2803 = 0.018

• Probability of a Forecast of occurrence: r = (a+b) / n = 0/2803 = 0.0

• Frequency Bias: B = (a+b) / (a+c) = 0/51 = 0.0

• Proportion Correct: PC = (a+d) / n = 2752/2803 = 0.982

Event observed

Yes No total

Event

forecasted

Yes 0 0 0

No 51 2752 2803

total 51 2752 2803

98.2% Accuracy!

Page 26: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 26/33

Some Scores and Skill Scores

Score Formula Finley (original)

Finley(never fc T.)

Finley (always fc. T.)

Proportion Correct

PC=(a+d)/n 0.966 0.982 0.018

Threat Score TS=a/(a+b+c) 0.228 0.000 0.018

Odds Ratio Θ=(ad)/(bc) 45.3 - -

Odss Ratio Skill Score

Q=(ad-bc)/(ad+bc) 0.957 - -

Heidke Skill Score

HSS=2(ad-bc)/(a+c)(c+d)+(a+b)(b+d)

0.355 0.0 0.0

Peirce Skill Score

PSS=(ad-bc)/(a+c)(b+d) 0.523 0.0 0.0

Clayton Skill Score

CSS=(ad-bc)/(a+b)(c+d) 0.271 - -

Gilbert Skill Score (ETS)

GSS=(a-aref)/(a-aref+b+c)aref = (a+b)(a+c)/n

0.216 0.0 0.0

Page 27: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 27/33

• Compute 2 x 2 contingency table: (for a set of cases)

• Event Probability: s = (a+c) / n

• Probability of a Forecast of occurrence: r = (a+b) / n

• Frequency Bias: B = (a+b) / (a+c)

• Hit Rate: H = a / (a+c)

• False Alarm Rate: F = b / (b+d)

• False Alarm Ratio: FAR = b / (a+b)

Event observed

Yes No total

Event

forecasted

Yes a b a+b

No c d c+d

total a+c b+d a+b+c+d=n

Verification of two category (yes/no) situations

Page 28: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 28/33

Example of Finley Tornado Forecasts (1884)

• Compute 2 x 2 contingency table: (for a set of cases)

Event observed

Yes No total

Event

forecasted

Yes 28 72 100

No 23 2680 2703

total 51 2752 2803

• Event Probability: s = (a+c) / n = 0.018

• Probability of a Forecast of occurrence: r = (a+b) / n = 0.036

• Frequency Bias: B = (a+b) / (a+c) = 1.961

• Hit Rate: H = a / (a+c) = 0.549

• False Alarm Rate: F = b / (b+d) = 0.026

• False Alarm Ratio: FAR = b / (a+b) = 0.720

Page 29: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 29/33

Event observed

Yes No threshold H F

Event forecasted

>80% - 100% 30 5 >80% 0.29 0.05

>60% - 80% 25 10 >60% 0.52 0.14

>40% - 60% 20 15 >40% 0.71 0.29

>20% - 40% 15 20 >20% 0.86 0.48

>0% - 20% 10 25 >0% 0.95 0.71

0% 5 30 1.00 1.00

total 105 105

Extension of 2 x 2 contingency table for prob. FC

0 1False Alarm Rate

Hit

Rate

0

1 •••

••

>80 >60 >40 >20 >0 0

Page 30: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 30/33

ROC curve

• ROC curve is plot of H against F for range of probability thresholds

low threshold

moderate threshold

high threshold

• ROC area (area under the ROC curve) is skill measure A=0.5 (no skill), A=1 (perfect deterministic forecast)

A=0.83

H

F

• ROC curve is independent of forecast bias, i.e. represents potential skill

• ROC is conditioned on observations (if y occurred, what did FC predict?)

Page 31: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 31/33

ROCSS vs. BSS

cBS

BSBSS 1

• ROCSS or BSS > 0 indicate skilful forecast system

12 AROCSS

Northern Extra-Tropics 500 hPa anomalies > 2σ (spring 2002)

Richardson, 2005

ROC skill score Brier skill score

Page 32: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 32/33

Summary I

• A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria

• To evaluate a forecast system we need to look at a (large) number of forecast – observation pairs

• Different scores measure different characteristics of the forecast system: Reliability / Resolution, Brier Score (BSS), ROC,…

• Perception of usefulness of ensemble may vary with score used

• It is important to understand the behaviour of different scores and choose appropriately

Page 33: Training Course 2009 – NWP-PR: Ensemble Verification I 1/33 Ensemble Verification I Renate Hagedorn European Centre for Medium-Range Weather Forecasts.

Training Course 2009 – NWP-PR: Ensemble Verification I 33/33

Goal of Practice Session

• How to construct a contingency table

• How to plot a Reliability Diagram (including Frequency Diagram) from the contingency table

• How to interpret Reliability and Frequency Diagram

• How to calculate the Brier Score and Brier Skill Score The “direct” way From the contingency table (BS=REL-RES+UNC)

• How to plot a ROC Diagram Compare characteristics of Reliability and ROC diagram