-
Nonlin. Processes Geophys., 27, 473–487,
2020https://doi.org/10.5194/npg-27-473-2020© Author(s) 2020. This
work is distributed underthe Creative Commons Attribution 4.0
License.
Statistical postprocessing of ensemble forecasts forsevere
weather at Deutscher WetterdienstReinhold HessDeutscher
Wetterdienst, Offenbach, Germany
Correspondence: Reinhold Hess ([email protected])
Received: 7 January 2020 – Discussion started: 20 January
2020Revised: 13 August 2020 – Accepted: 19 August 2020 – Published:
6 October 2020
Abstract. This paper gives an overview of Deutscher
Wet-terdienst’s (DWD’s) postprocessing system called Ensemble-MOS
together with its motivation and the design con-sequences for
probabilistic forecasts of extreme eventsbased on ensemble data.
Forecasts of the ensemble systemsCOSMO-D2-EPS and ECMWF-ENS are
statistically opti-mised and calibrated by Ensemble-MOS with a
focus on se-vere weather in order to support the warning decision
man-agement at DWD.
Ensemble mean and spread are used as predictors for lin-ear and
logistic multiple regressions to correct for conditionalbiases. The
predictands are derived from synoptic observa-tions and include
temperature, precipitation amounts, windgusts and many more and are
statistically estimated in a com-prehensive model output statistics
(MOS) approach. Longtime series and collections of stations are
used as trainingdata that capture a sufficient number of observed
events, asrequired for robust statistical modelling.
Logistic regressions are applied to probabilities that
prede-fined meteorological events occur. Details of the
implemen-tation including the selection of predictors with testing
forsignificance are presented. For probabilities of severe
windgusts global logistic parameterisations are developed that
de-pend on local estimations of wind speed. In this way,
robustprobability forecasts for extreme events are obtained
whilelocal characteristics are preserved.
The problems of Ensemble-MOS, such as model changesand
consistency requirements, which occur with the opera-tive MOS
systems of the DWD are addressed.
1 Introduction
Ensemble forecasting rose with the understanding of the lim-ited
predictability of weather. This limitation is caused bysparse and
imperfect observations, approximating numericaldata assimilation
and modelling, and by the chaotic phys-ical nature of the
atmosphere. The basic idea of ensembleforecasting is to vary
observations, initial and boundary con-ditions, and physical
parameterisations within their assumedscale of uncertainty and
rerun the forecast model with thesechanges.
The obtained ensemble of forecasts expresses the distribu-tion
of possible weather scenarios to be expected. Probabilis-tic
forecasts can be derived from the ensemble, like forecasterrors,
probabilities for special weather events, quantiles ofthe
distribution or even estimations of the full distribution.The
ensemble spread is often used as estimation for forecasterrors. In
a perfect ensemble system the spread is statisti-cally consistent
with the forecast error of the ensemble meanagainst observations
(e.g. Wilks, 2011); however, it is expe-rienced as being often too
small, especially for near-surfaceweather elements and short lead
times. Typically, an optimalspread–skill relationship close to 1
and its involved forecastreliability are obtained much more easily
for atmosphericvariables in higher vertical layers, e.g. 500 hPa
geopotentialheight, than for screen-level variables like 2 m
temperature,10 m wind speed or precipitation (e.g. Buizza et al.,
2005;Gebhardt et al., 2011; Buizza, 2018); see also Sect. 4.
In order to make best use of the probabilistic informa-tion
contained in the ensembles, e.g. by relating probabilitiesfor
harmful weather events to economical value in cost–lossevaluations
(e.g. Wilks, 2001; Ben Bouallègue et al., 2015),the ensemble
forecasts should be calibrated to observed rel-ative frequencies as
motivated by Buizza (2018). Warning
Published by Copernicus Publications on behalf of the European
Geosciences Union & the American Geophysical Union.
-
474 R. Hess: Statistical Postprocessing for Severe Weather
thresholds are the levels of probabilities at which
meteoro-logical warnings are to be issued. These thresholds may
betailored to the public depending on categorical scores such
asprobability of detection (POD) and false alarm ratio
(FAR).Statistical reliability of forecast probabilities is
consideredessential for qualified threshold definitions and for
automatedwarning guidances.
For deterministic forecasts statistical postprocessing isused
for optimisation and interpretation. This is likewise truefor
ensemble forecasts, where statistical calibration is an ad-ditional
application of postprocessing. Gneiting et al. (2007)describe
probabilistic postprocessing as a method to max-imise the sharpness
of a predictive distribution under the con-dition of calibration
(the climatologic average is calibratedtoo; however, it has no
sharpness and is useless as a fore-cast). Nevertheless,
optimisation is still an issue for ensembleforecasts. In general,
the systematic errors of the underlyingnumerical model turn up in
each forecast member and thusare retained in the ensemble mean.
Averaging only reducesthe random errors of the ensemble
members.
Due to its ability to improve skill and reliability of
proba-bilistic forecasts, many different postprocessing methods
ex-ist for both single- and multi-model ensembles. There
arecomprehensive multivariate systems and univariate systemsthat
are specific to a certain forecast element. Length of train-ing
data generally depends on the statistical method and ap-plication;
however, the availability of data is also often a se-rious
limitation. Some systems perform individual trainingfor different
locations in order to account for local character-istics, whilst
others apply the same statistical model to col-lections of stations
or grid points. Global modelling improvesstatistical sampling at
the cost of orographic and climatologicdisparities.
Classical MOS systems tend to underestimate forecast er-rors if
corrections are applied to each ensemble member in-dividually. In
order to maintain forecast variability, Vannit-sem (2009) suggests
considering observation errors. Gneit-ing et al. (2005) propose
non-homogeneous Gaussian regres-sion (NGR) that relies on Gaussian
distributions. The loca-tion and scale parameters of the Gaussian
distributions corre-spond to a linear function of the ensemble mean
and ensem-ble spread, respectively. The NGR coefficients are
trained byminimising the continuous ranked probability score
(CRPS).In Bayesian model averaging (BMA) (e.g. Raftery et al.,2005;
Möller et al., 2013) distributions of already bias-corrected
forecasts are combined as weighted averages usingkernel
functions.
Many different postprocessing methods tailored to differ-ent
variables exist; only some are mentioned here. For 24-hourly
precipitation Hamill (2012) present a multimodel en-semble
postprocessing based on extended logistic regressionand 8 years of
training data. Hamill et al. (2017) describea method to blend
high-resolution multimodel ensemblesby quantile mapping with short
training periods of about 2months for 6- and 12-hourly
precipitations. Postprocessing
methods specialising in wind speed have been developed aswell;
e.g. Sloughter et al. (2013) use BMA in combinationwith Gamma
distributions. An overview of conventional uni-variate
postprocessing approaches is given in Wilks (2018).
In addition to the univariate postprocessing methods men-tioned
above, there exist also approaches to model spatio-temporal
dependence structures and hence to produce en-sembles of forecast
scenarios. This enables, for instance,to estimate area-related
probabilities. Schefzik et al. (2013)and Schefzik and Möller (2018)
use ensemble copula cou-pling (ECC) and Schaake shuffle-based
approaches in orderto generate postprocessed forecast scenarios for
temperature,precipitation and wind. Ensembles of ECC forecast
scenar-ios provide high flexibility in product generation to the
con-straint that all ensemble data are accessible.
Fewer methods focus on extreme events of precipitationand wind
gusts that are essential for automated warning sup-port.
Friederichs et al. (2018) use the tails of generalisedextreme-value
distributions in order to estimate conditionalprobabilities of
extreme events. As extreme meteorologicalevents are (fortunately)
rare, long time series are required tocapture a sufficiently large
number of occurred events in or-der to derive statistically
significant estimations. For exam-ple, strong precipitation events
with rain amounts of morethan 15 mm h−1 are captured only about
once a year at eachrain gauge within Germany. Extreme events with
more than40 and 50 mm rarely appear; nevertheless, warnings are
es-sential when they do.
With long time series, a significant portion of the data
con-sists of calm weather without relevance for warnings. It
isproblematic, however, to restrict or focus training data on
se-vere events. In doing so, predictors might be selected thatare
highly correlated with the selected series of severe eventsbut
accidentally also to calm scenarios that are not containedin the
training data. In order to exclude these spurious pre-dictors and
to derive skilful statistical models, more generaltraining data
need to be used, since otherwise overforecast-ing presumably
results and frequency bias (FB) and FARincrease. This basically
corresponds to the idea of the fore-caster’s dilemma (see Lerch et
al., 2017) that states that over-forecasting is a promising
strategy when forecasts are evalu-ated mainly for extreme
events.
The usage of probabilistic forecasts for warnings of
severeweather also influences the way the forecasts need to be
eval-uated. Also for verification, long time periods are required
tocapture enough extreme and rare events to derive
statisticallysignificant results. Verification scores like root
mean squareerror (RMSE) or CRPS (e.g. Hersbach, 2000; Gneiting et
al.,2005) are highly dominated by the overwhelming majority ofcases
when no event occurred. Excellent but irrelevant fore-casts of calm
weather can pretend good verification results,although the few
relevant extreme cases might not be fore-casted well. Categorical
scores like POD and FAR are con-sidered more relevant for rare and
extreme cases, along withother more complex scores like Heidke
skill score (HSS) or
Nonlin. Processes Geophys., 27, 473–487, 2020
https://doi.org/10.5194/npg-27-473-2020
-
R. Hess: Statistical Postprocessing for Severe Weather 475
equitable threat score (ETS). Also, scatter diagrams
revealoutliers and are sensitive to extreme values.
Here we present a MOS approach that has been tailoredto
postprocessing ensemble forecasts for extreme and rareevents. It is
named Ensemble-MOS and has been set up atDWD in order to support
warning management with proba-bilistic forecasts of potentially
harmful weather events withinAutoWARN; see Reichert et al. (2015)
and Reichert (2016,2017). Altogether 37 different warning elements
exist atDWD, including heavy rain and strong wind gusts, both
atseveral levels of intensity, thunderstorms, snowfall, fog,
lim-ited visibility, frost and others. Currently the ensemble
sys-tems COSMO-D2-EPS and ECMWF-ENS are statisticallyoptimised and
calibrated using several years of training data,but Ensemble-MOS is
applicable to other ensembles in gen-eral.
At DWD, statistically postprocessed forecasts of the en-semble
systems COSMO-D2-EPS and ECMWF-ENS andalso of the deterministic
models ICON and ECMWF-IFS arecombined in a second step in order to
provide a consistentdata set and a seamless transition from very
short-term tomedium-range forecasts. This combined product provides
asingle voice basis for the generation of warning proposals;see
Reichert et al. (2015). The combination is based on asecond MOS
approach similar to the system described here;it uses the
individual statistical forecasts of the numericalmodels as
predictors. As a linear combination of calibratedforecasts does not
necessarily preserve calibration (e.g. Ran-jan and Gneiting, 2010),
additional constant predictors areadded to the MOS equations as a
remedy. Primo (2016) andReichert (2017) state that automated
warnings of wind gustsbased on the combined product achieve a
performance that iscomparable to that of human forecasters.
The further outline of the paper is as follows: after the
in-troduction, the used observations and ensemble systems
areintroduced in Sect. 2. Thereafter, Sect. 3 describes the
con-ceptual design of Ensemble-MOS with the definition of
pre-dictands and predictors and provides technical details of
thestepwise linear and logistic regressions. Especially for
ex-treme wind gusts, a global logistic regression is presentedthat
uses statistical forecasts of the speed of wind gusts aspredictors
for probabilities of strong events. General caveatsof MOS like
model changes and forecast consistency are ad-dressed at the end of
that section. The results shown here fo-cus on wind gusts and are
provided in Sect. 4. Finally, Sect. 5provides a summary and
conclusions.
2 Observations and ensemble data
Synoptic observations and model data from the ensemblesystems
COSMO-D2-EPS and ECMWF-ENS are used astraining data and for current
statistical forecasts. Time seriesof 8 years of observations and
model data have been gath-
ered for training at the time of writing. The used data
areintroduced in the following.
2.1 Synoptic observations
Observations of more than 320 synoptic stations within Ger-many
and its surroundings are used as part of the trainingdata. For
short time forecasts the latest available observationsat run time
are used also as predictors for the statistical mod-elling, which
is described in more detail in Sect. 3.1.
The synoptic observations include measurements of tem-perature,
dew point, precipitation amounts, wind speed anddirection, speed of
wind gusts, surface pressure, global ra-diation, visibility, cloud
coverage at several height levels,past and present weather and many
more. Past weather andpresent weather also contain observations of
thunderstorm,kind of precipitation and fog amongst others.
Ensemble-MOS derives all predictands that are relevant for
weatherwarnings based on synoptic measurements in a comprehen-sive
approach in order to provide the corresponding
statisticalforecasts. In this paper, we focus on the speeds of wind
gustsand on probabilities for severe storms.
2.2 COSMO-D2-EPS and upscaled precipitationprobabilities
The ensemble system COSMO-D2-EPS of DWD consists of20 members of
the numerical model COSMO-D2. It pro-vides short-term weather
forecasts for Germany, with runsevery 3 h (i.e. 00:00, . . ., 21:00
UTC) with forecast steps of1 up to 27 h ahead (up to 45 h for 03:00
UTC). COSMO-D2was upgraded from its predecessor model COSMO-DE on15
May 2018, together with its ensemble system COSMO-D2-EPS; the
upgrade included an increase in horizontal res-olution from 2.8 to
2.2 km and an adapted orography. De-tailed descriptions of COSMO-DE
and its ensemble systemCOSMO-DE-EPS are provided in Baldauf et al.
(2011), Geb-hardt et al. (2011) and Peralta et al. (2012),
respectively. Forthe ensemble system initial and boundary
conditions as wellas physical parameterisations are varied
according to theirassumed levels of uncertainty.
For the postprocessing of COSMO-D2-EPS, 8 years ofdata have been
gathered, including data from the predeces-sor system COSMO-DE-EPS,
which has been available since8 December 2010. Thus, a number of
model changes and up-dates are included in the data; impacts on
statistical forecast-ing are addressed later in Sect. 3.4. Each run
of Ensemble-MOS starts 2 h after the corresponding run of
COSMO-D2-EPS to ensure that the ensemble system has finished and
thedata are available.
Forecast probabilities of meteorological events can be
es-timated as the relative frequency of the ensemble membersthat
show the event of interest. If the relative frequencies
areevaluated grid point by grid point, the probabilities imply
thatthe event occurs within areas of the sizes of the grid
cells,
https://doi.org/10.5194/npg-27-473-2020 Nonlin. Processes
Geophys., 27, 473–487, 2020
-
476 R. Hess: Statistical Postprocessing for Severe Weather
Figure 1. Rank/Talagrand histogram for 1-hourly
precipitationamounts of COSMO-DE-EPS and forecast lead time 3 h;
data for18 stations from 2011 to 2017.
which are 2.2× 2.2 km2 for COSMO-D2-EPS. It is thereforenot
straightforwardly possible to compare event probabilitiesof
ensembles of numerical models with different grid resolu-tions.
For near-surface elements and short lead times, theCOSMO-D2-EPS
is often underdispersive and underesti-mates forecast errors.
Figure 1 shows a rank histogram for1-hourly precipitation amounts
of COSMO-DE-EPS. Toomany observations have either less or more
precipitation thanall members of the ensemble. Using these relative
frequen-cies as estimations of event probabilities statistically
resultsin too many probabilities with values 0 and 1.
Because of the high spatial variability of precipitation,
up-scaled precipitation products are also derived from
COSMO-D2-EPS, which are relative frequencies of precipitationevents
within areas of 10× 10 grid points (i.e. 22× 22 km2).A
meteorological event (e.g. that the precipitation rate ex-ceeds a
certain threshold) is considered to occur within anarea if the
event occurs at at least one of its grid points.Area probabilities
are therefore estimated straightforwardlyas the relative number of
ensemble members predicting thearea event, not requiring the event
to take place at exactly thesame grid point for all ensemble
members.
Certainly, these raw ensemble-based estimates are also af-fected
by systematic errors of the numerical model COSMO-D2. Hess et al.
(2018) observed a bias of −6.2 percentagepoints for the upscaled
precipitation product of COSMO-DE-EPS for the probability that
hourly precipitation rate exceeds0.1 mm. Verification has been done
against gauge-adjustedradar observations, which is a suitable
observation systemfor areas.
2.3 ECMWF-ENS and TIGGE-data
The ECMWF-ENS is a global ensemble system based on theIntegrated
Forecasting System (IFS) of the European Centrefor Medium-Range
Weather Forecasts (ECMWF). It consistsof 50 perturbed members plus
one control run and is com-puted twice a day for 00:00 and 12:00
UTC up to 15 d ahead(and even further with reduced resolution).
Postprocessingof Ensemble-MOS at DWD is based on the 00:00 UTC
runwith forecast lead times up to 10 d in steps of 3 h. Fore-casts
of ECMWF-ENS are interpolated from their genuinespectral resolution
to a regular grid with 28 km (0.25◦) meshsize. Data have been
gathered according to the availability ofCOSMO-DE/2-EPS since 8
December 2010.
TIGGE data from 2002 to 2013 (see Bougeault et al.,2010 and
Swinbank et al., 2016) of ECMWF-ENS were usedin a study to
demonstrate the benefits of Ensemble-MOSprior to unarchiving and
downloading the gridded ensem-ble data mentioned above. This study
was restricted to theavailable set of model variables of TIGGE (2 m
temperature,mean wind, cloud coverage and 24 h precipitation);
resultsare given in Sect. 4.2.
3 Postprocessing by Ensemble-MOS
The Ensemble-MOS of DWD is a model output statistics(MOS) system
specialised to postprocess the probabilisticinformation of NWP
ensembles. Besides calibrating prob-abilistic forecasts,
Ensemble-MOS also simultaneously op-timises continuous variables,
e.g. precipitation amounts andthe speeds of wind gusts. Moreover,
statistical interpretationsalso exist for meteorological elements
that are not availablefrom numerical forecasts (e.g. thunderstorm,
fog or rangeof visibility). In principle, all meteorological
parameters andevents that are regularly observed can be forecasted
statisti-cally. This includes temperature, dew point, wind speed
anddirection, wind gusts, surface pressure, global radiation,
vis-ibility, cloud coverage at several height levels as well as
thesynoptic weather with events of thunderstorm, special kindsof
precipitation, fog and more.
The basic concept of Ensemble-MOS is to use ensemblemean,
spread, and other ensemble statistics as predictors inmultiple
linear and logistic regressions. The use of ensembleproducts as
predictors instead of processing each ensemblemember individually
prevents difficulties with underdisper-sive statistical results and
underestimated errors, especiallyfor longer forecast horizons.
Since MOS systems usuallytend to converge towards climatology due
to the fading ac-curacy of numerical models and the limited
predictability ofmeteorological events (e.g. Vannitsem, 2009),
individuallyprocessed members converge accordingly. Moreover,
multi-variate MOS systems perform corrections that depend on
theselected set of predictors in order to reduce conditional
bi-ases. If the postprocessing of the individual ensemble mem-
Nonlin. Processes Geophys., 27, 473–487, 2020
https://doi.org/10.5194/npg-27-473-2020
-
R. Hess: Statistical Postprocessing for Severe Weather 477
bers uses the same set of predictors, the resulting
statisticalforecasts become correlated and underdispersive also for
thisreason.
Ensemble-MOS is based on a MOS system originally setup for
postprocessing deterministic forecasts of the formerglobal
numerical model GME of DWD and of the determin-istic,
high-resolution IFS of ECMWF; see Knüpffer (1996).Using the
ensemble mean and spread as model predictors al-lows application of
the original MOS approach for determin-istic NWP models to
ensembles in a straightforward way.
For continuous variables, such as temperature, precipita-tion
amount or wind speed, deterministic Ensemble-MOSforecasts and
estimates of the associated forecast errors areobtained by multiple
linear regression. The MOS equation ofa statistical estimate ŷk
with k predictors x1, . . .,xk and k+1coefficients c0, . . .,ck of
a continuous predictand y is
ŷk = c0+ c1x1+ . . .+ ckxk. (1)
For events like thunderstorms, heavy precipitation orstrong wind
speed, calibration of event occurrence or thresh-old exceedance
probability is performed using multiple lo-gistic regression. For
this an estimate
ŷk =1
1+ e−(c0+c1x1+...+ckxk)(2)
of the predictand y is determined using a maximum likeli-hood
approach. The predictand y now is a binary variablethat is 1 in
case the event was observed and 0 if not, whereasthe estimate ŷk
is considered a probability that takes valuesfrom 0 to 1. Logistic
regression (e.g. Hosmer et al., 2013) isa classical approach for
probabilistic postprocessing.
Details of the implementation of linear and logistic regres-sion
are presented in Sect. 3.1 and 3.2, respectively. Espe-cially for
probabilities of strong and extreme wind gusts aglobal regression
is applied that is presented in Sect. 3.3. Foran introduction to
MOS in general we refer to Glahn andLowry (1972), Wilks (2011) and
Vannitsem et al. (2018).
3.1 Optimisation and interpretation by linearregression
Ensemble-MOS derives altogether some 150 predictandsfrom
synoptic observations (for precipitation gauge-adjustedradar
products can also be used) for statistical modelling.For the speeds
of wind gusts and for precipitation amountsindividual predictands
for various reference periods (e.g. 1-hourly, 3-hourly, 6-hourly
and longer) are defined. As usual,these predictands are modelled by
individual linear regres-sions. The resulting statistical estimates
are added to thelist of available predictors for subsequent
regressions dur-ing postprocessing. They are selected as predictors
especiallyfor probabilities that the speeds of wind gusts or
precipitationamounts exceed predefined thresholds within the
correspond-ing time frames (i.e. 1, 3 h, etc.).
In order to estimate the error of the current forecast, anerror
predictand
ye = |ŷk − y| (3)
is defined as the absolute value of the residuum. The
cor-responding estimate ŷek is defined according to Eq. (1).
Thiserror predictand can be evaluated as soon as the estimate ŷk
isavailable. The absolute value is preferred over the root
meansquare (RMS) of the residuum, since it shows higher
corre-lations with many predictors and a better linear fitting.
ForGaussian distributions with density ϕµ,σ 2 the absolute errore
(or mean absolute deviation) of the distribution can be esti-mated
from the standard deviation σ as
e =
∞∫−∞
|x−µ|ϕµ,σ 2(x)dx = 2
∞∫0
x1
√2πσ
e−x2
2σ2 dx
=
√2πσ ≈ 0.8σ. (4)
For each predictand the most relevant predictors are se-lected
from a predefined set of independent variables by step-wise
regression. Statistical modelling is performed for eachpredictand,
station, season, forecast run and forecast time in-dividually in
general. For rare events, however, nine zoneswith similar
climatology are defined (e.g. coastal strip, northGerman plain,
various height zones in southern Germany,high mountain areas) and
the stations are clustered togetherin order to increase the number
of observed events and thestatistical significance of the training
data. All stations of acluster are modelled together for those
events.
Most potential predictors are based on forecasts of the
en-semble system, which are interpolated to the locations of
theobservation sites. Additional to the model values at the
near-est grid point, the mean and standard deviation of the 6× 6and
11× 11 surrounding grid points are also evaluated andprovided as
medium- and large-scale predictors, respectively.Moreover, extra
variables are also derived from the NWP-model fields to be used as
predictors, e.g. potential tempera-ture, various atmospheric layer
thicknesses, rotation and di-vergence of wind velocity, dew-point
spread and even spe-cial parameters, such as convective available
potential energy(CAPE) and severe weather threat index (SWEAT).
Thesevariables are computed from the ensemble means of the
re-quired fields.
Statistical forecasts of the same variable of the last
forecaststep and also of other variables of the current forecast
stepcan be used as well. For example, forecasts of 2 m tempera-ture
may use statistical forecasts of precipitation amounts ofthe same
time step as predictors. The order of the statisticalmodelling and
of the forecasting is relevant in such cases tomake sure the
required data are available.
Further predictors are derived from the latest observationsthat
are available at the time when the statistical forecast
https://doi.org/10.5194/npg-27-473-2020 Nonlin. Processes
Geophys., 27, 473–487, 2020
-
478 R. Hess: Statistical Postprocessing for Severe Weather
is computed. Generally, the latest observation is an excel-lent
projection for short-term forecasts up to about 4 to 6 h,which is
therefore added to the set of available predictors.Special care has
to be taken to process these predictors fortraining, however. Only
those observations can be used thatare available at run time of the
forecast. In case forecastsare computed for arbitrary locations
apart from observationand training sites, these observations or
persistency predic-tors have to be processed in exactly the same
way in trainingand forecasting. At locations other than observation
sites, therequired values need to be interpolated from the
surround-ing stations. As interpolation generally is a weighted
aver-age based on horizontal and vertical distance, it
introducessmoothing and, with it, a systematic statistical change
in theuse of the observations. If the training is performed using
ob-servations at the stations and the forecasting is using
interpo-lation, the statistical forecasts can be affected. As a
remedy,Ensemble-MOS uses observations as persistence predictorsfor
training that are interpolated from up to five surround-ing
stations in exactly the same way as when computing theforecast at
arbitrary locations, even if an observation at thecorrect location
was available.
Special orographic predictors also exist, like height of
sta-tion or height difference between station and model at a
spe-cific location. In order to address model changes, indicatorsor
binary variables are also provided (see Sect. 3.4 for de-tails).
Altogether more than 300 independent variables aredefined, from
which up to 10 predictors are selected for eachpredictand during
multiple regression.
During stepwise regression, the predictor with the
highestcorrelation with the predictand is selected first from the
setof available independent variables. Next, the linear regres-sion
with the previously chosen set of predictors is computedand the
next predictor with the highest correlation with theresiduum is
selected, and so on. Selection stops if no furtherpredictor exists
with a statistically significant correlation ac-cording to a
Student’s t-test. The level of significance of thetest is 0.18
divided by the number of available independentvariables. This
division is used because of the high numberof potential predictors.
With a type I error of e.g. 0.05 and anumber of 300 available
predictors, 15 predictors on averagewould be selected randomly
without providing significant in-formation. The value 0.18 is found
to be a good compromisein order to select a meaningful number of
predictors and toprevent overfitting in this scenario.
Table 1 lists the most important predictors for
statisticalforecasts of the maximal speed of wind gusts within 1 h.
Therelative weights are aggregated over 5472 equations, one foreach
cluster, season, forecast run, and forecast lead time.Note that
predictors that are highly correlated usually ex-clude each other
from appearing within one equation. Onlythe predictor with the
highest correlation with the predictandis selected and supplants
other correlated predictors that donot provide enough additional
information according to thet-test.
Figure 2. Probabilities of wind gusts higher than 14 ms−1 ona
regular 1 km grid over Germany; 13 h forecast lead time
fromEnsemble-MOS for COSMO-DE-EPS from 29 October 2018.
The MOS equations are determined by stepwise regres-sion for
individual locations and, in case of rare events, forclusters. In
order to compute statistical forecasts on a regu-lar grid, these
equations need to be evaluated for locationsapart from the training
and observation sites. In case of rareevents and cluster equations,
the appropriate cluster is de-termined for each grid point and the
equation of that clus-ter is used. The equations for individual
locations are inter-polated to the required grid point by linear
interpolation oftheir coefficients. In all cases, the required
values of the nu-merical model for these equations are evaluated
for the exactlocation. Observations that are used as persistency
predictorsare interpolated from surrounding sites. In this way,
griddedforecast maps can be obtained as displayed in Fig. 2 for
windgust probabilities (see Sect. 3.2 for probabilistic
forecasts).For computational efficiency, the forecasts are
initially com-puted on a regular grid of 20 km resolution and are
down-scaled thereafter to 1 km while taking into account the
var-ious height zones in southern Germany. The details of
thedownscaling are beyond the scope of the paper.
3.2 Calibration of probabilistic forecasts by
logisticregression
Event probabilities are calibrated using logistic
regression.Equation (2) is solved using a maximum likelihood
ap-
Nonlin. Processes Geophys., 27, 473–487, 2020
https://doi.org/10.5194/npg-27-473-2020
-
R. Hess: Statistical Postprocessing for Severe Weather 479
Table 1. Predictors for statistical forecasts of the maximal
speed of wind gusts within 1 h that have relative weights higher
than 1 %. Therelative weights are aggregated for all stations,
seasons, forecast runs, and forecast lead times. The ensemble mean
of COSMO-D2-EPS isdenoted by DMO (direct model output). Parentheses
within the predictor names denote time shifts. For a time shift of
−30 min, denoted as(-0:30), the predictors are interpolated based
on values for the previous and current forecast hours. The required
statistical forecasts haveto be evaluated in advance.
Predictor name Rel. weight Description(%)
FF(-0:30)StF 35.7 Statistical forecast of mean wind speed in 10
m heightFX1(-1)StF 12.1 Statistical forecast of speed of wind gusts
of the previous hourVMAX_10M_LS 8.9 DMO of speed of wind gusts for
a large surrounding area (mean of 11× 11 grid points)VMAX_10M 8.6
DMO of speed of wind gusts for next model grid pointFF_850(-0:30)
4.8 DMO of wind speed in 800 hPa heightVMAX_10M_MS 4.1 DMO of speed
of wind gusts for a medium surrounding area (mean of 6× 6 grid
points)Oa_D_0.5 3.2 Latest observation of the speed of wind
gustsFF_10m(-0:30) 2.8 DMO of mean wind speed in 10 m
heightStFT2m_T950 1.9 Statistical forecast of temperature
difference between 2 m and 950 hPa height (stability
index)Location-height 1.6 Height of stationFF_1000(-0:30) 1.1 DMO
of wind speed in 1000 hPa height
proach. The likelihood function
P(y,c0, . . .ck)=
n∏i=1
(ŷik
)yi(1− ŷik
)1−yi(5)
expresses the probability that the predictand y is realisedgiven
the estimate ŷk via the coefficients c0, . . .,ck (and bynow with
fixed predictors x1, . . .,xk), with n being the timedimension
(sample size) and i the time index. The predictandy of an event
probability is binomially distributed; its time-series values are
defined as 1 in case the event was observedand 0 if not, whereby
conditional independence is assumedin Eq. (5).
It is mathematically equivalent and computationally
moreefficient to maximise the logarithm of the likelihood
function
ln(P (y,c0, . . .ck))=n∑i=1
(yi ln
(ŷik
)+
(1− yi
)ln(
1− ŷik)). (6)
This maximisation is implemented by calling the routineG02GBF of
the NAG library in FORTRAN 90; see Numeri-cal Algorithms Group
(1990). The resulting fit of the estimateŷk can be evaluated by
the deviance
Dk =−2ln(P (y,c0, . . .,ck)) , (7)
which is a measure analogous to the squared sum of residuain
linear regression.
The selection of predictors is again performed
stepwise.Initially, the coefficient c0 of the null model y =
11+e−c0 thatfits the mean of the predictand is determined and the
nulldeviance D0 =−2ln(P (y,c0)) is computed. The coefficientc0 is
often called the intercept. Starting from the null modelthe
predictor that is selected first is the one that shows the
smallest deviance D1 =−2ln(P (y,c0,c1)). The differenceD1−D0 is
χ21 -distributed with 1 degree of freedom and isused to check the
statistical significance of the predictor.This check replaces the
t-test in linear regression and usesthe same statistical level. If
the predictor shows a significantcontribution, it is accepted and
further predictors are testedbased on the new model in the same
way. Otherwise, the pre-dictor is rejected and the previous fitting
ŷk−1 is accepted asthe final statistical model.
As a rule of thumb, for each selected predictor in the
sta-tistical model at least 10 events need to be captured withinthe
observation data (one in ten rule) to find stable coeffi-cients.
For example, with only 30 events in the training set,the number of
predictors should be restricted to three. Thisrule is critical
especially for rare events such as extreme windgusts or heavy
precipitation.
Since testing all candidate predictors from the set of about300
variables by computing their deviances is very costly,the score
test (Lagrange multiplier test) is actually applied toEnsemble-MOS.
Given a fitted logistic regression with k−1selected predictors, the
predictor is chosen next as xk , whichshows the steepest gradient
of the log-likelihood functionEq. (6) in an absolute sense when
introduced, normalised byits standard deviation σxk , i.e.
1σxk
∣∣∣∣∂ ln(P (y,c0, . . .,ck))∂ck∣∣∣ck=0
∣∣∣∣=∣∣∣∣∣ n∑i=1
(yi − ŷik−1
) xikσxk
∣∣∣∣∣ . (8)This equation results from basic calculus including
the iden-tity ∂ŷk
∂ck= ŷk(1− ŷk)xk . The right-hand side of Eq. (8) is ba-
sically the correlation of the current residuum with the
newpredictor. The score test thus results in the same selection
cri-
https://doi.org/10.5194/npg-27-473-2020 Nonlin. Processes
Geophys., 27, 473–487, 2020
-
480 R. Hess: Statistical Postprocessing for Severe Weather
Table 2. Parameters of fitted logistic distributions as shown in
Fig. 3for various forecast lead times h, with coefficients of
logistic regres-sions c0 and c1 and resulting means µt and standard
deviations σtfor threshold t = 13.9 ms−1. Estimated uncertainties
are given inbrackets.
h c0 c1 µt σt
1 −17.74 (±0.17) 1.30 (±0.01) 13.68 (±0.01) 1.40 (±0.01)4 −13.76
(±0.11) 1.01 (±0.01) 13.66 (±0.01) 1.80 (±0.02)7 −13.32 (±0.11)
0.98 (±0.01) 13.66 (±0.01) 1.86 (±0.02)10 −13.00 (±0.10) 0.95
(±0.01) 13.66 (±0.01) 1.91 (±0.02)16 −12.51 (±0.10) 0.92 (±0.01)
13.66 (±0.01) 1.98 (±0.02)
terion as applied to stepwise linear regression. Once the
pre-dictor xk is selected, the coefficients c0, . . .,ck are
updated tomaximise Eq. (6).
3.3 Global logistic regression of wind gust probabilities
For extreme events, the number of observed occurrencescan still
be too small to derive stable MOS equations, al-though time series
of several years have been gathered andthe stations are clustered
within climatologic zones in Ger-many. The eight warning thresholds
of DWD for wind gustsrange from 12.9 m s−1 (25.0 kn, proper wind
gusts) up to38.6 ms−1 (75.0 kn, extreme gales), whereas the
maximalobserved speed of wind gusts in the training data for a
clus-ter in the northern German plains is only 25.4 ms−1.
Espe-cially for probabilities of extreme wind gusts global
logisticregressions are developed that use events at the coastal
stripor at mountains in southern Germany and allow for meaning-ful
statistical forecasts of extreme events also in climatolog-ically
calm areas. The statistical forecasts of the continuousspeed of
wind gusts are used for these logistic regressionsas the only
predictors. They are modelled by stepwise lin-ear regression for
each station individually, as described inSect. 3.1. In this way,
rare occurrences of extreme events aregathered globally while
concurrently a certain degree of lo-cality is maintained.
The locally optimised and unbiased forecasts of wind gustspeeds
are excellent predictors for wind gust probabilities.The logistic
regressions according to Eq. (2) with k = 1 fitthe distributions of
observed wind gusts quite well, as shownin Fig. 3 for threshold t =
13.9 ms−1 (27.0 kn) and forecastlead times of 1 and 7 h,
respectively.
The statistical modelling of wind gust probabilities is
per-formed for each threshold t individually and is described inthe
following. The logistic regressions represent logistic
dis-tributions with meanµt =− c0c1 and variance σ
2t =
π2
3c21, which
are computed for various lead times h and are listed in Ta-ble
2.
The expectations µt are slightly smaller than the thresh-old t =
13.9 ms−1, almost independently of lead time. Thereason is that for
given statistical forecasts of wind gusts the
distribution of observations is almost Gaussian (see Fig.
4)albeit a little left skewed with a small number of very weakwind
observations.
The standard deviations σt increase with forecast leadtime,
reflecting the loss of accuracy of the statistical fore-casts.
Consequently, the graph of the cumulative distributionfunction in
Fig. 3 is more tilted for a forecast lead time of 7 hthan for 1
h.
Figure 5 shows fitted variances σ 2t of the eight
individualforecast runs of Ensemble-MOS for COSMO-DE-EPS andtheir
mean depending on lead time. In order to reduce thenumber of
coefficients and to increase consistency and ro-bustness of the
forecasts, the variance σ 2t is parameteriseddepending on forecast
lead time h by fitting the function
σ 2t (h)= ct log(ath+ bt ) (9)
with its parameters at , bt , and ct for threshold t .The fitted
expectations and variances show weak depen-
dencies on the time of the day and are neglected. The
logisticregressions of wind gust probabilities thus can be
expressedfor each threshold t by the mean µt and σt for all start
timesof the forecasts in the same way.
Even for very rare gales of 38.6 ms−1 more than 130events are
captured using 6 years of training data when mod-elling all
stations and forecast runs together, which is suf-ficient for
logistic regression. Training for these extremeevents is based
mainly on coastal and mountain stations, butthe statistical
regressions are applied to less exposed loca-tions in calmer
regions as well. Small threshold probabili-ties will be predicted
for those locations in general. However,meaningful estimations will
be generated once the statisticalforecasts of local wind speed rise
induced by the numericalmodel.
3.4 Specific issues and caveats of MOS
Ensemble-MOS optimises and calibrates ensemble forecastsusing
synoptic observations. Being a statistical method, it isvulnerable
to systematic changes in input data, since it as-sumes that errors
and characteristics of the past persist in fu-ture. An important
part of the input are observations, whosemeasurement instruments
sometimes change. It is recom-mended to use quality-checked
observations in order to avoidthe use of defective values for
training. Especially observa-tion sites that are automatised need
to be screened. Further-more, numerical models change with new
versions and up-dates that can affect statistical postprocessing,
as further dis-cussed in Sect. 3.4.1.
Although statistical forecasts generally improve the modeloutput
when verified against observations, the results arenot always
consistent in time, space and between the fore-cast variables (e.g.
between temperature and dew point), ifthey are optimised
individually. This issue is addressed inSect. 3.4.2.
Nonlin. Processes Geophys., 27, 473–487, 2020
https://doi.org/10.5194/npg-27-473-2020
-
R. Hess: Statistical Postprocessing for Severe Weather 481
Figure 3. Observed cumulative distributions of wind gusts
exceeding threshold 13.9 ms−1 (blue) and fit of logistic
distribution (green)depending on statistically optimised forecasts
of wind gusts for forecast lead times 1 h (a) and 7 h (b). The
threshold is dashed.
Figure 4. Distribution of wind gust observations from 2011 to
2016for 178 synoptic stations for cases where statistical forecast
(fit-ted for that period) is between 21 and 25 ms−1. Lead time is 1
h.Gaussian fit and mean of observations (green) and mean of
fore-casts (red).
3.4.1 Model changes
Statistical methods like Ensemble-MOS detect systematic er-rors
and deficiencies of NWP models during a past trainingperiod in
order to improve topical operational forecasts. Im-plicitly it is
assumed that the systematic characteristics ofthe NWP models
persist. Note that multiple regressions cor-rect not only for model
bias but also for conditional biasesthat depend on other
meteorological variables. Multiple re-gressions are more vulnerable
to model changes than simpleregressions, therefore. Systematic
changes in NWP models
Figure 5. Variances σ 2t of logistic distributions fitted to
cumulativedistributions of observed wind gusts for threshold t =
13.9 ms−1
depending on forecast lead time. Individual runs of
Ensemble-MOSfor COSMO-DE-EPS-MOS starting at 02:00, 05:00, . . .,
23:00 UTCin colours, mean of all runs in black, fitted
parameterisation of vari-ances dashed in black.
can affect statistical forecasts, even if the NWP forecasts
areobjectively improved as confirmed by verification. Given thatthe
statistical modelling provides unbiased estimations, anysystematic
change in NWP-model predictors will be reflectedin biases in the
statistical forecasts. The resulting biases de-pend on the
magnitudes of the changes of the predictors andon their weights in
the MOS equations.
One remedy for jumps in input data is the use of
indicator(binary) predictors. These predictors are related to the
dateof the change of the NWP model and are defined as 1 before
https://doi.org/10.5194/npg-27-473-2020 Nonlin. Processes
Geophys., 27, 473–487, 2020
-
482 R. Hess: Statistical Postprocessing for Severe Weather
and 0 after. When they are selected during stepwise regres-sion,
they account for sudden jumps in the training data andcan prevent
the introduction of unconditional biases in thestatistical
forecasts. Conditional biases depending on otherforecast variables,
however, are not corrected.
In order to process extreme and very rare events forweather
warnings, long time series of 7 years of data forCOSMO-D2-EPS have
been gathered at the time of writ-ing. Hence, the time series are
subject to a number of modelchanges. A significant model upgrade
from COSMO-DE toCOSMO-D2, including an increase in horizontal
resolutionfrom 2.8 to 2.2 km and an update of orography, took
placein May 2018. Since reforecasting of COSMO-D2-EPS formore than
1 year was technically not possible, the existingCOSMO-DE-EPS
database was used further and extendedwith reforecasts of
COSMO-D2-EPS of the year before oper-ational introduction. However,
statistical experiments usingthese reforecasts of COSMO-D2-EPS (and
the use of binarypredictors; see above) revealed only insignificant
improve-ments compared to training with data of COSMO-DE-EPSonly.
For rare events, longer time series are considered moreimportant
than the use of unaltered model versions.
3.4.2 Forecast consistency
As weather warnings are issued for a certain period of timeand a
specified region, continuity of probabilistic forecasts intime and
space is important. It should be accepted, however,that maps of
probabilistic forecasts do not comply with deter-ministic runs of
numerical models, as probabilistic forecastsare smoothed according
to forecast uncertainty. For example,there are hardly convective
cells in probabilistic forecasts,but rather areas exist where
convection might occur with acertain probability within a given
time period.
The statistical modelling of Ensemble-MOS is carried outfor each
forecast variable, forecast lead time and location in-dependently
and individual MOS equations are derived. Forrare meteorological
events clusters of stations are groupedtogether that are similar in
climatology in order to derive in-dividual cluster equations. This
local and individual fittingresults in optimal statistical
forecasts for the specific time,location and variable as measured
with the RMSE comparedto observations. However, it does not
guarantee that obtainedforecast fields are consistent in space,
time or between vari-ables.
In forecast time, spurious jumps of statistical forecasts
canappear, and variables with different reference periods usu-ally
do not match. For example, the sum of 12 successive1-hourly
precipitation amounts would not equal the corre-sponding 12-hourly
amount if the latter is modelled as an in-dividual predictand.
Statistical forecasts of temperature can-not be guaranteed to
exceed those of dew point. Maps of sta-tistical forecasts show high
variability from station to stationand unwanted anomalies in case
of cluster equations. Clusteredges turn up and it may appear that
there are higher wind
gusts in a valley than on a mountain nearby, in cases wherethe
locations are arranged in different clusters, for example.For
consistency in time and space the situation can be im-proved by
using the same equations for several lead timesand for larger
clusters or by elaborate subsequent smoothing.However, forecast
quality for a given space and time will bedegraded consequently.
For consistency between all forecastvariables multivariate
regressions are required that model therelevant predictands
simultaneously.
From the point of view of probabilistic forecasting, how-ever,
statistical forecasts are random variables with statisti-cal
distributions, although commonly only their expectationsare
considered the statistical forecast. In case forecast consis-tency
is violated from a deterministic point of view, this is notthe case
if statistical errors are taken into account. The sta-tistical
forecasts remain valid as long as the probability dis-tributions of
the variables overlap. As this is a mathematicalpoint of view, the
question remains how to communicate thisnature of probabilistic
forecasts to the public or traditionalmeteorologists in terms of
useful and accepted products.
4 Results
Evaluation of Ensemble-MOS for COSMO-DE-EPS andECMWF-ENS is
provided in the following. AlthoughEnsemble-MOS of DWD provides
statistical forecasts ofmany forecast variables that are relevant
for warnings, evalu-ation is focused on wind gusts for COSMO-DE-EPS
and ontemperature for ECMWF-ENS in order to limit the scope ofthe
paper.
4.1 Evaluation of Ensemble-MOS forCOSMO-DE-EPS
Verifications of the continuous speed of wind gusts are
pre-sented in Figs. 6–8 by various scatter diagrams
includingforecast means (solid line) and their standard errors
(dashedlines). Figures 6 (right) and 7 (right) show the statistical
fit ofthe speed of wind gusts against synoptic observations duringa
training period of 6 years of Ensemble-MOS for COSMO-DE-EPS for
lead times of 1 and 6 h, respectively. The fit isalmost unbiased
for all forecast speed levels. The raw ensem-ble means show
overforecasting for high wind gusts (samefigures, left) and the
standard errors are considerably larger.If no overfitting occurs,
out-of-sample forecasts are expectedto behave accordingly, which is
verified in Fig. 8 (right) fora test period of 3 months (at least
for wind gusts up to about20 ms−1).
Ensemble-MOS can predict its own current forecast errorsby using
error predictands according to Eq. (3). Forecasts ofthe absolute
errors of the speeds of wind gusts are relatedto observed errors in
Fig. 9 (right). The biases are small, al-though individual observed
errors are much larger than theirpredictions. The absolute errors
of the ensemble mean ver-
Nonlin. Processes Geophys., 27, 473–487, 2020
https://doi.org/10.5194/npg-27-473-2020
-
R. Hess: Statistical Postprocessing for Severe Weather 483
Figure 6. Scatter plots of ensemble means of 3 h forecasts of
the speeds of wind gusts of COSMO-DE-EPS versus observations (a)
and cor-responding statistical fits of 1 h forecasts of
Ensemble-MOS versus the same observations (b). Means of
observations (solid) and confidenceintervals (means± standard
deviations, dashed) are shown. Six years of data (2011–2016) are
used; number of cases are given by histograms.
Figure 7. As Fig. 6 but for 8 h forecasts of COSMO-DE-EPS (a)
and 6 h forecasts of Ensemble-MOS (b).
sus ensemble spread (normalised to absolute error)
stronglyunderestimate the observed errors of the ensemble mean;
seeFig. 9 (left). This is another example of underestimated
dis-persion of COSMO-DE-EPS as shown in Fig. 1 for
precipi-tation.
The statistical forecasts of the speeds of the wind gusts
areexcellent predictors for the probabilities that certain
warningthresholds are exceeded. This is demonstrated by the fits
ofthe observed distributions by logistic regression as shown inFig.
3. The global logistic regression presented in Sect. 3.3
is prepared for extreme and rare events; nevertheless, it
isapplicable to lower thresholds as well. The reliability dia-gram
in Fig. 10 shows well-calibrated probabilities for windgusts
exceeding 7.7 ms−1 for a zone in the northern Germanplains with
calmer winds in climatology. The COSMO-DE-EPS shows strong
overforecasting in these situations.
https://doi.org/10.5194/npg-27-473-2020 Nonlin. Processes
Geophys., 27, 473–487, 2020
-
484 R. Hess: Statistical Postprocessing for Severe Weather
Figure 8. As Fig. 6 but for 3 months of data (May–July 2016) and
forecasts of COSMO-DE-EPS (a) and Ensemble-MOS (b). Data of
thisperiod were not used for training.
Figure 9. Scatter plots of 3 h forecasts of the absolute errors
of COSMO-DE-EPS forecasts of wind gust speeds (estimated as
ensemblestandard deviations×0.8) versus observed absolute errors of
the ensemble means (a) and corresponding 1 h error forecasts of
Ensemble-MOS versus observed absolute errors of Ensemble-MOS
(statistical fit of training period, b). Means of observed absolute
errors (solid) andconfidence intervals (means ± standard
deviations, dashed) are shown. Six years of data (2011–2016) are
used.
4.2 Evaluation of Ensemble-MOS for ECMWF-ENS
In order to motivate the use of Ensemble-MOS for ECMWF-ENS, a
study has been carried out with a restricted set ofmodel variables
of TIGGE; see Sect. 2.3. Training is basedon ensemble data and
corresponding observations from 2002to 2012, whereas statistical
forecasting and verification isperformed for 2013; see Hess et al.
(2015) for details.
Results for 2 m temperature forecasts are shown in Fig. 11,which
illustrates essential improvements of postprocessedforecasts of
Ensemble-MOS compared to raw ensemble out-put. The statistical
forecast (blue) not only improves theraw ensemble mean (red), but
it also outperforms the high-resolution ECMWF-IFS (these data have
not been used fortraining). Also, the statistical estimation of
Ensemble-MOSof its own errors (pink) (see Sect. 3.1) is more
realistic over
Nonlin. Processes Geophys., 27, 473–487, 2020
https://doi.org/10.5194/npg-27-473-2020
-
R. Hess: Statistical Postprocessing for Severe Weather 485
Figure 10. Reliability diagram for probabilities of wind gusts
ex-ceeding 7.7 ms−1 (15.0 kn). Probabilistic forecasts of
Ensemble-MOS with a lead time of 6 h (green) and corresponding
relative fre-quencies of COSMO-DE-EPS with a lead time of 8 h
(blue). Verifi-cation is done for 3 months of data (May–July 2016)
and 18 stationsin Germany at about N52◦ latitude, including Berlin
for example.Vertical lines are 5 %–95 % consistency bars according
to Bröckerand Smith (2006).
the first few days than the estimate of the ensemble meanerrors
by the ensemble spread (yellow). Improvements ofECMWF-ENS with
Ensemble-MOS were also obtained for24 h precipitation and cloud
coverage.
5 Conclusions
This paper describes the Ensemble-MOS system of DWD,which is set
up to postprocess the ensemble systemsCOSMO-D2-EPS and ECMWF-ENS
with respect to severeweather to support warning management. MOS in
general isa mature and sound method and, in combination with
logisticregression, it can provide optimised and calibrated
statisticalforecasts. Stepwise multiple regression allows reduction
ofconditional biases that depend on the meteorological situa-tion,
which is defined by the selected predictors. The setupof
Ensemble-MOS to use ensemble mean and spread as pre-dictors is
computationally efficient and simplifies forecastingof calibrated
event probabilities and error estimates on longerforecast lead
times. Ensemble-MOS is operationally applica-ble with regard to its
robustness and computational costs andruns in trial mode in order
to support warning managementat DWD.
The ensemble spread is less often detected as an
importantpredictor, as might be expected, however. One reason is
thatthe spread actually carries less information about forecast
ac-curacy than originally intended. It is often too small and
toosteady to account for current forecast errors. Another
reason
Figure 11. Mean absolute error (MAE) of 2 m temperatureforecast
and error estimations depending on forecast lead time.Spread
(yellow): spread of ECMWF-ENS (normalised to MAE);MAE Ensemble Mean
(red): MAE of the mean of ECMWF-ENS; MAE Ctrl (grey): MAE of the
ECMWF-ENS controlrun; MA EMOS (pink): Ensemble-MOS forecast of its
own ab-solute errors (see Eq. 3: estimations of MAE EMOS, blue);MAE
MOS (blue): MAE of Ensemble-MOS for ECMWF-ENS;MAE HR (green): MAE
of high-resolution ECMWF-IFS.
is that some forecast variables correlate with their own
fore-cast errors (e.g. precipitation and wind gusts). If the
ensemblespread does not provide enough independent information,
itis not selected additionally to the ensemble mean during
step-wise regression. Currently, only ensemble mean and spreadare
provided as predictors for Ensemble-MOS. The imple-mentation of
various ensemble quantiles as additional pre-dictors is technically
straightforward and could improve theexploitation of the
probabilistic information of the ensemble.
Statistical forecasts of the speed of the wind gusts are
ex-cellent predictors for probabilities that given thresholds
areexceeded and are used as predictors within logistic
regres-sions. The same approach could be advantageous for
proba-bilities of heavy precipitation as well, where estimated
pre-cipitation amounts would be used as predictors.
An important further step in probabilistic forecasting isthe
estimation of complete (calibrated) distributions of fore-cast
variables rather than forecasting only discrete
thresholdprobabilities. For wind gusts with Gaussian conditional
er-rors as shown in Fig. 4 this seems possible but certainly
re-quires additional research.
With its inherent linearity (also in the case of logistic
re-gressions, there are linear combinations of predictors only)MOS
has its restrictions in modelling but supports traceabil-ity and
robustness, which are important features in opera-tional weather
forecasting. Therefore, MOS is considered apossible baseline for
future statistical approaches based onneural networks and machine
learning that allow for more
https://doi.org/10.5194/npg-27-473-2020 Nonlin. Processes
Geophys., 27, 473–487, 2020
-
486 R. Hess: Statistical Postprocessing for Severe Weather
general statistical modelling. Many of the statistical prob-lems
will remain however, such as finding suitable reac-tions to changes
in the NWP models, (deterministic) consis-tency and the definition
of useful probabilistic products (seeSect. 3.4.2) and the
verification of rare events. In all cases,training data are
considered of utmost importance, includingthe NWP-model output, as
well as quality-checked historicobservations.
Data availability. COSMO-DE-EPS data and synoptic observa-tions
are stored in DWD archives and can be made accessible un-der
certain conditions. Further information is available at
https://opendata.dwd.de (DWD, 2020). TIGGE data are available free
ofcharge, see https://confluence.ecmwf.int/display/TIGGE
(ECMWF,2020).
Author contributions. Conceptual design of Ensemble-MOS,
en-hancement of software for probabilistic forecasting of
ensembles(including logistic regression), processing of forecasts,
verificationand writing was done by the author.
Competing interests. The author declares that there is no
conflict ofinterest.
Special issue statement. This article is part of the special
issue “Ad-vances in post-processing and blending of deterministic
and ensem-ble forecasts”. It is not associated with a
conference.
Acknowledgements. The author thanks two anonymous reviewersand
the editor for their constructive comments, which helped toimprove
the structure and clarity of the manuscript. Thanks toJames Paul
for improving the use of English.
Review statement. This paper was edited by Stephan Hemri and
re-viewed by two anonymous referees.
References
Baldauf, M., Seifert, A., Förstner, J., Majewski, D.,
Raschendor-fer, M., and Reinhardt, T.: Operational convective-scale
nu-merical weather prediction with the COSMO model: descrip-tion
and sensitivities, Mon. Weather Rev., 139,
3887–3905,https://doi.org/10.1175/MWR-D-10-05013.1, 2011.
Ben Bouallègue, Z., Pinson, P., and Friederichs, P.: Quantile
fore-cast discrimination ability and value, Q. J. R. Meteorol.
Soc.,141, 3415–3424, https://doi.org/10.1002/qj.2624, 2015.
Bougeault, P., Toth, Z., Bishop, C., et al.: The THORPEX
interac-tive grand global ensemble, Bull. Amer. Meteor. Soc., 91,
1059–1072, https://doi.org/10.1175/2010BAMS2853.1, 2010.
Bröcker, J. and Smith, L. A.: Increasing the Reliabilityof
Reliability Diagrams, Weather Forecast., 22,
651–661,https://doi.org/10.1175/WAF993.1 2006.
Buizza, R.: Ensemble forecasting and the need for calibration,
in:Statistical Postprocessing of Ensemble Forecasts, edited by
Van-nitsem, S., Wilks, D. S., and Messner, J. W., chap. 2, pp.
15–48,Elsevier, Amsterdam, 2018.
Buizza, R., Houtekamer, P. L., Toth, Z., Pellerin, G., Wei, M.,
andZhu, Y.: A comparison of the ECMWF, MSC, and NCEP globalensemble
prediction systems, Mon. Weather Rev., 133, 1076–1097,
https://doi.org/10.1175/MWR2905.1, 2005.
DWD: COSMO-DE-EPS data, information available at:
https://opendata.dwd.de, last access: 30 September 2020.
ECMWF: TIGGE data, available at:
https://confluence.ecmwf.int/display/TIGGE, last access: 30
September 2020.
Friederichs, P., Wahl, S., and Buschow, S.: Postprocessing for
Ex-treme Events, in: Statistical Postprocessing of Ensemble
Fore-casts, edited by Vannitsem, S., Wilks, D. S., and Messner, J.
W.,chap. 5, pp. 128–154, Elsevier, Amsterdam, 2018.
Gebhardt, C., Theis, S. E., Paulat, M., and Ben Boual-lègue, Z.:
Uncertainties in COSMO-DE precipitationforecasts introduced by
model perturbations and vari-ation of lateral boundaries, Atmos.
Res., 100, 168–177,https://doi.org/10.1016/j.atmosres.2010.12.008,
2011.
Glahn, H. R. and Lowry, D. A.: The use of model out-put
statistics (MOS) in objective weather forecasting, J.Appl.
Meteorol., 11, 1203–1211,
https://doi.org/10.1175/1520-0450(1972)0112.0.CO;2, 1972.
Gneiting, T., Raftery, A. E., Westveld, A. H., and Goldman, T.:
Cal-ibrated probabilistic forecasting using ensemble model
outputstatistics and minimum CRPS estimation, Mon. Weather
Rev.,133, 1098–1118, https://doi.org/10.1175/MWR2904.1, 2005.
Gneiting, T., Balabdaoui, F., and Raftery, A. E.: Probabilistic
fore-casts, calibration and sharpness, J. R. Statist. Soc: B, 69,
243–268, https://doi.org/10.1111/j.1467-9868.2007.00587.x,
2007.
Hamill, T.: Verification of TIGGE multimodel and
ECMWFreforecast-calibrated probabilistic precipitation forecasts
over theconterminous United States, Mon. Weather Rev., 140,
2232–2252, https://doi.org/10.1175/MWR-D-11-00220.1, 2012.
Hamill, T. M., Engle, E., Myrick, D., Peroutka, M., Finan, C.,
andScheuerer, M.: The U.S. national blend of models for
statisti-cal postprocessing of probability of precipitation and
determin-istic precipitation amount, Mon. Weather Rev., 145,
3441–3463,https://doi.org/10.1175/MWR-D-16-0331.1, 2017.
Hersbach, H.: Decomposition of the continuous rankedprobability
score for ensemble prediction systems, Wea.Forecasting, 15,
559–570, https://doi.org/10.1175/1520-0434(2000)0152.0.CO;2,
2000.
Hess, R., Glashoff, J., and Reichert, B. K.: The Ensemble-MOS
ofDeutscher Wetterdienst, in: EMS Annual Meeting Abstracts,
12,Sofia, 2015.
Hess, R., Kriesche, B., Schaumann, P., Reichert, B. K.,
andSchmidt, V.: Area precipitation probabilities derived frompoint
forecasts for operational weather and warning ser-vice
applications, Q. J. R. Meteorol. Soc., 144,
2392–2403,https://doi.org/10.1002/qj.3306, 2018.
Hosmer, D. W., Lemenshow, S., and Sturdivant, R. X.: Applied
Lo-gistic Regression, Wiley Series in Probability and Statistics,
Wi-ley, New Jersey, 3rd edn., 2013.
Nonlin. Processes Geophys., 27, 473–487, 2020
https://doi.org/10.5194/npg-27-473-2020
https://opendata.dwd.dehttps://opendata.dwd.dehttps://confluence.ecmwf.int/display/TIGGEhttps://doi.org/10.1175/MWR-D-10-05013.1https://doi.org/10.1002/qj.2624https://doi.org/10.1175/2010BAMS2853.1https://doi.org/10.1175/WAF993.1https://doi.org/10.1175/MWR2905.1https://opendata.dwd.dehttps://opendata.dwd.dehttps://confluence.ecmwf.int/display/TIGGEhttps://confluence.ecmwf.int/display/TIGGEhttps://doi.org/10.1016/j.atmosres.2010.12.008https://doi.org/10.1175/1520-0450(1972)0112.0.CO;2https://doi.org/10.1175/1520-0450(1972)0112.0.CO;2https://doi.org/10.1175/MWR2904.1https://doi.org/10.1111/j.1467-9868.2007.00587.xhttps://doi.org/10.1175/MWR-D-11-00220.1https://doi.org/10.1175/MWR-D-16-0331.1https://doi.org/10.1175/1520-0434(2000)0152.0.CO;2https://doi.org/10.1175/1520-0434(2000)0152.0.CO;2https://doi.org/10.1002/qj.3306
-
R. Hess: Statistical Postprocessing for Severe Weather 487
Knüpffer, K.: Methodical and predictability aspects of MOS
sys-tems, in: Proceedings of the 13th Conference on Probability
andStatistics in the Atmospheric Sciences, pp. 190–197, San
Fran-cisco, 1996.
Lerch, S., Thorarinsdottir, T. L., Ravazzolo, F., and
Gneiting,T.: Forecaster’s dilemma: Extreme events and forecast
evalua-tion, Stat. Sci., 32, 106–127,
https://doi.org/10.1214/16-STS588,2017.
Möller, A., Lenkoski, A., and Thorarinsdottir, T. L.:
Multivari-ate probabilistic forecasting using ensemble Bayesian
model av-eraging and copulas, Q. J. R. Meteorol. Soc., 139,
982–991,https://doi.org/10.1002/qj.2009, 2013.
Numerical Algorithms Group: The NAG Library, Oxford,
UK,https://www.nag.com/ (last access: 30 September 2020), 1990.
Peralta, C., Ben Bouallègue, Z., Theis, S. E., Gebhardt, C.,
andBuchhold, M.: Accounting for initial condition uncertaintiesin
COSMO-DE-EPS, J. Geophys. Res.-Atmos., 117,
D07108,https://doi.org/10.1029/2011JD016581, 2012.
Primo, C.: Wind gust warning verification, Adv. Sci. Res., 13,
113–120, https://doi.org/10.5194/asr-13-113-2016, 2016.
Raftery, A. E., Gneiting, T., Balabdaoui, F., and Polakowski,M.:
Using Bayesian model averaging to calibrate fore-cast ensembles,
Mon. Weather Rev., 133,
1155–1174,https://doi.org/10.1175/MWR2906.1, 2005.
Ranjan, R. and Gneiting, T.: Combining probability forecasts,
J.Roy. Stat. Soc. B, 72, 71–91,
https://doi.org/10.1111/j.1467-9868.2009.00726.x, 2010.
Reichert, B. K.: The operational warning decision support
systemAutoWARN at DWD, in: 27th Meeting of the European Work-ing
Group on Operational Meteorological Workstation Systems(EGOWS),
Helsinki, 2016.
Reichert, B. K.: Forecasting and Nowcasting Severe Weather
Usingthe Operational Warning Decision Support System AutoWARNat
DWD, in: 9th Europ. Conf. on Severe Storms ECSS, Pula,Croatia,
2017.
Reichert, B. K., Glashoff, J., Hess, R., Hirsch, T., James, P.,
Lenhart,C., Paller, J., Primo, C., Raatz, W., Schleinzer, T., and
Schröder,G.: The decision support system AutoWARN for the
weatherwarning service at DWD, in: EMS Annual Meeting Abstracts,12,
Sofia, 2015.
Schefzik, R. and Möller, A.: Ensemble postprocesing methods
in-corporating dependence structures, in: Statistical
Postprocessingof Ensemble Forecasts, edited by Vannitsem, S.,
Wilks, D. S.,and Messner, J. W., chap. 4, pp. 91–125, Elsevier,
Amsterdam,2018.
Schefzik, R., Thorarinsdottir, T. L., and Gneiting, T.:
Un-certainty quantification in complex simulation models us-ing
ensemble copula coupling, Statist. Sci., 28,
616–640,https://doi.org/10.1214/13-STS443, 2013.
Sloughter, J. M., Gneiting, T., and Raftery, A. E.:
Probabilis-tic wind vector forecasting using ensembles and
Bayesianmodel averaging, Mon. Weather Rev., 141,
2107–2119,https://doi.org/10.1175/MWR-D-12-00002.1, 2013.
Swinbank, R., Kyouda, M., Buchanan, P., et al.: The TIGGE
projectand its achievements, Bull. Amer. Meteor. Soc., 97,
49–67,https://doi.org/10.1175/BAMS-D-13-00191.1, 2016.
Vannitsem, S.: A unified linear Model Output Statistics scheme
forboth deterministic and ensemble forecasts, Q. J. R.
Meteorol.Soc., 135, 1801–1815, https://doi.org/10.1002/qj.491,
2009.
Vannitsem, S., Wilks, D. S., and Messner, J. W., eds.:
Statisti-cal Postprocessing of Ensemble Forecasts, Elsevier,
Amsterdam,Oxford, Cambridge, 2018.
Wilks, D. S.: A skill score based on economic valuefor
probability forecasts, Meteorol. Appl., 8,
209–219,https://doi.org/10.1017/S1350482701002092, 2001.
Wilks, D. S.: Statistical Methods in the Atmospheric Sciences,
Aca-demic Press, San Diego, 3rd edn., 2011.
Wilks, D. S.: Univariate ensemble postprocessing, in:
StatisticalPostprocessing of Ensemble Forecasts, edited by
Vannitsem, S.,Wilks, D. S., and Messner, J. W., chap. 3, pp. 49–89,
Elsevier,Amsterdam, 2018.
https://doi.org/10.5194/npg-27-473-2020 Nonlin. Processes
Geophys., 27, 473–487, 2020
https://doi.org/10.1214/16-STS588https://doi.org/10.1002/qj.2009https://www.nag.com/https://doi.org/10.1029/2011JD016581https://doi.org/10.5194/asr-13-113-2016https://doi.org/10.1175/MWR2906.1https://doi.org/10.1111/j.1467-9868.2009.00726.xhttps://doi.org/10.1111/j.1467-9868.2009.00726.xhttps://doi.org/10.1214/13-STS443https://doi.org/10.1175/MWR-D-12-00002.1https://doi.org/10.1175/BAMS-D-13-00191.1https://doi.org/10.1002/qj.491https://doi.org/10.1017/S1350482701002092
AbstractIntroductionObservations and ensemble dataSynoptic
observationsCOSMO-D2-EPS and upscaled precipitation
probabilitiesECMWF-ENS and TIGGE-data
Postprocessing by Ensemble-MOSOptimisation and interpretation by
linear regressionCalibration of probabilistic forecasts by logistic
regressionGlobal logistic regression of wind gust
probabilitiesSpecific issues and caveats of MOSModel
changesForecast consistency
ResultsEvaluation of Ensemble-MOS for COSMO-DE-EPSEvaluation of
Ensemble-MOS for ECMWF-ENS
ConclusionsData availabilityAuthor contributionsCompeting
interestsSpecial issue statementAcknowledgementsReview
statementReferences