Forecasting risk with Markov–switching GARCH models: A large–scale performance study ✩ David Ardia a,b,* , Keven Bluteau a,c , Kris Boudt c,d , Leopoldo Catania e a Institute of Financial Analysis, University of Neuchˆatel, Neuchˆatel, Switzerland b Department of Finance, Insurance and Real Estate, Laval University, Qu´ ebec City, Canada c Solvay Business School, Vrije Universiteit Brussel, Belgium d School of Business and Economics, Vrije Universiteit Amsterdam, The Netherlands e Department of Economics and Business Economics, Aarhus University and CREATES, Denmark Abstract We perform a large–scale empirical study to compare the forecasting performance of single–regime and Markov–switching GARCH (MSGARCH) models from a risk management perspective. We find that, for daily, weekly, and ten–day equity log–returns, MSGARCH models yield more accurate Value–at–Risk, Expected Shortfall, and left–tail distribution forecasts than their single–regime counterpart. Also, our results indicate that accounting for parameter uncertainty improves left– tail predictions, independently of the inclusion of the Markov–switching mechanism. Keywords: GARCH, MSGARCH, forecasting performance, large–scale study, Value–at–Risk, Expected Shortfall, risk management ✩ An earlier version of this paper was circulated under the title “Forecasting performance of Markov–switching GARCH models: A large–scale empirical study”. We are grateful to the Editor (Esther Ruiz), the Associate Editor, and two anonymous referees for their useful comments which improved the paper significantly. We thank Samuel Borms, Peter Carl, Dirk Eddelbuettel, Richard Gerlach, Lennart Hoogerheide, Eliane Maalouf, Brian Peterson, Enrico Schumann, Denis–Alexandre Trottier, and participants at the Quant Insights 2017 (London), the R/Finance 2017 (Chicago), the 37th International Symposium on Forecasting (Cairns), and UseR 2017 (Brussels). We acknowledge Industrielle–Alliance, International Institute of Forecasters, Google Summer of Code 2016 and 2017, FQRSC (Grant # 2015-NP-179931) and Fonds de Donations at the University of Neuchˆatel for their financial support. We thank F´ elix–Antoine Fortin and Calcul Qu´ ebec (clusters Briaree, Colosse, Mammouth and Parall` ele II) as well as Laurent Fastnacht and the Institute of Hydrology at the University of Neuchˆ atel (cluster Galileo) for computational support. All computations have been performed with the R package MSGARCH (Ardia et al., 2017a,b) available from the CRAN repository at https://cran.r-project.org/package=MSGARCH. * Corresponding author. University of Neuchˆ atel, Rue A.-L. Breguet 2, CH-2000 Neuchˆ atel, Switzerland. Phone: +41 32 718 1365. Email addresses: [email protected](David Ardia), [email protected](Keven Bluteau), [email protected](Kris Boudt), [email protected](Leopoldo Catania) Preprint submitted to SSRN March 2, 2018
33
Embed
Forecasting risk with Markov{switching GARCH models · Forecasting risk with Markov{switching GARCH models: A large{scale performance studyI David Ardiaa,b,, Keven Bluteaua,c, Kris
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Forecasting risk with Markov–switching GARCH models:A large–scale performance studyI
David Ardiaa,b,∗, Keven Bluteaua,c, Kris Boudtc,d, Leopoldo Cataniae
aInstitute of Financial Analysis, University of Neuchatel, Neuchatel, Switzerland
bDepartment of Finance, Insurance and Real Estate, Laval University, Quebec City, Canada
cSolvay Business School, Vrije Universiteit Brussel, Belgium
dSchool of Business and Economics, Vrije Universiteit Amsterdam, The Netherlands
eDepartment of Economics and Business Economics, Aarhus University and CREATES, Denmark
Abstract
We perform a large–scale empirical study to compare the forecasting performance of single–regimeand Markov–switching GARCH (MSGARCH) models from a risk management perspective. We findthat, for daily, weekly, and ten–day equity log–returns, MSGARCH models yield more accurateValue–at–Risk, Expected Shortfall, and left–tail distribution forecasts than their single–regimecounterpart. Also, our results indicate that accounting for parameter uncertainty improves left–tail predictions, independently of the inclusion of the Markov–switching mechanism.
IAn earlier version of this paper was circulated under the title “Forecasting performance of Markov–switching
GARCH models: A large–scale empirical study”. We are grateful to the Editor (Esther Ruiz), the Associate Editor,and two anonymous referees for their useful comments which improved the paper significantly. We thank SamuelBorms, Peter Carl, Dirk Eddelbuettel, Richard Gerlach, Lennart Hoogerheide, Eliane Maalouf, Brian Peterson, EnricoSchumann, Denis–Alexandre Trottier, and participants at the Quant Insights 2017 (London), the R/Finance 2017(Chicago), the 37th International Symposium on Forecasting (Cairns), and UseR 2017 (Brussels). We acknowledgeIndustrielle–Alliance, International Institute of Forecasters, Google Summer of Code 2016 and 2017, FQRSC (Grant# 2015-NP-179931) and Fonds de Donations at the University of Neuchatel for their financial support. We thankFelix–Antoine Fortin and Calcul Quebec (clusters Briaree, Colosse, Mammouth and Parallele II) as well as LaurentFastnacht and the Institute of Hydrology at the University of Neuchatel (cluster Galileo) for computational support.All computations have been performed with the R package MSGARCH (Ardia et al., 2017a,b) available from theCRAN repository at https://cran.r-project.org/package=MSGARCH.∗Corresponding author. University of Neuchatel, Rue A.-L. Breguet 2, CH-2000 Neuchatel, Switzerland. Phone:
Initial studies on Markov–switching autoregressive heteroscedastic models applied to financial
times series focus on ARCH specifications and thus omit a lagged value of the conditional variance
in the variance equation (Cai, 1994; Hamilton and Susmel, 1994). The use of ARCH instead of
GARCH dynamics leads to computational tractability in the likelihood calculation. Indeed, Gray
(1996) shows that, given a Markov chain with K regimes and T observations, the evaluation of the
likelihood of a Markov–switching model with general GARCH dynamics requires the integration
over all KT possible paths, rendering the estimation infeasible. While this difficulty is not present
in ARCH specifications, the use of lower order GARCH models tends to offer a more parsimonious
representation than higher order ARCH models. Gray (1996), Dueker (1997) and Klaassen (2002)
tackle the path dependence problem of MSGARCH through approximation, by collapsing the past
regime–specific conditional variances according to ad–hoc schemes.1 An alternative approach is
provided by Haas et al. (2004), who let the GARCH processes of each state evolve in parallel
and thus independently of the GARCH process in the other states. Besides avoiding the path
dependence problem, their model allows for a clear–cut interpretation of the variance dynamics in
each regime. In our study, we consider the model by Haas et al. (2004) for these reasons.
The first contribution of our paper is to test if, indeed, MSGARCH models provide risk managers
with useful tools that can improve their volatility forecasts.2 To answer this question, we perform
a large–scale empirical analysis in which we compare the risk forecasting performance of single–
1Also more recent studies address this problem; for instance, Augustyniak (2014) relies on a Monte Carlo EM
algorithm with importance sampling.2Our study focuses exclusively on GARCH and MSGARCH models. GARCH is the workhorse model in financial
econometrics and has been investigated for decades. It is widely used by practitioners and academics; see for instanceBams et al. (2017) and Herwartz (2017). MSGARCH is the most natural and straightforward extension to GARCH.Alternative conditional volatility models include stochastic volatility models (Taylor, 1994; Jacquier et al., 1994),realized measure–based conditional volatility models such as HEAVY (Shephard and Sheppard, 2010) or RealizedGARCH (Hansen et al., 2011), or even combinations of these (Opschoor et al., 2017). Note finally that our study onlyconsiders the (1,1)–lag specification for the GARCH and MSGARCH models. While there is a clear computationalcost of considering higher orders for (MS)GARCH model specifications, the payoff in terms of improvement inforecasting precision may be low. In fact, several studies have shown that increasing the orders does not lead to asubstantial improvement of the forecasting performance in case of predicting the conditional variance of asset returns(see, e.g., Hansen and Lunde, 2005). We leave all these investigations for further research.
3
regime and Markov–switching GARCH models. We take the perspective of a risk manager working
for a fund manager and conduct our study on the daily, weekly and ten–day log–returns of a large
universe of stocks, equity indices, and foreign exchange rates. Thus, in contrast to Hansen and
Lunde (2005), who compare a large number of GARCH–type models on a few series, we focus on a
few GARCH and MSGARCH models and a large number of series. For single–regime and Markov–
switching specifications, the scedastic specifications we consider account for different reactions of
the conditional volatility to past asset returns. More precisely, we consider the symmetric GARCH
model (Bollerslev, 1986) as well as the asymmetric GJR model (Glosten et al., 1993). These
scedastic specifications are integrated into the MSGARCH framework with the approach of Haas
et al. (2004). For the (regime–dependent) conditional distributions, we use the symmetric and the
Fernandez and Steel (1998) skewed versions of the Normal and Student–t distributions. Overall,
this leads to sixteen models.
Our second contribution is to test the impact of the estimation method on the performance of
the volatility forecasting model. GARCH and MSGARCH models are traditionally estimated with
a frequentist (typically via ML) approach; see Haas et al. (2004), Marcucci (2005) and Augusty-
niak (2014). However, several recent studies have argued that a Bayesian approach offers some
advantages. For instance, Markov chain Monte Carlo (MCMC) procedures can explore the joint
posterior distribution of the model parameters, and parameter uncertainty is naturally integrated
into the risk forecasts via the predictive distribution (Ardia, 2008; Bauwens et al., 2010, 2014;
Geweke and Amisano, 2010; Ardia et al., 2017c).
Combining the sixteen model specifications with the frequentist and Bayesian estimation meth-
ods, we obtain 32 possible candidates for the state–of–the–art methodology for monitoring financial
risk. We use an out–of–sample evaluation period of 2,000 days, that ranges from (approximately)
2005 to 2016 and consists of daily log–returns. We evaluate the accuracy of the risk prediction
models in terms of estimating the Value–at–Risk (VaR), the Expected Shortfall (ES), and the
left–tail (i.e., losses) of the conditional distribution of the assets’ returns.
Our empirical results suggest a number of practical insights which can be summarized as follows.
First, we find that MSGARCH models report better VaR, ES, and left–tail distribution forecasts
4
than their single–regime counterpart. This is especially true for stock return data. Moreover,
improvements are more pronounced when the Markov–switching mechanism is applied to simple
specifications such as the GARCH–Normal model. Second, accounting for parameter uncertainty
improves the accuracy of the left–tail predictions, independently of the inclusion of the Markov–
switching mechanism. Moreover, larger improvements are observed in the case of single–regime
models. Overall, we recommend risk managers to rely on more flexible models and to perform
inference accounting for parameter uncertainty.
In addition to showing the good performance of MSGARCH models and Bayesian estimation
methods, we refer risk managers to our R package MSGARCH (Ardia et al., 2017a,b), which im-
plements MSGARCH models in the R statistical language with efficient C++ code.3 We hope
that this paper and the accompanying package will encourage practitioners and academics in the
financial community to use MSGARCH models and Bayesian estimation methods.
The paper proceeds as follows. Model specification, estimation, and forecasting are presented
in Section 2. The datasets, the testing design, and the empirical results are discussed in Section 3.
Section 4 concludes.
2. Risk forecasting with Markov–switching GARCH models
A key aspect in quantitative risk management is the modeling of the risk drivers of the securities
held by the fund manager. We consider here the univariate parametric framework, that computes
the desired risk measure in four steps. First, a statistical model which describes the daily log–
returns (profit and loss, P&L) dynamics is determined. Second, the model parameters are estimated
for a given estimation window. Third, the one/multi–day ahead distribution of log–returns is
obtained (either analytically or by simulation). Fourth, relevant risk measures such as the Value–
at–Risk (VaR) and the Expected Shortfall (ES) are computed from the distribution. The VaR
represents a quantile of the distribution of log–returns at the desired horizon, and the ES is the
expected loss when the loss exceeds the VaR level (Jorion, 2006). Risk managers can then allocate
3Our research project was funded by the 2014 SAS/IIF forecasting research grant, to compare MSGARCH vs.
GARCH models, and to develop and render publicly available the computer code for the estimation of MSGARCHmodels.
5
risk capital given their density or risk measure forecasts. Also, they can assess the quality of the
risk model, ex–post, via statistical procedures referred to as backtesting.
2.1. Model specification
We define yt ∈ R as the (percentage point) log–return of a financial asset at time t. To simplify
the exposition, we assume that the log–returns have zero mean and are not autocorrelated.4 The
general Markov–switching GARCH specification can be expressed as:
yt | (st = k, It−1) ∼ D(0, hk,t, ξk) , (1)
where D(0, hk,t, ξk) is a continuous distribution with zero mean, time–varying variance hk,t, and
additional shape parameters (e.g., asymmetry) gathered in the vector ξk.5 Furthermore, we as-
sume that the latent variable st, defined on the discrete space {1, . . . ,K}, evolves according to
an unobserved first order ergodic homogeneous Markov chain with transition probability ma-
trix P ≡ {pi,j}Ki,j=1, with pi,j ≡ P[st = j | st−1 = i]. We denote by It−1 the information set
up to time t − 1, that is, It−1 ≡ {yt−i, i > 0}. Given the parametrization of D(·), we have
E[y2t | st = k, It−1] = hk,t, that is, hk,t is the variance of yt conditional on the realization of st and
the information set It−1.
As in Haas et al. (2004), the conditional variance of yt is assumed to follow a GARCH–type
model. More precisely, conditionally on regime st = k, hk,t is specified as a function of past returns
and the additional regime–dependent vector of parameters θk:
hk,t ≡ h(yt−1, hk,t−1,θk) ,
where h(·) is a It−1–measurable function, which defines the filter for the conditional variance and
also ensures its positiveness. We further assume that hk,1 ≡ hk (k = 1, . . . ,K), where hk is a fixed
initial variance level for regime k, that we set equal to the unconditional variance in regime k.
4In practice, this means that we apply the (MS)GARCH models to de–meaned log–returns, as explained in
Section 3.5For t = 1, we initialize the regime probabilities and the conditional variances at their unconditional levels. To
simplify exposition, we use henceforth for t = 1 the same notation as for general t, since there is no confusion possible.
6
Depending on the form of h(·), we obtain different scedastic specifications. For instance, if:
hk,t ≡ ωk + αky2t−1 + βkhk,t−1 ,
with ωk > 0, αk > 0, βk ≥ 0 and αk + βk < 1 (k = 1, . . . ,K), we obtain the Markov–switching
GARCH(1, 1) model presented in Haas et al. (2004).6 In this case θk ≡ (ωk, αk, βk)′.
Alternative definitions of the function h(·) can be easily incorporated in the model. For instance,
to account for the well–known asymmetric reaction of volatility to the sign of past returns (often
referred to as the leverage effect ; see Black 1976), we specify a Markov–switching GJR(1, 1) model
exploiting the volatility specification of Glosten et al. (1993):
where 0 and I denote a vector of zeros and an identity matrix of appropriate sizes, fN (•;µ,Σ) is the multivariateNormal density with mean vector µ and covariance matrix Σ, ξk,1 is the asymmetry parameter, and ξk,2 the tailparameter of the skewed Student–t distribution in regime k. Moreover, hk ≡ hk(θk, ξk) is the unconditional variancein regime k and CSCk denotes the covariance–stationarity condition in regime k; see Trottier and Ardia (2016).
9We performed several sensitivity analyses to assess the impact of the estimation’s setup. First, we changed the
hyper–parameter values. Second, we ran longer MCMC chains. Third, we used 10,000 posterior draws instead of1,000. Finally, we tested an alternative MCMC sampler based on adaptive mixtures of Student–t distribution (Ardiaet al., 2009). In all cases, the conclusions remained qualitatively similar.
9
with mixing weights πk,T+1 ≡∑K
i=1 pi,kηi,T where ηi,T ≡ P[sT = i |Ψ, IT ] (i = 1, . . . ,K) are the
filtered probabilities at time T . The cumulative density function (CDF) is obtained from (4) as
follows:
F (yT+1 |Ψ, IT ) ≡∫ yT+1
−∞f(z |Ψ, IT )dz . (5)
Within the frequentist framework, the predictive PDF and CDF are simply computed by replacing
Ψ by the ML estimator Ψ in (4) and (5). Within the Bayesian framework, we proceed differently,
and integrate out the parameter uncertainty. Given a posterior sample {Ψ[m],m = 1, . . . ,M}, the
predictive PDF is obtained as:
f(yT+1 | IT ) ≡∫Ψf(yT+1 |Ψ, IT )f(Ψ | IT )dΨ ≈ 1
M
M∑m=1
f(yT+1 |Ψ[m], IT ) . (6)
The predictive CDF is given by:
F (yT+1 | IT ) ≡∫ yT+1
−∞f(z | IT )dz . (7)
For both estimation approaches, the VaR is estimated as a quantile of the predictive density, by
numerically inverting the predictive CDF. For instance, in the Bayesian framework, the VaR at
the α risk level equals:
VaRαT+1 ≡ inf {yT+1 ∈ R |F (yT+1 | IT ) = α} , (8)
while the ES at the α risk level is given by:
ESαT+1 ≡1
α
∫ VaRαT+1
−∞zf(z|IT )dz . (9)
In our empirical application, we consider the VaR and the ES at the 1% and 5% risk levels.
For evaluating the risk at an h–period horizon, we must rely on simulation techniques to obtain
the conditional density and downside risk measures, as described, for instance, in Blasques et al.
(2016). More specifically, given a MSGARCH model parameter Ψ, we generate 25,000 paths of
10
daily log–returns over a horizon of h days.10 The simulated distribution and the obtained α–
quantile then serve as estimates of the density and downside risk forecasts of the h–day cumulative
log–return.
3. Large–scale empirical study
We use 1,500 log–returns (in percent) for the estimation and run the backtest over 2,000 out–
of–sample log-returns for a period ranging from October 10, 2008, to November 17, 2016 (the full
dataset starts on December 26, 2002). Each model is estimated on a rolling window basis, and
one–step ahead as well as multi–step cumulative log–returns density forecasts are obtained.11 From
the estimated density, we compute the VaR and the ES at the 1% and 5% risk levels.
3.1. Datasets
We test the performance of the various models on several universes of securities typically traded
by fund managers:
� A set of 426 stocks, selected by taking the S&P 500 universe index as of November 2016,
and omitting the stocks for which more than 5% of the daily returns are zero, and stocks for
which there are less than 3,500 daily return observations.
� A set of eleven stock market indices: (1) S&P 500 (US; SPX), (2) FTSE 100 (UK; FTSE),
GSPTSE), and (11) Swiss Market Index (Switzerland; SSMI);
10With the frequentist estimation, we generate 25,000 paths with parameter Ψ, while in the case of the Bayesian
estimation, we generate 25 paths for each of the 1,000 value Ψ[m]
in the posterior sample. We use this number to getenough draws from the predictive distribution as we focus on the left tail. Geweke (1989) shows that the consistentestimation of the predictive distribution does not depend on the number of paths generated from the posterior. Sowith 25 paths, we indeed converge to the correct predictive distribution. We verified that increasing the number ofsimulations has no material impact on the results.
11Model parameters are updated every ten observations. We selected this frequency to speed up the computations.
Similar results for a subset of stocks were obtained when updating the parameters every day. This is also in line withthe observation of Ardia and Hoogerheide (2014), who show, in the context of GARCH models, that the performanceof VaR forecasts is not significantly affected when moving from a daily updating frequency to a weekly or monthlyupdating frequency. Note that while parameters are updated every ten observations, the density and downsides riskmeasures are computed every day.
11
� A set of eight foreign exchange rates: USD against CAD, DKK, NOK, AUD, CHF, GBP,
JPY, and EUR.12
Data are retrieved from Datastream. Each price series is expressed in local currency. We compute
the daily percentage log–return series defined by xt ≡ 100× log(Pt/Pt−1), where Pt is the adjusted
closing price (value) on day t. We then de–mean the returns xt using an AR(1)–filter, and use
those filtered returns, yt, to estimate and evaluate the precision of the financial risk monitoring
systems.
In Table 1, we report the summary statistics on the out–of–sample daily, five–day, and ten–
day cumulative log–returns for the three asset classes. We report the standard deviation (Std),
the skewness (Skew) and kurtosis (Kurt) coefficients evaluated over the full sample as well as the
historical 1% and 5% VaR and ES levels. We note the higher volatility in all periods for the
universe of stocks, followed by indices and exchange rates. All securities exhibit negative skewness,
with larger values for indices and stocks, while exchange rates seem to behave more symmetrically.
Interestingly, the negative skewness tends to be more pronounced for indices as the horizon grows.
Finally, at the daily horizon, we observe a significant kurtosis for stocks. Fat tails are also present
for indices and exchange rates, but less pronounced than for stocks. However, as the horizon grows,
the kurtosis of all asset classes tends to diminish.
[Insert Table 1 about here.]
3.2. Forecasting performance tests
We compare the adequacy of the 32 models in terms of providing accurate forecasts of the left
tail of the conditional distribution and the VaR and ES levels.
3.2.1. Accuracy of VaR predictions
For testing the accuracy of the VaR predictions, we use the so–called hit variable, which is a
dummy variable indicating a loss that exceeds the VaR level:
Iαt ≡ I{yt ≤ VaRαt } ,
12In the context of foreign exchange rates, left–tail forecasts aim at assessing the risk for a foreign investor investing
in USD and therefore facing devaluation of USD.
12
where VaRαt denotes the VaR prediction at risk level α for time t, and I{·} is the indicator function
equal to one if the condition holds, and zero otherwise. If the VaR is correctly specified, then the
hit variable has a mean value of α and is independently distributed over time. We test this for the
α = 1% and α = 5% risk levels using the unconditional coverage (UC) test by Kupiec (1995), and
the dynamic quantile (DQ) test by Engle and Manganelli (2004).
The UC test by Kupiec (1995) uses the likelihood ratio to test that the violations have a
Binomial distribution with E[Iαt ] = α. Denote by x ≡∑T
t=1 Iαt the number of observed rejections
on a total of T observations, then, under the null of correct coverage, we have that the test statistic:
UCα ≡ −2 ln[(1− α)T−xαx
]+ 2 ln
[(1− x
T
)T−x ( xT
)x],
is asymptotically chi–square distributed with one degree–of–freedom.
The DQ test by Engle and Manganelli (2004) is a test of the joint hypothesis that E[Iαt ] = α
and that the hit variables are independently distributed. The implementation of the test involves
the de–meaned process Hitαt ≡ Iαt − α . Under correct model specification, unconditionally and
conditionally, Hitαt has zero mean and is serially uncorrelated. The DQ test is then the traditional
Wald test of the joint nullity of all coefficients in the following linear regression:
Hitαt = δ0 +L∑l=1
δlHitαt−l + δL+1VaRαt−1 + εt .
If we denote the OLS parameter estimates as δ ≡ (δ0, . . . , δL+1)′ and Z as the corresponding data
matrix with, in column, the observations for the L + 2 explanatory variables, then the DQ test
statistic of the null hypothesis of correct unconditional and conditional coverage is:
DQα ≡δ′Z′Zδ
α(1− α).
As in Engle and Manganelli (2004), we choose L = 4 lags. Under the null hypothesis of correct
unconditional and conditional coverage, we have that DQα is asymptotically chi–square distributed
13
with L+ 2 degrees of freedom.13
3.2.2. Accuracy of the left–tail distribution
Risk managers care not only about the accuracy of the VaR forecasts but also about the
accuracy of the complete left–tail region of the log–return distribution. This broader view of
all losses is central in modern risk management, and, consistent with the regulatory shift to using
Expected Shortfall as the risk measure for determining capital requirements starting in 2018 (Basel
Committee on Banking Supervision, 2013). We evaluate the effectiveness of MSGARCH models to
yield accurate predictions of the left–tail distribution in three ways.
A first approach is to compute the weighted average difference of the observed returns with
respect to the VaR value, and give higher weight to losses that violate the VaR level. This corre-
sponds to the quantile loss assessment of Gonzalez-Rivera et al. (2004) and McAleer and Da Veiga
(2008). Formally, given a VaR prediction at risk level α for time t, the associated quantile loss
(QL) is defined as:
QLαt ≡ (α− Iαt )(yt −VaRαt ) .
The choice of this loss function for VaR assessment is appropriate since quantiles are elicited by it;
that is, when the conditional distribution is static over the sample, the VaRαt can be estimated by
minimizing the average quantile loss function. Elicitability is useful for model selection, estimation,
forecast comparison, and forecast ranking.
Unfortunately, there is no loss function available for which the ES risk measure is elicitable; see,
for instance, Bellini and Bignozzi (2015) and Ziegel (2016). However, it has been recently shown
by Fissler and Ziegel (2016) (FZ) that, in case of a constant conditional distribution, the couple
(VaR, ES) is jointly elicitable, as the values of vt and et that minimize the sample average of the
following loss function:
FZ(yt, vt, et, α,G1, G2)≡(Iαt − α)
(G1(vt)−G1(yt) +
1
αG2(et)vt
)−G2(et)
(1
αIαt yt − et
)− G2(et) ,
13As in Bams et al. (2017), it is possible to add more explanatory variable such as lagged returns and lagged
squared returns and jointly test the new coefficients. In our case, results obtained by adding lagged returns or laggedsquared returns are qualitatively similar to the simpler specification.
14
where G1 is weakly increasing, G2 is strictly positive and strictly increasing, and G′2 = G2. In a
similar setup as ours, Patton et al. (2017) assume the values of VaR and ES to be strictly negative
and recommend setting G1(x) = 0 and G2(x) = −1/x. For a VaR and a ES prediction at risk level
α for time t, the associated joint loss function (FZL) is then given by:
FZLαt ≡1
αESαtIαt (yt −VaRα
t ) +VaRα
t
ESαt+ log(−ESαt )− 1 , (10)
for ESαt ≤ VaRαt < 0. Hence, in order to gauge the precision of both the VaR and ES downside risk
estimates, we use the FZL function as our second evaluation criterion.
A third approach that we consider is to compare the empirical distribution with the predicted
conditional distribution through the weighed Continuous Ranked Probability Score (wCRPS), in-
troduced by Gneiting and Ranjan (2011) as a generalization of the CRPS scoring rule (Matheson
and Winkler, 1976). Following the notation introduced in Section 2, the wCRPS for a forecast at
time t is defined as:
wCRPSt ≡∫Rω(z) (F (z | It−1)− I{yt ≤ z})
2 dz ,
where F is the predictive CDF and ω : R→ R+ is a continuous weight function, which emphasizes
regions of interest of the predictive distribution, such as the tails or the center. Since our focus is
on predicting losses, we follow Gneiting and Ranjan (2011) and use the decreasing weight function
ω(z) ≡ 1−Φ(z), where Φ is the CDF of a standard Gaussian distribution. This way, discrepancies
in the left tail of the return distribution are weighed more than those in the right tail.14
For the QL, FZL and wCRPS approaches, we test the statistical significance of the differences
in the forecasting performance of two competing models, say models i and j. We do this by first
14We follow the implementation of Gneiting and Ranjan (2011) and compute wCRPS with the following approxi-
mation:
wCRPSt ≈zu − zlM − 1
M∑m=1
w(zm) (F (zm | It−1)− I{yt ≤ zm})2,
where zm ≡ zl+m×(zu−zl)/M and zu and zl are the upper and lower values, which defines the range of integration.The accuracy of the approximation can be increased to any desired level by M . Setting zl = −100, zu = 100 andM = 1,000 provides an accurate approximation when working with returns in percentage points. We also tested thetriangular integration approach and results were numerically equivalent. Alternative weights specifications, focusingon the right tail, center, of full distribution, lead to similar conclusions at the one–day forecasting horizon. Theresults are available from the authors upon request.
15
computing, for each out–of–sample date t, the average performance statistics across all securities in
the same asset class. Denote this difference as ∆i−jt ≡ Lit−L
jt , where Lit is the average value of the
performance measure (QL, FZL or wCRPS) of all assets within the same asset class. We then test
H0 : E[∆i−jt ] = 0 using the standard Diebold and Mariano (1995) (DM) test statistic, implemented
with the heteroscedasticity and autocorrelation robust (HAC) standard error estimators of Andrews
(1991) and Andrews and Monahan (1992). If the null hypothesis is rejected, the sign of the test
statistics indicates which model is, on average, preferred for a particular loss measure.
3.3. Results
We now summarize the results regarding our main research question: Does the additional
complexity of Markov–switching and the use of Bayesian estimation methods lead to more accurate
out–of–sample downside risk predictions? We first present our results regarding the accuracy of
the VaR predictions and then use the QL, FZL and wCRPS approaches to evaluate the gains in
terms of left–tail predictions.
3.3.1. Effect of model and estimator choice on the accuracy of VaR predictions
We first use the UC test of Kupiec (1995) and the DQ test of Engle and Manganelli (2004) to
evaluate the accuracy of each of the 32 methods considered in terms of predicting the VaR at the
5% and 1% level for the daily returns on the 426 stocks, 11 stock indices and 8 exchange rates.
For each asset, we obtain the p–value corresponding to the UC and DQ test computed using 2,000
out–of–sample observations. In Table 2, we aggregate the results per asset class by presenting the
percentage of assets for which the null hypothesis of correct unconditional and conditional coverage
is rejected at the 5% level, by the UC and DQ test, respectively.15
[Insert Table 2 about here.]
15In the case of stocks, as the universe is large and therefore prone to false positives, the p–values are corrected for
Type I error using the false discovery rate (FDR) approach of Benjamini and Hochberg (1995). The FDR correctionfor a confidence level q proceeds as follows. For a set of m ordered p–values p1 ≤ p2 ≤ . . . ≤ pm and correspondingnull hypotheses H1, H2, . . . , Hm, define v as the largest value of i for which pi ≤ i
mq, and the reject all hypotheses
Hi for i = 1, . . . , v.
16
Consider in Panels A and B of Table 2 the results for the UC test. At both VaR risk levels,
we find that the validity of the VaR predictions based on the GARCH and GJR skewed Student–t
risk model is never rejected, whatever the use of SR or MS models, or frequentist or Bayesian
estimation methods. The result changes drastically when we consider the more powerful DQ test
of correct conditional coverage in Panels C and D. Here, we find clear evidence that the use of MS
GJR models leads to a lower percentage of rejections of the validity of the VaR prediction for all
asset classes. At the 1% risk level, these differences are most often significant.
Overall, the one–day ahead backtest results indicate outperformance of MS over SR models,
especially for VaR prediction on equities. Moreover, a GJR specification leads to a substantial
reduction in the rejection frequencies. Both for MS and SR specifications, a fat–tailed conditional
distribution is of primary importance and delivers excellent results at both risk levels.
Finally, for this analysis, the frequency of rejections are similar between the Bayesian and
frequentist estimation methods. More precisely, a t–test for equal average rejections indicates that
differences are insignificant. We thus conclude that, based on the analysis of VaR forecast accuracy,
it is hard to discriminate between the estimation methods.
3.3.2. Effect of model choice on accuracy of left–tail predictions
A further question is how model simplification affects the accuracy of the left–tail return predic-
tion. In Table 3, we report the standardized difference between the average QL, FZL and wCRPS
values of the assets belonging to the same asset class, when we switch from a MS specification
to a SR specification. The standardization corresponds to the Diebold and Mariano (1995) (DM)
test statistic. Negative values indicate out–of–sample evidence of a deterioration in the prediction
accuracy when using the SR specification instead of the MS specification. When the standardized
value exceeds 2.57 (i.e., the critical value computed using a 1% significance level for a bilateral
test based on the asymptotic Normal distribution) in absolute value, the statistical significance is
highlighted with a gray shading.16 We report results obtained with the Bayesian framework only,
16We take the standard critical value in Diebold and Mariano (1995) as our Markov–switching specifications do
not nest the alternative single–regime model due to parameter constraints imposing that the volatility dynamics arenumerically different in each regime, and that each regime has a non–zero probability. The approach by Clark andMcCracken (2001) should be used when comparing nested models.
17
as the performance obtained with the Bayesian estimation is better for both MS and SR models
(especially for SR specifications) compared with the frequentist estimation.17
[Insert Table 3 about here.]
One–step ahead results for wCRPS favor MS models with negative values observed for almost
all asset classes and model specifications. QL, FZL and wCRPS results are consistent with the
backtest results: They confirm the superior performance of the MS specification for the universe
of stocks, while outperformance is less clear for indices and exchange rates. Indeed, for indices,
MS is required only when a non fat–tailed conditional distribution is assumed, while for exchange
rates, MS is generally not required. Note that, for all assets, the improvements tend to be more
pronounced when the Markov–switching mechanism is applied to simple specifications such as the
GARCH–Normal model.
For stocks, the MS specification significantly outperforms in terms of the FZL and wCRPS
measures at the five–day horizon. For the wCRPS measure at the ten–day horizon, and for the
QL measure at the five– and ten–day horizons, results are mostly insignificant, except for the FZL
5% measure, which favors MS models when a non fat–tailed conditional distribution is assumed.
MS and SR models perform similarly for the five– and ten–day returns on stock indices. Finally,
for exchange rate returns, SR models outperform MS models at the five– and ten–day horizons
according to the QL 1% measure, while the differences in QL 5%, FZL, and wCRPS are insignificant.
It is informative to examine if these gains in forecasting precision are stable across the out–
of–sample window. To determine this, we display in Figure 1 the cumulative wCRPS average
loss differential over the whole out–of–sample period for the best performing specification, the
GJR skewed Student–t model. Interestingly, we find that MSGARCH systematically outperforms
GARCH according to the criteria that are most sensitive to the extreme left tail of the return
distribution, namely the FZL (for α = 1% and α = 5%) and QL (for α = 1%). We also notice
that in these cases the gains of MSGARCH over GARCH increase during the last phase of the
turbulent period 2008–2012. With regards to wCRPS and QL at α = 5%, we find that MSGARCH
17Hence, our discussion based on Bayesian results is more conservative in the sense that it gives an advantage to
the SR specifications.
18
starts outperforming GARCH after the end of the turbulent period 2008–2012. We conjecture that
this improvement in performance can be explained by the lack of flexibility of the single–regime
GARCH specification. As also evident from the first panel of Figure 1, the market volatility
has changed both its unconditional level and its dependence structure between the two periods
2008–2012 and 2012–2015. Since the estimation window is of 1’500 observations (approximately 7
years), observations in the period 2008–2012 affect GARCH predictions for the whole 2012–2015
forecasting period. Differently, MSGARCH allows the volatility process to adapt more rapidly to
changes in regimes, resulting in better risk predictions. This is the case for the first half of the
window, ranging from December 2008 to November 2012 and encompasses the Great Financial
Crisis, but as well for the half of the window, ranging from December 2012 to November 2016 and
follows the crisis; more calm market period.
[Insert Figure 1 about here.]
We now consider in Table 4 a complete comparison of the wCRPS performance of all MS
models (in row) versus all SR models (in column). The elements in the diagonal correspond to
the wCRPS values reported in Table 3. They are informative about the change in wCRPS when
switching from a MS model to a SR model, keeping the same specification for the conditional
variance and distribution. The analysis of the extra–diagonal elements is informative about the
changes in wCRPS when switching from a MS model to a SR model, and changing the specification
of the volatility model or the density function. In this table, an outperforming MS risk model is
a model for which all standardized gains when changing the specification are negative. For almost
all comparisons, this is the case for the MS GJR model with skewed Student–t innovations. The
only exception is for modeling the returns of stock market indices, where it performs similarly as
its SR counterpart.
[Insert Table 4 about here.]
3.3.3. Effect of estimator choice on accuracy of left–tail predictions
In Table 5, we report the results for the Bayesian versus frequentist estimation methods in the
case of one–step ahead QL, FZL and wCRPS measures. Panel A (Panel B) shows the results for
19
MS (SR) models, where a negative (positive) value indicates outperformance (underperformance)
of Bayesian against frequentist estimation. In light gray, we emphasize cases of significant outper-
formance of the Bayesian estimation over the frequentist approach. For stocks, the QL 1% and 5%
comparisons indicate that Bayesian is preferred over ML, and it is significant in the majority of
the specifications. The same observation can be made when using the FZL and wCRPS evaluation
criteria. For stock indices and exchange rates, QL, FZL and wCRPS results are in favor of the
Bayesian estimation for both MS and SR models but results are less significant than for stocks.
Overall, we recommend to account for parameter uncertainty especially for stocks data, and when
the interest is on the left tail of the log–returns distribution. The performance gain is especially
Matheson, J.E., Winkler, R.L., 1976. Scoring rules for continuous probability distributions. Management Science 22,
1087–1096. doi:10.1287/mnsc.22.10.1087.
McAleer, M., Da Veiga, B., 2008. Single-index and portfolio models for forecasting Value-at-Risk thresholds. Journal
of Forecasting 27, 217–235. doi:10.1002/for.1054.
McNeil, A.J., Frey, R., Embrechts, P., 2015. Quantitative Risk Management: Concepts, Techniques and Tools.
Second ed., Princeton University Press.
Nelson, D.B., 1991. Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59, 347–370.
doi:10.2307/2938260.
Opschoor, A., van Dijk, D., van der Wel, M., 2017. Combining density forecasts using focused scoring rules. Journal
of Applied Econometrics doi:10.1002/jae.2575. in press.
Patton, A.J., Ziegel, J.F., Chen, R., 2017. Dynamic semiparametric models for expected shortfall. URL: https:
//ssrn.com/abstract=3000465. working paper.
Shephard, N., Sheppard, K., 2010. Realising the future: Forecasting with high-frequency-based volatility (HEAVY)
models. Journal of Applied Econometrics 25, 197–231. doi:10.1002/jae.1158.
Taylor, S.J., 1994. Modeling stochastic volatility: A review and comparative study. Mathematical Finance 4, 183–204.
doi:10.1111/j.1467-9965.1994.tb00057.x.
Trottier, D.A., Ardia, D., 2016. Moments of standardized Fernandez-Steel skewed distributions: Applications to the
estimation of GARCH-type models. Finance Research Letters 18, 311–316. doi:10.1016/j.frl.2016.05.006.
Vihola, M., 2012. Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing 22,
997–1008. doi:10.1007/s11222-011-9269-5.
Ziegel, J.F., 2016. Coherence and elicitability. Mathematical Finance 26, 901–918.
26
Table 1: Summary statistics of the return dataThe table presents the summary statistics of the (de–meaned) h–day cumulative log–returns for securitiesin the three asset classes used in our study. We report the standard deviation (Std), the skewness (Skew),the kurtosis (Kurt), and the 5% and 1% historical VaR and ES, on an unconditional basis for the 2,000out–of–sample observations. For each statistic, we compute the 25th, 50th and 75th percentiles over thewhole universe of assets.
h Percentile Std Skew Kurt 1% VaR 5% VaR 1% ES 5% ES
Table 2: Percentage of assets for which the validity of the VaR predictions is rejectedThe table presents the percentage of assets for which the unconditional coverage test (UC, Panels A and B)by Kupiec (1995) and the Dynamic Quantile test (DQ, Panels C and D) by Engle and Manganelli (2004)reject the null hypothesis of correct unconditional coverage (UC, DQ) and independence of violations (DQ)for the one–step ahead 1%–VaR (Panels A and C) and 5%–VaR (Panels B and D) at the 5% significancelevel. The VaR forecasts are obtained for Markov–switching (MS) and single–regime (SR) models for thevarious universes (426 stocks, 11 indices, and 8 exchange rates) and estimated via Bayesian or frequentisttechniques. We highlight in gray the best performing method for the cases in which, for a given asset classand model specification, the percentages of rejections between MS and SR models are significantly differentat the 5% level. In the case of stocks, rejections frequencies are corrected for Type I error using the FDRapproach of Benjamini and Hochberg (1995).
Table 4: Standardized gain in average performance when switching from MS to SR and chang-ing the specificationThis table presents the Diebold and Mariano (1995) test statistic of equal average wCRPS between a MSimplementation (in rows) and a SR implementation (in column), for all considered specifications, whenforecasting the distribution of one–day ahead log–returns. We report test statistics computed with robustHAC standard errors. Negative values indicate outperformance of the Markov–switching specification com-pared with single–regime models. In light (dark) gray, we report statistics which are significantly negative(positive) at the 1% level (bilateral test). Models are estimated with the Bayesian approach.
Figure 1: Cumulative performanceThis figure presents the evolution of VIX (the Chicago Board of Exchange’s volatility index) in the toppanel, together with the cumulative loss differentials (QL, FZL and wCRPS) for the 2,000 out–of–sampleobservations (ranging from December 2008 to November 2016). The comparison is done between the Markov–switching and the single–regime GJR skewed Student–t models. A positive value indicates outperformanceof the Markov–switching specification. A positive slope indicates outperformance at the corresponding date.