Leverage, asymmetry and heavy tails in the high-dimensional factor stochastic volatility model * Mengheng Li † Department of Econometrics, VU University Amsterdam, The Netherlands Marcel Scharth ‡ Discipline of Business Analytics, University of Sydney Business School, Australia Working paper; this version: November 12, 2017 Abstract There is a rich empirical literature that studies the stochastic volatility (SV) of univari- ate financial time series whose distribution exhibits asymmetry and heavy tails. Yet the literature focusing on high-dimensional SV models appears to be much scarcer, lacking a general modelling framework and efficient estimation method due to “curse of dimensional- ity”. Our contribution is twofold. Firstly, we propose a flexible high-dimensional factor SV model with leverage effect, asymmetry and heavy tails based on errors following the gener- alised hyperbolic skew Student’s t -distribution. With shrinkage, the model leads to different parsimonious forms, and thus is able to disengage systematic leverage effect and skewness from asset-specific ones. Secondly, we develop a highly efficient Markov chain Monte Carlo estimation procedure that analyses the univariate version of the model using efficient im- portance sampling. Extension to higher dimensions is straightforward via marginalisation of factors. Computational complexity is shown to be linearly scalable in number of both factors and assets. We assess the performance of our proposed method via extensive simu- lation studies using both univariate and multivariate simulated datasets. Finally, we show that the model outperforms other factor models in terms of estimation of value-at-risk and minimum-variance portfolio performance for a U.S. and an Australian portfolio. Keywords: Markov chain Monte Carlo; Generalised hyperbolic skew Student’s t-distribution; Stochastic volatility; Metropolis-Hastings algorithm; Importance sampling; Particle filter; Parti- cle Gibbs; State space model; Time-varying covariance matrix; Factor model JEL Classification: C11; C32; C53; C55; G32 * We would like to thank George Tauchen, Richard Gerlach, Gary Koop, Siem Jan Koopman, Frank Kleibergen, Lennart Hoogerheide, Robert Kohn, Charles Bos, Anne Opschoor, and seminar and workshop participants at The University of Sydney Business School, VU University Amsterdam, University of Amsterdam, Tinbergen Institute, the 10 th International Conference on Computational and Financial Econometrics (Seville, 2016), the 10 th Society of Financial Econometrics Annual Conference (New York, 2017), the 1 st International Conference on Econometrics and Statistics (Hong Kong, 2017), the 8 th European Seminar on Bayesian Econometrics (Maastricht, 2017) for useful comments and helpful suggestions on previous versions of this paper. Any remaining errors are ours alone. † Email: [email protected]; Contact author ‡ Email: [email protected]1
63
Embed
Leverage, asymmetry and heavy tails in the high ... › 2017 › 11 › paper1.pdfLeverage, asymmetry and heavy tails in the high-dimensional factor stochastic volatility model Mengheng
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Leverage, asymmetry and heavy tails in the high-dimensional
factor stochastic volatility model∗
Mengheng Li†
Department of Econometrics, VU University Amsterdam, The Netherlands
Marcel Scharth‡
Discipline of Business Analytics, University of Sydney Business School, Australia
Working paper; this version: November 12, 2017
Abstract
There is a rich empirical literature that studies the stochastic volatility (SV) of univari-ate financial time series whose distribution exhibits asymmetry and heavy tails. Yet theliterature focusing on high-dimensional SV models appears to be much scarcer, lacking ageneral modelling framework and efficient estimation method due to “curse of dimensional-ity”. Our contribution is twofold. Firstly, we propose a flexible high-dimensional factor SVmodel with leverage effect, asymmetry and heavy tails based on errors following the gener-alised hyperbolic skew Student’s t-distribution. With shrinkage, the model leads to differentparsimonious forms, and thus is able to disengage systematic leverage effect and skewnessfrom asset-specific ones. Secondly, we develop a highly efficient Markov chain Monte Carloestimation procedure that analyses the univariate version of the model using efficient im-portance sampling. Extension to higher dimensions is straightforward via marginalisationof factors. Computational complexity is shown to be linearly scalable in number of bothfactors and assets. We assess the performance of our proposed method via extensive simu-lation studies using both univariate and multivariate simulated datasets. Finally, we showthat the model outperforms other factor models in terms of estimation of value-at-risk andminimum-variance portfolio performance for a U.S. and an Australian portfolio.
Keywords: Markov chain Monte Carlo; Generalised hyperbolic skew Student’s t-distribution;Stochastic volatility; Metropolis-Hastings algorithm; Importance sampling; Particle filter; Parti-cle Gibbs; State space model; Time-varying covariance matrix; Factor model
JEL Classification: C11; C32; C53; C55; G32
∗We would like to thank George Tauchen, Richard Gerlach, Gary Koop, Siem Jan Koopman, Frank Kleibergen,Lennart Hoogerheide, Robert Kohn, Charles Bos, Anne Opschoor, and seminar and workshop participants at TheUniversity of Sydney Business School, VU University Amsterdam, University of Amsterdam, Tinbergen Institute,the 10th International Conference on Computational and Financial Econometrics (Seville, 2016), the 10th Societyof Financial Econometrics Annual Conference (New York, 2017), the 1st International Conference on Econometricsand Statistics (Hong Kong, 2017), the 8th European Seminar on Bayesian Econometrics (Maastricht, 2017) foruseful comments and helpful suggestions on previous versions of this paper. Any remaining errors are ours alone.†Email: [email protected]; Contact author‡Email: [email protected]
1
1 Introduction
Time-varying volatility and leverage effects, two of the so-called “stylised facts”, are often the
focus of research on time series of financial returns that are also believed to be asymmetrically
distributed with heavy tails. The rich literature studying financial times series provides strong
econometric evidence supporting such empirical findings. There are two major classes of models.
One is parameter-driven stochastic volatility (SV) models and the other is observation-driven
(generalised) autoregressive conditional heteroskedasticity (GARCH) models. Kim et al. (1998)
provides a classical comparison between the two classes of models in terms of filtering estimation
and forecasting performance. They find that the Gaussian SV model fits empirical data similarly
as the GARCH model with Student’s t-error. Carrasco and Chen (2002) derive detailed sta-
tistical properties of these two classes of models including mixing property and (un)conditional
distributions characterized by finite moments. The development of research in the last decade
shifts from statistical analysis to more detailed modelling techniques which aims at capturing
“stylised facts” including not only time-varying volatility but also leverage effects, left skewness
and heavy-tailedness of financial series. To this end, both new classes of SV and GARCH models
have been developed. Among many others, Shephard and Pitt (1997) and Durbin and Koop-
man (1997) develop similar simulated likelihood estimation procedure to estimate SV model with
Student’s t-error. The observation-driven counter part is the GARCH-t model firstly developed
by Bollerslev (1987) three decades ago. Leverage effect corresponds to the negative correla-
tion between past returns and future volatility. GARCH-M model of French et al. (1987) and
EGARCH model of Nelson (1991) extend the conditional structure of time-varying variance to
model the negative correlation. Koopman and Hol Uspensky (2002) and Yu (2005) discuss ways
of modelling leverage effect in SV models where the former also provides an efficient simulated
likelihood estimation method and the latter additionally shows that leverage effect may be the
cause of skewness of the return distribution.
A recent SV model proposed by Nakajima and Omori (2012) provides a modelling framing
work which nests time-varying volatility with leverage effect and heavy-tailed error distribution
with skewness based on a Gaussian mixture representation of Aas and Haff (2006)’s generalised
hyperbolic skew Student’s t-distribution. Our paper builds on the previous two researches by
which we are inspired to propose a new estimation procedure that can deliver more efficient
inference. The estimation of time-varying volatility models is straightforward if the model
is observation-driven, like all variants of GARCH models. It becomes more difficult if the
2
model is parameter-driven, i.e. SV models, which usually boils down to some non-linear state
space models without analytical likelihood function. It is recognised that simulated likelihood is
applicable to simple SV models, and that it suffers from flat likelihood function, multimodality
and other numerical issues when the model becomes more complex. In such a case, Bayesian
approach provides a sound alternative and is widely used because it has standard procedure
for sampling and is easy to conduct inference. Several ways of sampling the latent SV process
from its posterior distribution have been proposed, among which the multi-move sampler of
Shephard and Pitt (1997) and Watanabe and Omori (2004) and the auxiliary particle filter of
Pitt and Shephard (999a) are the most widely used methods. For general discussion on Bayesian
estimation of SV models we refer to Jacquier et al. (2004) and the references therein. These
methods fall within the broader category of sequential Markov Chain Monte Carlo method
detailed in Doucet et al. (2001). Another sampling method which this paper partially builds
upon is the efficient importance sampling (EIS) originally developed by Richard and Zhang
(2007). EIS is based on a carefully-constructed globally optimal importance density instead
of a locally optimal proposal which is used by the multi-move sampler and auxiliary particle
filter. Scharth and Kohn (2016) develop a highly efficient and stable algorithm called particle
efficient importance sampling (PEIS). As the name suggests, PEIS evaluates an intractable but
unbiasedly estimable likelihood function via combination of EIS and the sequential particle filter.
This paper refines PEIS in the context of a modified Gibb’s sampler (Lindsten et al., 2014)
and applies it to model high-dimensional SV models, a field of research where literature appears
to be much scarcer than univariate SV models. To the best of our knowledge, multivariate mod-
els with time-varying volatility are often difficult to estimate due to “curse of dimensionality”,
namely the number of parameters grows exponentially with the dimension of assets. Corner-
stones of multivariate observation-driven time-varying volatility models include but not limited
to the constant conditional correlation (CCC) GARCH model of Bollerslev (1990) which models
time-varying covariance matrix with constant correlation among assets. Engle (2002) extends
CCC-GARCH model with dynamic conditional correlation (DCC) and shows its applicability
in terms of estimation and forecasting. A GARCH model with dynamic conditional structure
for the vectorized covariance matrix (VGARCH) is studied by Bollerslev et al. (1994). All these
type of models are widely available in different commercial packages, but the dimension con-
sidered barely exceeds 20 except for the VGARCH model. Low-dimensional models apparently
cannot be of much help to quantitative mutual fund or hedged quant fund (Dempster et al.,
2008) which continues to gain popularity in the recent years thanks to advancement in compu-
3
tational power. A report by Vardi (2015) finds that quant funds usually have tens and even
hundreds of positions in their portfolio, highlighting the need of a high-dimensional multivariate
model for risk and investment management. An attempt to achieve this by observation-driven
models comes from a new class of generalised autoregressive score (GAS) models developed by
Creal et al. (2012) and Oh and Patton (2017). Promising results and successful applications in
high-dimensional models have been documented.
In the parameter-driven world, univariate SV models can be easily extended to multivari-
ate ones in a straightforward manner, however the difficult estimation usually hampers their
practical use (Chib et al., 2009). In such a case, Bayesian estimation is typically employed.
For example, in low-dimensional models Danielsson (1998) and Asai et al. (2006) thoroughly
survey developments in sampling the latent volatility process with comparisons among different
model specifications. Liesenfeld and Richard (2006) apply EIS to a portfolio with four assets,
leaving high-dimensional applications to future research. As far as we know, Pitt and Shephard
(999b) and Chib et al. (2006) are among the earliest who manage to model high-dimensional
financial time series with distinctive SV series pertaining to every individual equity return, and
they propose to model correlation via latent dynamic factors which also serve as systematic
measure of market movements. Nakajima (2015) extends the univariate model of Nakajima and
Omori (2012) to a factor-free high-dimensional framework, which addresses the leverage effect,
skewness and heavy tails of individual asset’s error distribution.
To resolve the dimensionality issue, this paper proposes a flexible high-dimensional factor
SV model. We address leverage effect and model asymmetry and heavy tails based on gen-
eralised hyperbolic skew Student’s t-error, which complements existing study and discussion.
Importantly, we introduce shrinkage to the model, resulting in automated model selection. The
resulted parsimonious form is expected to disentangle leverage effect and asymmetry in idiosyn-
cratic noise from those in the factors. A highly efficient Markov chain Monte Carlo estimation
procedure which uses EIS to exploit the Gaussian mixture representation of the error distribu-
tion is proposed to analyze the univariate version of the model. Sampling scheme of the full
model is simplified via marginalisation of factors and boils down to estimation of many uni-
variate series which can be done in parallel. As a result, the high-dimensional model is able to
achieve efficiency comparable to a univariate model. We assess the performance of our proposed
method via simulation studies with both univariate and multivariate simulated data. Finally
the model is applied to two portfolios consisting of equity returns from S&P100 and ASX50.
Comparisons among other factor models are carried out in terms of estimation of value-at-risk
Our discussion is organized as follows. Section 2 introduces the model setting and our
proposed Bayesian estimation method including the use of EIS in the context of particle Gibbs
with ancestor sampling. Section 3 details the methods of evaluating marginal likelihood based on
an efficient particle filtering algorithm combined with importance sampling of hyperparameters.
Section 4 starts with a simulation study on the univariate model in comparison with the method
in Nakajima and Omori (2012). And a simulation study on high-dimensional factor model is
carried out to assess the estimation efficiency and performance of the marginal likelihood and
Bayes factor criterion in choosing the right number of factors. Section 5 illustrates our empirical
application in VaR and dynamic portfolio management. We conclude in Section 6.
2 Model and Bayesian estimation
2.1 Univariate stochastic volatility model
Nakajima and Omori (2012) introduce the following univariate stochastic volatility model with
leverage using generalised hyperbolic skew Student’s t-error
yt = νt exp(ht/2), t = 1, ..., T,
νt = α+ βWt +√Wtεt, t = 1, ..., T,
ht+1 = µ(1− φ) + φht + ηt, t = 1, ..., T − 1,εtηt
∼ N(0
0
, 1 ρσ
ρσ σ2
), t = 1, ..., T,
Wt ∼ IG(ζ
2,ζ
2), t = 1, ..., T,
(1)
where yt is the time series of equity returns, and ht is the unobserved log-volatility modelled as
a stationary AR(1) process with initialisation h1 ∼ N(µ, σ2
1−φ2 ), and νt follows the generalised
hyperbolic skew Student’s t-distribution. ρ models the leverage effects often found to be negative
in financial returns (Yu, 2005)1, which indicates that a drop in equity return likely leads to an
increase in its volatility. IG denotes the inverse Gamma distribution, and the mixing random
variable Wt is introduced to jointly model asymmetry and heavy tails in yt. We choose α =
−βζ/(ζ−2) so that E(νt) = 0 and restrict ζ > 4 to ensure νt has a finite variance. The skewness
1We adapt the definition of leverage effect in Yu (2005), i.e. the correlation between the idiosyncratic error νtand the SV innovation ηt. ρ itself is thus not the leverage effect.
5
and heavy-tailedness of νt are jointly determined by the asymmetric parameter β and degrees of
freedom ζ. Figure 1 shows different shapes of νt’s density with various β and ζ values. Readers
2
1
0
-
f(8 ; -2<-<2, 1=10)
-1
-2-6
-4-2
0
8
24
6
0
0.25
0.15
0.3
0.1
0.35
0.05
0.2
20
15
1
10
f(8 ; -=-2, 5<1<20)
5-6
-4-2
0
8
24
6
0.05
0
0.3
0.25
0.2
0.15
0.1
Figure 1: Different density shapes of generalised hyperbolic skew Student’s t-distribution. Left:
varying β with ζ = 10; right: varying ζ with β = −2.
can refer to Aas and Haff (2006) for a detailed account of generalised hyperbolic skew Student’s
t-distribution including its density function fν , the p-th moment E(|ν|p), and an EM algorithm
for parameter estimation. β = 0 corresponds to a symmetric Student’s t-distribution for νt and
a standard normal distribution if ζ further becomes large. As argued by Aas and Haff (2006), a
unique feature of the model for νt is that in the tails
fν(ν) ∝ |ν|−ζ/2−1exp(−|βν|+ βν) as ν → ±∞.
This means that fν has one heavy and one semiheavy tail, unlike many other forms of skew
Student’s t-distribution whose both tails decay polynomially, making it an appealing model for
financial data.
Assuming |φ| < 1 and E(|νt|p) exists, the unconditional p-th moment of yt in model (1) is
E(|yt|p) = exp( σ2p2
2(1− φ2)+ µp
)E(|νt|p).
Notice that model (1) implies conditional time-varying leverage effects. Given Wt, one has
Cov(νt, ηt) =√Wtρσ.
This means that if one interpretsWt as a “shock variable”, such a shock has a multiplicative effect
on the leverage. In the Appendix we show that unconditionally the leverage effect Corr(νt, ηt) =
6
Le(β, ζ)ρ has the following multiplier
Le(β, ζ) =Γ( ζ−1
2 )
Γ( ζ2)
√(ζ − 2)2(ζ − 4)
2ζ2 + (4β2 − 12)ζ + 16, ζ > 4. (2)
Basic algebra shows Le(β, ζ) ∈ (0, 1), ∀β, ζ ∈ R with ∂Le∂ζ > 0, ∂2Le
∂ζ2< 0, ∂Le
∂|β| < 0, and ∂2Le∂β2 < 0.
Given β, when ζ becomes large the density of νt is less skewed and has lighter tails (?), so
Le(β, ζ) tends to one or leverage effect tends to ρ, similar to the case of a standard SV model
with normal error. Given ζ > 4, the magnitude of leverage decreases to zero with |β| even
though ρ 6= 0. This feature tells that if the return innovation νt puts a large weight on the
“shock variable” Wt (i.e. large |β|), leverage effect vanishes.
We develop an MCMC algorithm which partially builds on Nakajima and Omori (2012) who
argue that the Gaussian variance-mean mixture representation of νt as the second line in model
(1) allows for a conditional sampler, but ours is believed to be more efficient and computationally
faster. The density function of inverse Gamma distribution and normal distribution are log-
linear, indicating possibility of building globally optimal importance density for Wt and ht via
the EIS method by Richard and Zhang (2007). In Nakajima and Omori (2012), a modified multi-
move sampler from Shephard and Pitt (1997) is used to sample ht block-by-block conditional on
Wt with a local Laplace approximation to the posterior density (see also Watanabe and Omori
(2004) and Takahashi et al. (2009)). Later we show that efficiency is further improved with a
novel particle Gibbs algorithm based on the EIS importance density which samples ht and Wt
as a whole block. The next section details our MCMC algorithm.
2.2 Estimation of the univariate model
Let θ = (σ, ρ, φ, µ, β, ζ) collect hyperparameters, and xt1:t2 denote the history of process xs from
s = t1 to t2. The MCMC algorithm developed below boils down to a Metropolis-within-Gibbs
procedure (e.g., Gilks et al. 1995, Geweke and Tanizaki 2001 which Koop et al. 2007) samples
from the posterior distribution of (θ, h1:T ,W1:T )|y1:T for model (1). The algorithm iterates over
1. sampling (h1:t,W1:t)|y1:t, θ;
2. sampling θ|y1:t, h1:t,W1:t.
7
2.2.1 Sampling (h1:t,W1:t)|y1:t, θ
We aim at improve efficiency by sampling the latent processes ht and Wt as one block. For no-
tational simplicity, the dependence on θ is suppressed. p(·) generically denotes density function,
possibly with subscript indicating a specific distribution.
Model (1) is a non-linear non-Gaussian state space model, and reformulating the model to
tackle the leverage effect gives
yt = (α+ βWt +√Wtεt)e
ht/2, t = 1, ..., T
ht+1 = µ(1− φ) + φht + ρσεt +√
1− ρ2ση∗t , t = 1, ..., T − 1
where εt = (yte−ht/2−α− βWt)/
√Wt, and η∗t is standard normal independent on εt. We notice
that εt ∈ Ft, where Ft should be the filtration generated by both observables y1:t and unob-
servables h1:t and W1:t such that the model is Markovian and yt forms a martingale difference
sequence, allowing factorisation of likelihood via likelihood contributions.
Introducing xt = (ht,Wt)′, the likelihood is given by the integral
L(y1:T ) =
∫p(y1:T , x1:T )dx1:T =
∫p(y1|x1)p(x1)
T∏t=2
p(yt|xt)p(xt|xt−1, yt−1)dx1:T , (3)
where the transition density for t = 2, ..., T follows
p(xt|xt−1, yt−1) = pN (ht|ht−1, yt−1,Wt−1)pIG(Wt)
= N(ht;µ(1− φ) + φht−1 + ρσεt−1, (1− ρ2)σ2
)· IG
(Wt;
ζ
2,ζ
2
).
(4)
The efficient high-dimensional importance sampling method (EIS) of Richard and Zhang (2007)
and further studied by e.g., Jung and Liesenfeld (2001) and Scharth and Kohn (2016) proposes
the following importance sampler
q(x1:T |y1:T ) = q(x1|y1:T )T∏t=2
q(xt|xt−1, y1:T ),
with the conditional density q(xt|xt−1, y1:T ) for t = 2, ..., T written as
q(xt|xt−1, y1:T ) =kq(xt, xt−1; δt)
χq(xt−1; δt)with χq(xt−1; δt) =
∫kq(xt, xt−1; δt)dxt.
kq(xt, xt−1; δt) is a kernel in xt with integration constant χq(xt−1; δt), and δt is a set of importance
8
parameters with every element being a function of y1:T . At initial period, the importance density
is simply
q(x1|, y1:T ) =kq(x1; δ1)
χq(δ1)with χq(δt) =
∫kq(x1, ; δ1)dx1.
Using the above importance density, the likelihood (3) can be expressed as
where $ = (1−ρ2)σ2, ϑ = ρσ, and 1(.) is an indicator function which equals one if the condition
in brackets hold and zero otherwise. The joint prior π0(ϑ,$) = π0(ϑ|$)π0($) is a conjugate
normal-inverse-gamma prior which facilitates the use of shrinkage prior used in the factor SV
model in section 2.3.1. The above prior distributions reflect popular choices in the literature of
SV models. To compare performance of different sampling schemes, we consider the following
methods:
• EIS-PGAS: Our baseline method – particle Gibbs with ancestor sampling and EIS impor-
tance density;
27
• EIS-PG: Basic particle Gibbs with EIS importance density;
• BF-PGAS: Particle Gibbs with ancestor sampling using bootstrap filter;
• MM-MH: The method of Nakajima and Omori (2012) – multi-move sampler for ht, con-
ditional on which Wt is drawn via an accept-reject M-H algorithm.
While BF-PGAS uses 20, 000 particles in the particle propagation, both EIS-PGAS and EIS-PG
use only 10 particles. In total 22, 000 samples for each parameter are drawn with a discarded
burn-in period of initial 2000 samples. We base our comparison on the inefficiency factor to check
the efficiency under different sampling schemes. The inefficiency factor for a certain parameter
θ is defined as IE(θ) = 1 + 2∑∞
j=1 ρj(θ) where ρj(θ) is the j-th sample autocorrelation. Chib
(2001) shows that IE(θ) measures the degree of mixing of the Markov chain for θ|·. If IE(θ) = m,
the MCMC algorithms requires m times more samples than drawing from uncorrelated samples.
We choose Parzen window with bandwidth 1000 to compute the inefficiency factor.
4.2 Estimation results
Figure 3 reports the sample autocorrelation functions (ACF), the Markov chain sample paths
and the posterior density estimates for one simulated series estimated by EIS-PGAS. Figures
obtained from other simulated series do not suggest qualitative difference. From a similar figure
in Nakajima and Omori (2012), one can already see the ACF of parameters estimated by EIS-
PGAS decay much quicker than those by MM-MH, especially for φ, β and ζ, implying higher
efficiency of EIS-PGAS.
0 5000 10000 15000 200000.90
0.92
0.94
0.96
0.98
1.00φ
0 5000 10000 15000 200000.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.21σ
0 5000 10000 15000 200000.7
0.6
0.5
0.4
0.3
0.2ρ
0 5000 10000 15000 20000
9.5
9.0
8.5
8.0µ
0 5000 10000 15000 20000
1.0
0.8
0.6
0.4
0.2β
0 5000 10000 15000 2000010
15
20
25
30
35
40ζ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0φ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0σ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0ρ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0µ
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0β
0 200 400 600 800 10000.2
0.0
0.2
0.4
0.6
0.8
1.0ζ
0.90 0.92 0.94 0.96 0.98 1.000
5
10
15
20
25
30
35
40
45φ
0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.210
10
20
30
40
50σ
0.7 0.6 0.5 0.4 0.3 0.20
1
2
3
4
5
6
7
8ρ
9.5 9.0 8.5 8.00.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5µ
1.0 0.8 0.6 0.4 0.20.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5β
10 15 20 25 30 35 400.00
0.02
0.04
0.06
0.08
0.10
0.12ζ
Figure 3: EIS-PGAS MCMC results for a randomly chosen series simulated from model (1).From top to bottom: sample autocorrelations, sample paths, and posterior density estimates; From left to right:
φ, σ, ρ, µ, µ, β and ζ.
28
Table 1: Mcmc Results of Different Methods for Univariate Model
EIS-PGAS MM-MHθ Mean St.dev. 95% C.I. IE(θ) Mean St.dev. 95% C.I. IE(θ)
The Dow Chemical vs. Walgreens BootsThe Dow Chemical vs. PayPalWalgreens Boots vs. PayPal
Figure 6: Posterior mean of factor and stochastic volatility process. Left: SV exp(hj,t/2) from the
j = 1, 4-th factors; Middle: Volatility σi,t of three chosen asset returns; Right: Implied time-varying correlations
Corrij,t among the three asset returns.
35
The extracted factors are model-based, so one may be interested in their relationship with
market indicators such as the classic Fama-French factors (Fama and French, 1993). To examine
this, we run simple linear regression of the filtered estimate of four model-based factors on each
one of the three Fama-French factors, i.e. Rm-Rf, SMB, and HML during the same sample
period, and their t-statistics are shown in Figure 7. From the figure, it can be seen that the
variation of Rm-Rf, SMB and HML is respectively explained by the 2-nd, the 1-st and the 3-rd
factor. There is however no statistical evidence that Fama-French factor is jointly explained by
multiple model-based factors6. Based on this, we conjecture that each extracted factor contains
unique market information and measures different systematic movement from the factors con-
structed ad hoc by Fama and French. This exercise can be extended to other market factors.
For example, the momentum factor in the four-factor model of Carhart (1997) which introduces
an extra index describing the stock price’s tenaciousness of moving in one direction captures
systematic content outside the Fama-French three factors.
t sta
tistic
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
Figure 7: Explanatory content of four factors on the three factors of Fama and French (1993).For the left to the right are the t-statistics of regression of posterior mean of f ′t|1:t−1 for all t on each one of the
three Fama-French factors: Rm-Rf, SMB, and HML. Red indicates significant effect.
Table 6 shows the Bayes factor calculated via IS2 marginal likelihood for model specifications
with different number of factors. The number of factors under consideration are between 2 and
6, in line with other literature. Model with 4 factors is preferred over all other specifications, in
particular over model with 5 and 6 factors, which is the choice made by the IC p3 of Bai and
Ng (2002). Also the IC p1 delivers almost equal values for specification with 5 and 6 factors.
Via the use of IS2 for calculating the marginal likelihood, we can safely choose a model with 4
factors. Other comparisons show that the model with 3 factors is slightly preferred over 6-factor
model, and evidently preferred over the model with 5 factors.
6Notice that the model-based factors are identified up to a rotation, but joint significance is not affected bysuch rotations
36
Table 6: Number of Factor Based on Marginal Likelihood
Jeffrey’s scaleNumber of factors
4/2 4/3 4/5 4/6 3/5 3/6 6/5
1-3.2 – – – – –√
–3.2-10 – – – –
√–
√
10-100√ √
– – – – –> 100 – –
√ √– – –
Shaded cell indicates the Bayes factor using IS2 marginal likelihood for one choice of number of factors against an-
other falls into a certain category given by the Jeffrey’s scale.
5.3 Dynamic portfolio management
To see how our proposed model and estimation methods may work in practice, we compare
another five models in terms of VaR and portfolio performance under both one-week and two-
week rebalancing dynamics. Two portfolios are considered: 1). a U.S. portfolio which is the
dataset used in the previous application, i.e. 80 equity return series from components of S&P
100 index; 2). an Australian portfolio containing 41 return series from components of S&P ASX
50 index. A rolling window exercise of size T = 600 is carried out with S = 495 out-of-sample
trading weeks. Throughout the following, our model is abbreviated by HFSV.
5.3.1 Design and alternative models
We choose competing models which also adapt a factor structure, which in fact is usually con-
sidered a viable modelling framework when high-dimensional dataset is of interest7.
The first model is the multivariate stochastic volatility model (MSV) of Chib et al. (2006).
The second model is the same model but augmented with stochastic jumps (MSV-J). MSV-J is
formulated as
yt = Λft +Ktqt + ut,
where factor fj,t is Gaussian with standard stochastic volatility , i.e. no leverage effect or
asymmetry. The idiosyncratic term ui,t is Student’s-t error with standard stochastic volatility.
Kt is a diagonal matrix recording jump size at time t. qi,t is a Bernoulli random variable. MSV
does not have the jump term. We replace the estimation method for stochastic volatility given
in the original paper by our EIS-PGAS algorithm.
7Comparisons with popular models such as the BEKK, DCC, CCC, DECO, VGARCH model, and variants ofthem are left out, because these models do not have a factor structure. Though comparisons with them are stillinteresting, readers may refer to papers of our chosen competing models and references therein.
37
The third model, denoted by CKL, is the factor model of Chan et al. (1999) which writes
yt = Λft + ut
Ωt = ΛVtΛ′ + Ut,
where ft is a vector of constructed (thus observed) factors, which in this exercise contains the
three Fama-French factors8, i.e. ft = ((Rm-Rf)t, SMBt,HMLt)′. The covariance matrix Vt is
computed by rolling-windows, i.e. Vt1L
∑t−1l=t−L flf
′l , and Ut is the sample covariance matrix of
residuals from asset-by-asset regressions which find all rows of Λ.
The fourth model is the dynamic factor multivariate GARCH (DFMG) model of Santos and
Moura (2014). The model also uses constructed factors but is more flexible, and given by
yt = Λtft + ut
Ωt = ΛtVtΛ′t + Ut,
λk,t+1 = λk,t + ηt,
where λk,t is the k-th element of vec(Λt), k = 1, ..., p×n, which follows a random walk. Vt and Ut
are diagonal matrix with each element evolving according to a standard GARCH dynamics. fi,t
and ut are assumed to be Gaussian and Student’s-t, in line with the MSV and MSV-J model. To
estimate the model, one firstly estimate the GARCH dynamics for ft and obtain Vt. Secondly,
if treating constant, Λ can be obtained by OLS. The residuals are then passed into a GARCH-t
filter, delivering Ut. Thirdly, given Vt and Ut for t = 1, ...T , vec(Λt) is obtained via Kalman
filter, and the covariance matrix of ηt is estimated by quasi -maximum likelihood.
The last model we consider is the factor copula (FCO) model of Oh and Patton (2017). This
model provides a novel way of modelling high-dimensional dependence structure and allows for
enough flexibility. Computation complexity for this model is comparable to a factor GARCH
model such as DFMG. For ease of exposition, we leave out the model specification and forecasting
procedure for FCO. Readers may refer to the original paper. Also, we choose GARCH marginals
and a Gaussian factor model for simplicity, which implies a Gaussian conditional copula.
We consider a basic dynamic minimum-variance portfolio (MVP) problem. The MVP is
dynamic because rebalancing is allowed, and the rebalancing decision is based on filtered estimate
of the portfolio’s conditional covariance matrix. The MVP determines the n-by-1 portfolio
8The three factors for the U.S. portfolio is readily found online. We construct those for the Australian portfoliobased on definitions in Fama and French (1993).
38
weights ωt+h|t at time t to rebalance at time t+ h such that
ωt+h|t = arg minωω′Ωt+h|tω, subject to ω′ = 1,
where is a vector of ones. The solution of this MVP problem is given by
ωt+h|t =Ω−1t+h|t
′Ω−1t+h|t
.
For HFSV, MSV and MSV-J model, Ωt+h|t is obtained via methods in Section 3.2 9. For CKL
model, Ωt+h|t is simply set to be equal to Ωt. For DFMG and FCO model, it is straightforward
to use a GARCH-like recursive algorithm to compute Ωt+h|t.
5.3.2 VaR and performance
One important task in portfolio management is the determination of portfolio VaR at time t+ 1
given information up to time t, or V aRp,t+1|t. Provided the portfolio weights ωt+1|t solved from
the MVP problem, the one-step ahead VaR at α% level is given by
V aRp,t+1|t(α) =√ω′t+1|tΩt+1|tωt+1|tF
−1yp,t+1|t
(α),
where F−1yp,t+1|t
(α) is the α-th percentile of the distribution function of the one-step ahead pre-
dicted portfolio return yp,t+1|t = ω′t+1|tyt+1|1:t. For HFSV, MSV and MSV-J model, the distri-
bution function of yt+1|1:t can be readily estimated based on the particle system at time t as
in equation (28). For other models, the conditional forecasting density can derived straightfor-
wardly similar to GARCH type of models.
The unconditional and conditional coverage ratio test used by Chib et al. (2006) is applied
to investigate the quality of VaR estimates. We define the following binary sequence It as
It =
1 if ω′t+1|tyt+1 < V aRp,t+1|t
0 if ω′t+1|tyt+1 ≥ V aRp,t+1|t
.
It = 1 means hits or exceptions. Well behaved VaR estimates means the sequence It should
have correct unconditional coverage ratio, i.e. E(It) = α. A likelihood ratio (LR) test based
on the hit rate (HR) 1T
∑Tt=1 It can be constructed for the unconditional coverage. According
9The sampler for MSV-J has one extra step to draw Kt and qt. See Chib et al. (2006) for details.
39
Table 7: Quality of VaR Estimates for The U.S. Portfolio
The table shows p-values of coverage ratio tests for the Australian portfolio. Also see descriptions of Table 7.
to Christoffersen (1998), for dynamic models conditional coverage ratio is more relevant which
depends on serial independence of It. The test statistic LRuc for testing unconditional coverage
and LRind for serial independence can be constructed using It and they are both asymptotically
χ2(1)-distributed. The combined statistic LRcc = LRuc+LRind for testing conditional coverage
is asymptotically χ2(2)-distributed.
Table 7 and 8 report the p-values of LR tests for the U.S. and Australian portfolio with shaded
cells indicating rejection at 10% level. Comparing HR’s given by different models, HFSV is the
most accurate in estimating VaR, except for the case of Australian portfolio targeting VaR at
1% nominal level. MSV is always less accurate than MSV-J, highlighting the need for modelling
“jumps” or “shocks”. Though MSV incorporates Student’s t-distributed idiosyncratic errors, its
performance implies insufficiency in only modelling asset specific “shocks”. FCO also estimates
VaR well, perhaps with the exception of the U.S. portfolio targeting 5% nominal level, though
test results do not reject its validity.
40
Interestingly, all shaded cells come from either CKL or DFMG, both using constructed
factors. We conjecture this has to do with rebalancing. Because one updates the portfolio
weights based on covariance matrix forecast, on which the estimation of VaR critically depends,
constructed factors are proxies that may not adequately reveal the unobserved factor structure.
As a result, the forecast gets contaminated when a certain proportion of assets deviates from
factors. Additionally, HFSV is the only model taking into account asymmetry and leverage
effect, which are believed to influence HR.
Besides risk management, portfolio performance is also evaluated based on Sharpe ratio (SR)
and information ratio (IR). SR measures the risk-adjusted return per unit of portfolio return
variability. A portfolio that is rebalanced on a h-week basis has
SR(h) =µ(h)
σ(h), where µ(h) =
1
S − h
S−h∑s=1
ω′T+s+h|T+syT+s+h,
σ2(h) =1
S − h
S−h∑s=1
(ω′T+s+h|T+syT+s+h − µ(h)
)2.
IR is often used to set portfolio constraints for managers such as tracking risk limits. It measures
how much excess return can be generated from the amount of excess risk relative to a chosen
benchmark. Here we choose S&P 100 and ASX 50 index return as benchmark for the U.S. and
Australian portfolio. The IR is given by
IR(h) =µ(h)
σ(h), where µ(h) =
1
S − h
S−h∑s=1
ω′T+s+h|T+s(yT+s+h − µB,T+s+h),
σ2(h) =1
S − h
S−h∑s=1
(ω′T+s+h|T+s(yT+s+h − µB,T+s+h)− µ(h)
)2,
where µB,t is the benchmark return at time t.
Model comparisons are carried out in terms of portfolio average weekly return, variance,
SR and IR for the out-of-sample period considered. Table 9 shows that for the U.S. portfolio
the equally weighted portfolio gives the highest variance and the lowest mean return. This
would suggest that the equally weighted portfolio is inefficiently managed and locates inside
the conditional efficient frontier implied by different models, and in the bottom area of the
conditional feasible set10. This is in contrast to the Australian portfolio summarised in Table
10. Among all models, HSFV delivers the lowest portfolio return variance, and it is the only
10The efficient frontier and feasible set are conditional because the mean and covariance matrix of asset returnsat time t+ h is determined conditional on information up to time t.
41
Table 9: The U.S. Minimum Variance Portfolio Performance
The table shows the MV portfolio performance for the Australian portfolio. Also see descriptions of Table 9.
model achieving mean return higher than the equally weighted portfolio under both weekly and
biweekly rebalancing. It means that for other models, the equally weighted portfolio locates in
the upper half of their conditional feasible sets. For the U.S. portfolio rebalanced weekly, HFSV
delivers the second lowest variance, slightly higher than MSV-J. This is due to the jumps in
MSV-J explaining larger variations by stochastic jumps, though the variance under a biweekly
rebalancing policy becomes bigger than HFSV. Another observation is that the return variances
clearly fall in two groups. The first includes HFSV, MSV, MSV-J and FCO, whose factors are
model- and data-based. The second, showing larger variance, includes CKL and DFMG, which
uses constructed factors. This indicates that the conditional efficient frontier implied by the first
group of models lies to the left of that implied by the second group.
Importantly, HFSV delivers the highest SR for the U.S. portfolio under both rebalancing
42
policies, with the main competitor MSV-J. Although the U.S. portfolio managed using HFSV
compensates investors the most for the risk taken, the Australian portfolio rebalanced biweekly
suggests the superior performance of HFSV in relation to the risks investors choose to deviate
from the benchmark, i.e. high IR. Yet for the U.S. portfolio, MSV and MSV-J model give
the highest IR, followed by HFSV. FCO produces moderately-performing SR, but its deviation
from the benchmark fluctuates more, making its IR lower than other models with unobserved
factors. One should notice that because the choice of benchmark is subjective and influences the
calculation of IR, a low IR should not be seen as decisive evidence of poor model performance.
A final remark is that the SR and the IR are low because we only consider MVP. This is to say
that investors are assumed to have infinitely large risk-aversion. Should a certain degree of risk
is allowed and a certain return is required, both can increase.
6 Conclusion
We propose a high-dimensional factor stochastic volatility model with leverage effect using the
generalised hyperbolic skew Student’s t-error to address asymmetry and heavy tails of equity re-
turns. The model is shown to be flexible enough to distinguish asset-specific mean and volatility
dynamics from common factors. With shrinkage technique, the model helps answer the question
whether leverage effect and return asymmetry are systematic or idiosyncratic. A highly efficient
Bayesian estimation procedure to sample hyperparameters and unobserved volatility processes
is developed and we show that based on marginalisation of factors, factor loadings can be sam-
pled efficiently leading to a set of individual stochastic volatility models where particle efficient
importance sampling and refined particle Gibbs with ancestor sampling can be used. Addition-
ally, importance sampling squared accurately calculates marginal likelihood to determine the
number of factors. Our detailed Monte Carlo study on both univariate and multivariate models
provides evidence on the successful implementation of the proposed model and method. We
apply our model to a U.S. dataset with 80 assets, and find that large proportion of return asym-
metry comes from factors, indicating a co-skewness systematic phenomenon. Lastly, minimum
variance portfolio exercises for the U.S. portfolio and another Australian portfolio show that
estimation of VaR is very accurate using our proposed model. Under both weekly and biweekly
rebalancing policies, the model outperforms other factor models.
43
References
Aas, K. and Haff, I. H. (2006). The generalized hyperbolic skew Student’s t-distribution. Journalof Financial Econometrics, 4(2):275–309.
Aguilar, O. and West, M. (2000). Bayesian dynamic factor models and portfolio allocation.Journal of Business & Economic Statistics, 18(3):338–357.
Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342.
Asai, M., McAleer, M., and Yu, J. (2006). Multivariate stochastic volatility: a review. Econo-metric Reviews, 25(2-3):145–175.
Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.Econometrica, 70(1):191–221.
Bickel, P., Li, B., Bengtsson, T., et al. (2008). Sharp failure rates for the bootstrap particlefilter in high dimensions. In Pushing the limits of contemporary statistics: Contributions inhonor of Jayanta K. Ghosh, pages 318–329. Institute of Mathematical Statistics.
Bollerslev, T. (1987). A conditionally heteroskedastic time series model for speculative pricesand rates of return. The review of economics and statistics, pages 542–547.
Bollerslev, T. (1990). Modelling the coherence in short-run nominal exchange rates: a multi-variate generalized arch model. The review of economics and statistics, pages 498–505.
Bollerslev, T., Engle, R. F., and Nelson, D. B. (1994). Arch models. Handbook of econometrics,4:2959–3038.
Carhart, M. M. (1997). On persistence in mutual fund performance. The Journal of finance,52(1):57–82.
Carrasco, M. and Chen, X. (2002). Mixing and moment properties of various garch and stochasticvolatility models. Econometric Theory, 18(01):17–39.
Chan, J. C., Leon-Gonzales, R., and Strachan, R. W. (2013). Invariant inference and efficientcomputation in the static factor model.
Chan, L. K., Karceski, J., and Lakonishok, J. (1999). On portfolio optimization: Forecastingcovariances and choosing the risk model. Review of Financial Studies, 12(5):937–974.
Chib, S. (2001). Markov chain Monte Carlo methods: computation and inference. Handbook ofeconometrics, 5:3569–3649.
Chib, S. and Greenberg, E. (1994). Bayes inference in regression models with ARMA (p, q)errors. Journal of Econometrics, 64(1-2):183–206.
Chib, S., Nardari, F., and Shephard, N. (2006). Analysis of high dimensional multivariatestochastic volatility models. Journal of Econometrics, 134(2):341–371.
Chib, S., Omori, Y., and Asai, M. (2009). Multivariate stochastic volatility. In Handbook ofFinancial Time Series, pages 365–400. Springer.
Chopin, N., Singh, S. S., et al. (2013). On the particle Gibbs sampler. CREST.
Christoffersen, P. F. (1998). Evaluating interval forecasts. International economic review, pages841–862.
Clyde, M. and George, E. I. (2004). Model uncertainty. Statistical science, pages 81–94.
Creal, D., Koopman, S. J., and Lucas, A. (2012). A dynamic multivariate heavy-tailed modelfor time-varying volatilities and correlations. Journal of Business & Economic Statistics.
Danielsson, J. (1998). Multivariate stochastic volatility models: estimation and a comparisonwith VGARCH models. Journal of Empirical Finance, 5(2):155–173.
De Jong, P. and Shephard, N. (1995). The simulation smoother for time series models.Biometrika, 82(2):339–350.
Del Moral, P. and Formulae, F.-K. (2004). Genealogical and interacting particle systems withapplications, Probability and Its Applications.
Dempster, M. A. H., Pflug, G., and Mitra, G. (2008). Quantitative Fund Management. Chapmanand Hall/CRC.
44
Doucet, A., De Freitas, N., and Gordon, N. (2001). An introduction to sequential Monte Carlomethods. In Sequential Monte Carlo methods in practice, pages 3–14. Springer.
Doz, C., Giannone, D., and Reichlin, L. (2011). A two-step estimator for large approximatedynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1):188–205.
Durbin, J. and Koopman, S. J. (1997). Monte carlo maximum likelihood estimation for non-gaussian state space models. Biometrika, 84(3):669–684.
Durbin, J. and Koopman, S. J. (2000). Time series analysis of non-Gaussian observations basedon state space models from both classical and Bayesian perspectives. Journal of the RoyalStatistical Society: Series B (Statistical Methodology), 62(1):3–56.
Durbin, J. and Koopman, S. J. (2012). Time series analysis by state space methods. Number 38.Oxford University Press.
Engle, R. (2002). Dynamic conditional correlation: A simple class of multivariate generalizedautoregressive conditional heteroskedasticity models. Journal of Business & Economic Statis-tics, 20(3):339–350.
Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds.Journal of financial economics, 33(1):3–56.
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2012). The generalized dynamic factor model.Journal of the American Statistical Association.
French, K. R., Schwert, G. W., and Stambaugh, R. F. (1987). Expected stock returns andvolatility. Journal of financial Economics, 19(1):3–29.
Geweke, J. and Tanizaki, H. (2001). Bayesian estimation of state-space models using theMetropolis–Hastings algorithm within Gibbs sampling. Computational Statistics & Data Anal-ysis, 37(2):151–170.
Gilks, W. R., Best, N., and Tan, K. (1995). Adaptive rejection Metropolis sampling withinGibbs sampling. Applied Statistics, pages 455–472.
Jacquier, E., Polson, N. G., and Rossi, P. E. (2004). Bayesian analysis of stochastic volatilitymodels with fat-tails and correlated errors. Journal of Econometrics, 122(1):185–212.
Jung, R. C. and Liesenfeld, R. (2001). Estimating time series models for count data usingefficient importance sampling. AStA Advances in Statistical Analysis, 4(85):387–407.
Kim, S., Shephard, N., and Chib, S. (1998). Stochastic volatility: likelihood inference andcomparison with ARCH models. The Review of Economic Studies, 65(3):361–393.
Koop, G., Poirier, D. J., and Tobias, J. L. (2007). Bayesian econometric methods. CambridgeUniversity Press.
Koopman, S. J. and Hol Uspensky, E. (2002). The stochastic volatility in mean model: empiricalevidence from international stock markets. Journal of applied Econometrics, 17(6):667–689.
Liesenfeld, R. and Richard, J.-F. (2006). Classical and bayesian analysis of univariate andmultivariate stochastic volatility models. Econometric Reviews, 25(2-3):335–360.
Lindsten, F., Jordan, M. I., and Schon, T. B. (2014). Particle gibbs with ancestor sampling.Journal of Machine Learning Research, 15(1):2145–2184.
Nakajima, J. (2015). Bayesian analysis of multivariate stochastic volatility with skew returndistribution. Econometric Reviews, pages 1–23.
Nakajima, J. and Omori, Y. (2012). Stochastic volatility model with leverage and asymmetricallyheavy-tailed error using GH skew Student’s t-distribution. Computational Statistics & DataAnalysis, 56(11):3690–3704.
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econo-metrica: Journal of the Econometric Society, pages 347–370.
Oh, D. H. and Patton, A. J. (2017). Modeling dependence in high dimensions with factorcopulas. Journal of Business & Economic Statistics, 35(1):139–154.
Olsson, J. and Ryden, T. (2011). Rao-blackwellization of particle markov chain monte carlomethods using forward filtering backward sampling. IEEE Transactions on Signal Processing,59(10):4606–4619.
Pitt, M. and Shephard, N. (1999b). Time varying covariances: a factor stochastic volatility
45
approach. Bayesian statistics, 6:547–570.
Pitt, M. K. and Shephard, N. (1999a). Filtering via simulation: Auxiliary particle filters. Journalof the American statistical association, 94(446):590–599.
Richard, J.-F. and Zhang, W. (2007). Efficient high-dimensional importance sampling. Journalof Econometrics, 141(2):1385–1411.
Ruiz, E. (1994). Quasi-maximum likelihood estimation of stochastic volatility models. Journalof econometrics, 63(1):289–306.
Santos, A. A. and Moura, G. V. (2014). Dynamic factor multivariate garch model. ComputationalStatistics & Data Analysis, 76:606–617.
Scharth, M. and Kohn, R. (2016). Particle efficient importance sampling. Journal of Economet-rics, 190(1):133–147.
Shephard, N. and Pitt, M. K. (1997). Likelihood analysis of non-gaussian measurement timeseries. Biometrika, 84(3):653–667.
Snyder, C., Bengtsson, T., Bickel, P., and Anderson, J. (2008). Obstacles to high-dimensionalparticle filtering. Monthly Weather Review, 136(12):4629–4640.
Takahashi, M., Omori, Y., and Watanabe, T. (2009). Estimating stochastic volatility modelsusing daily returns and realized volatility simultaneously. Computational Statistics & DataAnalysis, 53(6):2404–2426.
Tran, M.-N., Scharth, M., Pitt, M. K., and Kohn, R. (2014). Importance sampling squared forbayesian inference in latent variable models. Available at SSRN 2386371.
Vardi, N. (2015). Top quant hedge funds stand out with good 2015.
Watanabe, T. and Omori, Y. (2004). A multi-move sampler for estimating non-gaussian timeseries models: Comments on shephard & pitt (1997). Biometrika, pages 246–248.
Wright, S. and Nocedal, J. (1999). Numerical optimization. Springer Science, 35:67–68.
Yu, J. (2005). On leverage in a stochastic volatility model. Journal of Econometrics, 127(2):165–178.
46
Appendices
A The leverage effect multiplier
The leverage effect for the univariate SV model (1) is Corr(νt, ηt) = Cov(νt, ηt)/√
Var(νt)Var(ηt),
where the numerator
Cov(νt, ηt) = E(√Wt)ρσ.
Since Wt ∼ IG( ζ2 ,ζ2), 1
ζWt is IG( ζ2 ,12)-distributed or Inv − χ2(ζ) distributed. Let Wt =
√Wt,
we have 1ζ W
2t ∼ Inv − χ2(ζ) with Jacobian 2
ζ Wt. It follows that
E(Wt) =
∫ ∞0
2
ζW 2t
2−ζ/2
Γ(ζ/2)ζ
2ζ+2 W
−(ζ+2)t exp(
−ζ2W 2
t
)dWt
=1√ζWt
∫ ∞0
1
2ζ/2−1Γ(ζ/2)
(1√ζWt
)−ζ−1
exp(−1
2
1√ζWt)
−2dWt
=
√ζ
2ζ/2−1Γ(ζ/2)
∫ ∞0
y−ζ exp(1
−2y2)dy
=
√ζ
2ζ/2−1Γ(ζ/2)
∫ ∞0
2ζ/2−3/2zζ/2−1 exp(−z)dz
=
√ζΓ((ζ − 1)/2
)√
2Γ(ζ/2),
where we use substitution y ≡ 1√ζWt and z ≡ 1
2y−2. In the denominator, the variance of the
generalised hyperbolic skew Student’s t-distributed error νt is given by Aas and Haff (2006) (in
their parametrisation δ2 and v are both equivalent to our ζ), i.e.
Var(νt) =2β2ζ2
(ζ − 2)2(ζ − 4)+
ζ
ζ − 2.
With these quantities, the unconditional leverage effect multiplier can be shown to be to one
given in Section 2.1.
B Discussion on two theorems
Theorem 1 can be proved following the arguments in Lindsten et al. (2014) who show the
invariance between PGAS with a bootstrap filter and the one without AS. It might be of interest
to establish the equivalence between EIS-PGAS and EIS-PG (which is also an alternative proof).
One can notice that they are not the same as the bootstrap filter where the importance density
47
is simply p(xt|xt−1) and thus the sampling weights are proportional to p(yt|xit), i.e. independent
of ancestor trajectories. Using the notations of Section 2.2 and letting LMθ (x∗1:T , B) for M ≥ 0
denote the Markov kernel implied by EIS-PG on (X∗1:T ,F1:T ), we propose the following
Proposition 1. Suppose EIS is the importance density used in particle filtering for PGAS and
PG, i.e. q(xt|xt−1, y1:T ) as in (8). For any x∗1:T ∈ X∗1:T ,
KMθ (x∗1:T , B) = LMθ (x∗1:T , B), ∀B ∈ F1:T .
To proceed, firstly suppose the final product of EIS-PGAS is the k-th chosen sample xk1:T .
The kernel is then given by
KMθ (x∗1:T , B) = Eθ,x∗1:T1B(xk1:T ) = Eδ1:T1B(xk1:T ).
The expectation is with respect to all random number generated in the algorithm, i.e. x1:T , a2:T , k ∈
R2T × N2(T−1)+1/0 . The last equation comes from the fact that their distribution function is de-
fined by the EIS importance density parameter vector δt = bt, ct, st, rt11 as in (6), which is
identical for both samplers.
Following Lindsten et al. (2014), the ancestor index can be recursively written by αt = aαt+1
t+1
going backward from αT = k, and without loss of generality we take the measurable rectangle
set B = ΠTt=1Bt and for all t = 1, ..., T , Bt ∈ Ft where Ft is the natural filtration, i.e. B is a
π-system generating F1:T . So we can write these two kernels as
KMθ (x∗1:T , B) = E
(T∏t=1
1Bt(xαtt )∣∣δt) , and LMθ (x∗1:T , B) = E
(T∏t=1
1Bt(xβtt )∣∣δt) .
It suffices to show that for all bounded and multiplicative functionals f(x1:T ) =∏Tt=1 ft(xt)
we have Eδ1:T(f(xα1:T
1:T ))
= Eδ1:T(f(xβ1:T1:T )
). Because EIS-PG sampler is essentially a backward
simulator running forward. This can be established via backward induction according to Olsson
and Ryden (2011) and Lindsten et al. (2014). Suppose this holds for t < T and s > t, i.e.
E
(T∏
s=t+1
fs(xαss )∣∣δs) = E
(T∏
s=t+1
fs(xβss )∣∣δs) .
The induction hypothesis can be shown to hold following the equivalence between a backward
11And they are determined by the previous draw in the MCMC run, i.e. the reference trajectory x∗1:T , andother hyperparameters θ.
48
simulator and a bootstrap filter in Olsson and Ryden (2011). To see this, remember that both
EIS-PG and EIS-PGAS choose χq(xT ; δT+1) = 1. Since this choice is arbitrary, we can make δT
contain all zeros. As a result, kq(xT , xT−1; δT ) = p(xT |xT−1, yT−1); namely this choice also makes
ωiT for i = 1, ...,M + 1 proportional to p(yT |xT ). So αT and βT are equally distributed. Using
the arguments in the Appendix of Lindsten et al. (2014) and their instrumental representation
of PGAS, the induction can be completed.
Proposition 1 shows the kernels defined by EIS-PGAS and EIS-PG are equivalent. It might
be interesting to see how EIS-PGAS improves the mixing of MCMC from PGAS with a bootstrap
filter. We do not attempt to derive a formal proof, but from equation (11) one can see that the
smaller the variance of ωit−1 the larger the probability p(aM+1t 6= M + 1), i.e. the probability of
the ancestor path of the reference trajectory being different from its original one. EIS is deigned
to minimise the variance of logarithm of the importance weight (see equation (9) and (10)) so
it is expected to be nearly optimal in maximising p(aM+1t 6= M + 1).
Theorem 2 bounds the total variation distance between KMθ (x∗1:T , B) and∫B p(x1:T |y1:T )dx1:T
with the assumption that all importance weights are bounded from above by a constant ωθ <∞.
The proof follows from Doeblin’s theorem, and uniform ergodicity can be established (Doucet
et al., 2001). One may argue that the upper limit condition for importance weights is too
strong in practice (otherwise there would not be any degeneration of particle system). A more
natural condition is to bound the variance of importance weights by a constant. This applies
particularly to our case because EIS importance density minimises the quadratic distance to the
target density. We conjecture that the quadratic Kantorovich distance between (KMθ )n(x∗1:T , ·)
and any other PG kernel without EIS is positive even when n→∞.
C Monte Carlo study of the factor SV model
This section details a simulation study on the high-dimensional factor SV model. As shown in
Section 2, the factor SV model with n assets and p factors boils down to n + p individual SV
models which can be analysed in parallel once the factors and factor loadings are sampled. We
show that the multivariate model is able to achieve efficiency comparable to a univariate model
as expected, especially with the marginalisation of factors and sampling factor loadings Λ based
on a Laplace approximation. In practice, it is important to apply the right degree of shrinkage
on leverage effect and skewness and to determine the number of factors. We demonstrate the
effectiveness and efficiency of picking the right model using the IS2 with PEIS method of Tran
49
et al. (2014) and Scharth and Kohn (2016) applied to our model.
C.1 Model setup
Our baseline model has 50 assets with 8 factors, the same dimensionality as the model of Chib
et al. (2006), but notice in our model there are more than a thousand parameters to estimate.
One feature of our model is the shrinkage on leverage effect and skewness, so we also consider
DGP without leverage effect or skewness, as well as DGP with non-zero leverage effect and
skewness for all factors and asset-specific processes, i.e. containing p+n non-zero leverage effect
parameter ρ’s and skewness parameter β’s. We denote the following DGPs:
• sLE sSK: some have leverage effect, and some have skewness;
• sLE aSK: some have leverage effect, and all have skewness;
• aLE sSK: all have leverage effect, and some have skewness;
• aLE aSK: all have leverage effect and skewness;
• nLE nSK: none has leverage effect or skewness.
“Some”, “all” and “none” in the above definitions refer to the p+n univariate series of the factor
and asset-specific processes, i.e. fj,tpj=1 and ui,tni=1 for t = 1, ..., T . For example, sLE sSK
means that this simulated dataset has non-zero leverage effect and skewness in some of the p+n
series, while all of the series in the dataset aLE sSK has leverage effect but some of them have
skewness.
When a dataset has leverage effect or skewness in some of the p + n univariate series, a
random vector is generated from a binomial distribution with 0.5 probability of success and p+n
trials which serve as an index vector indicating which series have leverage effect or skewness.
Accordingly, we choose a beta prior for the shrinkage parameter introduced in Section 2.3.1
∆ϑ ∼ Beta(2, 2), ∆β ∼ Beta(2, 2).
We assume a flat normal prior for the free elements of Λ or λij ∼ N(0, 10), but we generate those
elements for the simulation study from N(1, 1) so that the prior is effectively non-informative.
Other hyperparameters are generated from their prior distributions given in (29), except that
only negative β’s (if not zero) are selected. Such a design aims to reflect the dynamics and
stylised facts of daily equity returns.
50
The method of Chib and Greenberg (1994) samples the factor loadings Λ with the factors
ft for t = 1, ..., T marginalised out. They compare the their posterior output to the result
obtained via conditioning on the factors as in Pitt and Shephard (999b) and Aguilar and West
(2000). They show that in the case of 4 factors, the sampling of Λ can be 20 and 40 times more
efficient than the method which samples Λ either by column of by row conditional on the factors,
measured by inefficiency factor. And in the case of 8 factors, the efficiency gain can be 80-fold.
We apply their idea of using an MH sampler based on a Laplace approximation for the conditional
posterior distribution of Λ, so we can expect similar efficiency gain with the marginalisation of
factors. Therefore in the following, we do not compare difference of sampling efficiency resulted
from marginalisation of factors, but instead we focus on the effect of EIS proposal and ancestor
sampling used in particle Gibbs algorithm, similar to our simulation study the univariate SV
model. In the next subsections, four estimation methods, i.e. EIS-PGAS, EIS-PG, BF-PGAS
and MM-MH implemented to analyse p+n individual SV models, are considered and compared.
We simulate each dataset with length T = 2000. Figure 8 illustrates the 1-st and 50-th
simulated return series, i.e. y1,t and y50,t as well as the 1-st and 8-th factor, i.e. f1,t and f8,t
with their respective SV process l1,t, l50,t, h1,t and h8,t. Applying the initialisation and MCMC
0 500 1000 1500 2000
0.04
0.02
0.00
0.02
0.04
retu
rn
y1, t
0 500 1000 1500 20000.15
0.10
0.05
0.00
0.05
0.10
0.15y50, t
0 500 1000 1500 20000.04
0.03
0.02
0.01
0.00
0.01
0.02
0.03
0.04f1, t
0 500 1000 1500 2000
0.04
0.02
0.00
0.02
0.04f8, t
0 500 1000 1500 200011.5
11.0
10.5
10.0
9.5
9.0
8.5
8.0
log-
vola
tility
l1, t
0 500 1000 1500 200012.5
12.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5l50, t
0 500 1000 1500 200011.5
11.0
10.5
10.0
9.5
9.0
8.5h1, t
0 500 1000 1500 200012.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5
8.0h8, t
Figure 8: Simulated return series and factors with their respective log-volatility from model(16). Upper panel: return series; Bottom panel: log-volatility.
algorithm detailed in Section 2, we run the sampler for 22, 000 iterations for posterior inference
with the first 2000 burn-in samples discarded. In our experiment of applying EIS-PGAS, the
number of MCMC iterations can be safely halved without much difference in terms of posterior
statistics and efficiency. But in order to have reliable posterior comparisons with the other three
methods, we keep the number of iterations at 22, 000, anticipating different degrees of sampling
51
inefficiency for EIS-PG, BF-PGAS and MM-MH.
C.2 Estimation results
Firstly, we discuss some estimation results from applying our proposed method EIS-PGAS to
the most interesting dataset sLE sSK. Figure 9 reports the posterior means and sample standard
deviations of the hyperparameters related to the 58 SV models for fj,t8j=1 and ui,t50i=1 (i.e.
all parameters except for Λ), together with their true DGP values. The top three graphs from
the left to the right are the results for φ, σ, and ρ, while the bottom three graphs from the
left to the right are the results for µ, β, and ζ. All x-axes correspond to the 58 individual
SV models with the first 8 coordinates indicating respective factors and the rest relating to the
asset-specific processes. We represent the true DGP value and posterior mean of each parameter
using a pair of line graphs with value shown on the left y-axis of each graph, and the sample
standard deviations are given by the scatter plot with values indicated by the right y-axis.
10 20 30 40 50
para
met
er e
stim
ate
0.9
0.95
1(i)
stan
dard
dev
iatio
n
0
0.02
0.04
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22(ii)
stan
dard
dev
iatio
n
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
-1
-0.5
0(iii)
stan
dard
dev
iatio
n
0
0.2
0.4
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
-11.5
-11
-10.5
-10
-9.5
-9
-8.5
-8(iv)
stan
dard
dev
iatio
n
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
-4
-2
0
2(v)
stan
dard
dev
iatio
n
0.05
0.1
0.15
0.2
Pos. meanDGPstd.
10 20 30 40 50
para
met
er e
stim
ate
10
20
30
40
50(vi)
stan
dard
dev
iatio
n
2
4
6
8
10
Pos. meanDGPstd.
Figure 9: EIS-PGAS estimated posterior means and standard deviations of stochastic volatilitymodel parameters for dataset sLE sSK. (i): φ; (ii): σ; (iii): ρ; (iv): µ; (v): β; (vi): ζ. Coordinates 1 to
8 on all x-axes indicate factors fj,t for j = 1, ..., 8 and the rest correspond to ui,t for i = 1, ..., 50. Left y-axes:
parameter values; Right y-axes: sample standard deviations.
The results suggest that EIS-PGAS can efficiently estimate all SV hyperparameters accu-
rately. The posterior means of the autoregressive parameter φ, volatility of volatility parameter
σ and unconditional mean of log-volatility µ are close to their true DGP values, especially for the
factors. One or two µ’s and σ’s may be deemed slightly away from their DGP values but with
the standard deviation taken into account, these deviations resulted from a specific simulated
sample path are believed to be reasonably small.
52
From the bottom right graph of Figure 9 one can notice some discrepancies between the
posterior means of ζ’s and their true DGP values and for some of the SV models the sample
standard deviations of ζ’s obtained from the Markov chain are also relatively high. Probably
the d.o.f parameters ζ’s are the poorest estimated ones among all parameters, a result in line
with Nakajima and Omori (2012) which applies MM-MH to model (1). But in section 4.1 we
notice that EIS-PGAS is significantly more efficient than the three alternative methods, as seen
in Table 1 and 2. Of our interest is the effect of the shrinkage prior assumed for the leverage
effect parameter ρ and skewness parameter β. The shrinkage is expected to detect zero leverage
effect and skewness in the dataset sLE sSK automatically, similar to the case of variable selection
discussed by Clyde and George (2004). The vertical line in the top right and bottom middle
graph of Figure 9 indicates zero leverage effect or skewness for a particular individual SV process.
It can be seen that whenever the DGP value is zero, EIS-PGAS effectively gives zero posterior
mean. This confirms that the shrinkage priors help determine zero leverage effect and skewness
and consequently makes the model more parsimonious. However also from the first column of
Figure 11 which shows the posterior probability of a zero parameter estimated by EIS-PGAS,
we can see that all ρ’s in the upper row and β’s in the lower row are “forced” to collapse towards
zero, causing some near-zero leverage effect and skewness parameters to be shrunken to zero.
But we find that the cost of slight over-shrinkage is minor when applying IS2 to calculate the
marginal likelihood of a dataset and associated Bayesian factors.
We report the posterior results of the 1-st, 4-th, 6-th and 8-th factor loadings, i.e. the
respective columns of Λ in Figure 10. True GDP values and posterior means are illustrated by
the bar charts with values corresponding to the left y-axis, while sample standard deviations
are shown by the scatter plots with right y-axis. It is easy to see that EIS-PGAS is able to
estimate the factor loadings very accurately with a flat prior. Though our proposed factor SV
model takes a much more complex form than that of Chib et al. (2006), we have the same
conclusion that the estimation efficiency for factor loadings is mainly due to marginalisation of
factors when sampling the loading matrix Λ based on a Laplace approximation. Furthermore,
it is not affected by the presence of leverage effect, skewness and heavy-tailedness modelled for
factor dynamics.
Table 11 shows the correlation between posterior means of a vector of parameters and their
true DGP with the sum of total absolute deviations in the bracket as a measure of estimation
accuracy. The first row of the first panel in Table 11 shows those statistics for EIS-PGAS
applied to sLE sSK. We can see that except for the ζ’s which have a correlation coefficient 0.85,
53
5 10 15 20 25 30 35 40 45 50
fact
or lo
adin
gs
-5
0
5
stan
dard
dev
iatio
n
0
0.2
0.4
Pos. meanDGPstd.
5 10 15 20 25 30 35 40 45 50
fact
or lo
adin
gs
-2
0
2
4
6
stan
dard
dev
iatio
n
0
0.1
0.2
0.3
0.4
Pos. meanDGPstd.
10 15 20 25 30 35 40 45 50
fact
or lo
adin
gs
-5
0
5
stan
dard
dev
iatio
n
0
0.2
0.4
Pos. meanDGPstd.
10 15 20 25 30 35 40 45 50
fact
or lo
adin
gs
-2
0
2
4
stan
dard
dev
iatio
n
0
0.1
0.2
0.3
Pos. meanDGPstd.
Figure 10: EIS-PGAS estimated posterior means and standard deviations of factor loadings fordataset sLE sSK. From the top to bottom: loadings for the 1-st, 4-th, 6-th and 8-th. Left y-axis: parameter
values; Right y-axis: standard deviations
all parameters are highly correlated with their DGP counterparts, over 0.94. This suggests that
EIS-PGAS is capable of sampling parameters related to both factor and asset-specific processes
accurately from the joint posterior distribution.
C.3 Comparisons among methods
Among the four estimation methods, BF-PGAS is the easiest to implement, and methods in-
volving EIS is more complicated as one needs to build p+n importance densities in each run of
the MCMC sampler for all SV series and the inverse gamma mixture component. The MM-MH
method of Nakajima and Omori (2012) is built on the classic multi-move sampler of Shephard
and Pitt (1997) and works satisfactorily in the univariate case as demonstrated in the section
4.1. The following shows the estimation efficiency and accuracy for the multivariate extension.
Table 11 summarises the correlation between the posterior means of all parameters estimated
by the four methods and their DGP values under different datasets. Also, the mean absolute
deviations between parameters and their DGP values are reported in the table. Both statistics
serve as a metric for accuracy. EIS-PGAS and EIS-PG are the best to achieve high correlations
for the parameter estimates, and under all datasets EIS-PGAS works better than the other
with only two correlation coefficients below 0.9 and none smaller than 0.8. The mean absolute
deviations given by the two methods are also the smallest among the four, especially for the
d.o.f parameter ζ which is the poorest estimated parameter for all methods and datasets. For
54
example, under the dataset sLE sSK, the mean absolute deviation for ζ given by these two
methods is half of that given by MM-MH, and on fifth of BF-PGAS. It is thus evident that the
EIS part in the algorithm contributes to the estimation accuracy.
Ancestor sampling also improves accuracy slightly as EIS-PGAS gives a bit smaller mean
absolute deviations in most of the cases than EIS-PG except for ρ under aLE saSK. Another
evidence that ancestor sampling may help improve accuracy is that the correlation coefficients
given by EIS-PGAS seem to fluctuate less under different datasets than those given by EIS-PG,
and so do mean absolute deviations. Though as is shown in Table 1 and 2 that for a univariate
SV model, the ancestor sampling algorithm renders estimates more accurate, further study is
needed to pin down its effect on accuracy for the high-dimensional factor SV model. Different
from its performance of estimating a univariate model, MM-MH does not provide correlation
coefficients as high as EIS-PG(AS). The autoregressive coefficient φ, which is relatively easy
to estimate, shows a correlation lower than 0.9 under sLE sSK and sLE aSK. The correlation
for unconditional mean µ is also lower than 0.9 under aLE sSK and aLE aSK, while under all
datasets EIS-PG(AS) is able to estimate µ with correlation always higher than 0.9. In terms
of ζ, MM-MH is less than satisfactory with the highest correlation smaller than 0.85 and the
lowest not exceeding 0.7. Mean absolute deviation also suggests that MM-MH is outperformed
by EIS-PG(AS). For example, under aLE sSK MM-MH gives the mean absolute deviation for µ
larger than 1, while that is only 0.28 and 0.29 given by EIS-PGAS and EIS-PG.
BF-PGAS is the worst performing estimation method with correlation coefficients much lower
and mean absolute deviations much higher than the other three methods. The mean absolute
deviations for φ, ρ and ζ are high compared with their parameter values indicating that for those
parameters this method fails. Though one may argue that the correlation coefficient of 0.91 for φ
given by BF-PGAS under nLE nSK does not suggest much difference from EIS-PG(AS) which
gives correlation 0.97, we notice that the mean absolute deviation given by BF-PGAS is 0.1
while the corresponding value given by EIS-PGAS is only 0.01. Considering the autoregressive
coefficient is often around 0.98 as shown in the top left graph of Figure 9, it is believed that
the posterior mean resulted from BF-PGAS contains a coherent and large bias. This holds
true also for other parameters. The inaccuracy of BF-PGAS is likely due to the dimensionality
involved. It is shown by the asymptotic results of Snyder et al. (2008) and Bickel et al. (2008)
that the inevitable impoverishment of particle quality and the tendency of the particle system to
collapse as the step t goes away from the initialisation t = 0 is because the number of particles
cannot scale exponentially with the dimension of observations n, and bootstrap filter suffers
55
Table 11: Accuracy Comparisons of Different Methods Under Different Datasets
Reported are the correlation between posterior means from four estimation methods applied to different datasets
and true DGP values with mean absolute deviations given in the bracket.
56
from sharper collapsing rate. In our model, n = 50 and bootstrap filter would require millions of
particles to avoid collapse which limits its practical use for our model. As a result, resampling
has to take place at every t. A direct consequence for this is that BF-PGAS becomes highly
inefficient and inaccurate.
In the whole set of system parameters, the factor loadings Λ are the best estimated ones,
with EIS related methods showing a correlation bigger than 0.98. The smallest correlations of Λ
given by MM-MH and BF-PGAS are 0.93 and 0.79 respectively. The mean absolute deviations
for Λ are also low except for BF-PGAS under all datasets. This shows the effectiveness of our
proposed sampling method for the factor loadings.
Figure 11 shows the posterior probability of zero leverage effect and zero skewness, i.e.
p(ρ|y1:T ) = 0 and p(β|y1:T ) = 0, both of which are a (n + p)-dimensional vector where ρ =
ρfj , ρui and β = βfj , βui for i = 1, ..., n and j = 1, ..., p. The black dots on top of each graph
indicate zero DGP value for the corresponding series, and we represent the posterior probability
of being zero estimated from different methods using different symbols. Notice that the estimate
for ρ and β is obtained with the help of shrinkage prior introduced in section 2.3.1, so new draws
in the MCMC sampler for all elements of ρ and β have non-zero probability to be zero. This
means if one point is closer to a black dot, the better the respective method is able to tell zero
leverage effect or skewness in the DGP. When both zero leverage effect and zero skewness are
present in some of factors and asset-specific processes, EIS-PGAS has the fewest points located
in the “ambiguity” area, namely in the middle of 0 and 1. One can see that whenever ρ = 0
and β = 0, EIS-PGAS gives posterior probability of being zero that is larger than 0.9 for ρ and
0.8 for β. Under sLE sSK, there are three cases of overshrinkage for ρ and just one for β using
EIS-PGAS, while other methods obviously overestimate the posterior zeros probabilities.
EIS-PG and MM-MH perform similarly in determining zero parameters, and they are not
that worse than EIS-PGAS when the DGP values are zero, but these two methods, especially
MM-MH deliver too many points in the ambiguity area. In other words, when the leverage
effect and skewness have non-zero DGP, EIS-PG and MM-MH hesitate more than EIS-PGAS
to assign non-zero values for those parameters. This observation under sLE sSK carries over
to all datasets except nLE nSK, which highlights the use of ancestor sampling if one aims to
not only detecting zero parameters but also accurately estimating non-zero parameters. Under
aLE aSK except for BF-PGAS, all other three methods do not suggest overshrinkage, but EIS-
PG and MM-MH have more points in the ambiguity area than EIS-PGAS, particularly for ρ.
This shows the effect of shrinkage prior on leverage and skewness when importance sampling is
57
coupled with ancestor sampling. Moreover, under nLE nSK or when all members of ρ and β are
equal to zero, EIS-PG(AS) and MM-MH perform equally well with all posterior probability of
zero parameters approaching one. BF-PGAS is the worst estimation method among all, suffering
from its inaccuracy of estimation.
0 10 20 30 40 50
leve
rage
effe
ct
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
EIS-PGASMM-MHEIS-PGBF-PGAS
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 500.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sLE_sSK0 10 20 30 40 50
skew
ness
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sLE_aSK0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
aLE_sSK0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
aLE_aSK0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
nLE_nSK0 10 20 30 40 50
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 11: Posterior probability of zero leverage effect and skewness estimated by differentmethods under different datasets. Upper row: leverage effect parameter ρ; Lower row: skewness parameter
β. Black dots at the top of each graph indicate zero parameter for corresponding series in the DGP. Coordinates
1 to 8 on all x-axes indicate factors fj,t for j = 1, ..., 8 and the rest correspond to ui,t for i = 1, ..., 50.
To examine the estimation efficiency, and the efficiency of different MCMC algorithms, we
calculate the inefficiency factor IE(θ) with a vector of system parameters θ. Because under
different datasets, the results are similar, we only report those under aLE aSK12. Table 12
reports the medium of inefficiency factors obtained from the four methods under aLE aSK with
10-th and 90-th percentile in the bracket. A quick observation is that the factor SV model
is although high-dimensional, that during one run of the MCMC sampler it can be put into
n+ p individual univariate model (1) once the factor fj,tpj=1 is sampled greatly improves the
efficiency which is comparable to the univariate model. IE(φ) and IE(µ) are the smallest two
among the four methods, but those given by EIS-PGAS are less than half of those given by the
other three methods. MM-MH even produces the estimate for µ with a medium inefficiency
factor of 42.48, six times larger than 7.48 given by EIS-PGAS. As in the univariate model,
ancestor sampling contributes a lot to the efficiency of the MCMC sampler. EIS-PGAS is at
12This particular dataset is chosen because it has non-zero leverage effect and skewness across all factors andasset-specific process. Other datasets with either ρ or β or both equal to zero show larger inefficiency factor, Butthis is due to many consecutive zeros in the Markov chain.
58
Table 12: Inefficiency Factor for Parameter Estimates Under aLE aSK
factors, with EIS-PG(AS) and MM-MH giving correlation larger than 0.9 and 0.75 respectively
under all datasets. The difference between correlation for f1,t and for fj,t with j 6= 1 is likely
due to the identification restriction imposed on the loading matrix Λ. The correlation for factor
estimates given by EIS-PG is on average slightly lower than EIS-PGAS with exceptions found in
f5,t under sLE sSK and f7,t under nLE nSK. This suggests that ancestor sampling adds certain
degree of precision because of the efficiency gain on top of EIS. In case of ht and lt, EIS-PGAS
is also the best estimation method. For example under sLE sSK, both EIS-PG and MM-MH
given correlation smaller than 0.7 for h2,t. Under sLE aSK and aLE aSK, the gain in precision
by ancestor sampling is seen by correlation from EIS-PGAS being higher than EIS-PG by 5%
to 10%, which suggests that when skewness is present in all factors and asset-specific processes,
ancestor sampling tends to be more effective when used together with the shrinkage prior for ρ
and β.
The “shock variable” Wt and Qt may be of limited use in practice but they serve as stochastic
weights and influence leverage effect as we show in section 2.1, so it is still interesting to see
how the four methods perform when estimating the inverse gamma mixture components. BF-
PGAS is still the most inaccurate one and one can compare with the estimates for the factors
and SV series and observe that EIS-PG is eclipsed by EIS-PGAS. Lastly under nLE nSK, EIS-
PG is almost as efficient as MM-MH, but both giving correlation lower than EIS-PGAS. This
emphasises the effect of leverage and skewness on chosen MCMC algorithms.
60
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
corre
latio
n
0
0.2
0.4
0.6
0.8
1
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
0
0.2
0.4
0.6
0.8
1
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
0
0.2
0.4
0.6
0.8
1
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
0
0.2
0.4
0.6
0.8
1
f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
corre
latio
n
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
0
0.2
0.4
0.6
0.8
1
h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t
0
0.2
0.4
0.6
0.8
1
sLE_sSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
corre
latio
n
0
0.2
0.4
0.6
0.8
1
sLE_aSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
0
0.2
0.4
0.6
0.8
1
aLE_sSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
0
0.2
0.4
0.6
0.8
1
aLE_aSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
0
0.2
0.4
0.6
0.8
1
nLE_nSK
W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t
0
0.2
0.4
0.6
0.8
1
EIS_PGASMM_MHEIS_PGBF_PGAS
Figure 13: Correlations between posterior mean estimate of all factors, some stochastic volatilityseries with corresponding inverse gamma mixing components obtained from four estimationmethods and their DGP series under different datasets.
C.4 Number of factors
Marginal likelihood evaluation is needed to calculate the Bayes factor to pick up the right model,
such as determining the right number of factors and choosing the most plausible specifications
for factors and asset-specific processes. We firstly illustrate the stability and ability of IS2 to
determine the right number of factors which is the most important model specification. Notice
that there is no need to worry about error distributions thanks to the shrinkage technique we
apply.
Table 13 shows the EIS-PGAS conditional average log-likelihood or posterior ordinate with
system parameters evaluated at their posterior means. We report the evaluation with different
number of particles used in the modified PEIS method introduced in section 3.1. Notice that
with the modification the importance density constructed boils down to partially independent
n + p EIS importance densities used to analyse the univariate SV model, and it manages to
approximate the conditional posterior distribution closely and deliver posterior means that are
highly correlated with the DGP values. We expect that in our high-dimensional setting, not
many particles are needed to accurately evaluate conditional log-likelihood or posterior ordinate,
and as Scharth and Kohn (2016) found that in the case of two-component SV model as few as
two particles can already stably and accurately compute the likelihood. From Table 13 we
see that, the log-likelihood estimates for aLE aSK converge using at least 100 particles, and