Leverage, asymmetry and heavy tails in the high ... › 2017 › 11 › paper1.pdfLeverage, asymmetry and heavy tails in the high-dimensional factor stochastic volatility model Mengheng

Leverage, asymmetry and heavy tails in the high-dimensional

factor stochastic volatility model∗

Mengheng Li†

Department of Econometrics, VU University Amsterdam, The Netherlands

Marcel Scharth‡

Discipline of Business Analytics, University of Sydney Business School, Australia

Working paper; this version: November 12, 2017

Abstract

There is a rich empirical literature that studies the stochastic volatility (SV) of univari-ate financial time series whose distribution exhibits asymmetry and heavy tails. Yet theliterature focusing on high-dimensional SV models appears to be much scarcer, lacking ageneral modelling framework and efficient estimation method due to “curse of dimensional-ity”. Our contribution is twofold. Firstly, we propose a flexible high-dimensional factor SVmodel with leverage effect, asymmetry and heavy tails based on errors following the gener-alised hyperbolic skew Student’s t-distribution. With shrinkage, the model leads to differentparsimonious forms, and thus is able to disengage systematic leverage effect and skewnessfrom asset-specific ones. Secondly, we develop a highly efficient Markov chain Monte Carloestimation procedure that analyses the univariate version of the model using efficient im-portance sampling. Extension to higher dimensions is straightforward via marginalisationof factors. Computational complexity is shown to be linearly scalable in number of bothfactors and assets. We assess the performance of our proposed method via extensive simu-lation studies using both univariate and multivariate simulated datasets. Finally, we showthat the model outperforms other factor models in terms of estimation of value-at-risk andminimum-variance portfolio performance for a U.S. and an Australian portfolio.

Keywords: Markov chain Monte Carlo; Generalised hyperbolic skew Student’s t-distribution;Stochastic volatility; Metropolis-Hastings algorithm; Importance sampling; Particle filter; Parti-cle Gibbs; State space model; Time-varying covariance matrix; Factor model

JEL Classification: C11; C32; C53; C55; G32

∗We would like to thank George Tauchen, Richard Gerlach, Gary Koop, Siem Jan Koopman, Frank Kleibergen,Lennart Hoogerheide, Robert Kohn, Charles Bos, Anne Opschoor, and seminar and workshop participants at TheUniversity of Sydney Business School, VU University Amsterdam, University of Amsterdam, Tinbergen Institute,the 10th International Conference on Computational and Financial Econometrics (Seville, 2016), the 10th Societyof Financial Econometrics Annual Conference (New York, 2017), the 1st International Conference on Econometricsand Statistics (Hong Kong, 2017), the 8th European Seminar on Bayesian Econometrics (Maastricht, 2017) foruseful comments and helpful suggestions on previous versions of this paper. Any remaining errors are ours alone.†Email: [email protected]; Contact author‡Email: [email protected]

1

1 Introduction

Time-varying volatility and leverage effects, two of the so-called “stylised facts”, are often the

focus of research on time series of financial returns that are also believed to be asymmetrically

distributed with heavy tails. The rich literature studying financial times series provides strong

econometric evidence supporting such empirical findings. There are two major classes of models.

One is parameter-driven stochastic volatility (SV) models and the other is observation-driven

(generalised) autoregressive conditional heteroskedasticity (GARCH) models. Kim et al. (1998)

provides a classical comparison between the two classes of models in terms of filtering estimation

and forecasting performance. They find that the Gaussian SV model fits empirical data similarly

as the GARCH model with Student’s t-error. Carrasco and Chen (2002) derive detailed sta-

tistical properties of these two classes of models including mixing property and (un)conditional

distributions characterized by finite moments. The development of research in the last decade

shifts from statistical analysis to more detailed modelling techniques which aims at capturing

“stylised facts” including not only time-varying volatility but also leverage effects, left skewness

and heavy-tailedness of financial series. To this end, both new classes of SV and GARCH models

have been developed. Among many others, Shephard and Pitt (1997) and Durbin and Koop-

man (1997) develop similar simulated likelihood estimation procedure to estimate SV model with

Student’s t-error. The observation-driven counter part is the GARCH-t model firstly developed

by Bollerslev (1987) three decades ago. Leverage effect corresponds to the negative correla-

tion between past returns and future volatility. GARCH-M model of French et al. (1987) and

EGARCH model of Nelson (1991) extend the conditional structure of time-varying variance to

model the negative correlation. Koopman and Hol Uspensky (2002) and Yu (2005) discuss ways

of modelling leverage effect in SV models where the former also provides an efficient simulated

likelihood estimation method and the latter additionally shows that leverage effect may be the

cause of skewness of the return distribution.

A recent SV model proposed by Nakajima and Omori (2012) provides a modelling framing

work which nests time-varying volatility with leverage effect and heavy-tailed error distribution

with skewness based on a Gaussian mixture representation of Aas and Haff (2006)’s generalised

hyperbolic skew Student’s t-distribution. Our paper builds on the previous two researches by

which we are inspired to propose a new estimation procedure that can deliver more efficient

inference. The estimation of time-varying volatility models is straightforward if the model

is observation-driven, like all variants of GARCH models. It becomes more difficult if the

2

model is parameter-driven, i.e. SV models, which usually boils down to some non-linear state

space models without analytical likelihood function. It is recognised that simulated likelihood is

applicable to simple SV models, and that it suffers from flat likelihood function, multimodality

and other numerical issues when the model becomes more complex. In such a case, Bayesian

approach provides a sound alternative and is widely used because it has standard procedure

for sampling and is easy to conduct inference. Several ways of sampling the latent SV process

from its posterior distribution have been proposed, among which the multi-move sampler of

Shephard and Pitt (1997) and Watanabe and Omori (2004) and the auxiliary particle filter of

Pitt and Shephard (999a) are the most widely used methods. For general discussion on Bayesian

estimation of SV models we refer to Jacquier et al. (2004) and the references therein. These

methods fall within the broader category of sequential Markov Chain Monte Carlo method

detailed in Doucet et al. (2001). Another sampling method which this paper partially builds

upon is the efficient importance sampling (EIS) originally developed by Richard and Zhang

(2007). EIS is based on a carefully-constructed globally optimal importance density instead

of a locally optimal proposal which is used by the multi-move sampler and auxiliary particle

filter. Scharth and Kohn (2016) develop a highly efficient and stable algorithm called particle

efficient importance sampling (PEIS). As the name suggests, PEIS evaluates an intractable but

unbiasedly estimable likelihood function via combination of EIS and the sequential particle filter.

This paper refines PEIS in the context of a modified Gibb’s sampler (Lindsten et al., 2014)

and applies it to model high-dimensional SV models, a field of research where literature appears

to be much scarcer than univariate SV models. To the best of our knowledge, multivariate mod-

els with time-varying volatility are often difficult to estimate due to “curse of dimensionality”,

namely the number of parameters grows exponentially with the dimension of assets. Corner-

stones of multivariate observation-driven time-varying volatility models include but not limited

to the constant conditional correlation (CCC) GARCH model of Bollerslev (1990) which models

time-varying covariance matrix with constant correlation among assets. Engle (2002) extends

CCC-GARCH model with dynamic conditional correlation (DCC) and shows its applicability

in terms of estimation and forecasting. A GARCH model with dynamic conditional structure

for the vectorized covariance matrix (VGARCH) is studied by Bollerslev et al. (1994). All these

type of models are widely available in different commercial packages, but the dimension con-

sidered barely exceeds 20 except for the VGARCH model. Low-dimensional models apparently

cannot be of much help to quantitative mutual fund or hedged quant fund (Dempster et al.,

2008) which continues to gain popularity in the recent years thanks to advancement in compu-

3

tational power. A report by Vardi (2015) finds that quant funds usually have tens and even

hundreds of positions in their portfolio, highlighting the need of a high-dimensional multivariate

model for risk and investment management. An attempt to achieve this by observation-driven

models comes from a new class of generalised autoregressive score (GAS) models developed by

Creal et al. (2012) and Oh and Patton (2017). Promising results and successful applications in

high-dimensional models have been documented.

In the parameter-driven world, univariate SV models can be easily extended to multivari-

ate ones in a straightforward manner, however the difficult estimation usually hampers their

practical use (Chib et al., 2009). In such a case, Bayesian estimation is typically employed.

For example, in low-dimensional models Danielsson (1998) and Asai et al. (2006) thoroughly

survey developments in sampling the latent volatility process with comparisons among different

model specifications. Liesenfeld and Richard (2006) apply EIS to a portfolio with four assets,

leaving high-dimensional applications to future research. As far as we know, Pitt and Shephard

(999b) and Chib et al. (2006) are among the earliest who manage to model high-dimensional

financial time series with distinctive SV series pertaining to every individual equity return, and

they propose to model correlation via latent dynamic factors which also serve as systematic

measure of market movements. Nakajima (2015) extends the univariate model of Nakajima and

Omori (2012) to a factor-free high-dimensional framework, which addresses the leverage effect,

skewness and heavy tails of individual asset’s error distribution.

To resolve the dimensionality issue, this paper proposes a flexible high-dimensional factor

SV model. We address leverage effect and model asymmetry and heavy tails based on gen-

eralised hyperbolic skew Student’s t-error, which complements existing study and discussion.

Importantly, we introduce shrinkage to the model, resulting in automated model selection. The

resulted parsimonious form is expected to disentangle leverage effect and asymmetry in idiosyn-

cratic noise from those in the factors. A highly efficient Markov chain Monte Carlo estimation

procedure which uses EIS to exploit the Gaussian mixture representation of the error distribu-

tion is proposed to analyze the univariate version of the model. Sampling scheme of the full

model is simplified via marginalisation of factors and boils down to estimation of many uni-

variate series which can be done in parallel. As a result, the high-dimensional model is able to

achieve efficiency comparable to a univariate model. We assess the performance of our proposed

method via simulation studies with both univariate and multivariate simulated data. Finally

the model is applied to two portfolios consisting of equity returns from S&P100 and ASX50.

Comparisons among other factor models are carried out in terms of estimation of value-at-risk

4

http://www.gasmodel.com/

(VaR) and minimum-variance portfolio performance.

Our discussion is organized as follows. Section 2 introduces the model setting and our

proposed Bayesian estimation method including the use of EIS in the context of particle Gibbs

with ancestor sampling. Section 3 details the methods of evaluating marginal likelihood based on

an efficient particle filtering algorithm combined with importance sampling of hyperparameters.

Section 4 starts with a simulation study on the univariate model in comparison with the method

in Nakajima and Omori (2012). And a simulation study on high-dimensional factor model is

carried out to assess the estimation efficiency and performance of the marginal likelihood and

Bayes factor criterion in choosing the right number of factors. Section 5 illustrates our empirical

application in VaR and dynamic portfolio management. We conclude in Section 6.

2 Model and Bayesian estimation

2.1 Univariate stochastic volatility model

Nakajima and Omori (2012) introduce the following univariate stochastic volatility model with

leverage using generalised hyperbolic skew Student’s t-error

yt = νt exp(ht/2), t = 1, ..., T,

νt = α+ βWt +√Wtεt, t = 1, ..., T,

ht+1 = µ(1− φ) + φht + ηt, t = 1, ..., T − 1,εtηt

∼ N(0

0

, 1 ρσ

ρσ σ2

), t = 1, ..., T,

Wt ∼ IG(ζ

2,ζ

2), t = 1, ..., T,

(1)

where yt is the time series of equity returns, and ht is the unobserved log-volatility modelled as

a stationary AR(1) process with initialisation h1 ∼ N(µ, σ2

1−φ2 ), and νt follows the generalised

hyperbolic skew Student’s t-distribution. ρ models the leverage effects often found to be negative

in financial returns (Yu, 2005)1, which indicates that a drop in equity return likely leads to an

increase in its volatility. IG denotes the inverse Gamma distribution, and the mixing random

variable Wt is introduced to jointly model asymmetry and heavy tails in yt. We choose α =

−βζ/(ζ−2) so that E(νt) = 0 and restrict ζ > 4 to ensure νt has a finite variance. The skewness

1We adapt the definition of leverage effect in Yu (2005), i.e. the correlation between the idiosyncratic error νtand the SV innovation ηt. ρ itself is thus not the leverage effect.

5

and heavy-tailedness of νt are jointly determined by the asymmetric parameter β and degrees of

freedom ζ. Figure 1 shows different shapes of νt’s density with various β and ζ values. Readers

2

1

0

-

f(8 ; -2<-<2, 1=10)

-1

-2-6

-4-2

0

8

24

6

0

0.25

0.15

0.3

0.1

0.35

0.05

0.2

20

15

1

10

f(8 ; -=-2, 5<1<20)

5-6

-4-2

0

8

24

6

0.05

0

0.3

0.25

0.2

0.15

0.1

Figure 1: Different density shapes of generalised hyperbolic skew Student’s t-distribution. Left:

varying β with ζ = 10; right: varying ζ with β = −2.

can refer to Aas and Haff (2006) for a detailed account of generalised hyperbolic skew Student’s

t-distribution including its density function fν , the p-th moment E(|ν|p), and an EM algorithm

for parameter estimation. β = 0 corresponds to a symmetric Student’s t-distribution for νt and

a standard normal distribution if ζ further becomes large. As argued by Aas and Haff (2006), a

unique feature of the model for νt is that in the tails

fν(ν) ∝ |ν|−ζ/2−1exp(−|βν|+ βν) as ν → ±∞.

This means that fν has one heavy and one semiheavy tail, unlike many other forms of skew

Student’s t-distribution whose both tails decay polynomially, making it an appealing model for

financial data.

Assuming |φ| < 1 and E(|νt|p) exists, the unconditional p-th moment of yt in model (1) is

E(|yt|p) = exp( σ2p2

2(1− φ2)+ µp

)E(|νt|p).

Notice that model (1) implies conditional time-varying leverage effects. Given Wt, one has

Cov(νt, ηt) =√Wtρσ.

This means that if one interpretsWt as a “shock variable”, such a shock has a multiplicative effect

on the leverage. In the Appendix we show that unconditionally the leverage effect Corr(νt, ηt) =

6

Le(β, ζ)ρ has the following multiplier

Le(β, ζ) =Γ( ζ−1

2 )

Γ( ζ2)

√(ζ − 2)2(ζ − 4)

2ζ2 + (4β2 − 12)ζ + 16, ζ > 4. (2)

Basic algebra shows Le(β, ζ) ∈ (0, 1), ∀β, ζ ∈ R with ∂Le∂ζ > 0, ∂2Le

∂ζ2< 0, ∂Le

∂|β| < 0, and ∂2Le∂β2 < 0.

Given β, when ζ becomes large the density of νt is less skewed and has lighter tails (?), so

Le(β, ζ) tends to one or leverage effect tends to ρ, similar to the case of a standard SV model

with normal error. Given ζ > 4, the magnitude of leverage decreases to zero with |β| even

though ρ 6= 0. This feature tells that if the return innovation νt puts a large weight on the

“shock variable” Wt (i.e. large |β|), leverage effect vanishes.

We develop an MCMC algorithm which partially builds on Nakajima and Omori (2012) who

argue that the Gaussian variance-mean mixture representation of νt as the second line in model

(1) allows for a conditional sampler, but ours is believed to be more efficient and computationally

faster. The density function of inverse Gamma distribution and normal distribution are log-

linear, indicating possibility of building globally optimal importance density for Wt and ht via

the EIS method by Richard and Zhang (2007). In Nakajima and Omori (2012), a modified multi-

move sampler from Shephard and Pitt (1997) is used to sample ht block-by-block conditional on

Wt with a local Laplace approximation to the posterior density (see also Watanabe and Omori

(2004) and Takahashi et al. (2009)). Later we show that efficiency is further improved with a

novel particle Gibbs algorithm based on the EIS importance density which samples ht and Wt

as a whole block. The next section details our MCMC algorithm.

2.2 Estimation of the univariate model

Let θ = (σ, ρ, φ, µ, β, ζ) collect hyperparameters, and xt1:t2 denote the history of process xs from

s = t1 to t2. The MCMC algorithm developed below boils down to a Metropolis-within-Gibbs

procedure (e.g., Gilks et al. 1995, Geweke and Tanizaki 2001 which Koop et al. 2007) samples

from the posterior distribution of (θ, h1:T ,W1:T )|y1:T for model (1). The algorithm iterates over

1. sampling (h1:t,W1:t)|y1:t, θ;

2. sampling θ|y1:t, h1:t,W1:t.

7

2.2.1 Sampling (h1:t,W1:t)|y1:t, θ

We aim at improve efficiency by sampling the latent processes ht and Wt as one block. For no-

tational simplicity, the dependence on θ is suppressed. p(·) generically denotes density function,

possibly with subscript indicating a specific distribution.

Model (1) is a non-linear non-Gaussian state space model, and reformulating the model to

tackle the leverage effect gives

yt = (α+ βWt +√Wtεt)e

ht/2, t = 1, ..., T

ht+1 = µ(1− φ) + φht + ρσεt +√

1− ρ2ση∗t , t = 1, ..., T − 1

where εt = (yte−ht/2−α− βWt)/

√Wt, and η∗t is standard normal independent on εt. We notice

that εt ∈ Ft, where Ft should be the filtration generated by both observables y1:t and unob-

servables h1:t and W1:t such that the model is Markovian and yt forms a martingale difference

sequence, allowing factorisation of likelihood via likelihood contributions.

Introducing xt = (ht,Wt)′, the likelihood is given by the integral

L(y1:T ) =

∫p(y1:T , x1:T )dx1:T =

∫p(y1|x1)p(x1)

T∏t=2

p(yt|xt)p(xt|xt−1, yt−1)dx1:T , (3)

where the transition density for t = 2, ..., T follows

p(xt|xt−1, yt−1) = pN (ht|ht−1, yt−1,Wt−1)pIG(Wt)

= N(ht;µ(1− φ) + φht−1 + ρσεt−1, (1− ρ2)σ2

)· IG

(Wt;

ζ

2,ζ

2

).

(4)

The efficient high-dimensional importance sampling method (EIS) of Richard and Zhang (2007)

and further studied by e.g., Jung and Liesenfeld (2001) and Scharth and Kohn (2016) proposes

the following importance sampler

q(x1:T |y1:T ) = q(x1|y1:T )T∏t=2

q(xt|xt−1, y1:T ),

with the conditional density q(xt|xt−1, y1:T ) for t = 2, ..., T written as

q(xt|xt−1, y1:T ) =kq(xt, xt−1; δt)

χq(xt−1; δt)with χq(xt−1; δt) =

∫kq(xt, xt−1; δt)dxt.

kq(xt, xt−1; δt) is a kernel in xt with integration constant χq(xt−1; δt), and δt is a set of importance

8

parameters with every element being a function of y1:T . At initial period, the importance density

is simply

q(x1|, y1:T ) =kq(x1; δ1)

χq(δ1)with χq(δt) =

∫kq(x1, ; δ1)dx1.

Using the above importance density, the likelihood (3) can be expressed as

∫p(y1|x1)p(x1)

q(x1|y1:T )

T∏t=2

p(yt|xt)p(xt|xt−1, yt−1)

q(xt|xt−1, y1:T )q(x1:T |y1:T )dx1:T

=1

χq(δ1)

∫p(y1|x1)p(x1)

kq(x1; δ1)/χq(x1; δ2)

T∏t=2


kq(xt, xt−1; δt)/χq(xt; δt+1)q(x1:T |y1:T )dx1:T ,

(5)

starting from χq(xT ; δT+1) = 1.

EIS is particularly suitable in our case because both inverse Gamma and normal distribution

belong to the exponential family which is closed under multiplication. This would mean that

one can choose a conjugate importance kernel with the transition density (4), namely

kq(xt, xt−1; δt) = k(xt, xt−1; δt) · kp(xt, xt−1; yt−1),

where

kp(xt, xt−1; yt−1) = p(xt|xt−1, yt−1)χp(xt−1; yt−1), with χp(xt−1; yt−1) =

∫kp(xt, xt−1; yt−1)dxt.

The likelihood (5) then becomes

χq(δ1)

∫ p(y1|x1)χq(x1;δ2)χp(·;)

k(x1; δ1)

T∏t=2

p(yt|xt) χq(xt;δt+1)χp(xt−1;yt−1)

k(xt, xt−1; δt)q(x1:T |y1:T )dx1:T ,

where χp(·; ) corresponds to the integration constant with respect to the unconditional distribu-

tionN(h1;µ, σ2

1−φ2 ) and IG(W1; ζ2 ,ζ2). It follows from the transition density pN (ht|ht−1, yt−1,Wt−1)

and pIG(Wt) in (4) that

kp(xt, xt−1; yt−1) = exp(µ(1− φ) + φht−1 + ρσεt−1

(1− ρ2)σ2ht −

h2t

12(1− ρ2)σ2

)·W−

ζ2−1

t exp(−ζ2W−1t ).

Let δt = (bt, ct, st, rt). For conjugacy we choose the following kernel

k(xt, xt−1; δt) = exp(btht −1

2cth

2t ) ·W

stt exp(rtW

−1t ) (6)

9

with the ratio of integration constant given by

χq(xt; δt+1)

χp(xt−1; yt−1)=

√vt

(1− ρ2)σ2exp

(1

2(µ2t

vt− (µ(1− φ) + φht−1 + ρσεt−1)2

(1− ρ2)σ2))×

Γ(ζ/2 + rt)

(ζ/2 + rt)ζ/2+st

(ζ/2)ζ/2

Γ(ζ/2),

where

vt =(1− ρ2)σ2

1 + (1− ρ2)σ2ct, and µt = vt

(bt +

µ(1− φ) + φht−1 + ρσεt−1

(1− ρ2)σ2

). (7)

The choice of kernel (6) corresponds to an importance density being the product of a normal

density with mean µt and variance vt defined in (7) and an inverse Gamma density with shape

parameter ζ/2 + st and rate parameter ζ/2 + rt, i.e.

q(xt|xt−1, y1:T ) = N(ht;µt, vt) · IG(Wt;

ζ

2+ st,

ζ

2+ rt

). (8)

The set of importance parameters δt is determined iteratively via a sequence of auxiliary least

square regressions. Briefly, given δ(n)t , J trajectories of x

(j)t = (h

(j)t ,W

(j)t )′ for j = 1, ..., J can be

drawn using (8). At each t, δ(n+1)t is determined by solving the following least square regression

δ(n+1)t = arg min

δt

J∑j=1

[(log p(yt|x(j)

t ) + logχq(x

(j)t ; δ

(n+1)t+1 )

χp(x(j)t−1; yt−1)

)−(γt + log k(x

(j)t , x

(j)t−1; δt)

)]2

,

(9)

where γt is a normalizing constant. Effectively EIS finds the minimiser δt for the variance of the

ratio or the importance weight

p(yt|xt) χq(xt;δt+1)χp(xt−1;yt−1)

k(xt, xt−1; δt)=


kq(xt, xt−1; δt)/χq(xt; δt+1). (10)

Because exponential family kernels such as (6) are log-linear, the regression is a basic OLS

with regressors (h(j)t , h

(j)t

2,− logW

(j)t ,−1/W

(j)t ). As shown by Richard and Zhang (2007) and

Scharth and Kohn (2016), the backward-shift of the period t+1 integration constant χq(xt; δt+1)

is crucial for obtaining a globally efficient importance density as it depends on both the lagged

and future states.

Once the importance density is determined, we apply particle Gibbs with ancestor sampling

(PGAS) which is originally developed by Lindsten et al. (2014) to sample x1:T from p(x1:T |y1:T ),

i.e. to sample (h1:T ,W1:T )|y1:T . Because EIS is employed inside PGAS, we term our sampler

10

EIS-PGAS, and it belongs to a bigger family of particle Gibbs (PG) (see e.g., Chopin et al.,

2013) based on particle filtering or sequential Monte Carlo method (see Pitt and Shephard, 999a

and Doucet et al., 2001 for a general discussion).

The first step is to generate a particle system containing M particles xi1:t−1Mi=1 and asso-

ciated weights ωit−1Mi=1 to recursively approximate the filtering distribution p(x1:t−1|y1:t−1) by

a sum of Dirac delta functions D(.), i.e.

p(x1:t−1|y1:t−1) =

M∑i=1

ωit−1∑Mj=1 ω

jt−1

D(x1:t−1 − xi1:t−1).

Secondly at time t EIS-PGAS updates the particle system by sampling ait, xtMi=1 independently

from

It(at, xt) =ωatt−1∑Mj=1 ω

jt−1

q(xt|xatt−1, y1:T ),

with at indexing the ancestor particle, i.e. xi1:t = (xait1:t−1, x

it)

2. Notice the “ancestor-weighted”

importance density It(at, xt) depends on the whole particle system up to time t − 1. Let x?1:T

denote the reference trajectory which is a previous draw from p(x1:T |y1:T ). The particle system

is then augmented with x?1:T by assigning xM+1t = x?t in the next step. EIS-PGAS differs from

a standard PG because it also samples the ancestor for the reference trajectory according to

p(aM+1t = i) =

ωit−1p(x?t |xit−1, yt−1)∑M+1

j=1 ωjt−1p(x?t |x

jt−1, yt−1)

, (11)

and then the history of reference trajectory is “rewritten” by setting xM+11:t = (x

aM+1t

1:t−1 , xM+1t ).

The recursion for each t is finished by re-weighting the augmented system according to

ωit =p(yt|xit)p(xit|xit−1, yt−1)

q(xit|xit−1, y1:T ), for i = 1, ...,M + 1. (12)

Once t = T , the last step of EIS-PGAS is to sample a new path x+1:T from

p(x1:T |y1:T ) =

M+1∑i=1

ωiT∑M+1j=1 ωjT

D(x1:T − xi1:T ), (13)

which serves as the reference trajectory x?1:T in the next MCMC run. The rearrangement of

reference trajectory (11) comes directly from Bayesian updating of the probability that xit−1 is

2Initialisation of the system is straightforward via sampling from the unconditional distribution of xt.

11

the ancestor of x?t with the prior belief of this probability being ωit−1. The Bayesian updating

effectively breaks down the reference trajectory into pieces. As a result x+1:T is substantially

different from x?1:T with high probability, thus improving mixing compared to standard PG.

Let the sequence of densities p(xt|xt−1, yt−1) in (4) for t = 1, ..., T be defined on the mea-

surable space (X?1:T ,F1:T ) with parameters θ ∈ Θ. EIS-PGAS defines a Markov kernel KMθ on

(X?1:T ,F1:T ) that maps x?1:T stochastically into x+

1:T for any M ≥ 0. It is easy to show that

EIS-PG (i.e. EIS-PGAS without ancestor sampling step (11)) leads to a reversible and ergodic

Markov kernel on (X?1:T ,F1:T ) according to Chopin et al. (2013). Based on results in Lindsten

et al. (2014) the following theorem shows the invariance property of EIS-PGAS.

Theorem 1. The EIS-PGAS kernel KMθ parametrised by θ ∈ Θ with any M ≥ 0 leaves the

posterior probability density function p(x1:T |y1:T ) invariant:

∫Bp(x1:T |y1:T )dx1:T =

∫KMθ (x?1:T , B)p(x?1:T |y1:T )dx?1:T , ∀B ∈ F1:T .

In the next section, we give MCMC algorithm for sampling θ given x1:T or (h1:T ,W1:T ) and

y1:T . For the sampler to converge, KMθ is required to be ergodic. Making the dependence on θ

explicit for the importance weights in (12) and assuming a boundedness condition for ωiθ,t, we

can establish uniform ergodicity of EIS-PGAS kernel in the following theorem.

Theorem 2. Suppose for any t = 1, ..., T and θ ∈ Θ, given xi1:t−1M+1i=1 and B ∈ F1:T ,

supxt

(maxiωiθ,t)≤ ωθ <∞.

Then for any M ≥ 1 and θ ∈ Θ, there exists some ϕ ∈ [0, 1) such that

∣∣∣∣∣∣∣∣(KMθ )n(x?1:T , B)−∫Bp(x1:T |y1:T )dx1:T

∣∣∣∣∣∣∣∣TV

≤ p(y1:T )

(N − 1

Nωθ

)Tϕn.

The Appendix discusses the above two Theorems in more details. Based on a Monte Carlo

study in Section 4 in comparison to the method proposed by Nakajima and Omori (2012), we

show that the ancestor sampling improves the mixing of the Markov chain for the latent state

(h1:T ,W1:T ) as well as for the hyperparamters θ. Moreover, the incorporation of EIS further

improves efficiency supporting the boundedness condition in Theorem 2. It is largely due to the

joint contribution to efficiency through EIS-PGAS that allows us to study the high-dimensional

factor stochastic volatility model.

12

2.2.2 Sampling θ|y1:T , h1:T ,W1:T

Let π0(·) and π(|·) denote prior and posterior distribution respectively unless the conditioning

set is stated otherwise. The sampling procedure for θ = (σ, ρ, φ, µ, β, ζ) described below mainly

follows from Nakajima and Omori (2012) with exception of sampling σ and ρ.

(i). Sampling autoregressive coefficient φ. Given the rest, the conditional posterior distribution

is

π(φ|·) ∝ π0(φ)√

1− φ2 exp

−(1− φ2)h2

1

2σ2−T−1∑t=1

(ht+1 − φht − ρσεt)2

2σ2(1− ρ2)

∝ π0(φ)√

1− φ2 exp

−

(φ− µφ)2

2σ2φ

,

where ht = ht − µ and

µφ =

∑T−1t=1 (ht+1 − σρεt)htρ2h2

1 +∑T−1

t=2 h2t

, σ2φ =

σ2(1− ρ2)

ρ2h21 +

∑T−1t=2 h2

t

.

Metropolis-Hastings (M-H) algorithm is employed to sample from the above posterior. We draw

a candidate φ∗ from N(µφ, σ2φ) truncated within (−1, 1) to ensure stationarity. The candidate

is accepted with probability

min

π0(φ∗)

√1− φ∗2

π0(φ)√

1− φ2, 1

.

(ii). Sampling volatility of volatility σ and leverage coefficient ρ. The joint posterior probability

distribution π(σ, ρ|) is given by

π(σ, ρ|·) ∝ π0(σ, ρ)σ−T (1− ρ2)−T−12 exp

−(1− φ2)h2

1

2σ2−T−1∑t=1


2σ2(1− ρ2)

.

We can reparameterise the likelihood in the above expression by ϑ = ρσ and $ = σ2(1 − ρ2).

If we factorise the joint prior as π0(ϑ|$)π0($) and choose π0($) to be IG(s0, r0) and π0(ϑ|$)

to be N(ϑ0, v2ϑ$), i.e. a normal-inverse-gamma conjugate prior, new draws can be efficiently

13

generated from ϑ|· ∼ N(µϑ, σ2ϑ$), where $|· ∼ IG(r1, s1) and

σ2ϑ =

1

v2ϑ

+T−1∑t=1

ε2t

−1

, µϑ = σ2ϑ

ϑ0

v2ϑ

+T−1∑t=1

εt(ht+1 − φht)

,

s1 = s0 +T

2, r1 = r0 +

1

2

T−1∑t=1

(ht+1 − φht)2 −µ2ϑ

σ2ϑ

+ϑ2

0

v2ϑ

.

(14)

The Markov chain is accordingly updated with σ =√ϑ2 +$ and ρ = ϑ/σ. Besides efficiency,

this reparametrisation can be easily modified to incorporate shrinkage. This becomes clear when

we study the multivariate model in the next section.

(iii). Sampling unconditional mean µ of ht. Let the prior distribution of the unconditional

mean µ be N(µ0, v2µ). The conditional posterior distribution is given by

π(µ|) ∝ exp

−(µ− µ0)2

2v2µ

− (1− φ2)h21

2σ2−T−1∑t=1


2σ2(1− ρ2)

.

We can generate a new draw µ| ∼ N(µµ, σ2µ) with

σ2µ =

1

v2µ

+(1− ρ2)(1− φ2) + (T − 1)(1− φ)2

σ2(1− ρ2)

−1

,

µµ = σ2µ

µ0

v2µ

+(1− ρ2)(1− φ2)h1 + (1− φ)

∑T−1t=1 (ht+1 − φht − ρσεt)

σ2(1− ρ2)

.

(iv). Sampling skewness parameter β. Let the prior distribution of the skewness parameter β

be N(β0, v2β). Denoting Wt = Wt − ζ

ζ−2 , the conditional posterior distribution follows

π(β|) ∝ exp

−(β − β0)2

2v2β

−T∑t=1

(yt − βWteht2 )2

2Wteht−T−1∑t=1

(ht+1 − φht − ρσ(yte−ht2 − βWt)/

√Wt)

2

2σ2(1− ρ2)

,

from which a new draw β| ∼ N(µβ, σ2β) can be generated with

σ2β =

1

v2β

+1

1− ρ2

T−1∑t=1

W 2t

Wt+W 2T

WT

−1

,

µβ = σ2β

β0

v2β

+1

1− ρ2

T−1∑t=1

ytWt

Wteht2

+T−1∑t=1

(ht+1 − φht)ρWt

σ(1− ρ2)√Wt

.

(15)

14

In the next section, we modify the normal prior for β to allow from shrinkage.

(v). Sampling Wt d.o.f parameter ζ. A Gamma prior π0(ζ) ≡ G(sζ , rζ) is used for the d.o.f

parameter of the mixture process Wt. The conditional posterior distribution of ζ involves the

full joint likelihood which follows

π(ζ|) ∝ π0(ζ)

T∏t=1

IG(Wt;ζ

2,ζ

2) exp

−

T∑t=1

(yt − βWteht/2)2

2Wteht−T−1∑t=1


2σ2(1− ρ2)

.

M-H algorithm is used to draw δ = log(ζ−4), based on a normal approximation of the logarithm

of transformed posterior density function log π(δ|) whose mode and the second derivative around

the model are µδ and σ2δ , respectively. The draw is accepted with probability

min

π(ζ∗|)N(δ;µδ,−σ−2

δ ) exp(ζ)

π(ζ|)N(δ∗;µδ,−σ−2δ ) exp(ζ∗)

, 1

.

2.3 Estimation of factor stochastic volatility model

Based on our formulation of univariate SV model in the previous sections, we can write the

factor stochastic volatility model compactly as follows,

yt = Λft + ut, t = 1, ..., T,

fj,tTt=1 ∼ Model (1), ∀j ∈ 1, ..., p,

ui,tTt=1 ∼ Model (1), ∀i ∈ 1, ..., n,

(16)

where yt ∈ Rn, ft ∈ Rp and Λ ∈ Rn×p. Model (16) tells that each factor process fj,t and process

of idiosyncratic noise ui,t follow the SV model introduced in section 2.1. The model proposed

above is motivated by both Chib et al. (2006) and Nakajima (2015), but it is considerably more

flexible than these two. The former although models factor structure, it ignores possible heavy-

tailedness and skewness in the factor process. The latter builds on Nakajima and Omori (2012)

who is the first to model SV using the generalised hyperbolic skew Student’s t-error, but it does

not achieve dimension reduction considering the probable factor structure in asset returns.

It is easy to see that model (16) is that it is scalable both in n, the number of assets, and in

p, the number of factors. As long as an efficient sampling procedure is available for analysing

the univairate Model (1), the computational cost is only linear in the dimension of the return

15

series yt3. In the following subsection, we talk about an efficient MCMC algorithm for sampling

factors ft and the factor loadings Λ, followed by a discussion on shrinkage of skewness.

2.3.1 MCMC algorithm for the multivariate model

To distinguish the factor SV and the mixture component from asset-specific processes, we use

hj,t and Wj,t to denote the SV process and the inverse gamma mixture component for the factor

fj,t. li,t and Qi,t denote those for the idiosyncratic noise ui,t. Superscripts fj and ui are used to

distinguish related parameters. That is, for the i-th return series, i = 1, ..., n, the model reads

yi,t =

p∑j=1

Λij(αfj + βfjWj,t +

√Wj,tξj,t)e

hj,t/2 + (αui + βuiQi,t +√Qi,tεi,t)e

li,t/2.

Let h and l denote the set of SV processes corresponding to the p-dimensional factors ft and the

n-dimensional idiosyncratic noise ut respectively for t = 1, ..., T , namely h = h1, ..., hp where

hj = hj,tTt=1 for j = 1, ..., p, and l = l1, ..., ln where li = li,tTt=1 for i = 1, ..., n. We denote

the set of mixture component by W and Q in a similar fashion. A model with n assets and p

factors has 6(n+ p) + np− (p2 + p)/2 parameters with usual identification restriction imposed

on factor loadings Λ.

(i). Sampling h, l, W , Q and associated hyperparameters. Similar to Chib et al. (2006), the

multivariate model (16) can be separated into n + p univariate SV models as in (1) due to

independence structure conditional on factor process ftTt=1 and loadings Λ. Let the (n + p)-

dimensional vector zk,t be zj,t = fj,t for j = 1, ..., p and zp+i,t = ui,t for i = 1, ..., n and t = 1, ..., T

with ut = yt − Λft. Then one can apply MCMC algorithm developed in Section 2.2 to analyse

zk,t for k = 1, ..., n+ p.

Though model (16) is very flexible, it models factor and asset-specific dynamics in a non-

discriminatory fashion. One research question addressed in this paper is to find the sources of

skewness and leverage effect observed in asset returns. Are they systematic or idiosyncratic?

To given an answer, we modify π0(β) and π0(ϑ|$) in Section 2.2 by sparsity prior usually used

in Bayesian variable selection (Clyde and George, 2004). Let θk = (σk, ρk, φk, µk, βk, ζk) collect

all the hyperparameters pertaining to the k-th univariate stochastic volatility model zk,t for

3Linear complexity allows for parallel computing which greatly saves computing time.

16

k = 1, ..., n+ p. The sparsity prior πsparse0 (βk) takes the form

∆βD0(βk) + (1−∆β)N(β0, v2β),

where D0(.) denotes the Dirac delta function at zero, and N(β0, v2β) is the normal prior intro-

duced previously. This prior means that βk has shrinkage probability ∆β with a point mass at

zero and probability 1 − ∆β of taking a value that is N(β0, v2β)-distributed. Under the above

sparsity prior, the conditional posterior distribution of βk is given by

βk|· ∼ ∆βkD0(βk) + (1−∆βk)N(µβk , σ2βk

),

where µβk and σ2βk

are the mean and variance of the normal posterior distribution as defined in

(15), and where

∆βk =1−∆β

∆βσ2βk

+ 1−∆β, with σ2

βk=σβkvβ

exp(µ2βk

2σ2βk

). (17)

The shrinkage probability ∆β has a beta conjugate prior, so posterior draws can be generated

given the number of non-zero βk’s in the Markov chain.

To shrink ρk is equivalent to let ϑk|$k have a sparsity prior under the reparametrisation

ϑk = ρkσk and $k = σ2k(1 − ρ2

k). The reparametrisation equips ϑk|$k with a normal prior as

discussed in Section 2.2, so πsparse0 (ϑk|$k) takes the form

∆ϑD0(ϑk) + (1−∆ϑ)N(ϑ0, v2ϑϕk).

The conditional posterior distribution is thus given by

ϑk|· ∼ ∆ϑkD0(ϑk) + (1−∆ϑk)N(µϑk , σ2ϑkϕk),

where µϑk and σ2ϑk

are defined in (14). ∆ϑk is defined similarly as (17).

As mentioned before, shrinkage on both βk and ρk may help explain the sources for skewness

and leverage effect — two important “stylised facts” observed in asset returns. It also leads to

different parsimonious models with some βk’s and ρk’s being zero, which is expected to improve

forecasting performance for value-at-risk (VaR) and covariance matrix (Nakajima, 2015). This

also reduces the effort for numerous model comparisons when n is large.

17

(ii). Sampling factors ft. Let us suppress the dependence on h, l, W and Q, as well as all

hyperparameters. Given the factor loadings Λ we have

yt|ft ∼ N(Λft, Ut), ft ∼ N(Ft, Vt),

where yt = (y1,t, ..., yn,t)′ and Ft = (F1,t, ..., Fp,t)

′ with

yi,t = yi,t − (αui + βuiQi,t)eli,t/2, Fj,t = (αfj + βfjWj,t)e

hj,t/2,

Vt = diag(W1,teh1,t , ...,Wp,te

hp,t), Ut = diag(Q1,tel1,t , ..., Qn,te

ln,t).(18)

Basic Bayesian calculation shows that at each t the conditional posterior distribution of factor

ft is N(µft ,Σft) where

Σft = (Λ′U−1t Λ + V −1

t )−1, µft = Σft(Λ′U−1t yt + V −1

t Ft),

(iii). Sampling factor loadings Λ. We impose usual identification restriction on the loading

matrix Λ, i.e. the upper p-by-p sub-matrix is lower triangular with ones on the diagonal. It is

easy to see that given factors ft, the conditional posterior distribution for the free elements in

Λ will be normal with a normal prior. However due to the product form Λft appearing in the

likelihood, one can expect the draw of Λ conditional on ft to be inefficient. Chib et al. (2006)

shows that efficiency can be improved via marginalisation of ft.

Given h, l, W and Q, the conditional log-likelihood function is

l(y1:T |Λ) =T∑t=1

lt(yt|Λ) = logN(yt; ΛFt,Ωt)

= −1

2

T∑t=1

k log 2π + log |Ωt|+ (yt − ΛFt)

′Ω−1(yt − ΛFt),

(19)

where Ωt = ΛVtΛ′+Ut. The M-H algorithm of Chib and Greenberg (1994) is applied to sample

vec(Λ)|· using a multivariate Student’s t-proposal density T (µΛ,ΣΛ, v) where µΛ is the mode

of l(y1:T |Λ) and ΣΛ equals minus the inverse of the approximate Hessian matrix of l(y1:T |Λ)

around its mode. Degrees of freedom v is chosen arbitrarily. To find the mode, we propose to

use Hessian-free optimisation routine such as L-BFGS and quasi-Newton method (Wright and

Nocedal, 1999), based on the following score function ∂l(y1:T |Λ)/∂Λij =∑T

t=1 ∂lt(yt|Λ)/∂λij

18

with λij denoting the ij-th free element of Λ and

∂lt(yt|Λ)

∂λij=− 1

2

∂ log |Ωt|∂λij

+∂

∂λij

(y′tΩ−1t yt − 2y′tΩ

−1t ΛFt + F ′tΛ

′Ω−1t ΛFt

)=− tr

(Ω−1t ΛVt

∂Λ′

∂λij

)+ y′tΩ

−1t

∂Λ

∂λijVtΛ

′Ω−1t yt + y′tΩ

−1t

∂Λ

∂λij(Ip − 2VtΛ

′Ω−1t Λ)Ft

+ F ′t

(Λ′Ω−1

t

∂Λ

∂λijVtΛ

′Ω−1t Λ− 1

2(∂Λ′

∂λijΩ−1t Λ + Λ′Ω−1

t

∂Λ

∂λij)

)Ft,

where Ω−1t = U−1

t −U−1t Λ(V −1

t +Λ′U−1t Λ)−1Λ′U−1

t . After some convergence criterion is met, we

compute the inverse of observed information matrix, i.e. ΣΛ = (G(µΛ)G(µΛ)′)−1 where G(µΛ) is

the gradient matrix whose t-th column equals vec(∂lt(yt|Λ)/∂λiji,j) with i, j running through

all free elements of Λ. Then a candidate draw vec(Λ∗) can be generated from the proposal with

acceptance probability

min

π0(vec(Λ∗)) exp(l(y1:T |Λ∗))T (vec(Λ);µΛ,ΣΛ, v)

π0(vec(Λ)) exp(l(y1:T |Λ))T (vec(Λ∗);µΛ,ΣΛ, v), 1

.

2.3.2 Initialisation

When the dimension of assets yt in model (16) is large, we advocate to initialise the Markov

chain efficiently rather than starting randomly, although Theorem 2 says that the Markov kernel

KMθ implied by EIS-PGAS is guaranteed to converge starting from almost anywhere. In our

experiment, such an initialisation may save hours of computation time and accelerates the

convergence of the Markov chain to its stationary distribution.

We propose to initialise our model through principal components (PC). Let us rewrite model

(16) as Y = FΛ′ + u where Y ∈ RT×n, F ∈ RT×p and u ∈ RT×n. So the t-th row in of Y , F

and u are y′t, f′t and u′t respectively, and ft is chosen to be the PC’s at time t . Or equivalently

we have

y = (In ⊗ F )λ+ u, (20)

where y = vec(Y ), λ = vec(Λ′), and u = vec(u). Under conditions specified by Doz et al.

(2011), PC’s are consistent estimate of the factors, and we apply the criterion in Bai and Ng

(2002) to choose the preliminary number of factors. Because we impose identification restriction

on Λ, the matrix of eigenvectors in relation to PC’s cannot initialise Λ. We notice that (20) is a

linear regression model in λ and the identification restriction implies a linear constraint of the

form

Rλ = r.

19

This means we can choose the constraint OLS estimate λcols to initialise Λ, which is

λcols = λols − (In ⊗ (F ′F )−1)R′(R(In ⊗ (F ′F )−1)R′)−1(Rλols − r), (21)

where λols = (In ⊗ (F ′F )−1F ′)y. Given λcols, Doz et al. (2011) suggest that the estimate of

factors E(ft|y1:T ) can be obtained by

ft = (Λcols′Λcols)

−1Λcols′yt, t = 1, ..., T. (22)

The initialisation of factors ft for t = 1, ..., T and loadings Λ is completed with iterations over

(21) and (22) until convergence. It can be expected that the above procedure delivers a sound

initialisation especially for the loading matrix Λ with identification restriction as Chan et al.

(2013) show that there exists a mapping which effectively rotates the PC’s towards the factors

under certain identification restriction scheme imposed on the loadings.

With initialised Λ and ft, residuals are obtained as ut = yt−Λft. So we obtain n+p univariate

series zj,t = fj,t for j = 1, ..., p and zp+i,t = ui,t for i = 1, ..., n. For any k ∈ 1, ..., n + p,

zk,tTt=1 is modelled as a basic SV model and reparametrise it according to Ruiz (1994), so that

a quasi -maximum likelihood (QML) estimation can be efficiently implemented to the following

approximate linear Gaussian state space model

log(z2k,t) = log(2) + ψ(1/2) + hk,t +

√ψ′(1/2)εk,t, t = 1, ..., T,

hk,t+1 = µk(1− φk) + φkhk,t + σkηt, t = 1, ..., T − 1,(23)

where ψ(·) is the digamma function and ψ′(·) is its first order derivative. εk,t and ηk,t are i.i.d

normal with correlation coefficient ρk. Maximising the log-likelihood via Kalman filter (Durbin

and Koopman 2012) gives the QML estimate of φk, σk, ρk and µk, which serve as initialisations

for k = 1, ..., n+ p. We choose the initial value of the skewness parameter βk to be zero and the

d.o.f ζk to be 20 for all k.

The Markov chain of the SV process hj,tTt=1 and li,tTt=1 for all i and j is initialised by

applying the simulation smoother of De Jong and Shephard (1995) to state space model (23).

The chain of the inverse gamma mixing component Wj,t and Qi,t for all i and j is initialised by

drawing from IG(sk, rk,t), where sk = ζk/2 + 1 and rk,t = ζk/2 + z2k,t exp(−hk,t)/2.

20

3 Model evaluation

This section introduces method for Bayesian model comparison which relies on the calcula-

tion of marginal likelihood p(y1:T |M) under a certain model M from which the Bayes factor

p(y1:T |M1)/p(y1:T |M2) can be calculated for modelM1 andM2. We implement a marginalised

version of the importance sampling squared (IS2) of Tran et al. (2014) for the proposed factor

SV model. IS2 produces an efficient and accurate estimate of marginal likelihood when the

conditional likelihood p(y1:T |Mi, θi) with hyperparameter vector θi is intractable, but can be

estimated unbiasedly. A detailed simulation study in the Appendix shows that IS2 is fully func-

tional and accurate in picking the true model, bolstering the result of Tran et al. (2014) who

only apply IS2 to univariate SV models.

To start with, suppressing the dependence on a certain model, the marginal likelihood can

be written as

p(y1:T ) =

∫p(y1:T , θ)dθ =

∫p(y1:T |θ)π0(θ)dθ

=

∫p(y1:T |θ)π0(θ)

q(θ|y1:T )q(θ|y1:T )dθ,

where π0(θ) is the prior, and q(θ|y1:T ) is an importance density mimicking the posterior π(θ|y1:T ) ∝

p(y1:T |θ)π0(θ). The above integral can thus be approximated by Monte Carlo simulation

p(y1:T ) =1

S

∑s=1

w(θs), where w(θs) =p(y1:T |θs)π0(θs)

q(θs|y1:T )and θs ∼ q(θ|y1:T ). (24)

The is straightforward to implement if the likelihood p(y1:T |θ) is available in closed form. How-

ever, in our case

p(y1:T |θ) =

∫· · ·∫p(y1:T , f1:T , h1:T , l1:T ,W1:T , Q1:T |θ)df1:Tdh1:Tdl1:TdW1:TdQ1:T ,

which is high-dimensional and intractable. Tran et al. (2014) show that under mild conditions

that if there exists an unbiased estimate of the likelihood, i.e. E(p(y1:T |θ)) = p(y1:T |θ), averaging

importance weights to compute the marginal likelihood as formula (24) is still valid with p(y1:T |θ)

replaced by p(y1:T |θ). In the next subsection, a simulation-based method is introduced for

efficiently computing marginal likelihood using an unbiased estimate of the likelihood.

21

3.1 Marginal likelihood

For many state space models, an unbiased estimate of likelihood is readily available using par-

ticle marginal Metropolis-Hastings (PMMH) algorithm (Andrieu et al. 2010 and Del Moral and

Formulae 2004). For example, the paper of factor SV model by Chib et al. (2006) applies the

celebrated auxiliary particle filter (APF) of Pitt and Shephard (999a) to compute the posterior

ordinate for the evaluation of Bayes factors. According to Scharth and Kohn (2016), parti-

cle efficient importance sampling (PEIS) algorithm significantly outperforms PMMH in terms

of variance reduction for the likelihood estimate which is essential for efficient computation of

marginal likelihood. Similar to the EIS-PGAS method introduced in section 2.2 which builds

a sequential but globally optimal importance density q(xt|xt−1, y1:T ), PEIS is a direct exten-

sion of APF algorithm which only uses one-period (or few periods) forward weights based on

q(xt|xt−1, y1:t) for resampling. Conceptually, the global optimality of PEIS which minimises

the importance weights in (10) an (5) is what makes it highly efficient for evaluating marginal

likelihood.

Similar to EIS-PGAS, PEIS constructs the importance density for xt at t = 1, ..., T where

xt = hj,tpj=1, Wj,tpj=1, li,tni=1, Qi,tni=1.

which involves solving a high dimensional least square problem in light of (9). Computation

becomes costly in this case. Scharth and Kohn (2016) also document that numerical instability

can cause failure of PEIS with an ill-constructed importance density. To circumvent this issue,

ft is firstly marginalised out throughout the analysis. Given θ we have

yt|xt ∼ N(yt + ΛFt,Ωt), (25)

where Ωt = ΛVtΛ′+Ut, and yi,t for i = 1, ..., n, Fj,t for j = 1, ..., p, Vt and Ut are defined in (18).

Furthermore, we propose to use a “suboptimal” importance density which is formed by p + n

individual importance densities.

Suppose after n-th iteration, one has the importance density with parameters δ(n)t = δ(n)

j,t , δ(n)i,t

for t = 1, ..., T , corresponding to n+ p individual independence densities as in (8) with param-

eters δ(n)j,t for hj,tpj=1,Wj,tpj=1 and parameters δ(n)

i,t ni=1 for li,tni=1, Q(s)i,t ni=1. Because ft is

marginalised out, we cannot recover the leverage effect in either factor or asset-specific process

which are present in the evolution of the SV process. However, we notice that the factor process

22

at time t can be approximated by

f(n)t = (Λ′U

(n)t

−1Λ + V

(n)t

−1)−1(Λ′U

(n)t

−1y

(n)t + V

(n)t

−1F

(n)t ), (26)

and the idiosyncratic noise at time t is thus

u(n)t = yt − Λf

(n)t .

With both f(n)t and u

(n)t , leverage effect can be accounted for so we can update δ

(n+1)j,t and

δ(n+1)i,t by n+ p least square regressions as in (9). Approximating the importance density for the

multivariate model by building p+ n EIS individual importance densities means this procedure

is suboptimal because ft and ut are not observed. Possible correlations among hj,t|y1:T across

all j and among li,t|y1:T across all i are also discarded. Nevertheless we find this procedure

works sufficiently good without any numerical failure for all simulated and real datasets that we

analyse.

To compute the unbiased estimate of the likelihood function p(y1:T |θ) = p(y1|θ)∏Tt=2 p(yt|y1:t−1, θ),

one propagates the particle system with forward weights resampling (Shephard and Pitt 1997

and Scharth and Kohn 2016). Suppose at time t, one has the particle system xi1:t, ωitMi=1.

Suppressing the dependence on θ, the forward weights are calculated according to

−→ω it = ωit

χq(xit; δt+1)

χp(xit−1; yt−1), i = 1, ...,M,

where ωit is the normalized weight ωit = ωit/∑M

i=1 ωit. χq(x

it; δt+1) is the integration constant

of the importance density q(xt+1|xit, y1:T ) with kernel kq(xit+1, x

it; δt+1), while χp(x

it−1; yt−1) is

the integration constant of the transition density p(xt|xit−1, yt−1) which is the product of n+ p

densities. Also, q(xit+1|xit, y1:T ) is the product of n + p individual EIS importance densities.

Next, with the normalised forward weights

−→ωi

t =−→ω it∑M

j=1−→ω jt

,

one calculates the effective sample size ESS = 1/∑M

i=1(−→ωi

t)2. If ESS drops below a predeter-

mined threshold, resampling is applied to M particles xitMi=1 with probability −→ωi

tMi=1, and

all normalised weights ωit are set to be 1/M for i = 1, ...,M . At time t + 1, M new particles

xit+1Mi=1 need to be generated from the importance density q(xt+1|xit, y1:T ), which requires M

23

draws from p+ n individual importance densities with each as in (8). To do so, we can approx-

imate the factor process and idiosyncratic noise process at time t via (26). For example, hij,t+1

for i = 1, ...,M can be obtained by

hij,t+1 = µfj (1− φfj ) + φfjhij,t +ρfjσfj

W ij,t

(f ij,t − αfj − βfjW ij,t) +

√1− ρfj 2

σfjη∗j,ti,

with η∗j,ti ∼ N(0, 1) for j = 1, ..., p. Other latent process propagates similarly. Antithetic

variables are used to reduce Monte Carlo noise during particle propagation. In particular, pairs

of perfectly negatively correlated Gaussian variables are generated for all SV processes (Durbin

and Koopman 2000 and Scharth and Kohn 2016), and pairs of inverse gamma variables are

generated using a Gaussian copula with perfect negative correlation for the mixture components.

Once the prorogation of all particles is finished, the importance weights are recalculated as

ωit+1 =

ωit × p(yt+1|xit+1)p(xit+1|xit, yt)/kq(xit+1, x

it; δt+1), if resampling

ωit × p(yt+1|xit+1)p(xit+1|xit, yt)/q(xit+1|xit, y1:T ), otherwise,

where the conditional observation density p(yt|xt) is given by (25). The propagation of particle

system at time t completes by calculating the estimate of the likelihood contribution via

p(yt+1|y1:t) =

(∑M

i=1−→ω it)(∑M

i=1 ωit+1), if resampling∑M

i=1 ωit+1, otherwise,

Once the propagation of particle system reaches time T , the unbiased estimate of likelihood is

simply given by p(y1:T |θ) = p(y1|θ)∏Tt=2 p(yt|y1:t−1, θ) with obvious modification to p(y1|θ).

The unbiased estimate of the likelihood p(y1:T |θ) found by PEIS is a central piece for applying

IS2 to find the estimate of marginal likelihood. It follows that

p(y1:T ) =1

S

S∑s=1

w(θs), where w(θs) =p(y1:T |θs)π0(θs)

q(θs|y1:T )and θs ∼ q(θ|y1:T ). (27)

We follow Tran et al. (2014) to choose the optimal number of particles in PEIS which balances

the trade-off between variance reduction and computing time. They show that when there is

overhead cost of designing importance density, such as computing time allocated to find the

EIS importance density, the total computing time is the product of Var(log p(y1:T |θ)) in its

exponent and the overhead cost. This sheds some light on our proposed way of constructing the

24

importance density – it is less costly and more stable to build p + n EIS importance densities

as approximation than to build an exact high-dimensional EIS importance density.

Lastly, we choose q(θ|y1:T ) to be a m-component Gaussian mixture constructed from Markov

chain for θ via EM algorithm. The number of component m is determined by BIC.

3.2 Forecasting and filtering

It is straightforward to perform forecasting based on the output from a particle filtering algo-

rithm. Keeping θ at its posterior mean, at time T the particles xkT Kk=1 with normalised weights

ωkT Kk=14 are propagated 1-period forward based on their transition dynamics. Due to leverage

effect, we still approximate factors and idiosyncratic noise at time T through (26). The 1-period

ahead forecast yT+1 is given by

yT+1|y1:T , θ =

K∑k=1

ωkT (ykT+1|y1:T , θ),

where each ykT+1|y1:T , θ is imputed from

ykT+1|y1:T , θ ∼ N(ykT+1 + ΛF kT+1,ΩkT+1), Ωk

T+1 = ΛV kT+1Λ′ + UkT+1,

where ykT+1, F kt+1, V kT+1 and UkT+1 are as in (18). Propagating all SV processes and inverse

gamma mixture components S-period forward, S ≥ 2, gives the multi-period ahead forecast

yT+S |y1:T , θ =K∑k=1

ωkT (ykT+S |y1:T , θ). (28)

The predicted total return over S periods∑S

s=1 yT+s|y1:T , θ thus follows a mixture Gaussian

distributionS∑s=1

yT+s|y1:T , θ ∼K∑k=1

ωkTN(S∑s=1

ykT+s + ΛF kT+s,S∑s=1

ΩkT+s).

The above can be used to estimate moments of returns over S periods which are essential for

portfolio management and benchmarking, and also for other statistics such as tail index and

VaR.

It is also of interest to find the filtered estimate of mean return, covariance matrix and

correlation matrix when financial decisions need to made online. The filtered mean return and

4K here is chosen to be larger than the number of particles M used to estimate the likelihood.

25

covariance matrix are

µt|t−1 = E(yt|y1:t−1, θ), Ωt|t−1 = E(Ωt|y1:t−1, θ),

which can be estimated by

µt|t−1 =K∑k=1

ωkt−1(ykt + ΛF kt ), Ωt|t−1 =K∑k=1

ωkt−1Ωkt ,

where Ωkt = ΛV k

t Λ′ + Ukt and ωkt−1Kk=1 are the normalised weights pertaining to the particle

system at time t− 1. Chib et al. (2006) express the filtered correlation as

Rt|t−1 = E(Υt|y1:t−1, θ).

Υt is thus the conditional correlation matrix for yt|hj,tpj=1, Wj,tpj=1, li,tni=1, Qi,tni=1, or

Υt = D(Ωt)− 1

2 ΩtD(Ωt)− 1

2 ,

where D(Σ) denotes the matrix with diagonal elements equal to those of Σ and zero off-diagonal

elements. So Rt|t−1 can be estimated by

Rt|t−1 =K∑k=1

ωkt−1Υkt =

K∑k=1

ωkt−1D(Ωkt )− 1

2 ΩktD(Ωk

t )− 1

2 .

4 Simulation study

For ease of exposition, this section only investigates the effectiveness and efficiency of proposed

sampling method via a simulation study of univariate SV model (1). We highlight the contribu-

tion of EIS-PGAS in efficiency for sampling hyperparameters and latent process, in comparison

with the method developed by Nakajima and Omori (2012). An extensive and detailed stimu-

lation study of the high-dimensional factor SV model (16) is given in the Appendix.

To the best of our knowledge, Nakajima and Omori (2012) pioneered the SV model with

leverage effect using the generalised hyperbolic skew Student’s t-distributed errors. They im-

plement a Metropolis-within-Gibbs sampling algorithm exploiting the mean-variance mixture

representation of the error distribution. Inspired by their approach, we propose EIS-PGAS in

section 2.2 and focus on the efficiency gain resulting from both EIS and ancestor sampling.

26

EIS-PGAS samples the SV process ht|y1:T , θ and mixture component Wt|y1:T , θ simultaneously

while Nakajima and Omori apply the multi-mover sampler of Watanabe and Omori (2004) to

sample ht|y1:T ,W1:T , θ and Wt|y1:T ,W1:T , θ sequentially. Their sampler is less efficient than ours

because ht and Wt appear in product form in the observation likelihood p(yt|ht,Wt, θ). In this

simulation study, we show the efficiency gain is more than moderate.

4.1 Model setup

We simulate 500 series each with length T = 2000 from model (1) with fixed parameter values

φ = 0.95, σ = 0.15, ρ = −0.5, µ = −9, β = 0.5, and ζ = 20. Typical time series for yt, ht and

Wt are shown in Figure 2.

0 500 1000 1500 20000.06

0.04

0.02

0.00

0.02

0.04

0.06yt

0 500 1000 1500 200010.5

10.0

9.5

9.0

8.5

8.0

7.5

7.0ht

0 500 1000 1500 20000

1

2

3

4

5

6Wt

Figure 2: A simulated path of yt with latent process ht and Wt. Left: yt; middle: ht; right: Wt.

We specify the following prior

φ+ 1

2∼ Beta(20, 1.5), $ ∼ IG(2.5, 0.025), ϑ|$ ∼ N(0, 20$),

µ ∼ N(−10, 1), β ∼ N(0, 1), ζ ∼ Gamma(20, 1.25)1(ζ>4),

(29)

where $ = (1−ρ2)σ2, ϑ = ρσ, and 1(.) is an indicator function which equals one if the condition

in brackets hold and zero otherwise. The joint prior π0(ϑ,$) = π0(ϑ|$)π0($) is a conjugate

normal-inverse-gamma prior which facilitates the use of shrinkage prior used in the factor SV

model in section 2.3.1. The above prior distributions reflect popular choices in the literature of

SV models. To compare performance of different sampling schemes, we consider the following

methods:

• EIS-PGAS: Our baseline method – particle Gibbs with ancestor sampling and EIS impor-

tance density;

27

• EIS-PG: Basic particle Gibbs with EIS importance density;

• BF-PGAS: Particle Gibbs with ancestor sampling using bootstrap filter;

• MM-MH: The method of Nakajima and Omori (2012) – multi-move sampler for ht, con-

ditional on which Wt is drawn via an accept-reject M-H algorithm.

While BF-PGAS uses 20, 000 particles in the particle propagation, both EIS-PGAS and EIS-PG

use only 10 particles. In total 22, 000 samples for each parameter are drawn with a discarded

burn-in period of initial 2000 samples. We base our comparison on the inefficiency factor to check

the efficiency under different sampling schemes. The inefficiency factor for a certain parameter

θ is defined as IE(θ) = 1 + 2∑∞

j=1 ρj(θ) where ρj(θ) is the j-th sample autocorrelation. Chib

(2001) shows that IE(θ) measures the degree of mixing of the Markov chain for θ|·. If IE(θ) = m,

the MCMC algorithms requires m times more samples than drawing from uncorrelated samples.

We choose Parzen window with bandwidth 1000 to compute the inefficiency factor.

4.2 Estimation results

Figure 3 reports the sample autocorrelation functions (ACF), the Markov chain sample paths

and the posterior density estimates for one simulated series estimated by EIS-PGAS. Figures

obtained from other simulated series do not suggest qualitative difference. From a similar figure

in Nakajima and Omori (2012), one can already see the ACF of parameters estimated by EIS-

PGAS decay much quicker than those by MM-MH, especially for φ, β and ζ, implying higher

efficiency of EIS-PGAS.

0 5000 10000 15000 200000.90

0.92

0.94

0.96

0.98

1.00φ

0 5000 10000 15000 200000.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21σ

0 5000 10000 15000 200000.7

0.6

0.5

0.4

0.3

0.2ρ

0 5000 10000 15000 20000

9.5

9.0

8.5

8.0µ

0 5000 10000 15000 20000

1.0

0.8

0.6

0.4

0.2β

0 5000 10000 15000 2000010

15

20

25

30

35

40ζ

0 200 400 600 800 10000.2

0.0

0.2

0.4

0.6

0.8

1.0φ

0 200 400 600 800 10000.2

0.0

0.2

0.4

0.6

0.8

1.0σ

0 200 400 600 800 10000.2

0.0

0.2

0.4

0.6

0.8

1.0ρ

0 200 400 600 800 10000.2

0.0

0.2

0.4

0.6

0.8

1.0µ

0 200 400 600 800 10000.2

0.0

0.2

0.4

0.6

0.8

1.0β

0 200 400 600 800 10000.2

0.0

0.2

0.4

0.6

0.8

1.0ζ

0.90 0.92 0.94 0.96 0.98 1.000

5

10

15

20

25

30

35

40

45φ

0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.210

10

20

30

40

50σ

0.7 0.6 0.5 0.4 0.3 0.20

1

2

3

4

5

6

7

8ρ

9.5 9.0 8.5 8.00.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5µ

1.0 0.8 0.6 0.4 0.20.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5β

10 15 20 25 30 35 400.00

0.02

0.04

0.06

0.08

0.10

0.12ζ

Figure 3: EIS-PGAS MCMC results for a randomly chosen series simulated from model (1).From top to bottom: sample autocorrelations, sample paths, and posterior density estimates; From left to right:

φ, σ, ρ, µ, µ, β and ζ.

28

Table 1: Mcmc Results of Different Methods for Univariate Model

EIS-PGAS MM-MHθ Mean St.dev. 95% C.I. IE(θ) Mean St.dev. 95% C.I. IE(θ)

φ 0.955 0.010 [0.939 0.978] 4.526 0.942 0.010 [0.921 0.973] 81.532σ 0.151 0.008 [0.148 0.172] 11.046 0.167 0.015 [0.139 0.202] 159.406ρ -0.512 0.037 [-0.602 -0.415] 23.218 -0.542 0.067 [-0.731 -0.444] 79.157µ -8.979 0.096 [-9.081 -8.755] 4.326 -8.920 0.123 [-9.124 -8.795] 27.582β -0.573 0.101 [-0.790 -0.400] 16.027 -0.714 0.255 [-1.346 -0.318] 163.733ζ 18.963 3.622 [16.462 26.348] 36.356 28.655 5.117 [16.067 37.881] 299.057

EIS-PG BF-PGASθ Mean St.dev. 95% C.I. IE(θ) Mean St.dev. 95% C.I. IE(θ)

φ 0.965 0.011 [0.942 0.975] 64.057 0.812 0.122 [0.764 0.992] 16.746σ 0.162 0.006 [0.154 0.187] 132.744 0.237 0.094 [0.167 0.304] 73.569ρ -0.522 0.034 [-0.712 -0.424] 92.246 -0.204 0.120 [-0.421 0.086] 52.74µ -9.343 0.114 [-9.547 -8.834] 15.321 -10.657 0.284 [-11.050 -8.891] 24.315β -0.688 0.162 [-0.842 -0.460] 93.682 -0.137 0.630 [-1.143 0.722] 51.985ζ 22.462 3.459 [16.785 -30.114] 123.37 46.864 12.795 [19.674 65.049] 96.781

1 Reported is the average of posterior means, standard deviations, lower and upper bounds of 95% credible interval,

and inefficient factors of all the 500 simulated series with each obtained from the 20,000 MCMC samples after a

burn-in period of 2000 samples.2 True DGP: φ = 0.95, σ = 0.15, ρ = −0.5, µ = −9, β = −0.5 and ζ = 20.

Table 1 reports averages of some posterior statistics under the four estimation methods for

all the 500 simulated series. Except for BF-PGAS, the posterior means of all parameters are

found to be close to the DGP. As argued by Lindsten et al. (2014), the frequent resampling in

bootstrap filter may render the PGAS algorithm inaccurate because the probability that the

reference trajectory degenerates to improbable value does not converge to zero. Despite the

inaccurate posterior means, bootstrap filter also leads to much higher standard deviation of

the posterior distribution. The credible intervals of ρ and β covering zero means this method

effectively fails to capture leverage effect and skewness of yt. A conclusion from this observation

is that for the univariate SV model (1), it is important for any sampling scheme to have some

form of importance density which mimics the conditional posterior distribution of the latent

processes in order to deliver reasonable posterior estimates. MM-MH depends on a second order

local approximation of p(h1:T |y1:T ,W1:T , θ) to draw ht, while EIS-based methods generate h1:T

and W1:T together based on a global approximation to p(h1:T ,W1:T |y1:T , θ). Importantly, the

inefficient factors IE(θ) under EIS-PGAS is much lower than that under MM-MH. For some

simulated series, our proposed EIS-PGAS is able to achieve IE(φ), IE(σ) and IE(ζ) that are

20 times smaller than MM-MH. Comparing IE(θ) of EIS-PGAS with that of EIS-PG, it can

29

Table 2: Correlation Matrix of Posterior Samples

EIS-PGAS MM-MHφ σ ρ µ β ζ φ σ ρ µ β ζ

φ 1 -0.27 0.02 0.01 0.00 -0.03 φ 1 -0.64 -0.15 0.05 -0.01 -0.03σ 1 -0.04 -0.03 -0.03 0.05 σ 1 0.13 -0.10 -0.16 0.08ρ 1 0.05 0.14 -0.02 ρ 1 0.04 0.21 0.13µ 1 0.15 0.11 µ 1 0.27 0.18β 1 -0.24 β 1 -0.79ζ 1 ζ 1

EIS-PG BF-PGASφ σ ρ µ β ζ φ σ ρ µ β ζ

φ 1 -0.48 -0.17 0.01 -0.01 -0.04 φ 1 -0.59 -0.11 0.09 0.05 -0.1σ 1 0.09 -0.11 -0.08 0.05 σ 1 -0.15 -0.18 -0.08 0.13ρ 1 0.07 0.13 0.06 ρ 1 0.15 0.08 0.19µ 1 0.19 0.20 µ 1 0.16 0.35β 1 -0.28 β 1 -0.45ζ 1 ζ 1

Reported is the average of correlation matrix of posterior samples of all the 500 simulated series.

be said that the majority of efficiency gain in terms of how well the Markov chain mixes comes

from the ancestor sampling employed in particle Gibbs algorithm; however, the EIS importance

density used when propagating particles is also instrumental for efficiency gain if we compare

EIS-PG with MM-MH or compare EIS-PGAS with BF-PGAS. Notice that although BF-PGAS

achieves good mixing of the Markov chain, the posterior estimates are effectively useless as they

are too far away from the DGP.

Table 2 shows the averages of correlation matrix of the posterior samples under four es-

timation methods. If the correlation coefficient between two chains is near unity in absolute

value, the sampler is inefficient as it samples from a narrow state space. The table shows that

EIS-PGAS samples from a satisfactorily wider state space than the other three methods. In

particular we notice that MM-MH gives corr(β, ζ) = −0.79 and corr(φ, σ) = −0.64 which triple

those values given by EIS-PGAS. This would mean that one needs to base inference of the pos-

terior distribution on much more samples if MM-MH is applied. In addition, MM-MH achieves

accuracy but its Gibbs sampler iterating over h1:T |y1:T ,W1:T , θ and W1:T |y1:T , h1:T , θ based on

a local approximation of the conditional posterior distribution delivers a Markov chain with less

appealing mixing than the EIS-based method which based on a global approximation to the

joint posterior distribution.

We emphasise that the efficiency of a multivariate sampler and the accuracy of calculated

30

marginal likelihood crucially depend on the constructed EIS-PGAS sampler for the univariate

model. Recall that with marginalisation of factors, the factor SV model boils down to n +

p individual univariate models. This would mean that the results from simulation study on

the high-dimensional factor SV model, which is detailed in the Appendix, are expected to be

comparable to the univariate case.

5 Empirical study

With the emergence of powerful computing technologies, traditional hedge funds and investment

institutions are turning their eyes to portfolios with increasing dimensionalities where human

judgment of investment strategy starts to play a diminishing role. One important task is risk

management, i.e. tracking and forecasting the covariance matrix of the portfolio components

over time. In this section, we demonstrate an application using our proposed high-dimensional

factor SV model with the EIS-PGAS estimation method, followed by an exercise on forecasting

the covariance matrix and determination of VaR.

5.1 Data and models

The dataset we use has 80 equity or asset returns which is as far as we know among the largest in

SV literature. It contains weekly stock returns from the S&P100 index5 which are obtained from

Yahoo Finance. It covers periods including the 1998 Asian crisis, the 2000 dot-com bubble, the

2008 U.S. financial crisis and the 2012 European debt crisis with a total data length of T = 1095

trading weeks. Model is formulated as in (16). Additionally, shrinkage priors for ρ and β are

applied because we presume that leverage effect and skewness are not ubiquitous across factors

and individual assets. Table 3 shows the values of Bai and Ng (2002) criteria in determining

the number of factors. Both IC p1 and IC p2 suggest 4 factors while IC p3 suggests 6. Later

we show that based on the IS2 marginal likelihood criterion the number of factor chosen is 4,

supporting the former two non-Bayesian criteria.

5.2 Estimation results

Figure 4 shows the posterior mean estimates of the loadings on the four factors with associated

standard deviations. The first observation is that most loadings tend to have positive sign

with similar magnitudes, which can be interpreted as individual asset’s sensitivity to market

5The 20 stocks in the index composition that are not chosen have too short trading history.

31

https://en.wikipedia.org/wiki/S%26P_100

Table 3: Number of Factors By Bai&Ng Criteria

number of factorsCriterion

IC p1 IC p2 IC p3

4 -6.52148 -6.51769 -6.533705 -6.52131 -6.51658 -6.536596 -6.52012 -6.51444 -6.53846

The three criteria correspond to Corollary 1 and equation (9) in Bai and Ng (2002). Shaded cell indicates the num-

ber of factors determined by associated criterion.

movement measured by the extracted factor. The fact that model (16) also takes into account

the factor SV and the “shock variable” in the mean suggests that market shocks Wj,t are part

of systematic risk with stochastic weights that are proportional to exp(hj,t/2), j = 1, ..., 4.

For the 2-nd and the 4-th factor, we notice that many loadings are close to zero with distinct

exceptions, suggesting the drivers for these two factors may come from a few asset returns in

the index composition. But we should notice that the proposed factor model is identified up to

a rotation, so sparsity of factor loadings does not necessarily mean lack of systematic content

for these factors.

10 20 30 40 50 60 70 80-0.5

0

0.5

1

1.5

fact

or lo

adin

gs

0

0.1

0.2

0.3

0.4

stan

dard

dev

iatio

n

Pos. meanstd.

10 20 30 40 50 60 70 80-2

0

2

fact

or lo

adin

gs

0

0.2

0.4

stan

dard

dev

iatio

n

10 20 30 40 50 60 70 80-1

-0.5

0

0.5

1

1.5

fact

or lo

adin

gs

0

0.1

0.2

0.3

0.4

0.5

stan

dard

dev

iatio

n

10 20 30 40 50 60 70 80

assets

-1

0

1

fact

or lo

adin

gs

0

0.1

0.2

stan

dard

dev

iatio

n

Figure 4: Posterior estimate of the factor loadings Λ. From top to bottom are the posterior mean

(bar) and standard deviation (dot) of the loadings on the 1-st, 2-nd, 3-rd and 4-th factors. Due to identification

restriction, the upper diagonal block of Λ is fixed.

Table 4 summarises the sample mean and standard deviation of posterior mean estimates

of autoregressive coefficient φ, volatility of volatility σ, unconditional mean µ and the d.o.f.

32

Table 4: Summary of Posterior Estimate

parsMean and standard deviation of posterior

mean s.t.d. C.I. lb C.I. ub skewness

φ 0.971 (0.033) 0.007 (0.004) 0.962 (0.037) 0.977 (0.033) -0.536 (0.279)σ 0.197 (0.095) 0.019 (0.010) 0.184 (0.082) 0.209 (0.091) 0.082 (0.017)µ -7.578 (0.454) 0.328 (0.109) -7.670 (0.468) -7.308 (0.449) -0.404 (0.396)ζ 28.885 (1.603) 5.436 (0.371) 24.265 (1.489) 31.434 (1.708) 0.409 (0.169)

The table shows the mean of 84 posterior means, standard deviations, the lower (lb) and upper bounds (ub) of 95%

credible interval, and skewness obtained from the MCMC samples after burn-in. In the bracket is the associated

standard deviation among 84 series of one posterior statistics.

parameter ζ among the 84 series consisting of 4 factor processes and 80 individual asset-specific

processes. Mean of φ is 0.97 with quite small standard deviation, which indicates most of SV

series of ht and lt are quite persistent. We find that only 5 assets with φli smaller than 0.9,

and 21 assets smaller than 0.95. The value of mean volatility of volatility σ is in line with

many other researches on univariate SV model. There is however one asset with a near-zero σ.

This means that four factors together account for most of its first and second order variation.

The sample skewness of φ, σ and ζ is most likely due to the parameter transformation applied,

however the sample mean of the skewness of µ is found to result from five very left-skewed µ’s,

of which four are asset-specific and one is from h4,t, the SV of f4,t. This can also be seen from

the large standard deviation of the skewness estimates, suggesting the estimation procedure

produces quite different posterior distributions of p(µ|·) among the 84 series.

Table 5 reports some statistics summarising the inefficiency factors obtained using the pro-

posed EIS-PGAS algorithm including minimum, maximum and interquartile range (IQR). For

parameters pertaining to the 84 SV models, inefficiency factors are very comparable to the one

obtained in the simulation study of the univariate model. This supports our previous claim that

once the multivariate model is split into n+ p individual univariate models, EIS-PGAS is able

to produce an MCMC sample almost as efficiently as in the case of a univariate model. For

such a complex model structure with more than 800 parameters to estimate, that EIS-PGAS

delivers smaller than 20 IE(φ) and IE(µ) in almost all 84 individual Markov chains suggests

our method is highly efficient. Also, there is only one IE(σ) larger than 50 and the tight IQR

also suggests the fast decay of the autocovariance of the Markov chain. IE(ζ) tends to be larger

than the previous three parameters, similar to the case of the univariate model. The last four

columns give the inefficiency factor of loadings on the four factors. Remember that the sampling

33

Table 5: Summary of Inefficiency Factor

statisticsParametersφ σ µ ζ Λ1 Λ2 Λ3 Λ4

medium 11.32 27.79 6.35 67.24 14.86 45.24 33.64 69.70min 5.67 22.37 3.77 32.85 8.94 25.53 21.71 42.57max 21.27 52.46 13.38 94.21 19.63 87.04 50.38 108.67IQR 13.68 30.56 8.10 47.16 6.34 51.90 18.66 57.83

Based on 84 estimates for each parameter, the table shows the summary of inefficiency factors delivered by the pro-

posed EIS-PGAS algorithm. Λj stands for the loadings on the j-th factor. IQR is the interquartile range.

scheme for Λ we use is a standard MH algorithm based on Laplace approximation, but with the

help of marginalisation of factors, factor loadings can be sampled efficiently, a similar result of

Chib et al. (2006). Compared to IE(Λ2) and IE(Λ4), the tighter IQR of IE(Λ1) and IE(Λ3)

may be caused by the many-near-zero loadings on the 2-nd and 4-th factor as seen in Figure 4.

Figure 5 illustrates the posterior shrinkage estimate of the leverage effect ρ and skewness

parameter β, sorted in ascending order. The left and middle panel show that both leverage effect

and asymmetry carry some systematic content. Although ρf2 and ρf3 are between −0.1 and 0,

ρf1 and ρf4 are clearly non-zero and negative, shared by all assets. Asymmetry from the second

factor contributes the most to the observed return asymmetry for individual assets. Comparing

the shrinkage estimate of β with ρ, we can see that apart from the factor leverage effect, many

return series still possess some amount of asset-specific leverage effect, which can be told by the

right panel showing that many individual assets enjoy smaller than 0.8 posterior probability of

zero leverage. This is different from the case of asymmetry where βf accounts for almost all

asymmetry in asset returns. Consequently, the phenomenon of systematic asymmetry of asset

returns may imply risk premium imposed on the third moment of the “market portfolio”.

The lefe graph of Figure 6 illustrates the posterior mean estimate of the SV of f1,t and

f4,t, i.e. the estimate of exp(h1,t/2) and exp(h4,t/2). The volatility of the f4,t is extremely

high in 2008, suggesting its role as a crisis factor. The middle graph shows the estimate of the

time-varying volatility of three chosen asset returns, which is computed via

σi,t =( 4∑j=1

Λ2ij exp(hj,t) + exp(li,t)

) 12 ,

where i ∈ 1, ..., 80 is the index for the asset. That the three volatility series behave very differently

highlights the room for modelling asset-specific SV process on top of factor SV which we believe

34

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

leve

rag

e e

ffe

ct ;

Pos. mean1st factor2nd factor3rd factor4th factor

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

ske

wn

ess

-

Pos. mean1st factor2nd factor3rd factor4th factor

0 0.2 0.4 0.6 0.8 1

P(;|.)=0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P(-

|.)=

0

individual asset1st factor2nd factor3rd factor4th factor

Figure 5: Sorted posterior estimate of ρ and β. Left / Middle: posterior mean estimate of leverage effect

ρ / skewness parameter β with 95% credible interval; Right: posterior zero probability of β against that of ρ.

Coloured dots indicate the parameters corresponding to four factors.

has profound consequence in covariance matrix forecasting and risk evaluation. Furthermore, it

is known that asset returns tend to co-move during periods of market turmoil, and we can see

this in the right graph. It shows the implied time-varying correlations among the three chosen

asset returns, which are calculated as

Corrij,t =

∑4k=1 ΛikΛjk exp(hk,t)

σi,tσj,t.

In the year of financial crisis the three correlation series start climbing up, one of which even

shoots up to over 0.4. Yet we have to notice that outside the crisis period the correlation can

be low in absolute value, and correlations have different magnitude and volatility. The latter

indicates that if one uses models with equicorrelation, portfolio management may not be optimal

in terms of diversification.

01/95 07/97 01/00 07/02 01/05 07/07 01/10 07/12 01/15 07/170

0.05

0.1

0.15

0.2

0.25

1st factor4st factor

01/95 07/97 01/00 07/02 01/05 07/07 01/10 07/12 01/15 07/170

0.05

0.1

0.15

The Dow Chemical CompanyWalgreens Boots Alliance, Inc.PayPal Holdings

01/95 07/97 01/00 07/02 01/05 07/07 01/10 07/12 01/15 07/17-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

The Dow Chemical vs. Walgreens BootsThe Dow Chemical vs. PayPalWalgreens Boots vs. PayPal

Figure 6: Posterior mean of factor and stochastic volatility process. Left: SV exp(hj,t/2) from the

j = 1, 4-th factors; Middle: Volatility σi,t of three chosen asset returns; Right: Implied time-varying correlations

Corrij,t among the three asset returns.

35

The extracted factors are model-based, so one may be interested in their relationship with

market indicators such as the classic Fama-French factors (Fama and French, 1993). To examine

this, we run simple linear regression of the filtered estimate of four model-based factors on each

one of the three Fama-French factors, i.e. Rm-Rf, SMB, and HML during the same sample

period, and their t-statistics are shown in Figure 7. From the figure, it can be seen that the

variation of Rm-Rf, SMB and HML is respectively explained by the 2-nd, the 1-st and the 3-rd

factor. There is however no statistical evidence that Fama-French factor is jointly explained by

multiple model-based factors6. Based on this, we conjecture that each extracted factor contains

unique market information and measures different systematic movement from the factors con-

structed ad hoc by Fama and French. This exercise can be extended to other market factors.

For example, the momentum factor in the four-factor model of Carhart (1997) which introduces

an extra index describing the stock price’s tenaciousness of moving in one direction captures

systematic content outside the Fama-French three factors.

t sta

tistic

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Figure 7: Explanatory content of four factors on the three factors of Fama and French (1993).For the left to the right are the t-statistics of regression of posterior mean of f ′t|1:t−1 for all t on each one of the

three Fama-French factors: Rm-Rf, SMB, and HML. Red indicates significant effect.

Table 6 shows the Bayes factor calculated via IS2 marginal likelihood for model specifications

with different number of factors. The number of factors under consideration are between 2 and

6, in line with other literature. Model with 4 factors is preferred over all other specifications, in

particular over model with 5 and 6 factors, which is the choice made by the IC p3 of Bai and

Ng (2002). Also the IC p1 delivers almost equal values for specification with 5 and 6 factors.

Via the use of IS2 for calculating the marginal likelihood, we can safely choose a model with 4

factors. Other comparisons show that the model with 3 factors is slightly preferred over 6-factor

model, and evidently preferred over the model with 5 factors.

6Notice that the model-based factors are identified up to a rotation, but joint significance is not affected bysuch rotations

36

Table 6: Number of Factor Based on Marginal Likelihood

Jeffrey’s scaleNumber of factors

4/2 4/3 4/5 4/6 3/5 3/6 6/5

1-3.2 – – – – –√

–3.2-10 – – – –

√–

√

10-100√ √

– – – – –> 100 – –

√ √– – –

Shaded cell indicates the Bayes factor using IS2 marginal likelihood for one choice of number of factors against an-

other falls into a certain category given by the Jeffrey’s scale.

5.3 Dynamic portfolio management

To see how our proposed model and estimation methods may work in practice, we compare

another five models in terms of VaR and portfolio performance under both one-week and two-

week rebalancing dynamics. Two portfolios are considered: 1). a U.S. portfolio which is the

dataset used in the previous application, i.e. 80 equity return series from components of S&P

100 index; 2). an Australian portfolio containing 41 return series from components of S&P ASX

50 index. A rolling window exercise of size T = 600 is carried out with S = 495 out-of-sample

trading weeks. Throughout the following, our model is abbreviated by HFSV.

5.3.1 Design and alternative models

We choose competing models which also adapt a factor structure, which in fact is usually con-

sidered a viable modelling framework when high-dimensional dataset is of interest7.

The first model is the multivariate stochastic volatility model (MSV) of Chib et al. (2006).

The second model is the same model but augmented with stochastic jumps (MSV-J). MSV-J is

formulated as

yt = Λft +Ktqt + ut,

where factor fj,t is Gaussian with standard stochastic volatility , i.e. no leverage effect or

asymmetry. The idiosyncratic term ui,t is Student’s-t error with standard stochastic volatility.

Kt is a diagonal matrix recording jump size at time t. qi,t is a Bernoulli random variable. MSV

does not have the jump term. We replace the estimation method for stochastic volatility given

in the original paper by our EIS-PGAS algorithm.

7Comparisons with popular models such as the BEKK, DCC, CCC, DECO, VGARCH model, and variants ofthem are left out, because these models do not have a factor structure. Though comparisons with them are stillinteresting, readers may refer to papers of our chosen competing models and references therein.

37

The third model, denoted by CKL, is the factor model of Chan et al. (1999) which writes

yt = Λft + ut

Ωt = ΛVtΛ′ + Ut,

where ft is a vector of constructed (thus observed) factors, which in this exercise contains the

three Fama-French factors8, i.e. ft = ((Rm-Rf)t, SMBt,HMLt)′. The covariance matrix Vt is

computed by rolling-windows, i.e. Vt1L

∑t−1l=t−L flf

′l , and Ut is the sample covariance matrix of

residuals from asset-by-asset regressions which find all rows of Λ.

The fourth model is the dynamic factor multivariate GARCH (DFMG) model of Santos and

Moura (2014). The model also uses constructed factors but is more flexible, and given by

yt = Λtft + ut

Ωt = ΛtVtΛ′t + Ut,

λk,t+1 = λk,t + ηt,

where λk,t is the k-th element of vec(Λt), k = 1, ..., p×n, which follows a random walk. Vt and Ut

are diagonal matrix with each element evolving according to a standard GARCH dynamics. fi,t

and ut are assumed to be Gaussian and Student’s-t, in line with the MSV and MSV-J model. To

estimate the model, one firstly estimate the GARCH dynamics for ft and obtain Vt. Secondly,

if treating constant, Λ can be obtained by OLS. The residuals are then passed into a GARCH-t

filter, delivering Ut. Thirdly, given Vt and Ut for t = 1, ...T , vec(Λt) is obtained via Kalman

filter, and the covariance matrix of ηt is estimated by quasi -maximum likelihood.

The last model we consider is the factor copula (FCO) model of Oh and Patton (2017). This

model provides a novel way of modelling high-dimensional dependence structure and allows for

enough flexibility. Computation complexity for this model is comparable to a factor GARCH

model such as DFMG. For ease of exposition, we leave out the model specification and forecasting

procedure for FCO. Readers may refer to the original paper. Also, we choose GARCH marginals

and a Gaussian factor model for simplicity, which implies a Gaussian conditional copula.

We consider a basic dynamic minimum-variance portfolio (MVP) problem. The MVP is

dynamic because rebalancing is allowed, and the rebalancing decision is based on filtered estimate

of the portfolio’s conditional covariance matrix. The MVP determines the n-by-1 portfolio

8The three factors for the U.S. portfolio is readily found online. We construct those for the Australian portfoliobased on definitions in Fama and French (1993).

38

weights ωt+h|t at time t to rebalance at time t+ h such that

ωt+h|t = arg minωω′Ωt+h|tω, subject to ω′ = 1,

where is a vector of ones. The solution of this MVP problem is given by

ωt+h|t =Ω−1t+h|t

′Ω−1t+h|t

.

For HFSV, MSV and MSV-J model, Ωt+h|t is obtained via methods in Section 3.2 9. For CKL

model, Ωt+h|t is simply set to be equal to Ωt. For DFMG and FCO model, it is straightforward

to use a GARCH-like recursive algorithm to compute Ωt+h|t.

5.3.2 VaR and performance

One important task in portfolio management is the determination of portfolio VaR at time t+ 1

given information up to time t, or V aRp,t+1|t. Provided the portfolio weights ωt+1|t solved from

the MVP problem, the one-step ahead VaR at α% level is given by

V aRp,t+1|t(α) =√ω′t+1|tΩt+1|tωt+1|tF

−1yp,t+1|t

(α),

where F−1yp,t+1|t

(α) is the α-th percentile of the distribution function of the one-step ahead pre-

dicted portfolio return yp,t+1|t = ω′t+1|tyt+1|1:t. For HFSV, MSV and MSV-J model, the distri-

bution function of yt+1|1:t can be readily estimated based on the particle system at time t as

in equation (28). For other models, the conditional forecasting density can derived straightfor-

wardly similar to GARCH type of models.

The unconditional and conditional coverage ratio test used by Chib et al. (2006) is applied

to investigate the quality of VaR estimates. We define the following binary sequence It as

It =

1 if ω′t+1|tyt+1 < V aRp,t+1|t

0 if ω′t+1|tyt+1 ≥ V aRp,t+1|t

.

It = 1 means hits or exceptions. Well behaved VaR estimates means the sequence It should

have correct unconditional coverage ratio, i.e. E(It) = α. A likelihood ratio (LR) test based

on the hit rate (HR) 1T

∑Tt=1 It can be constructed for the unconditional coverage. According

9The sampler for MSV-J has one extra step to draw Kt and qt. See Chib et al. (2006) for details.

39

Table 7: Quality of VaR Estimates for The U.S. Portfolio

S&P 100α=0.01 α=0.05

HR LRuc LRind LRcc HR LRuc LRind LRcc

HFSV 0.011 0.71 0.75 0.89 0.051 0.86 0.20 0.43MSV 0.017 0.13 0.68 0.30 0.046 0.71 0.46 0.71MSV-J 0.012 0.69 0.37 0.62 0.051 0.86 0.87 0.97CKL 0.019 0.07 0.27 0.10 0.084 0.00 0.07 0.00DFMG 0.023 0.01 0.62 0.02 0.121 0.00 0.12 0.00FCO 0.009 0.70 0.23 0.45 0.058 0.36 0.34 0.42

The table shows p-values of coverage ratio tests for the US portfolio. Portfolio weights are updated weekly based

on one-step ahead forecast of covariance matrix. α is the nominal level of VaR. Shaded cells indicate rejection of

coverage ratio test at 10% level.

Table 8: Quality of VaR Estimates for The Australian Portfolio

ASX 50α=0.01 α=0.05

HR LRuc LRind LRcc HR LRuc LRind LRcc

HFSV 0.014 0.44 0.24 0.37 0.051 0.86 0.37 0.66MSV 0.015 0.25 0.51 0.42 0.054 0.72 0.46 0.71MSV-J 0.012 0.69 0.36 0.61 0.048 0.85 0.21 0.45CKL 0.014 0.44 0.48 0.58 0.036 0.12 0.08 0.06DFMG 0.018 0.07 0.09 0.04 0.079 0.00 0.12 0.00FCO 0.009 0.73 0.11 0.26 0.053 0.71 0.20 0.41

The table shows p-values of coverage ratio tests for the Australian portfolio. Also see descriptions of Table 7.

to Christoffersen (1998), for dynamic models conditional coverage ratio is more relevant which

depends on serial independence of It. The test statistic LRuc for testing unconditional coverage

and LRind for serial independence can be constructed using It and they are both asymptotically

χ2(1)-distributed. The combined statistic LRcc = LRuc+LRind for testing conditional coverage

is asymptotically χ2(2)-distributed.

Table 7 and 8 report the p-values of LR tests for the U.S. and Australian portfolio with shaded

cells indicating rejection at 10% level. Comparing HR’s given by different models, HFSV is the

most accurate in estimating VaR, except for the case of Australian portfolio targeting VaR at

1% nominal level. MSV is always less accurate than MSV-J, highlighting the need for modelling

“jumps” or “shocks”. Though MSV incorporates Student’s t-distributed idiosyncratic errors, its

performance implies insufficiency in only modelling asset specific “shocks”. FCO also estimates

VaR well, perhaps with the exception of the U.S. portfolio targeting 5% nominal level, though

test results do not reject its validity.

40

Interestingly, all shaded cells come from either CKL or DFMG, both using constructed

factors. We conjecture this has to do with rebalancing. Because one updates the portfolio

weights based on covariance matrix forecast, on which the estimation of VaR critically depends,

constructed factors are proxies that may not adequately reveal the unobserved factor structure.

As a result, the forecast gets contaminated when a certain proportion of assets deviates from

factors. Additionally, HFSV is the only model taking into account asymmetry and leverage

effect, which are believed to influence HR.

Besides risk management, portfolio performance is also evaluated based on Sharpe ratio (SR)

and information ratio (IR). SR measures the risk-adjusted return per unit of portfolio return

variability. A portfolio that is rebalanced on a h-week basis has

SR(h) =µ(h)

σ(h), where µ(h) =

1

S − h

S−h∑s=1

ω′T+s+h|T+syT+s+h,

σ2(h) =1

S − h

S−h∑s=1

(ω′T+s+h|T+syT+s+h − µ(h)

)2.

IR is often used to set portfolio constraints for managers such as tracking risk limits. It measures

how much excess return can be generated from the amount of excess risk relative to a chosen

benchmark. Here we choose S&P 100 and ASX 50 index return as benchmark for the U.S. and

Australian portfolio. The IR is given by

IR(h) =µ(h)

σ(h), where µ(h) =

1

S − h

S−h∑s=1

ω′T+s+h|T+s(yT+s+h − µB,T+s+h),

σ2(h) =1

S − h

S−h∑s=1

(ω′T+s+h|T+s(yT+s+h − µB,T+s+h)− µ(h)

)2,

where µB,t is the benchmark return at time t.

Model comparisons are carried out in terms of portfolio average weekly return, variance,

SR and IR for the out-of-sample period considered. Table 9 shows that for the U.S. portfolio

the equally weighted portfolio gives the highest variance and the lowest mean return. This

would suggest that the equally weighted portfolio is inefficiently managed and locates inside

the conditional efficient frontier implied by different models, and in the bottom area of the

conditional feasible set10. This is in contrast to the Australian portfolio summarised in Table

10. Among all models, HSFV delivers the lowest portfolio return variance, and it is the only

10The efficient frontier and feasible set are conditional because the mean and covariance matrix of asset returnsat time t+ h is determined conditional on information up to time t.

41

Table 9: The U.S. Minimum Variance Portfolio Performance

S&P 100h=1 h=2

µ× 102 σ2 × 104 SR IR µ× 102 σ2 × 104 SR IR

EquWgt 0.150 7.385 0.055 0.041 0.152 7.398 0.056 0.043HFSV 0.205 2.301 0.135 0.033 0.188 2.241 0.126 0.046MSV 0.187 2.462 0.119 0.036 0.174 2.520 0.110 0.051MSV-J 0.183 2.209 0.123 0.052 0.186 2.534 0.117 0.032CKL 0.146 4.671 0.068 0.020 0.160 4.215 0.078 0.016DFMG 0.155 5.205 0.068 0.037 0.175 4.813 0.080 0.022FCO 0.174 2.911 0.102 0.033 0.203 3.116 0.115 0.040

The table shows the average weekly MVP returns and variances for the U.S. portfolio under different models. EquWgt

denotes a equal weighted portfolio. SR and IR are also reported where the latter is relative to S&P 100 index re-

turn. One- and two-week rebalancing policies are considered. Shaded cells indicate the best performer with lowest

variance, highest mean, highest SR, or highest IR.

Table 10: The Australian Minimum Variance Portfolio Performance

ASX 50h=1 h=2

µ× 102 σ2 × 104 SR IR µ× 102 σ2 × 104 SR IR

EquWgt -0.128 6.077 -0.052 -0.162 -0.130 6.084 -0.053 -0.161HFSV -0.086 2.164 -0.063 0.119 -0.106 2.043 -0.057 0.126MSV -0.147 2.283 -0.097 0.124 -0.134 2.764 -0.081 0.101MSV-J -0.132 2.280 -0.087 0.094 -0.177 2.655 -0.109 0.025CKL -0.115 4.455 -0.055 -0.174 -0.127 4.860 -0.058 -0.153DFMG -0.190 3.037 -0.109 -0.099 -0.194 3.675 -0.101 -0.231FCO -0.180 2.634 -0.111 0.028 -0.146 2.648 -0.090 0.062

The table shows the MV portfolio performance for the Australian portfolio. Also see descriptions of Table 9.

model achieving mean return higher than the equally weighted portfolio under both weekly and

biweekly rebalancing. It means that for other models, the equally weighted portfolio locates in

the upper half of their conditional feasible sets. For the U.S. portfolio rebalanced weekly, HFSV

delivers the second lowest variance, slightly higher than MSV-J. This is due to the jumps in

MSV-J explaining larger variations by stochastic jumps, though the variance under a biweekly

rebalancing policy becomes bigger than HFSV. Another observation is that the return variances

clearly fall in two groups. The first includes HFSV, MSV, MSV-J and FCO, whose factors are

model- and data-based. The second, showing larger variance, includes CKL and DFMG, which

uses constructed factors. This indicates that the conditional efficient frontier implied by the first

group of models lies to the left of that implied by the second group.

Importantly, HFSV delivers the highest SR for the U.S. portfolio under both rebalancing

42

policies, with the main competitor MSV-J. Although the U.S. portfolio managed using HFSV

compensates investors the most for the risk taken, the Australian portfolio rebalanced biweekly

suggests the superior performance of HFSV in relation to the risks investors choose to deviate

from the benchmark, i.e. high IR. Yet for the U.S. portfolio, MSV and MSV-J model give

the highest IR, followed by HFSV. FCO produces moderately-performing SR, but its deviation

from the benchmark fluctuates more, making its IR lower than other models with unobserved

factors. One should notice that because the choice of benchmark is subjective and influences the

calculation of IR, a low IR should not be seen as decisive evidence of poor model performance.

A final remark is that the SR and the IR are low because we only consider MVP. This is to say

that investors are assumed to have infinitely large risk-aversion. Should a certain degree of risk

is allowed and a certain return is required, both can increase.

6 Conclusion

We propose a high-dimensional factor stochastic volatility model with leverage effect using the

generalised hyperbolic skew Student’s t-error to address asymmetry and heavy tails of equity re-

turns. The model is shown to be flexible enough to distinguish asset-specific mean and volatility

dynamics from common factors. With shrinkage technique, the model helps answer the question

whether leverage effect and return asymmetry are systematic or idiosyncratic. A highly efficient

Bayesian estimation procedure to sample hyperparameters and unobserved volatility processes

is developed and we show that based on marginalisation of factors, factor loadings can be sam-

pled efficiently leading to a set of individual stochastic volatility models where particle efficient

importance sampling and refined particle Gibbs with ancestor sampling can be used. Addition-

ally, importance sampling squared accurately calculates marginal likelihood to determine the

number of factors. Our detailed Monte Carlo study on both univariate and multivariate models

provides evidence on the successful implementation of the proposed model and method. We

apply our model to a U.S. dataset with 80 assets, and find that large proportion of return asym-

metry comes from factors, indicating a co-skewness systematic phenomenon. Lastly, minimum

variance portfolio exercises for the U.S. portfolio and another Australian portfolio show that

estimation of VaR is very accurate using our proposed model. Under both weekly and biweekly

rebalancing policies, the model outperforms other factor models.

43

References

Aas, K. and Haff, I. H. (2006). The generalized hyperbolic skew Student’s t-distribution. Journalof Financial Econometrics, 4(2):275–309.

Aguilar, O. and West, M. (2000). Bayesian dynamic factor models and portfolio allocation.Journal of Business & Economic Statistics, 18(3):338–357.

Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269–342.

Asai, M., McAleer, M., and Yu, J. (2006). Multivariate stochastic volatility: a review. Econo-metric Reviews, 25(2-3):145–175.

Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.Econometrica, 70(1):191–221.

Bickel, P., Li, B., Bengtsson, T., et al. (2008). Sharp failure rates for the bootstrap particlefilter in high dimensions. In Pushing the limits of contemporary statistics: Contributions inhonor of Jayanta K. Ghosh, pages 318–329. Institute of Mathematical Statistics.

Bollerslev, T. (1987). A conditionally heteroskedastic time series model for speculative pricesand rates of return. The review of economics and statistics, pages 542–547.

Bollerslev, T. (1990). Modelling the coherence in short-run nominal exchange rates: a multi-variate generalized arch model. The review of economics and statistics, pages 498–505.

Bollerslev, T., Engle, R. F., and Nelson, D. B. (1994). Arch models. Handbook of econometrics,4:2959–3038.

Carhart, M. M. (1997). On persistence in mutual fund performance. The Journal of finance,52(1):57–82.

Carrasco, M. and Chen, X. (2002). Mixing and moment properties of various garch and stochasticvolatility models. Econometric Theory, 18(01):17–39.

Chan, J. C., Leon-Gonzales, R., and Strachan, R. W. (2013). Invariant inference and efficientcomputation in the static factor model.

Chan, L. K., Karceski, J., and Lakonishok, J. (1999). On portfolio optimization: Forecastingcovariances and choosing the risk model. Review of Financial Studies, 12(5):937–974.

Chib, S. (2001). Markov chain Monte Carlo methods: computation and inference. Handbook ofeconometrics, 5:3569–3649.

Chib, S. and Greenberg, E. (1994). Bayes inference in regression models with ARMA (p, q)errors. Journal of Econometrics, 64(1-2):183–206.

Chib, S., Nardari, F., and Shephard, N. (2006). Analysis of high dimensional multivariatestochastic volatility models. Journal of Econometrics, 134(2):341–371.

Chib, S., Omori, Y., and Asai, M. (2009). Multivariate stochastic volatility. In Handbook ofFinancial Time Series, pages 365–400. Springer.

Chopin, N., Singh, S. S., et al. (2013). On the particle Gibbs sampler. CREST.

Christoffersen, P. F. (1998). Evaluating interval forecasts. International economic review, pages841–862.

Clyde, M. and George, E. I. (2004). Model uncertainty. Statistical science, pages 81–94.

Creal, D., Koopman, S. J., and Lucas, A. (2012). A dynamic multivariate heavy-tailed modelfor time-varying volatilities and correlations. Journal of Business & Economic Statistics.

Danielsson, J. (1998). Multivariate stochastic volatility models: estimation and a comparisonwith VGARCH models. Journal of Empirical Finance, 5(2):155–173.

De Jong, P. and Shephard, N. (1995). The simulation smoother for time series models.Biometrika, 82(2):339–350.

Del Moral, P. and Formulae, F.-K. (2004). Genealogical and interacting particle systems withapplications, Probability and Its Applications.

Dempster, M. A. H., Pflug, G., and Mitra, G. (2008). Quantitative Fund Management. Chapmanand Hall/CRC.

44

Doucet, A., De Freitas, N., and Gordon, N. (2001). An introduction to sequential Monte Carlomethods. In Sequential Monte Carlo methods in practice, pages 3–14. Springer.

Doz, C., Giannone, D., and Reichlin, L. (2011). A two-step estimator for large approximatedynamic factor models based on Kalman filtering. Journal of Econometrics, 164(1):188–205.

Durbin, J. and Koopman, S. J. (1997). Monte carlo maximum likelihood estimation for non-gaussian state space models. Biometrika, 84(3):669–684.

Durbin, J. and Koopman, S. J. (2000). Time series analysis of non-Gaussian observations basedon state space models from both classical and Bayesian perspectives. Journal of the RoyalStatistical Society: Series B (Statistical Methodology), 62(1):3–56.

Durbin, J. and Koopman, S. J. (2012). Time series analysis by state space methods. Number 38.Oxford University Press.

Engle, R. (2002). Dynamic conditional correlation: A simple class of multivariate generalizedautoregressive conditional heteroskedasticity models. Journal of Business & Economic Statis-tics, 20(3):339–350.

Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds.Journal of financial economics, 33(1):3–56.

Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2012). The generalized dynamic factor model.Journal of the American Statistical Association.

French, K. R., Schwert, G. W., and Stambaugh, R. F. (1987). Expected stock returns andvolatility. Journal of financial Economics, 19(1):3–29.

Geweke, J. and Tanizaki, H. (2001). Bayesian estimation of state-space models using theMetropolis–Hastings algorithm within Gibbs sampling. Computational Statistics & Data Anal-ysis, 37(2):151–170.

Gilks, W. R., Best, N., and Tan, K. (1995). Adaptive rejection Metropolis sampling withinGibbs sampling. Applied Statistics, pages 455–472.

Jacquier, E., Polson, N. G., and Rossi, P. E. (2004). Bayesian analysis of stochastic volatilitymodels with fat-tails and correlated errors. Journal of Econometrics, 122(1):185–212.

Jung, R. C. and Liesenfeld, R. (2001). Estimating time series models for count data usingefficient importance sampling. AStA Advances in Statistical Analysis, 4(85):387–407.

Kim, S., Shephard, N., and Chib, S. (1998). Stochastic volatility: likelihood inference andcomparison with ARCH models. The Review of Economic Studies, 65(3):361–393.

Koop, G., Poirier, D. J., and Tobias, J. L. (2007). Bayesian econometric methods. CambridgeUniversity Press.

Koopman, S. J. and Hol Uspensky, E. (2002). The stochastic volatility in mean model: empiricalevidence from international stock markets. Journal of applied Econometrics, 17(6):667–689.

Liesenfeld, R. and Richard, J.-F. (2006). Classical and bayesian analysis of univariate andmultivariate stochastic volatility models. Econometric Reviews, 25(2-3):335–360.

Lindsten, F., Jordan, M. I., and Schon, T. B. (2014). Particle gibbs with ancestor sampling.Journal of Machine Learning Research, 15(1):2145–2184.

Nakajima, J. (2015). Bayesian analysis of multivariate stochastic volatility with skew returndistribution. Econometric Reviews, pages 1–23.

Nakajima, J. and Omori, Y. (2012). Stochastic volatility model with leverage and asymmetricallyheavy-tailed error using GH skew Student’s t-distribution. Computational Statistics & DataAnalysis, 56(11):3690–3704.

Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econo-metrica: Journal of the Econometric Society, pages 347–370.

Oh, D. H. and Patton, A. J. (2017). Modeling dependence in high dimensions with factorcopulas. Journal of Business & Economic Statistics, 35(1):139–154.

Olsson, J. and Ryden, T. (2011). Rao-blackwellization of particle markov chain monte carlomethods using forward filtering backward sampling. IEEE Transactions on Signal Processing,59(10):4606–4619.

Pitt, M. and Shephard, N. (1999b). Time varying covariances: a factor stochastic volatility

45

approach. Bayesian statistics, 6:547–570.

Pitt, M. K. and Shephard, N. (1999a). Filtering via simulation: Auxiliary particle filters. Journalof the American statistical association, 94(446):590–599.

Richard, J.-F. and Zhang, W. (2007). Efficient high-dimensional importance sampling. Journalof Econometrics, 141(2):1385–1411.

Ruiz, E. (1994). Quasi-maximum likelihood estimation of stochastic volatility models. Journalof econometrics, 63(1):289–306.

Santos, A. A. and Moura, G. V. (2014). Dynamic factor multivariate garch model. ComputationalStatistics & Data Analysis, 76:606–617.

Scharth, M. and Kohn, R. (2016). Particle efficient importance sampling. Journal of Economet-rics, 190(1):133–147.

Shephard, N. and Pitt, M. K. (1997). Likelihood analysis of non-gaussian measurement timeseries. Biometrika, 84(3):653–667.

Snyder, C., Bengtsson, T., Bickel, P., and Anderson, J. (2008). Obstacles to high-dimensionalparticle filtering. Monthly Weather Review, 136(12):4629–4640.

Takahashi, M., Omori, Y., and Watanabe, T. (2009). Estimating stochastic volatility modelsusing daily returns and realized volatility simultaneously. Computational Statistics & DataAnalysis, 53(6):2404–2426.

Tran, M.-N., Scharth, M., Pitt, M. K., and Kohn, R. (2014). Importance sampling squared forbayesian inference in latent variable models. Available at SSRN 2386371.

Vardi, N. (2015). Top quant hedge funds stand out with good 2015.

Watanabe, T. and Omori, Y. (2004). A multi-move sampler for estimating non-gaussian timeseries models: Comments on shephard & pitt (1997). Biometrika, pages 246–248.

Wright, S. and Nocedal, J. (1999). Numerical optimization. Springer Science, 35:67–68.

Yu, J. (2005). On leverage in a stochastic volatility model. Journal of Econometrics, 127(2):165–178.

46

Appendices

A The leverage effect multiplier

The leverage effect for the univariate SV model (1) is Corr(νt, ηt) = Cov(νt, ηt)/√

Var(νt)Var(ηt),

where the numerator

Cov(νt, ηt) = E(√Wt)ρσ.

Since Wt ∼ IG( ζ2 ,ζ2), 1

ζWt is IG( ζ2 ,12)-distributed or Inv − χ2(ζ) distributed. Let Wt =

√Wt,

we have 1ζ W

2t ∼ Inv − χ2(ζ) with Jacobian 2

ζ Wt. It follows that

E(Wt) =

∫ ∞0

2

ζW 2t

2−ζ/2

Γ(ζ/2)ζ

2ζ+2 W

−(ζ+2)t exp(

−ζ2W 2

t

)dWt

=1√ζWt

∫ ∞0

1

2ζ/2−1Γ(ζ/2)

(1√ζWt

)−ζ−1

exp(−1

2

1√ζWt)

−2dWt

=

√ζ

2ζ/2−1Γ(ζ/2)

∫ ∞0

y−ζ exp(1

−2y2)dy

=

√ζ

2ζ/2−1Γ(ζ/2)

∫ ∞0

2ζ/2−3/2zζ/2−1 exp(−z)dz

=

√ζΓ((ζ − 1)/2

)√

2Γ(ζ/2),

where we use substitution y ≡ 1√ζWt and z ≡ 1

2y−2. In the denominator, the variance of the

generalised hyperbolic skew Student’s t-distributed error νt is given by Aas and Haff (2006) (in

their parametrisation δ2 and v are both equivalent to our ζ), i.e.

Var(νt) =2β2ζ2

(ζ − 2)2(ζ − 4)+

ζ

ζ − 2.

With these quantities, the unconditional leverage effect multiplier can be shown to be to one

given in Section 2.1.

B Discussion on two theorems

Theorem 1 can be proved following the arguments in Lindsten et al. (2014) who show the

invariance between PGAS with a bootstrap filter and the one without AS. It might be of interest

to establish the equivalence between EIS-PGAS and EIS-PG (which is also an alternative proof).

One can notice that they are not the same as the bootstrap filter where the importance density

47

is simply p(xt|xt−1) and thus the sampling weights are proportional to p(yt|xit), i.e. independent

of ancestor trajectories. Using the notations of Section 2.2 and letting LMθ (x∗1:T , B) for M ≥ 0

denote the Markov kernel implied by EIS-PG on (X∗1:T ,F1:T ), we propose the following

Proposition 1. Suppose EIS is the importance density used in particle filtering for PGAS and

PG, i.e. q(xt|xt−1, y1:T ) as in (8). For any x∗1:T ∈ X∗1:T ,

KMθ (x∗1:T , B) = LMθ (x∗1:T , B), ∀B ∈ F1:T .

To proceed, firstly suppose the final product of EIS-PGAS is the k-th chosen sample xk1:T .

The kernel is then given by

KMθ (x∗1:T , B) = Eθ,x∗1:T1B(xk1:T ) = Eδ1:T1B(xk1:T ).

The expectation is with respect to all random number generated in the algorithm, i.e. x1:T , a2:T , k ∈

R2T × N2(T−1)+1/0 . The last equation comes from the fact that their distribution function is de-

fined by the EIS importance density parameter vector δt = bt, ct, st, rt11 as in (6), which is

identical for both samplers.

Following Lindsten et al. (2014), the ancestor index can be recursively written by αt = aαt+1

t+1

going backward from αT = k, and without loss of generality we take the measurable rectangle

set B = ΠTt=1Bt and for all t = 1, ..., T , Bt ∈ Ft where Ft is the natural filtration, i.e. B is a

π-system generating F1:T . So we can write these two kernels as

KMθ (x∗1:T , B) = E

(T∏t=1

1Bt(xαtt )∣∣δt) , and LMθ (x∗1:T , B) = E

(T∏t=1

1Bt(xβtt )∣∣δt) .

It suffices to show that for all bounded and multiplicative functionals f(x1:T ) =∏Tt=1 ft(xt)

we have Eδ1:T(f(xα1:T

1:T ))

= Eδ1:T(f(xβ1:T1:T )

). Because EIS-PG sampler is essentially a backward

simulator running forward. This can be established via backward induction according to Olsson

and Ryden (2011) and Lindsten et al. (2014). Suppose this holds for t < T and s > t, i.e.

E

(T∏

s=t+1

fs(xαss )∣∣δs) = E

(T∏

s=t+1

fs(xβss )∣∣δs) .

The induction hypothesis can be shown to hold following the equivalence between a backward

11And they are determined by the previous draw in the MCMC run, i.e. the reference trajectory x∗1:T , andother hyperparameters θ.

48

simulator and a bootstrap filter in Olsson and Ryden (2011). To see this, remember that both

EIS-PG and EIS-PGAS choose χq(xT ; δT+1) = 1. Since this choice is arbitrary, we can make δT

contain all zeros. As a result, kq(xT , xT−1; δT ) = p(xT |xT−1, yT−1); namely this choice also makes

ωiT for i = 1, ...,M + 1 proportional to p(yT |xT ). So αT and βT are equally distributed. Using

the arguments in the Appendix of Lindsten et al. (2014) and their instrumental representation

of PGAS, the induction can be completed.

Proposition 1 shows the kernels defined by EIS-PGAS and EIS-PG are equivalent. It might

be interesting to see how EIS-PGAS improves the mixing of MCMC from PGAS with a bootstrap

filter. We do not attempt to derive a formal proof, but from equation (11) one can see that the

smaller the variance of ωit−1 the larger the probability p(aM+1t 6= M + 1), i.e. the probability of

the ancestor path of the reference trajectory being different from its original one. EIS is deigned

to minimise the variance of logarithm of the importance weight (see equation (9) and (10)) so

it is expected to be nearly optimal in maximising p(aM+1t 6= M + 1).

Theorem 2 bounds the total variation distance between KMθ (x∗1:T , B) and∫B p(x1:T |y1:T )dx1:T

with the assumption that all importance weights are bounded from above by a constant ωθ <∞.

The proof follows from Doeblin’s theorem, and uniform ergodicity can be established (Doucet

et al., 2001). One may argue that the upper limit condition for importance weights is too

strong in practice (otherwise there would not be any degeneration of particle system). A more

natural condition is to bound the variance of importance weights by a constant. This applies

particularly to our case because EIS importance density minimises the quadratic distance to the

target density. We conjecture that the quadratic Kantorovich distance between (KMθ )n(x∗1:T , ·)

and any other PG kernel without EIS is positive even when n→∞.

C Monte Carlo study of the factor SV model

This section details a simulation study on the high-dimensional factor SV model. As shown in

Section 2, the factor SV model with n assets and p factors boils down to n + p individual SV

models which can be analysed in parallel once the factors and factor loadings are sampled. We

show that the multivariate model is able to achieve efficiency comparable to a univariate model

as expected, especially with the marginalisation of factors and sampling factor loadings Λ based

on a Laplace approximation. In practice, it is important to apply the right degree of shrinkage

on leverage effect and skewness and to determine the number of factors. We demonstrate the

effectiveness and efficiency of picking the right model using the IS2 with PEIS method of Tran

49

et al. (2014) and Scharth and Kohn (2016) applied to our model.

C.1 Model setup

Our baseline model has 50 assets with 8 factors, the same dimensionality as the model of Chib

et al. (2006), but notice in our model there are more than a thousand parameters to estimate.

One feature of our model is the shrinkage on leverage effect and skewness, so we also consider

DGP without leverage effect or skewness, as well as DGP with non-zero leverage effect and

skewness for all factors and asset-specific processes, i.e. containing p+n non-zero leverage effect

parameter ρ’s and skewness parameter β’s. We denote the following DGPs:

• sLE sSK: some have leverage effect, and some have skewness;

• sLE aSK: some have leverage effect, and all have skewness;

• aLE sSK: all have leverage effect, and some have skewness;

• aLE aSK: all have leverage effect and skewness;

• nLE nSK: none has leverage effect or skewness.

“Some”, “all” and “none” in the above definitions refer to the p+n univariate series of the factor

and asset-specific processes, i.e. fj,tpj=1 and ui,tni=1 for t = 1, ..., T . For example, sLE sSK

means that this simulated dataset has non-zero leverage effect and skewness in some of the p+n

series, while all of the series in the dataset aLE sSK has leverage effect but some of them have

skewness.

When a dataset has leverage effect or skewness in some of the p + n univariate series, a

random vector is generated from a binomial distribution with 0.5 probability of success and p+n

trials which serve as an index vector indicating which series have leverage effect or skewness.

Accordingly, we choose a beta prior for the shrinkage parameter introduced in Section 2.3.1

∆ϑ ∼ Beta(2, 2), ∆β ∼ Beta(2, 2).

We assume a flat normal prior for the free elements of Λ or λij ∼ N(0, 10), but we generate those

elements for the simulation study from N(1, 1) so that the prior is effectively non-informative.

Other hyperparameters are generated from their prior distributions given in (29), except that

only negative β’s (if not zero) are selected. Such a design aims to reflect the dynamics and

stylised facts of daily equity returns.

50

The method of Chib and Greenberg (1994) samples the factor loadings Λ with the factors

ft for t = 1, ..., T marginalised out. They compare the their posterior output to the result

obtained via conditioning on the factors as in Pitt and Shephard (999b) and Aguilar and West

(2000). They show that in the case of 4 factors, the sampling of Λ can be 20 and 40 times more

efficient than the method which samples Λ either by column of by row conditional on the factors,

measured by inefficiency factor. And in the case of 8 factors, the efficiency gain can be 80-fold.

We apply their idea of using an MH sampler based on a Laplace approximation for the conditional

posterior distribution of Λ, so we can expect similar efficiency gain with the marginalisation of

factors. Therefore in the following, we do not compare difference of sampling efficiency resulted

from marginalisation of factors, but instead we focus on the effect of EIS proposal and ancestor

sampling used in particle Gibbs algorithm, similar to our simulation study the univariate SV

model. In the next subsections, four estimation methods, i.e. EIS-PGAS, EIS-PG, BF-PGAS

and MM-MH implemented to analyse p+n individual SV models, are considered and compared.

We simulate each dataset with length T = 2000. Figure 8 illustrates the 1-st and 50-th

simulated return series, i.e. y1,t and y50,t as well as the 1-st and 8-th factor, i.e. f1,t and f8,t

with their respective SV process l1,t, l50,t, h1,t and h8,t. Applying the initialisation and MCMC

0 500 1000 1500 2000

0.04

0.02

0.00

0.02

0.04

retu

rn

y1, t

0 500 1000 1500 20000.15

0.10

0.05

0.00

0.05

0.10

0.15y50, t

0 500 1000 1500 20000.04

0.03

0.02

0.01

0.00

0.01

0.02

0.03

0.04f1, t

0 500 1000 1500 2000

0.04

0.02

0.00

0.02

0.04f8, t

0 500 1000 1500 200011.5

11.0

10.5

10.0

9.5

9.0

8.5

8.0

log-

vola

tility

l1, t

0 500 1000 1500 200012.5

12.0

11.5

11.0

10.5

10.0

9.5

9.0

8.5l50, t

0 500 1000 1500 200011.5

11.0

10.5

10.0

9.5

9.0

8.5h1, t

0 500 1000 1500 200012.0

11.5

11.0

10.5

10.0

9.5

9.0

8.5

8.0h8, t

Figure 8: Simulated return series and factors with their respective log-volatility from model(16). Upper panel: return series; Bottom panel: log-volatility.

algorithm detailed in Section 2, we run the sampler for 22, 000 iterations for posterior inference

with the first 2000 burn-in samples discarded. In our experiment of applying EIS-PGAS, the

number of MCMC iterations can be safely halved without much difference in terms of posterior

statistics and efficiency. But in order to have reliable posterior comparisons with the other three

methods, we keep the number of iterations at 22, 000, anticipating different degrees of sampling

51

inefficiency for EIS-PG, BF-PGAS and MM-MH.

C.2 Estimation results

Firstly, we discuss some estimation results from applying our proposed method EIS-PGAS to

the most interesting dataset sLE sSK. Figure 9 reports the posterior means and sample standard

deviations of the hyperparameters related to the 58 SV models for fj,t8j=1 and ui,t50i=1 (i.e.

all parameters except for Λ), together with their true DGP values. The top three graphs from

the left to the right are the results for φ, σ, and ρ, while the bottom three graphs from the

left to the right are the results for µ, β, and ζ. All x-axes correspond to the 58 individual

SV models with the first 8 coordinates indicating respective factors and the rest relating to the

asset-specific processes. We represent the true DGP value and posterior mean of each parameter

using a pair of line graphs with value shown on the left y-axis of each graph, and the sample

standard deviations are given by the scatter plot with values indicated by the right y-axis.

10 20 30 40 50

para

met

er e

stim

ate

0.9

0.95

1(i)

stan

dard

dev

iatio

n

0

0.02

0.04

Pos. meanDGPstd.

10 20 30 40 50

para

met

er e

stim

ate

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22(ii)

stan

dard

dev

iatio

n

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

Pos. meanDGPstd.

10 20 30 40 50

para

met

er e

stim

ate

-1

-0.5

0(iii)

stan

dard

dev

iatio

n

0

0.2

0.4

Pos. meanDGPstd.

10 20 30 40 50

para

met

er e

stim

ate

-11.5

-11

-10.5

-10

-9.5

-9

-8.5

-8(iv)

stan

dard

dev

iatio

n

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Pos. meanDGPstd.

10 20 30 40 50

para

met

er e

stim

ate

-4

-2

0

2(v)

stan

dard

dev

iatio

n

0.05

0.1

0.15

0.2

Pos. meanDGPstd.

10 20 30 40 50

para

met

er e

stim

ate

10

20

30

40

50(vi)

stan

dard

dev

iatio

n

2

4

6

8

10

Pos. meanDGPstd.

Figure 9: EIS-PGAS estimated posterior means and standard deviations of stochastic volatilitymodel parameters for dataset sLE sSK. (i): φ; (ii): σ; (iii): ρ; (iv): µ; (v): β; (vi): ζ. Coordinates 1 to

8 on all x-axes indicate factors fj,t for j = 1, ..., 8 and the rest correspond to ui,t for i = 1, ..., 50. Left y-axes:

parameter values; Right y-axes: sample standard deviations.

The results suggest that EIS-PGAS can efficiently estimate all SV hyperparameters accu-

rately. The posterior means of the autoregressive parameter φ, volatility of volatility parameter

σ and unconditional mean of log-volatility µ are close to their true DGP values, especially for the

factors. One or two µ’s and σ’s may be deemed slightly away from their DGP values but with

the standard deviation taken into account, these deviations resulted from a specific simulated

sample path are believed to be reasonably small.

52

From the bottom right graph of Figure 9 one can notice some discrepancies between the

posterior means of ζ’s and their true DGP values and for some of the SV models the sample

standard deviations of ζ’s obtained from the Markov chain are also relatively high. Probably

the d.o.f parameters ζ’s are the poorest estimated ones among all parameters, a result in line

with Nakajima and Omori (2012) which applies MM-MH to model (1). But in section 4.1 we

notice that EIS-PGAS is significantly more efficient than the three alternative methods, as seen

in Table 1 and 2. Of our interest is the effect of the shrinkage prior assumed for the leverage

effect parameter ρ and skewness parameter β. The shrinkage is expected to detect zero leverage

effect and skewness in the dataset sLE sSK automatically, similar to the case of variable selection

discussed by Clyde and George (2004). The vertical line in the top right and bottom middle

graph of Figure 9 indicates zero leverage effect or skewness for a particular individual SV process.

It can be seen that whenever the DGP value is zero, EIS-PGAS effectively gives zero posterior

mean. This confirms that the shrinkage priors help determine zero leverage effect and skewness

and consequently makes the model more parsimonious. However also from the first column of

Figure 11 which shows the posterior probability of a zero parameter estimated by EIS-PGAS,

we can see that all ρ’s in the upper row and β’s in the lower row are “forced” to collapse towards

zero, causing some near-zero leverage effect and skewness parameters to be shrunken to zero.

But we find that the cost of slight over-shrinkage is minor when applying IS2 to calculate the

marginal likelihood of a dataset and associated Bayesian factors.

We report the posterior results of the 1-st, 4-th, 6-th and 8-th factor loadings, i.e. the

respective columns of Λ in Figure 10. True GDP values and posterior means are illustrated by

the bar charts with values corresponding to the left y-axis, while sample standard deviations

are shown by the scatter plots with right y-axis. It is easy to see that EIS-PGAS is able to

estimate the factor loadings very accurately with a flat prior. Though our proposed factor SV

model takes a much more complex form than that of Chib et al. (2006), we have the same

conclusion that the estimation efficiency for factor loadings is mainly due to marginalisation of

factors when sampling the loading matrix Λ based on a Laplace approximation. Furthermore,

it is not affected by the presence of leverage effect, skewness and heavy-tailedness modelled for

factor dynamics.

Table 11 shows the correlation between posterior means of a vector of parameters and their

true DGP with the sum of total absolute deviations in the bracket as a measure of estimation

accuracy. The first row of the first panel in Table 11 shows those statistics for EIS-PGAS

applied to sLE sSK. We can see that except for the ζ’s which have a correlation coefficient 0.85,

53

5 10 15 20 25 30 35 40 45 50

fact

or lo

adin

gs

-5

0

5

stan

dard

dev

iatio

n

0

0.2

0.4

Pos. meanDGPstd.

5 10 15 20 25 30 35 40 45 50

fact

or lo

adin

gs

-2

0

2

4

6

stan

dard

dev

iatio

n

0

0.1

0.2

0.3

0.4

Pos. meanDGPstd.

10 15 20 25 30 35 40 45 50

fact

or lo

adin

gs

-5

0

5

stan

dard

dev

iatio

n

0

0.2

0.4

Pos. meanDGPstd.

10 15 20 25 30 35 40 45 50

fact

or lo

adin

gs

-2

0

2

4

stan

dard

dev

iatio

n

0

0.1

0.2

0.3

Pos. meanDGPstd.

Figure 10: EIS-PGAS estimated posterior means and standard deviations of factor loadings fordataset sLE sSK. From the top to bottom: loadings for the 1-st, 4-th, 6-th and 8-th. Left y-axis: parameter

values; Right y-axis: standard deviations

all parameters are highly correlated with their DGP counterparts, over 0.94. This suggests that

EIS-PGAS is capable of sampling parameters related to both factor and asset-specific processes

accurately from the joint posterior distribution.

C.3 Comparisons among methods

Among the four estimation methods, BF-PGAS is the easiest to implement, and methods in-

volving EIS is more complicated as one needs to build p+n importance densities in each run of

the MCMC sampler for all SV series and the inverse gamma mixture component. The MM-MH

method of Nakajima and Omori (2012) is built on the classic multi-move sampler of Shephard

and Pitt (1997) and works satisfactorily in the univariate case as demonstrated in the section

4.1. The following shows the estimation efficiency and accuracy for the multivariate extension.

Table 11 summarises the correlation between the posterior means of all parameters estimated

by the four methods and their DGP values under different datasets. Also, the mean absolute

deviations between parameters and their DGP values are reported in the table. Both statistics

serve as a metric for accuracy. EIS-PGAS and EIS-PG are the best to achieve high correlations

for the parameter estimates, and under all datasets EIS-PGAS works better than the other

with only two correlation coefficients below 0.9 and none smaller than 0.8. The mean absolute

deviations given by the two methods are also the smallest among the four, especially for the

d.o.f parameter ζ which is the poorest estimated parameter for all methods and datasets. For

54

example, under the dataset sLE sSK, the mean absolute deviation for ζ given by these two

methods is half of that given by MM-MH, and on fifth of BF-PGAS. It is thus evident that the

EIS part in the algorithm contributes to the estimation accuracy.

Ancestor sampling also improves accuracy slightly as EIS-PGAS gives a bit smaller mean

absolute deviations in most of the cases than EIS-PG except for ρ under aLE saSK. Another

evidence that ancestor sampling may help improve accuracy is that the correlation coefficients

given by EIS-PGAS seem to fluctuate less under different datasets than those given by EIS-PG,

and so do mean absolute deviations. Though as is shown in Table 1 and 2 that for a univariate

SV model, the ancestor sampling algorithm renders estimates more accurate, further study is

needed to pin down its effect on accuracy for the high-dimensional factor SV model. Different

from its performance of estimating a univariate model, MM-MH does not provide correlation

coefficients as high as EIS-PG(AS). The autoregressive coefficient φ, which is relatively easy

to estimate, shows a correlation lower than 0.9 under sLE sSK and sLE aSK. The correlation

for unconditional mean µ is also lower than 0.9 under aLE sSK and aLE aSK, while under all

datasets EIS-PG(AS) is able to estimate µ with correlation always higher than 0.9. In terms

of ζ, MM-MH is less than satisfactory with the highest correlation smaller than 0.85 and the

lowest not exceeding 0.7. Mean absolute deviation also suggests that MM-MH is outperformed

by EIS-PG(AS). For example, under aLE sSK MM-MH gives the mean absolute deviation for µ

larger than 1, while that is only 0.28 and 0.29 given by EIS-PGAS and EIS-PG.

BF-PGAS is the worst performing estimation method with correlation coefficients much lower

and mean absolute deviations much higher than the other three methods. The mean absolute

deviations for φ, ρ and ζ are high compared with their parameter values indicating that for those

parameters this method fails. Though one may argue that the correlation coefficient of 0.91 for φ

given by BF-PGAS under nLE nSK does not suggest much difference from EIS-PG(AS) which

gives correlation 0.97, we notice that the mean absolute deviation given by BF-PGAS is 0.1

while the corresponding value given by EIS-PGAS is only 0.01. Considering the autoregressive

coefficient is often around 0.98 as shown in the top left graph of Figure 9, it is believed that

the posterior mean resulted from BF-PGAS contains a coherent and large bias. This holds

true also for other parameters. The inaccuracy of BF-PGAS is likely due to the dimensionality

involved. It is shown by the asymptotic results of Snyder et al. (2008) and Bickel et al. (2008)

that the inevitable impoverishment of particle quality and the tendency of the particle system to

collapse as the step t goes away from the initialisation t = 0 is because the number of particles

cannot scale exponentially with the dimension of observations n, and bootstrap filter suffers

55

Table 11: Accuracy Comparisons of Different Methods Under Different Datasets

EIS-PGASφ σ ρ µ β ζ Λ

sLE sSK .94 [.01] .94 [.01] .98 [.07] .95 [.21] .98 [.28] .85 [3.89] .99 [.11]sLE aSK .94 [.01] .92 [.01] .96 [.04] .95 [.18] .99 [.35] .91 [5.34] .99 [.13]aLE sSK .95 [.02] .97 [.02] .99 [.08] .97 [.28] .97 [.24] .88 [3.07] .98 [.09]aLE aSK .97 [.01] .96 [.02] .99 [.06] .98 [.18] .99 [.26] .95 [4.41] .99 [.12]nLE nSK .96 [.01] .91 [.02] .95 [.01] .97 [.29] .92 [.08] .82 [4.27] .98 [.06]

MM-MHsLE sSK .87 [.09] .81 [.02] .91 [.15] .92 [.33] .88 [.38] .73 [9.01] .97 [.24]sLE aSK .84 [.11] .96 [.02] .97 [.07] .95 [.84] .92 [.67] .67 [11.91] .93 [.34]aLE sSK .91 [.03] .88 [.01] .89 [.12] .88 [1.11] .93 [.56] .81 [8.49] .98 [.19]aLE aSK .93 [.03] .92 [.02] .87 [.11] .89 [.78] .91 [.44] .74 [5.76] .97 [.40]nLE nSK .93 [.16] .90 [.02] .97 [.03] .93 [.27] .95 [.12] .83 [7.63] .99 [.09]

EIS-PGsLE sSK .93 [.01] .92 [.02] .99 [.08] .94 [.27] .97 [.22] .88 [4.12] .99 [.07]sLE aSK .96 [.02] .91 [.01] .95 [.09] .94 [.31] .98 [.52] .93 [8.69] .99 [.03]aLE sSK .91 [.04] .94 [.02] .94 [.10] .96 [.29] .94 [.32] .84 [6.11] .99 [.10]aLE aSK .97 [.02] .92 [.03] .98 [.06] .97 [.62] .98 [.43] .96 [5.57] .99 [.07]nLE nSK .97 [.01] .96 [.01] .94 [.03] .93 [.46] .95 [.11] .79 [6.79] .99 [.09]

BF-PGASsLE sSK .77 [.19] .64 [.04] .51 [.18] .87 [.57] .78 [.86] .24 [15.04] .86 [.98]sLE nSK .82 [.14] .77 [.06] .62 [.26] .84 [1.34] .69 [.67] .31 [9.60] .87 [1.64]nLE sSK .84 [.09] .68 [.05] .41 [.33] .76 [.88] .74 [.93] .24 [11.24] .79 [1.44]nLE nSK .91 [.10] .81 [.04] .57 [.41] .63 [.76] .84 [.34] .47 [10.07] .85 [.67]aLE aSK .84 [.12] .83 [.02] .45 [.43] .81 [1.21] .62 [.88] .35 [8.65] .89 [1.27]

Reported are the correlation between posterior means from four estimation methods applied to different datasets

and true DGP values with mean absolute deviations given in the bracket.

56

from sharper collapsing rate. In our model, n = 50 and bootstrap filter would require millions of

particles to avoid collapse which limits its practical use for our model. As a result, resampling

has to take place at every t. A direct consequence for this is that BF-PGAS becomes highly

inefficient and inaccurate.

In the whole set of system parameters, the factor loadings Λ are the best estimated ones,

with EIS related methods showing a correlation bigger than 0.98. The smallest correlations of Λ

given by MM-MH and BF-PGAS are 0.93 and 0.79 respectively. The mean absolute deviations

for Λ are also low except for BF-PGAS under all datasets. This shows the effectiveness of our

proposed sampling method for the factor loadings.

Figure 11 shows the posterior probability of zero leverage effect and zero skewness, i.e.

p(ρ|y1:T ) = 0 and p(β|y1:T ) = 0, both of which are a (n + p)-dimensional vector where ρ =

ρfj , ρui and β = βfj , βui for i = 1, ..., n and j = 1, ..., p. The black dots on top of each graph

indicate zero DGP value for the corresponding series, and we represent the posterior probability

of being zero estimated from different methods using different symbols. Notice that the estimate

for ρ and β is obtained with the help of shrinkage prior introduced in section 2.3.1, so new draws

in the MCMC sampler for all elements of ρ and β have non-zero probability to be zero. This

means if one point is closer to a black dot, the better the respective method is able to tell zero

leverage effect or skewness in the DGP. When both zero leverage effect and zero skewness are

present in some of factors and asset-specific processes, EIS-PGAS has the fewest points located

in the “ambiguity” area, namely in the middle of 0 and 1. One can see that whenever ρ = 0

and β = 0, EIS-PGAS gives posterior probability of being zero that is larger than 0.9 for ρ and

0.8 for β. Under sLE sSK, there are three cases of overshrinkage for ρ and just one for β using

EIS-PGAS, while other methods obviously overestimate the posterior zeros probabilities.

EIS-PG and MM-MH perform similarly in determining zero parameters, and they are not

that worse than EIS-PGAS when the DGP values are zero, but these two methods, especially

MM-MH deliver too many points in the ambiguity area. In other words, when the leverage

effect and skewness have non-zero DGP, EIS-PG and MM-MH hesitate more than EIS-PGAS

to assign non-zero values for those parameters. This observation under sLE sSK carries over

to all datasets except nLE nSK, which highlights the use of ancestor sampling if one aims to

not only detecting zero parameters but also accurately estimating non-zero parameters. Under

aLE aSK except for BF-PGAS, all other three methods do not suggest overshrinkage, but EIS-

PG and MM-MH have more points in the ambiguity area than EIS-PGAS, particularly for ρ.

This shows the effect of shrinkage prior on leverage and skewness when importance sampling is

57

coupled with ancestor sampling. Moreover, under nLE nSK or when all members of ρ and β are

equal to zero, EIS-PG(AS) and MM-MH perform equally well with all posterior probability of

zero parameters approaching one. BF-PGAS is the worst estimation method among all, suffering

from its inaccuracy of estimation.

0 10 20 30 40 50

leve

rage

effe

ct

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

EIS-PGASMM-MHEIS-PGBF-PGAS

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 500.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

sLE_sSK0 10 20 30 40 50

skew

ness

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

sLE_aSK0 10 20 30 40 50

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

aLE_sSK0 10 20 30 40 50

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

aLE_aSK0 10 20 30 40 50

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

nLE_nSK0 10 20 30 40 50

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 11: Posterior probability of zero leverage effect and skewness estimated by differentmethods under different datasets. Upper row: leverage effect parameter ρ; Lower row: skewness parameter

β. Black dots at the top of each graph indicate zero parameter for corresponding series in the DGP. Coordinates

1 to 8 on all x-axes indicate factors fj,t for j = 1, ..., 8 and the rest correspond to ui,t for i = 1, ..., 50.

To examine the estimation efficiency, and the efficiency of different MCMC algorithms, we

calculate the inefficiency factor IE(θ) with a vector of system parameters θ. Because under

different datasets, the results are similar, we only report those under aLE aSK12. Table 12

reports the medium of inefficiency factors obtained from the four methods under aLE aSK with

10-th and 90-th percentile in the bracket. A quick observation is that the factor SV model

is although high-dimensional, that during one run of the MCMC sampler it can be put into

n+ p individual univariate model (1) once the factor fj,tpj=1 is sampled greatly improves the

efficiency which is comparable to the univariate model. IE(φ) and IE(µ) are the smallest two

among the four methods, but those given by EIS-PGAS are less than half of those given by the

other three methods. MM-MH even produces the estimate for µ with a medium inefficiency

factor of 42.48, six times larger than 7.48 given by EIS-PGAS. As in the univariate model,

ancestor sampling contributes a lot to the efficiency of the MCMC sampler. EIS-PGAS is at

12This particular dataset is chosen because it has non-zero leverage effect and skewness across all factors andasset-specific process. Other datasets with either ρ or β or both equal to zero show larger inefficiency factor, Butthis is due to many consecutive zeros in the Markov chain.

58

Table 12: Inefficiency Factor for Parameter Estimates Under aLE aSK

EIS-PGAS MM-MH EIS-PG BF-PGAS

φ 8.12 [4.97 14.86] 24.68 [18.94 29.52] 14.69 [10.64 19.48] 18.87 [14.31 22.49]σ 21.33 [8.62 27.53] 124.20 [111.37 134.79] 78.54 [34.73 94.39] 64.61 [48.03 92.59]ρ 22.74 [18.40 28.69] 107.56 [84.16 147.33] 84.57 [41.58 106.34] 87.53 [55.21 104.46]µ 7.48 [5.97 9.06] 42.48 [37.64 48.76] 16.06 [10.55 23.74] 27.45 [14.62 44.78]β 19.73 [14.82 26.45] 214.78 [149.60 307.43] 117.41 [89.65 134.80] 54.73 [39.06 82.36]ζ 43.80 [31.69 67.81] 371.81 [256.14 504.26] 108.56 [86.17 134.89] 53.29 [45.88 67.21]Λ 33.43 [24.74 51.12] 37.85 [27.84 48.57] 41.94 [32.50 57.93] 46.28 [37.66 63.18]

The medium of the vector inefficiency factors is reported with 10-th and 90-th percentiles shown in the bracket.

least twice as efficient as EIS-PG, and for β it is more than five times more efficient.

Furthermore, the proposed procedure of constructing importance density following EIS also

plays an equally important role in improving efficiency. If one compare BF-PGAS with EIS-

PGAS, it is easy to see that except for the loading matrix Λ the latter gives inefficient factor

much smaller than the former, indicating again the necessity of constructing an importance

density that closely approximates the posterior distribution of parameters in a high-dimensional

model. The inefficiency factors for the loading matrix Λ given by the four methods are however

similar, all replying on marginalisation of factors when sampling Λ.

The use of n+p EIS importance densities in the form of (8) does not only gain accuracy and

efficiency for parameter estimates, but also for latent processes, i.e. the SV series ht = hj,tpj=1

and lt = li,ti=n, and the inverse gamma components Wt = Wj,tpj=1 and Qt = Qi,tni=1, and

the factor process fj,tpj=1. The estimates for ht and lt are of special interest as they are related

to covariance matrix forecasting and risk evaluation for a portfolio of equity returns. As an

example, Figure 12 illustrates eight SV series from ht in the upper row and lt in the lower row with

posterior mean estimates given by EIS-PGAS and their GDP values. It can be clearly seen that

EIS-PGAS achieves high accuracy in the smoothed mean estimate of stochastic volatility series,

in terms of both following the overall pattern of the DGP processes and capturing some extreme

values. The converged n+p EIS importance densities together provide a very close approximation

of the intractable posterior distribution of those latent processes. In our experiment, even with

random starting parameter values, EIS-PGAS manages to produce reasonable sample paths.

Figure 13 shows the correlation between the posterior means of all the factors, eight chosen SV

series with their inverse gamma mixture components estimated from the four methods and their

DGP values. Except for BF-PGAS, the correlation for f1,t seems to be higher than all the other

59

0 500 1000 1500 2000

log-

vola

tility

hj,t

-12.5

-12

-11.5

-11

-10.5

-10

-9.5

-9(i)

posterior meanDGP

0 500 1000 1500 2000-13

-12

-11

-10

-9

-8

-7(ii)

0 500 1000 1500 2000-11.5

-11

-10.5

-10

-9.5

-9

-8.5

-8(iii)

0 500 1000 1500 2000-12

-11.5

-11

-10.5

-10

-9.5

-9

-8.5

-8

-7.5(iv)

t0 500 1000 1500 2000

log-

vola

tility

l i,t

-12

-11.5

-11

-10.5

-10

-9.5

-9

-8.5

-8(v)

t0 500 1000 1500 2000

-11.5

-11

-10.5

-10

-9.5

-9

-8.5

-8

-7.5(vi)

t0 500 1000 1500 2000

-12

-11.5

-11

-10.5

-10

-9.5

-9

-8.5

-8

-7.5(vii)

t0 500 1000 1500 2000

-12

-11.5

-11

-10.5

-10

-9.5

-9

-8.5(viii)

Figure 12: Posterior mean estimate for stochastic volatility series hj,t and li,t. logarithm of stochastic

volatility of factor process fj,t (upper row) and idiosyncratic noise process ui,t (lower row). (i): h2,t; (ii): h4,t;

(iii): h5,t; (iv): h8,t; (v): l6,t; (vi): l16,t; (vii): l35,t; (viii): l45,t

factors, with EIS-PG(AS) and MM-MH giving correlation larger than 0.9 and 0.75 respectively

under all datasets. The difference between correlation for f1,t and for fj,t with j 6= 1 is likely

due to the identification restriction imposed on the loading matrix Λ. The correlation for factor

estimates given by EIS-PG is on average slightly lower than EIS-PGAS with exceptions found in

f5,t under sLE sSK and f7,t under nLE nSK. This suggests that ancestor sampling adds certain

degree of precision because of the efficiency gain on top of EIS. In case of ht and lt, EIS-PGAS

is also the best estimation method. For example under sLE sSK, both EIS-PG and MM-MH

given correlation smaller than 0.7 for h2,t. Under sLE aSK and aLE aSK, the gain in precision

by ancestor sampling is seen by correlation from EIS-PGAS being higher than EIS-PG by 5%

to 10%, which suggests that when skewness is present in all factors and asset-specific processes,

ancestor sampling tends to be more effective when used together with the shrinkage prior for ρ

and β.

The “shock variable” Wt and Qt may be of limited use in practice but they serve as stochastic

weights and influence leverage effect as we show in section 2.1, so it is still interesting to see

how the four methods perform when estimating the inverse gamma mixture components. BF-

PGAS is still the most inaccurate one and one can compare with the estimates for the factors

and SV series and observe that EIS-PG is eclipsed by EIS-PGAS. Lastly under nLE nSK, EIS-

PG is almost as efficient as MM-MH, but both giving correlation lower than EIS-PGAS. This

emphasises the effect of leverage and skewness on chosen MCMC algorithms.

60

f1,t f2,t f3,t f4,t f5,t f6,t f7,t f8,t

corre

latio

n

0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1

h2,t h4,t h5,t h8,t l6,t l16,t l35,t l45,t

corre

latio

n

0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1

sLE_sSK

W2,tW4,tW5,tW8,tQ6,tQ16,tQ35,tQ45,t

corre

latio

n

0

0.2

0.4

0.6

0.8

1

sLE_aSK


0

0.2

0.4

0.6

0.8

1

aLE_sSK


0

0.2

0.4

0.6

0.8

1

aLE_aSK


0

0.2

0.4

0.6

0.8

1

nLE_nSK


0

0.2

0.4

0.6

0.8

1

EIS_PGASMM_MHEIS_PGBF_PGAS

Figure 13: Correlations between posterior mean estimate of all factors, some stochastic volatilityseries with corresponding inverse gamma mixing components obtained from four estimationmethods and their DGP series under different datasets.

C.4 Number of factors

Marginal likelihood evaluation is needed to calculate the Bayes factor to pick up the right model,

such as determining the right number of factors and choosing the most plausible specifications

for factors and asset-specific processes. We firstly illustrate the stability and ability of IS2 to

determine the right number of factors which is the most important model specification. Notice

that there is no need to worry about error distributions thanks to the shrinkage technique we

apply.

Table 13 shows the EIS-PGAS conditional average log-likelihood or posterior ordinate with

system parameters evaluated at their posterior means. We report the evaluation with different

number of particles used in the modified PEIS method introduced in section 3.1. Notice that

with the modification the importance density constructed boils down to partially independent

n + p EIS importance densities used to analyse the univariate SV model, and it manages to

approximate the conditional posterior distribution closely and deliver posterior means that are

highly correlated with the DGP values. We expect that in our high-dimensional setting, not

many particles are needed to accurately evaluate conditional log-likelihood or posterior ordinate,

and as Scharth and Kohn (2016) found that in the case of two-component SV model as few as

two particles can already stably and accurately compute the likelihood. From Table 13 we

see that, the log-likelihood estimates for aLE aSK converge using at least 100 particles, and

61

Table 13: EIS-PGAS Log-likelihood Evaluation

DatasetNumber of particles100 200 300 500 1000

sLE sSK -1864.79 -1834.26 -1833.68 -1833.27 -1833.34sLE aSK -1824.61 -1819.57 -1819.24 -1819.66 -1819.29aLE sSK -1841.27 -1828.03 -1830.86 -1830.94 -1830.62aLE aSK -1812.46 -1811.74 -1811.09 -1811.42 -1812.00nLE nSK -1923.95 -1920.53 -1921.81 -1921.86 -1921.44

Reported are average log-likelihood evaluation with posterior mean parameter estimates and different number of

particles.

for nLE nSK and sLE aSK with the number of particles larger than 200, there is no major

difference in the log-likelihood. For aLE sSK and sLE sSK, more than 300 particles lead to

converged log-likelihood.

It is reasonable to believe that the number of particles needed does not change significantly

across different parameter values when one computes the conditional likelihood of posterior

ordinate, a condition also assumed in Tran et al. (2014). So when applying IS2 to calculate the

marginal likelihood, 300 particles are used with each draw of parameters. Following the formula

(27), we construct the importance density for the parameter q(θ|y1:T ) via mixture of Gaussian

distributions estimated using the MCMC samples. We treat each set of parameters, such as φ

and σ as an individual (n+ p)-dimensional Gaussian mixture random vectors and initialize the

number of components using a standard k-mean algorithm. The mixture can also be constructed

using multivariate Student’s t-distributions, but not much difference is found there.

We repeat the simulation exercise 30 times to obtain 30 sLE sSK datasets with different

realisations, and the DGP is the same as before with 8 factors. Out of the 30 simulated replica-

tions, the IC p1 criterion of Bai and Ng (2002) chooses 8 factors 21 times, and 6, 7, 9, and 10

factors twice, twice, 4 times and once respectively.

Table 14 shows the model comparisons using Bayes factors with the left column indicating

the comparison among models with different number of factors. The Jeffrey’s scale suggests

decisive evidence in favor of the model with 8 factors against all cases. It can be concluded that

if the true DGP is the proposed factor SV model, Bayes factor outperforms the criterion of Bai

and Ng (2002) in determining number of factors. Comparing the model with 7 factors and the

one with 6 factors, the Jeffrey’s scale favors the model with 7 factors in 83.33% of the cases.

Similarly, the model with 9 factors is favored over the one with 10 factors in 83.33% of the cases,

62

Table 14: Frequency(%) of Bayes Factors With Different Number of Factors

sLE sSKDGP: 8 factors1-3.2 3.2-10 10-100 >100 Total>10

8/6 0 0 0 100.00 100.008/7 0 0 0 100.00 100.008/9 0 0 3.33 96.67 100.008/10 0 0 0 100 100.00

7/6 0 16.67 40.00 43.33 83.337/9 0 6.67 56.67 36.67 93.337/10 0 3.33 6.67 0.90 96.67

9/6 0 0 0 100.00 100.009/10 0 16.67 36.67 46.67 83.33

The choice of range for Bayes factors is according to the Jeffrey’s scale. Frequency distribution is determined across

30 simulated replications. The left column indicates the comparison between two specifications. For example 8/6

corresponds to the Bayes factor calculation with marginal likelihood of a model with 8 factors in nominator and that

with 6 factors in the denominator.

so we conjecture that IS2 may also be effective in selecting a misspecified model that is “closer”

to the true model. In our experiment, besides its performance of choosing number of factors, we

also notice that IS2 is as stable as the method of reduced MCMC run of Chib and Greenberg

(1994) and Nakajima (2015) but much easier to implement.

63

Leverage, asymmetry and heavy tails in the high ... › 2017 › 11 › paper1.pdfLeverage, asymmetry and heavy tails in the high-dimensional factor stochastic volatility model Mengheng

Documents