Modelling Dependent Credit Risks with Extensions of CreditRisk + and Application to Operational Risk (Lecture Notes) Prof. Dr. Uwe Schmock PRisMa Lab Financial and Actuarial Mathematics (FAM) Institute for Stochastics and Mathematical Methods in Economics Vienna University of Technology Wiedner Hauptstraße 8–10/E105-1 A-1040 Vienna, Austria 2004–2020 [email protected]Version 1 of Notes: March 25, 2020, 2pm (Incomplete revision) 1 Updates on fam.tuwien.ac.at/∼schmock/notes/ExtensionsCreditRiskPlus.pdf
180
Embed
Modelling Dependent Credit Risks with Extensions of CreditRisk …schmock/notes/ExtensionsCredit... · 2020-03-25 · Modelling Dependent Credit Risks with Extensions of CreditRisk+
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Modelling Dependent Credit Risks
with Extensions of CreditRisk+
and Application to Operational Risk
(Lecture Notes)
Prof. Dr. Uwe Schmock
PRisMa Lab
Financial and Actuarial Mathematics (FAM)
Institute for Stochastics and Mathematical Methods in Economics
Show by direct calculation for the multivariate beta function4 that
B(α1, . . . , αd) :=
∫∆d−1
( d−1∏i=1
xαi−1i
)(1− x1 − · · · − xd−1)αd−1 d(x1, . . . , xd−1)
=
∏di=1 Γ(αi)
Γ(α1 + · · ·+ αd), α1, . . . , αd > 0, (2.29)
which in the case d = 2 simplifies to
B(α, β) :=
∫ 1
0xα−1(1− x)β−1 dx =
Γ(α) Γ(β)
Γ(α+ β), α, β > 0. (2.30)
Using a particular choice of α1, . . . , αd, conclude that the (d − 1)-dimensional
volume of ∆d−1 is 1/(d− 1)!.
Hint: Write down∏di=1 Γ(αi) and use a d-dimensional integral substitution with
(x1, . . . , xd−1, 1− x1 − · · · − xd−1)z where (x1, . . . , xd−1) ∈ ∆d−1 and z ∈ [0,∞).
Definition 2.5 (Beta distribution). A density of the beta distribution with real
shape parameters α, β > 0 is given by
fα,β(p) =
pα−1(1−p)β−1
B(α,β) for p ∈ (0, 1),
0 for p ∈ R \ (0, 1),(2.31)
where B(α, β) denotes the beta function, see (2.30). For a random variable P
with a beta distribution, we use the notation P ∼ Beta(α, β).
3The gamma function is actually a meromorphic function on the complex plane C with poles
at 0 and the negative integers, but this will not be used in the following.4The proof of Lemma 4.22 below contains a probabilistic argument for the case d = 2.
For S ⊂ Z the Wasserstein distance dW(µ, ν) between the probability measures
µ and ν takes into account not only the amounts by which their individual
probabilities differ, as in the total variation distance dTV(µ, ν), but also how far
apart the differences occur, which explains the inequality in part (c) above.
Proof of Lemma 3.18. (a), (b) Let en := µ(n) − ν(n) denote the error for
n ∈ S. Then, for every A ⊂ S,
1
2
∑n∈S|en| ≥
1
2
∑n∈A
en −1
2
∑n∈S\A
en =∑n∈A
en −1
2
∑n∈S
en︸ ︷︷ ︸= 0
= µ(A)− ν(A),
where the inequality is an equality if and only if |en| = en for every n ∈ A and
|en| = −en for every n ∈ S \A.
(c) Due to (3.18), the Wasserstein distance dW(µ, ν) is well-defined. Given a
set A ⊂ S, the indicator function 1A: S → R is Lipschitz continuous on S ⊂ Zwith constant at most 1, hence (c) follows from the Definitions 3.7 and 3.14.
Exercise 3.19 (Representation of the total variation distance with densities).
Let (S,S) be a measurable space and consider µ, ν ∈M1(S,S). Let λ be a non-
negative σ-finite measure on (S,S) such that µ λ and ν λ (such a measure
always exists, take λ = µ+ ν, for example). By the Radon–Nikodym theorem
there exist corresponding probability densities f = dµ/dλ and g = dν/dλ.
(a) Generalize Lemma 3.18(a) by proving that dTV(µ, ν) = µ(A)− ν(A) for a
set A ∈ S if and only if there exists a set N ∈ S with λ(N) = 0 such that
A \N ⊂ x ∈ S | f(x) ≥ g(x) and Ac \N ⊂ x ∈ S | f(x) ≤ g(x).
(b) Generalize Lemma 3.18(b) by proving that dTV(µ, ν) = 12‖f − g‖L1(λ).
(c) Derive from part (b) that dTV(µ, ν) = 1− ‖minf, g‖L1(λ) and compare
with Remark 3.8.
Exercise 3.20 (Total variation norm). Let (S,S) be a measurable space and
consider the set M(S,S) of all R-valued (or C-valued) measures on (S,S). Let
D be a measure-determining subset of S, meaning that µ(A) = 0 for all A ∈ Dis only possible if µ ∈ M(S,S) is the zero measure, i.e. µ(A) = 0 for all A ∈ S.
Prove:
(a) ‖µ‖D := supA∈D |µ(A)| for µ ∈M(S,S) defines a norm.
For D = S this is the total variation norm ‖ · ‖TV. In particular, (M(S,S), ‖ · ‖D)
Figure 3.1: The factor [0,∞) 3 λ 7→ (1 − e−λ)/λ in (3.19) and its upper bound λ 7→min1, 1/λ from (3.21). The upper line is the factor from (3.20) with a kink at λ ≈ 1.144.
3.4 Poisson Approximation
In this section we show that the distribution of a sum of independent Bernoulli
random variables can be approximated by a Poisson distribution. The quality of
the approximation is measured by the total variation metric dTV of probability
distributions as well as the Wasserstein metric dW, see Definitions 3.7 and 3.14,
respectively.
Theorem 3.23. Let X1, . . . , Xm be independent Bernoulli random variables.
Then W := X1 + · · ·+Xm is the random variable counting the number of ones.
Define pi = P[Xi = 1] and λ = E[W ] = p1 + · · ·+ pm. Then
dTV
(Poisson(λ) ,L(W )
)≤ 1− e−λ
λ
m∑i=1
p2i , (3.19)
cf. Barbour and Hall [4], with the understanding that the fraction on the right-hand
side is one for λ = 0 (apply L’Hopital’s rule for λ 0). In addition,
dW
(Poisson(λ) ,L(W )
)≤ min
1,
4
3
√2
eλ
m∑i=1
p2i . (3.20)
Remark 3.24. Since e−λ > 0 and 1− e−λ ≤ λ, we have the upper bound
variables with Ng ∼ Poisson(λg) for every g ∈ G. Then the distribution of the
Nm0 -valued random vector
N =∑g∈G
cgNg, (3.47)
where the vector15 cg = (cg,1, . . . , cg,m)> ∈ 0, 1m is given by
cg,i = 1g(i) =
1 if i ∈ g,
0 if i /∈ g,(3.48)
is called the m-variate Poisson distribution MPoisson(G,λ,m) on Nm0 .
In the credit risk interpretation, the obligors in the group g ⊂ 1, . . . ,mdefault together with Poisson intensity λg, independent of the other groups in G.
An empty group of obligors cannot cause any default, for this reason we excluded
∅ from G. For practical applications we should assume that 1, . . . ,m ⊂⋃g∈G g, because otherwise there would exist obligors who can never default. If
G = ∅, then (3.47) is an empty sum and MPoisson(G,λ,m) is interpreted as
the degenerate distribution concentrated at the origin 0 ∈ Nm0 . If m = 1 and
G = g with g = 1, then MPoisson(G,λ,m) coincides with Poisson(λg). It
might be tempting to choose G = P(1, . . . ,m)\∅ for greatest generality, but
then there are 2m − 1 Poisson parameters (λg)g∈G, which already for m = 1 000
obligors are far too many to yield a practically useful model.
The next result is the multivariate generalization of Lemma 3.2.
Lemma 3.41 (Summation property of the multivariate Poisson distribution).
If N1, . . . , Nk are independent with Ni ∼ MPoisson(Gi, λ
(i),m)
for all i ∈1, . . . , k with λ(i) = (λ
(i)g )g∈Gi according to Definition 3.40, then
N :=k∑i=1
Ni ∼ MPoisson(G,λ,m) ,
where G =⋃ki=1Gi and λ = (λg)g∈G is given by
λg =k∑i=1Gi3g
λ(i)g , g ∈ G.
Exercise 3.42. Use Lemma 3.2 and Definition 3.40 to prove Lemma 3.41.
Remark 3.43 (Infinite divisibility of the multivariate Poisson distribution). Lemma
3.41 implies that the multivariate Poisson distribution MPoisson(G,λ,m) with
λ = (λg)g∈G is infinitely divisible, because for every k ∈ N the distribution
of N1 + · · · + Nk is MPoisson(G,λ,m), when N1, . . . , Nk are independent with
Ni ∼ MPoisson(G,λ(k),m
)for every i ∈ 1, . . . , k, where λ(k) = (λg/k)g∈G.
15The vector cg points to a corner of the m-dimensional hypercube.
38
Lemma 3.44 (Moments of the multivariate Poisson distribution). Assume that
N = (N1, . . . , Nm)> ∼ MPoisson(G,λ,m). Then, with the notation from Defini-
tion 3.40, for all i, j ∈ 1, . . . ,m,
E[Ni] =∑g∈Gi∈g
λg (3.49)
and for the components of the covariance matrix of N ,
Cov(Ni, Nj) =∑g∈Gi,j∈g
λg. (3.50)
Proof. Equation (3.49) follows from (3.47), (3.48) and (3.3). Similarly, using the
bi-linearity of the covariance and the independence of (Ng)g∈G,
Cov(Ni, Nj) =∑g∈Gi∈g
∑g′∈Gj∈g′
Cov(Ng, Ng′
)︸ ︷︷ ︸= 0 if g 6=g′
=∑g∈Gi,j∈g
Var(Ng) .
Using 3.4, the result (3.50) follows.
Remark 3.45. Note that by (3.50) the components of a multivariate Poisson
distribution can only have a non-negative covariance.
Lemma 3.46 (Multivariate Poisson distribution with independent components).
Assume that N = (N1, . . . , Nm) ∼ MPoisson(G,λ,m) and m ≥ 2. Then, with
the notation from Definition 3.40, the following properties are equivalent:
(a) The components N1, . . . , Nm are independent.
(b) Cov(Ni, Nj) = 0 for all i, j ∈ 1, . . . ,m with i 6= j.
(c) λg = 0 for all g ∈ G with |g| ≥ 2.
Proof. Note that (a) implies (b), which in turn implies (c) via (3.50). If (c) holds,
then Nga.s.= 0 for all g ∈ G with |g| ≥ 2, henceN1
...
Nm
a.s.=
m∑i=1i∈G
ciNi
by (3.47), hence Nia.s.= Ni if i ∈ G and Ni
a.s.= 0 otherwise. Since (Ng)g∈G, |g|=1
are independent by Definition 3.40, part (a) follows.
39
3.6 General Multivariate Poisson Mixture Model
Following the mixture approach outlined in Section 2.2 for Bernoulli default
indicators, this section generalizes the multivariate Poisson distribution discussed
in the previous section by introducing random Poisson intensities (Λg)g∈G for all
the groups of obligors defaulting together.
Formally, (Λg)g∈G is a collection of [0,∞)-valued random variables, which
may even be dependent. Similar assumptions as in Section 2.2.1 are made for
the intensities, namely
P[Ng = ng |(Λh)h∈G]a.s.= P[Ng = ng |Λg ]
a.s.= e−Λg Λ
ngg
ng!(3.51)
for every g ∈ G and ng ∈ N0, cf. (2.10), and the conditional independence of
(Ng)g∈G given (Λg)g∈G, i.e., for all ng ∈ N0 for g ∈ G,
P[Ng = ng for all g ∈ G |(Λh)h∈G]a.s.=∏g∈G
P[Ng = ng |(Λh)h∈G]
a.s.=∏g∈G
Λngg
ng!e−Λg by (3.51),
(3.52)
cf. (2.11). The unconditional joint distribution of (Ng)g∈G can be obtained by
integrating over the random intensities, i.e.
P[Ng = ng for all g ∈ G] = E[ ∏g∈G
Λngg
ng!e−Λg
]. (3.53)
Exercise 3.47 (Explicit construction of the general multivariate Poisson mixture
model). Consider a [0,∞)G-valued random vector Λ′ = (Λ′g)g∈G on a probability
space (Ω′,A′,P′). Define Ω = Ω′ × NG0 and A = A′ ⊗ P(NG0 ).
(a) Show that K: [0,∞)G × P(NG0 )→ [0, 1] with
K(λ,B) :=∑
(ng)g∈G∈B
∏g∈G
λngg
ng!e−λg (3.54)
for all λ = (λg)g∈G ∈ [0,∞)G and B ⊂ NG0 is a well-defined stochastic tran-
sition kernel. Hint: (3.54) can be expressed as K(λ, ·) =⊗
g∈G Poisson(λg).
(b) Show that a well-defined probability measure P on the product space (Ω,A)
is uniquely determined by
P[A×B] = EP′ [1AK(Λ′, B)], A ∈ A′, B ⊂ NG0 . (3.55)
Hint: Consider P′ ⊗ ν on (Ω,A), where ν is the counting measure on NG0 ,
and consider the product in (3.54) as probability density. Alternatively,
apply [32, Corollary 14.23].
40
(c) For every g ∈ G define Λg(ω) = Λ′g(ω′) and Ng(ω) = ng for all ω =
(ω′, (nh)h∈G) ∈ Ω. Prove that (3.51) and (3.52) are satisfied. Hint: Use
(3.55) and the hint for (a).
3.6.1 Expected Values, Variances, and Individual Covariances
Again, the expected number of defaults can be deduced from the properties of the
underlying random intensities (Λg)g∈G. From (3.3), (3.4) and (3.51) we obtain
that E[Ng |Λg ]a.s.= Λg and Var(Ng |Λg)
a.s.= Λg for every g ∈ G. For the numbers
N1, . . . , Nm of default events of the individual obligors 1, . . . ,m, we have the
representation N1
...
Nm
=∑g∈G
cgNg (3.56)
from (3.47), henceE[N1]...
E[Nm]
=∑g∈G
cg E[E[Ng |Λg ]︸ ︷︷ ︸
a.s.= Λg
]=∑g∈G
cg E[Λg] ,
or, written out componentwise,
E[Ni] =∑g∈Gi∈g
E[Λg] , i ∈ 1, . . . ,m. (3.57)
Note that the sum of all ones in the vector cg gives the number |g| of obligors
defaulting together when the group g defaults. Hence, using (3.56),
N := N1 + · · ·+Nm =m∑i=1
∑g∈G
cg,iNg =∑g∈G|g|Ng (3.58)
is the random variable representing the overall number of default events in the
credit portfolio. Similarly, using (3.57),
E[N ] =
m∑i=1
E[Ni] =∑g∈G|g|E[Λg].
To calculate the variances and covariances of N1, . . . , Nm, we start with a
general formula, which is helpful in particular for mixture models. We will
apply (3.60) with X = Ng and the sub-σ-algebra B = σ(Λg) containing all the
information about Λg.
41
Lemma 3.48. Let X and Y be square-integrable Rc- and Rd-valued random
vectors, respectively, on a probability space (Ω,A,P) and B ⊂ A a sub-σ-algebra.
Then the covariance matrix of size c× d satisfies
Cov(X,Y ) = E[Cov(X,Y |B)
]+ Cov
(E[X |B],E[Y |B]
), (3.59)
where expectations are taken componentwise. If c = d = 1 and X = Y , then
(3.59) reduces to
Var(X) = E[Var(X |B)
]+ Var
(E[X |B]
). (3.60)
Proof. The formula for the variance follows from the one for the covariance
matrix. It therefore suffices to prove (3.59). We view X and Y as column vectors.
Using the definition of the covariance matrix, adding and subtracting conditional
expectations, we get that
Cov(X,Y ) = E[(X − E[X])(Y − E[Y ])>
]= E
[((X − E[X |B]) + (E[X |B]− E[X])
)×((Y − E[Y |B]) + (E[Y |B]− E[Y ])
)>].Expanding the product, inserting conditional expectations given B in the first
three terms and using properties of conditional expectation,
Example 4.14 (Binomial distribution). Let the random variable N ∼ Bin(m, p)
describe the number of successes in m ∈ N independent Bernoulli trails with
success probability p ∈ [0, 1], meaning that N = B1 + · · ·+Bm with independent
Bernoulli random variables B1, . . . , Bm. By (4.2), for every i ∈ 1, . . . ,m,
ϕBi(s) = 1 + p(s− 1), s ∈ C,
hence the multiplication theorem of probability-generating functions, cf. (4.27),
implies that
ϕN (s) =m∏i=1
ϕBi(s) = (1 + p(s− 1))m, s ∈ C. (4.29)
Example 4.15 (Multivariate Poisson distribution). Assume that N has the multi-
variate Poisson distribution MPoisson(G, (λg)g∈G,m) as in Definition 3.40. By
the representation (3.47), using multi-index notation, the probability-generating
function is given by
ϕN (s) = E[sN]
= E[ ∏g∈G
(scg)Ng], s ∈ Cm,
where scg =∏i∈g si by (3.48). Using the independence of (Ng)g∈G and the
multiplication theorem (4.27) of probability-generating functions,
ϕN (s) =∏g∈G
E[(scg)Ng
], s ∈ Cm.
Finally, using the probability-generating function of Poisson(λg) for every g ∈ G,
see Example 4.3,
ϕN (s) =∏g∈G
exp(λg(scg − 1
))= exp
(∑g∈G
λg(scg − 1
)), s ∈ Cm. (4.30)
Example 4.16 (Multinomial distribution). Given a dimension d ∈ N, let B1, . . . ,
Bm be m ∈ N independent d-dimensional random vectors, each one having a
multivariate Bernoulli distribution with probability vector p = (p1, . . . , pd) ∈[0, 1]d satisfying p1 + · · ·+ pd = 1, see Example 4.5, i.e. Bj ∼ Multinomial(1, p)
for each j ∈ 1, . . . ,m. We can interpret Bj as describing the result of the
jth trial, which can have d different outcomes. Then the ith component Ni of
N := B1 + · · ·+Bm describes the number of outcomes of type i in a sequence of
m independent trails, for every i ∈ 1, . . . , d. By definition, N has a multinomial
distribution, which we denote by Multinomial(m, p1, . . . , pd) or Multinomial(m, p)
for short. By (4.7), the probability-generating function of Bj is given by
Remark 4.20 (Summation property of the binomial distribution). Using Lemma
4.18 for d = 2 and looking at the one-dimensional marginal distribution (cf.
Exercise 4.17(b)), we obtain the summation property of the binomial distribution.
Of course, this also follows directly using (4.29).
Remark 4.21. The following observation uses generating functions to make the
Poisson approximation of Theorem 3.23 plausible. Let ϕBi denote the probability-
generating function of the Bernoulli random variable Bi of obligor i ∈ 1, . . . ,m,indicating a default with probability pi. As in (4.2),
ϕBi(s) = 1 + pi(s− 1), s ∈ C.
We denote the number of defaults in the whole portfolio by W = B1 + · · ·+Bmand the corresponding generating function by ϕW . If we assume the defaults
of the obligors to be independent, then ϕW (s) =∏mi=1 ϕBi(s). Using the linear
approximation 1 + x ≈ ex for |x| small, we get
ϕW (s) =
m∏i=1
(1 + pi(s− 1)) ≈m∏i=1
epi(s−1) = eλ(s−1), s ∈ C,
with λ := p1 + · · ·+ pm, which according to (4.3) is the probability-generating
function of N ∼ Poisson(λ).
4.2 Application to the General Poisson Mixture Model
After this excursion, the next step is to represent the distribution of the number
of defaults N = N1 + · · ·+Nm in terms of a generating function. At least for all
s ∈ C with |s| ≤ 1,
ϕN (s) = E[sN1+···+Nm]
= E[E[sN1+···+Nm |Λ1, . . . ,Λm
]]= E
[ m∏i=1
E[sNi |Λi
]︸ ︷︷ ︸a.s.= eΛi(s−1)
]
= E[e(Λ1+···+Λm)(s−1)
],
(4.34)
where we used the conditional independence from (3.51) and the generating
function from (4.3). If Λ1, . . . ,Λm are independent, then
ϕN (s) =m∏i=1
E[eΛi(s−1)
]. (4.35)
54
4.3 Properties of the Gamma Distribution
Until now, no assumption was made about the distribution of any Λi. In this
section we will consider only one factor Λ. An arbitrary, but well-accepted choice
for mathematical convenience, is the gamma distribution. Therefore, suppose Λ
to be gamma-distributed (notation Λ ∼ Γ(α, β)) with shape parameter α > 0
and inverse scale (or rate) parameter β > 0, i.e., Λ has a density
f(λ) =
βα
Γ(α)λα−1 e−βλ for λ > 0,
0 for λ ≤ 0,(4.36)
where Γ denotes the gamma function given in (2.27). The integral substitution
x = βλ shows that f is indeed a probability density.
Note that Γ(1, β) is the exponential distribution with rate parameter β > 0,
whereas Γ(n, β) with general n ∈ N is called Erlang distribution. Furthermore,
Γ(n2 ,12) is called χ2-distribution with n ∈ N degrees of freedom.
The next lemma shows that for every inverse scale parameter β > 0 the
gamma distributions Γ(α, β)α>0 form a semigroup under convolution. It also
shows that the gamma distribution is infinitely divisible.
Lemma 4.22 (Summation property of the gamma distribution). Let k ∈ Nand α1, . . . , αk, β > 0. If Λ1, . . . ,Λk are independent random variables with
Λi ∼ Γ(αi, β) for every i ∈ 1, . . . , k, then
k∑i=1
Λi ∼ Γ(α1 + · · ·+ αk, β).
Proof. The lemma follows by induction as soon as it is proved for k = 2. Let
f1 and f2 be densities according to (4.36) for Λ1 ∼ Γ(α1, β) and Λ2 ∼ Γ(α2, β),
respectively. Due to independence of Λ1 and Λ2, a density f for Λ := Λ1 + Λ2 is
given by the convolution, i.e., for all λ > 0,
f(λ) =
∫ λ
0f1(µ)f2(λ− µ) dµ
=
∫ λ
0
βα1
Γ(α1)µα1−1 e−βµ
βα2
Γ(α2)(λ− µ)α2−1 e−β(λ−µ) dµ.
Rearranging, defining α = α1 + α2, and using the substitution µ = λx yields
f(λ) =βα
Γ(α)λα−1 e−βλ︸ ︷︷ ︸
Γ(α,β)-density
· Γ(α)
Γ(α1) Γ(α2)
∫ 1
0xα1−1(1− x)α2−1 dx, λ > 0,
where the remaining constant needs to equal 1, because both sides are probability
distributions. As a side effect, this calculation evaluates the beta function
NegBin(α, 0) with α ≥ 0 as the degenerate distribution concentrated in 0. Note
that the right-hand sides of (4.47), (4.48), (4.49) and (4.50) below, hence also
(4.51) and (4.52) are correct for these cases.
If α ∈ N, then (4.46) gives the probability of exactly n ∈ N0 successes before
the α-th failure in a sequence of independent Bernoulli experiments with success
probability p. For α = 1, the negative binomial distribution (4.46) reduces to the
geometric distribution with parameter p ∈ [0, 1).
Let us calculate the expectation, the variance and the probability-generating
function of N . Since L(N |Λ)a.s.= Poisson(Λ), we have
E[N ] = E[E[N |Λ]︸ ︷︷ ︸a.s.= Λ by (3.3)
]= E[Λ] =
α
β=
αp
1− p(4.47)
by (4.39) and the substitution β = 1−pp arising from (4.45). Using Lemma 3.48
as well as (4.39) for the mean and (4.41) for the variance of Λ, we obtain
Var(N) = E[Var(N |Λ)︸ ︷︷ ︸a.s.= Λ by (3.4)
] + Var(E[N |Λ]︸ ︷︷ ︸a.s.= Λ by (3.3)
)
= E[Λ] + Var(Λ) =α
β+
α
β2= α
β + 1
β2=
αp
(1− p)2,
(4.48)
where we used (4.45) and β = 1−pp for the last equation. It remains to calculate
the corresponding probability-generating function. Using (4.46) and expanding
the fraction by (1− ps)α, it follows that
ϕN (s) = E[sN]
=∞∑n=0
sn P[N = n]
=qα
(1− ps)α∞∑n=0
(α+ n− 1
n
)(1− ps)α(ps)n︸ ︷︷ ︸
NegBin(α,ps)-distribution
=( q
1− ps
)α(4.49)
for all real s ≥ 0 with ps < 1, hence for all s ∈ C with p|s| < 1. Alternatively,
using L(N |Λ)a.s.= Poisson(Λ) and the generating function (4.3) of the Poisson
distribution,
ϕN |Λ(s) := E[sN∣∣Λ] a.s.
= eΛ(s−1), s ∈ C,
as well as the exponential moments (4.42) of Λ ∼ Γ(α, β),
ϕN (s) = E[E[sN |Λ
]]= E
[eΛ(s−1)
]=
(1− s− 1
β
)−α=( β
1 + β − s
)α=( q
1− ps
)α (4.50)
58
for all s ∈ C with p|s| < 1. Since
ϕ(n)N (s) =
pnqα
(1− ps)α+n
n−1∏l=0
(α+ l), n ∈ N, (4.51)
it follows via (4.21) for the factorial moments of the negative binomial distribution
that
E[ n−1∏l=0
(N − l)]
=pn
qn
n−1∏l=0
(α+ l), n ∈ N. (4.52)
Here is the analogue of the Poisson and gamma summation properties given
in Lemma 3.2 and Lemma 4.22, respectively, transferred to independent random
variables with a negative binomial distribution (see Lemma 4.40 below for a
multi-dimensional generalization):
Lemma 4.24 (Summation property of the negative binomial distribution). Let
k ∈ N and α1, . . . , αk ≥ 0 as well as p ∈ [0, 1). If N1, . . . , Nk are independent
with Ni ∼ NegBin(αi, p) for every i ∈ 1, . . . , k, then
N :=k∑i=1
Ni ∼ NegBin(α1 + · · ·+ αk, p) . (4.53)
Proof. By independence, cf. (4.27), and generating function from (4.50),
ϕN (s) =k∏i=1
ϕNi(s) =k∏i=1
( q
1− ps
)αi=( q
1− ps
)α1+···+αk(4.54)
for all s ∈ C satisfying p|s| < 1. Therefore, N ∼ NegBin(α, p) with α =
α1 + · · ·+ αk, because the probability-generating function uniquely determines
the distribution, cf. (4.14).
Exercise 4.25. Give a more probabilistic proof of Lemma 4.24 by considering
the negative binomial distribution as a gamma-mixed Poisson distribution and
using Lemma 3.2 and Lemma 4.22.
4.5 Generating Function of Compound Distributions
Assume that N is N0-valued and that (Xn)n∈N is a sequence of Nd0-valued,
independent, identically distributed random variables, which is independent of
N . To characterize the distribution of the Nd0-valued random sum
S :=
N∑n=1
Xn , (4.55)
59
we compute its generating function ϕS . Using the multi-index notation as in
Example 4.1, the dominated convergence theorem, the independence of the sum
X1 + · · ·+Xn from the event N = n as well as the i. i. d. assumption for the
sequence (Xn)n∈N,
ϕS(s) = E[sX1+···+XN
]=∞∑n=0
E[sX1+···+Xn1N=n
]=
∞∑k=0
E[sX1+···+Xn]︸ ︷︷ ︸
= (E[sX1 ])n = (ϕX1(s))n
P[N = n]
= ϕN (ϕX1(s)),
(4.56)
where is calculation is valid for all s ∈ Cd such that the power series defining
ϕX1(s) is absolutely convergent and such that the power series defining ϕNconverges at |ϕX1(s)|. This is the case at least for all s ∈ Cd with ‖s‖∞ ≤ 1; note
that |ϕX1(s)| ≤ 1 for these s.
Example 4.26 (Pairwise independence is not enough for (4.56)). We emphasis that
the i.i.d. sequence (Xn)n∈N should be independent of N ; the independence of Xn
and N for every n ∈ N, that means pairwise independence, is not enough for (4.56).
For a counterexample, consider an i.i.d. sequence (Xn)n∈N with X1 ∼ Bin(1, 12),
hence ϕX1(s) = 12(1 + s) for s ∈ C by (4.2). Define N = 2− ((X1 +X2) mod 2).
Then P[N = 1] = P[N = 2] = 12 and
P[N = 1, Xi = j] = P[Xi = j, X3−i = 1− j] = 14
as well as
P[N = 2, Xi = j] = P[X1 = j, X2 = j] = 14
for all i ∈ 1, 2 and j ∈ 0, 1, hence N and Xi are independent for every
i ∈ 1, 2. Note that ϕN (s) = 12s+ 1
2s2 and
ϕN (ϕX1(s)) = 14(1 + s) + 1
8(1 + s)2 = 38 + 1
2s+ 18s
2, s ∈ C. (4.57)
However, for the compound sum S given by (4.55), we have that S = 0 =
X1 = 0, S = 1 = X1 = 1, X2 = 0 and S = 2 = X1 = 1, X2 = 1, hence
ϕS(s) = 12 + 1
4s+ 14s
2, s ∈ C,
which differs from (4.57), hence (4.56) does not hold in this case.
Let Q = (qν)ν∈Nd0 with qν := P[X1 = ν] denote the distribution of X1. If
N ∼ Poisson(λ) with λ ≥ 0, then the random sum S in (4.55) has a so-called
compound Poisson distribution and we use the notation S ∼ CPoisson(λ,Q).
60
Since ϕN (s) = eλ(s−1) for all s ∈ C by (4.3), the calculation in (4.56) implies that
ϕS(s) = exp(λ(ϕX1(s)− 1)
)(4.58)
for all s ∈ Cd for which the power series defining ϕX1(s) converges, which is the
case at least when ‖s‖∞ ≤ 1.
Similarly, if N ∼ NegBin(α, p) with α ≥ 0 and p ∈ [0, 1), then S from (4.55)
has a so-called compound negative binomial distribution and we use the notation
S ∼ CNegBin(α, p,Q). Since ϕN (s) = qα/(1− ps)α with q := 1− p for all s ∈ Cwith p|s| < 1 by (4.50), the calculation in (4.56) implies that
ϕS(s) =
(q
1− pϕX1(s)
)α(4.59)
for all s ∈ Cd for which the power series defining ϕX1(s) is absolutely convergent
and for which p|ϕX1(s)| < 1, which is the case at least when ‖s‖∞ ≤ 1.
Let us look at a prominent example and its credit risk interpretation.
Example 4.27 (Negative binomial distribution as compound Poisson distribution).
Let (Xn)n∈N denote i.i.d. random variables, where X1 ∼ Log(p) has a logarithmic
distribution with parameter p ∈ (0, 1), cf. Example 4.4. Recall (4.6) to see that
ϕX1(s) =log(1− ps)log(1− p)
, |s| < 1/p.
According to (4.58), the compound Poisson sum S has the generating function
ϕS(s) = exp
(λ
(log(1− ps)log(1− p)
− 1
)︸ ︷︷ ︸
= λlog(1−p) log 1−ps
1−p
)=( 1− p
1− ps
)α, |s| < 1/p,
with
α := − λ
log(1− p)≥ 0, (4.60)
which according to (4.50) is the probability-generating function of a negative
Proof. The probability-generating function of the Dirac measure δcg is given in
multi-index notation by ϕδcg(s) = scg for all s ∈ Cm, hence
λϕµ(s) =∑g∈G
λgscg , s ∈ Cm.
Therefore, using (4.58), the probability-generating function ϕ of CPoisson(λ, µ)
is given by
ϕ(s) = exp(λ(ϕµ(s)− 1)
)= exp
(∑g∈G
λg(scg − 1)
), s ∈ Cm,
which agrees with the probability-generating function (4.30) of the multivariate
Poisson distribution MPoisson(G, (λg)g∈G,m).
4.6 Some Compound Distributions
Arising from the Multivariate Bernoulli Distribution
Throughout this subsection, let (Bm)m∈N denote i.i.d. multivariate Bernoulli
random variables with B1 ∼ Multinomial(1, p1, . . . , pd), where p1, . . . , pd ∈ [0, 1]
with p1 + · · · + pd = 1, see Example 4.5. Then ϕB1(s) =∑d
i=1 pisi for all
s = (s1, . . . , sd) ∈ Cd by (4.7). Furthermore, let M be an N0-valued random
variable, independent of (Bm)m∈N, and consider the random sum
N = (N1, . . . , Nd) =
M∑m=1
Bm. (4.64)
Remark 4.34 (Covariation of components). Suppose that Var(M) <∞. Using
the representation from (4.64) and Lemma 3.48 applied with B = σ(M),
Cov(Ni, Nj) = Cov(E[Ni |M ]︸ ︷︷ ︸
= piM
,E[Nj |M ]︸ ︷︷ ︸= pjM
)+ E
[Cov(Ni, Nj |M)︸ ︷︷ ︸
=−pipjM
]= pi pj
(Var(M)− E[M ]
).
Hence the sign of the covariation of two components can vary depending on
the expectation and the variance of the distribution of M . It vanishes for
M ∼ Poisson(λ) due to (3.3) and (3.4); Example 4.35 below shows that there is
even independence in this case. For M ∼ Log(p) the sign depends on the value
of p ∈ (0, 1), see Exercise 4.11(b) below.
Example 4.35. (Compound Poisson) Let M ∼ Poisson(λ) with λ ≥ 0. Then (4.7)
and (4.58) imply for the random sum (4.64) that
ϕN (s) = exp
(λ
( d∑i=1
pisi − 1︸ ︷︷ ︸=∑di=1 pi(si−1)
))=
d∏i=1
exp(λpi(si − 1)
)(4.65)
64
for all s = (s1, . . . , sd) ∈ Cd, hence the components of N are independent and
satisfy Ni ∼ Poisson(λpi) for every i ∈ 1, . . . , d. This independence may come
as a surprise, because the components of the multivariate Bernoulli distributed
summands are dependent. However, this independence is a special feature of the
Poisson distribution, it is lost if, for example, the logarithmic distribution (see
Subsection 4.6.1) or the negative binomial distribution (see Subsection 4.6.2) is
considered for M .
If P[M = m] = 1 for an m ∈ N, then N ∼ Multinomial(m, p1, . . . , pd) for the
random variable in (4.64), see Example 4.16. More generally, given (n1, . . . , nd) ∈Nd0, define m = n1 + · · ·+ nd ∈ N0. Then N = (n1, . . . , nd) is only possible when
i=1 pisi for s = (s1, . . . , sd) ∈ Cd given by (4.7),
it follows from (4.56) for the probability-generating function of N that
ϕN (s) =
(q
1−∑d
i=1 pisi
)α(4.72)
for all s = (s1, . . . , sd) ∈ Cd with∣∣∑d
i=1 pisi∣∣ < 1, which is certainly the case if
(p1 + · · ·+ pd)‖s‖∞ < 1. Note that the calculation leading to (4.72) is correct for
p1 = · · · = pd = 0, and the result (4.72) is also correct for α = 0.
Here is the multi-dimensional generalization of Lemma 4.24, which also implies
that the negative multinomial distribution is infinitely divisible:
Lemma 4.40. Let k ∈ N, α1, . . . , αk ≥ 0 and p1, . . . , pd ∈ [0, 1) with p1 + · · ·+pd < 1. If N1, . . . , Nk are independent with Ni ∼ NegMult(αi, p1, . . . , pd) for
of the random sum S = X1 + · · ·+XN defined in (4.55). If the distribution
qn := P[N = n] , n ∈ N0,
of N satisfies the recursion formula given in Definition 5.1 below, then Theorem
5.8 shows that there is an efficient way to do this.
Definition 5.1. A probability distribution (qn)n∈N0 is called Panjer(a, b, k) dis-
tribution with a, b ∈ R and k ∈ N0 if q0 = q1 = · · · = qk−1 = 0 and
qn =(a+
b
n
)qn−1 for all n ∈ N with n ≥ k + 1. (5.1)
Remark 5.2. Given a, b ∈ R and k ∈ N0, the linearity of (5.1) implies that there
exists at most one probability distribution (qn)n∈N0 satisfying Definition 5.1,
because there can be at most one qk ∈ [0, 1] such that∑∞
n=k qn = 1.
Definition 5.3 (Truncation). Let (qn)n∈N0 be a probability distribution and
l ∈ N0 such that there is mass at l or above, meaning that∑∞
n=l qn > 0.
Then the l-truncated probability distribution (qn)n∈N0 of (qn)n∈N0 is defined by
q0 = · · · = ql−1 := 0 and
qn :=qn
1−∑l−1
j=0 qj, n ≥ l. (5.2)
Lemma 5.4. Suppose (qn)n∈N0 is the Panjer(a, b, k) distribution and l ≥ k is an
integer such that there is mass at l or above. Then the l-truncation of (qn)n∈N0
is the Panjer(a, b, l) distribution.
Exercise 5.5. Prove Lemma 5.4 using the linearity of the recursion equation
(5.1)
Remark 5.6. All probability distributions satisfying Definition 5.1 were identified
by Sundt and Jewell [48] for the case k = 0, Willmot [56] for the case k = 1,
and finally Hess, Liewald and Schmidt [28] for general k ∈ N0. The Panjer
distributions are the following:
(a) Poisson distribution (cf. Example 5.17),
(b) Negative binomial distribution (cf. Example 5.18),
(c) Binomial distribution (cf. Example 5.20),
(d) Logarithmic distribution (cf. Example 5.21),
(e) Extended negative binomial distribution (cf. Example 5.22),
(f) Extended logarithmic distribution (cf. Example 5.23),
74
(g) All truncations of these distributions (cf. Definition 5.3 and Lemma 5.4).
Exercise 5.7. Prove that the only non-degenerate probability distributions in
the class Panjer(a, b, 0) | a, b ∈ R are the Poisson, binomial, and the negative
binomial distributions.
The following theorem combines results of Panjer [41] and Hess, Liewald and
Schmidt [28] with the multivariate extension of Sundt [47]. For j, n ∈ Nd0 we write
j ≤ n if this is true for all d components, and we write j < n if j ≤ n and j 6= n,
meaning that there is strict inequality for at least one component. Note that ≤ is
then a partial order on Nd0. We write 〈·, ·〉 for the standard inner product in Rd.
Theorem 5.8 (Multivariate extended Panjer recursion). Assume that the prob-
ability distribution (qn)n∈N0 of N is the Panjer(a, b, k) distribution and that
aP[X1 = 0] 6= 1. Then the distribution (pn)n∈Nd0 of the random sum S defined in
(4.55) can be calculated by
p0 = ϕN (P[X1 = 0]) =
q0 if P[X1 = 0] = 0,
E[(P[X1 = 0])N
]otherwise,
(5.3)
where ϕN is the probability-generating function of N , and the recursion formula
pn =1
1− aP[X1 = 0]
(P[Sk = n] qk +
∑j∈Nd0
0<j≤n
(a+
b〈cn, j〉〈cn, n〉
)P[X1 = j] pn−j
)(5.4)
for all n ∈ Nd0 \ 0, where Sk := X1 + · · ·+Xk and cn ∈ Rd is chosen such that
〈cn, n〉 6= 0; the vector cn := (1, . . . , 1) works in every case.
Proof. Theorem 5.8 is a corollary of Theorem 5.26(a) below, hence its proof is
given just after the statement of Theorem 5.26.
Remark 5.9 (Technical assumption). Of the Panjer distributions given in Remark
5.6, only the uninteresting case P[X1 = 0] = 1 with N ∼ ExtLog(k, 1), cf.
Example 5.23, or one of its truncations, see Lemma 5.4, violates the technical
assumption aP[X1 = 0] 6= 1. Obviously, pn = 0 for all n ∈ Nd0 \ 0 in these cases.
Remark 5.10 (Computational speed-up for small support of L(X1)). For n =
(n1, . . . , nd) ∈ Nd0 \ 0, the number of terms in (5.4) is (n1 + 1) · · · (nd + 1)− 1,
which may limit the practical applicability of the recursion to small dimension
d. A remarkable speed-up is possible if the support of the distribution of X1
is concentrated on just a few points of Nd0, let’s us write SX = n ∈ Nd0 \ 0 |P[X1 = n] > 0 for this support without the origin of Nd0. Then the sum in (5.4)
runs over all j ∈ SX satisfying j ≤ n, i.e.
j ∈ Sn(X) := SX ∩d∏i=1
0, . . . , ni,
75
and their cardinalities satisfies |Sn(X)| ≤ min|SX |, (n1 + 1) · · · (nd + 1)− 1. If
|SX | <∞, then |SX | is an upper bound for the number of terms which doesn’t
grow with n. Remark 5.11 below simplifies the computation of the individual
terms.
Remark 5.11 (Choice of cn). While cn = (1, . . . , 1) works in (5.4) in every case,
there is a computational advantage in choosing cn dependent on n. To illustrate
this, let us take the notation of Remark 5.10 and define Si,n(X) = ji | (j1, . . . ,jd) ∈ Sn(X) for every i ∈ 1, . . . , d. Since every n = (n1, . . . , nd) ∈ SX has
at least one non-zero component, let’s say the ith one ni, we can then choose
cn = (0, . . . , 0, 1, 0, . . . , 0) with the 1 at the ith position, which simplifies 〈cn, j〉and 〈cn, n〉 to ji and ni, respectively, and allows us to pull out the factor from
the other summations in (5.4), i.e.,
∑j∈Sn(X)
(a+
b〈cn, j〉〈cn, n〉
)P[X1 = j] pn−j
=∑
l∈Si,n(X)
(a+
bl
ni
) ∑(j1,...,jd)∈Sn(X)
ji=l
P[X1 = j] pn−j .
Remark 5.12 (Calculation of L(Sk) by convolutions). If k = 0, then Sk = 0,
hence P[Sk = n] = 0 for all n ∈ Nd0 \ 0. If k = 1, then Sk = X1. If k ≥ 2,
then Sk = Sk−1 +Xk and the distribution of Sk can be calculated recursively by
convolution in a numerically stable way, i. e.,
P[Sk = n] =∑j∈Nd0j≤n
P[Sk−1 = n− j]P[Xk = j] , n ∈ Nd0. (5.5)
Starting with the integer k ≥ 4, there is a more efficient way to calculate the
distribution of Sk, similar to the exponentiation by squaring or the Russian
peasant multiplication. Given l,m ∈ N, observe that Sl+m = Sl + S′m with
S′m := Xl+1 + · · ·+Xl+md= Sm. Therefore by convolution,
P[Sl+m = n] =∑j∈Nd0j≤n
P[Sl = n− j]P[Sm = j] , n ∈ Nd0. (5.6)
Define now l = blog2 kc and let k =∑l
i=0 bi2i with bl = 1 and b0, . . . , bl−1 ∈ 0, 1
be the binary representation of k. Calculate iteratively via (5.6) the distributions
of S2i = S2i−1 +S′2i−1 for i ∈ 1, 2, . . . , l, which requires l convolutions. If k = 2l,
then we are done, otherwise the distribution of Sk is obtained by using (5.6) to
calculate the convolution of the distributions of all those S2i with i ∈ 0, 1, . . . , l,for which bi = 1. This requires b0 + · · ·+ bl−1 additional convolutions, so there
are l + b0 + · · · + bl−1 ≤ 2l altogether. This is numerically more precise than
〈c,Xi−m〉 for i ∈ N. These are i.i.d. and [0,∞)-valued random variables. Fix the
natural number k ≥ 2. Define S′k = X ′1 + · · ·+X ′k. Then 0 ≤ S′k = 〈c, Sk − km〉and S′k = 0 = X ′1 = 0, . . . , X ′k = 0 = X1 = m, . . . ,Xk = m. Using the i.i.d.
assumption, (5.10) follows.
To prove (5.11) for a given n ∈ Nd0 with 〈c, n〉 > k〈c,m〉, rewrite (5.7) with
m+ n in place of n. Then take the inner product with c and solve for P[Sk = n],
which is possible because 〈c, n− km〉 6= 0 and P[X1 = m] > 0 by the choice of c
and m. Furthermore, all remaining terms with 〈c, j〉 ≤ 〈c,m〉 on the right-hand
side of (5.11) are zero and can be omitted. Since 〈c,X1〉 ≥ 〈c,m〉, it follows that
〈c, Sk〉 ≥ k〈c,m〉 by the above part of the proof, hence we may skip all terms
on the right-hand side of (5.11) with 〈c,m+ n− j〉 < k〈c,m〉. Since j ≤ m+ n,
these are the ones with 〈c, j〉 > 〈c,m + n〉 − k〈c,m〉 = 〈c, n − (k − 1)m〉. This
justifies to sum only over j ∈ Nm,n.
Before we derive Theorem 5.8 from Theorem 5.26(a) below, let us look at
several examples and keep the numerical stability for the recursion formula (5.4)
in mind.
Example 5.17 (Poisson distribution). If (qn)n∈N0 is Poisson(λ) with λ ≥ 0, then
q0 = e−λ and
qn =λn
n!e−λ =
λ
nqn−1, n ∈ N,
hence Poisson(λ) is the Panjer(0, λ, 0) distribution. Using (4.3), the initial value
(5.3) turns into
p0 = eλ(P[X1=0]−1) . (5.12)
The recursion formula (5.4) can be simplified to
pn =λ
ni
∑j∈Nd0
0<j≤n
ji P[X1 = j] pn−j , (5.13)
for every n = (n1, . . . , nd) ∈ Nd0 \ 0, where i ∈ 1, . . . , d is chosen such that
ni 6= 0. See Remark 5.10 to omit terms in (5.13) with value zero. The recursion
(5.13) is numerically stable because only non-negative numbers are multiplied
and added.
Example 5.18 (Negative binomial distribution). If (qn)n∈N0 is NegBin(α, p) with
parameters α > 0 and p ∈ [0, 1) as specified in (4.46), then q0 = qα and
qn =
(α+ n− 1
n
)pnqα =
α+ n− 1
np qn−1, n ∈ N,
with q := 1 − p, hence NegBin(α, p) is the Panjer(p, (α − 1)p, 0) distribution.
Using (4.50), the initial value (5.3) turns into
p0 =( q
1− pP[X1 = 0]
)α. (5.14)
80
The recursion formula (5.4) can be simplified to
pn =p
ni(1− pP[X1 = 0])
∑j∈Nd0
0<j≤n
(αji + ni − ji)P[X1 = j] pn−j (5.15)
for every n = (n1, . . . , nd) ∈ Nd0 \ 0, where i ∈ 1, . . . , d is chosen such that
ni 6= 0. See Remark 5.10 for the possibility to omit terms in (5.15) with value
zero. The recursion (5.15) is numerically stable because ni − ji ∈ N0 (this
requires proper programming, αji has to be added afterwards) and otherwise
only non-negative numbers are multiplied and added to calculate the sum.
Remark 5.19 (Calculation of the initial value). To apply the d-dimensional
extended Panjer recursion (5.4), the probability p0 of a loss of zero is needed as
starting value, see (5.3). If N ∼ Poisson(λ) with λ ≥ 0, then p0 is given by (5.12).
If N ∼ NegBin(α, p) with α > 0 and p ∈ [0, 1), then p0 is given by (5.14). When
modeling large portfolios with the collective risk model (4.55) using one of these
two claim number distributions, it can happen for large λ or α, respectively, that
p0 is so small that it can only be represented as zero on a computer (numerical
underflow). The recursion (5.4) then produces pn = 0 for all n ∈ Nd0 \ 0, which
is clearly wrong. The standard solution, cf. [33, Section 6.6.2], is to perform
Panjer’s recursion with the reduced parameter λ′ := λ/2n (resp. α′ := α/2n)
instead, where n ∈ N is chosen such that the new starting value p0 is properly
representable on the computer. Afterwards, n iterative and numerically stable
convolutions are needed to calculate the original probability distribution. This
approach works because for independent N1, . . . , N2n ∼ Poisson(λ/2n), we have
that N = N1 + · · ·+N2n ∼ Poisson(λ) by Lemma 3.2, similarly for the negative
binomial distribution, see Lemma 4.24. In general, this works for claim number
distributions closed under convolutions.
Example 5.20 (Binomial distribution). Let (qn)n∈N0 denote the binomial distri-
bution Bin(m, p) with success probability p ∈ [0, 1) and number of trials m ∈ N.
Let q := 1− p denote the failure probability. Then, for every n ∈ N,
qn =
(m
n
)pnqm−n =
m− n+ 1
n
p
q·(
m
n− 1
)pn−1qm−(n−1)
=
(−pq︸︷︷︸
=: a
+(m+ 1)p
q︸ ︷︷ ︸=: b
1
n
)qn−1,
hence Bin(m, p) is the Panjer(−p/q, (m+ 1)p/q, 0) distribution. The recursion
factor a + b/n is zero for n = m + 1, giving qn = 0 for n ≥ m + 1 as expected.
Using (4.29), the initial value (5.3) turns into
p0 =(1 + p(P[X1 = 0]− 1)
)m. (5.16)
81
Consider Panjer’s recursion formula (5.4) for n = (n1, . . . , nd) ∈ Nd0 \ 0 with
n1 ≥ m + 2 and n2 = · · · = nd = 0. Without loss of generality we can take
cn = (1, 0, . . . , 0). Then the term
a+b〈cn, j〉〈cn, n〉
= −pq
(1− m+ 1
n1j1
)changes sign as j = (j1, 0, . . . , 0) varies between (1, 0, . . . , 0) and (n1, 0, . . . , 0).
Therefore, the recursion might not be numerically stable because cancellations
can occur. The problem with numerical underflow during the calculation of the
initial value p0 given in (5.16) can also occur for large m, cf. Remark 5.19. Since
ϕS(s) = ϕN(ϕX1(s)
)=(q + pϕX1(s)
)m=
l∏k=0bk=1
(q + pϕX1(s)
)2k
at least for all s ∈ Cd with ‖s‖∞ ≤ 1, where m =∑l
k=0 bk2k with b1, . . . , bl−1 ∈
0, 1, bl = 1 and l = blog2mc denotes the binary representation of m, we see
that the distribution (pn)n∈Nd0 of S can be computed in a numerically stable way
with b0 + · · ·+ bl−1 + l ≤ 2l convolutions, see Remark 5.12.
Example 5.21 (Logarithmic distribution). If (qn)n∈N0 is Log(p) with p ∈ [0, 1),
cf. Example 4.4, then q0 = 0, q1 = 1/c(p) with c(p) defined by (4.5) and
qn =pn−1
c(p)n= p
n− 1
nqn−1, for n ∈ N, n ≥ 2,
hence Log(p) is the Panjer(p,−p, 1) distribution. Using (4.6), the initial value
(5.3) turns into
p0 = P[X1 = 0]c(pP[X1 = 0])
c(p)(5.17)
The recursion formula (5.4) simplifies to
pn =1
1− pP[X1 = 0]
(P[X1 = n]
c(p)+
p
ni
∑j∈Nd0
0<j<nji<ni
(ni − ji)P[X1 = j] pn−j
)(5.18)
for every n = (n1, . . . , nd) ∈ Nd0 \ 0, where i ∈ 1, . . . , d is chosen such that
ni 6= 0. See Remark 5.10 about the possibility to omit further terms in (5.18)
with value zero. The recursion (5.18) is numerically stable because ni − ji ∈ N0
and otherwise only non-negative numbers are multiplied and added inside the
parenthesis to calculate the sum. For p = 0, the recursion (5.18) simplifies
dramatically to pn = P[X1 = n] for all n ∈ Nd0 \ 0.
82
Example 5.22 (Extended negative binomial distribution). For parameters k ∈ N,
converges absolutely for all x ∈ C with |x| ≤ 1; for x = −p we see that∑n∈N0
(α+ n− 1
n
)pn =
∑n∈N0
(−αn
)(−p)n = (1− p)−α = q−α. (5.22)
We conclude that the nominators in (5.19) are all of the same sign and, by (5.22),
the denominator is the sum of these. Hence qn > 0 for all integers n ≥ k and∑∞n=k qn = 1.
Using the first equality in (5.20), we see that, for every n ≥ k + 1,(α+ n− 1
n
)pn =
α+ n− 1
np · pn−1
(n− 1)!
n−1∏j=1
(α+ n− 1− j)
=(
1 +α− 1
n
)p ·(α+ n− 2
n− 1
)pn−1,
hence ExtNegBin(α, k, p) is the Panjer(p, (α − 1)p, k) distribution. Consider
Panjer’s recursion formula (5.4) for n = (n1, . . . , nd) ∈ Nd0 \ 0 with n1 > 1− αand n2 = · · · = nd = 0. Without loss of generality we can take cn = (1, 0, . . . , 0).
Then the term
a+b〈cn, j〉〈cn, n〉
=(
1 +α− 1
n1j1
)p
changes sign as j = (j1, 0, . . . , 0) varies between (1, 0, . . . , 0) and (n1, 0, . . . , 0).
Therefore, the recursion can be numerically unstable due to cancellations, see
Remark 5.28.
To calculate the probability-generating function of a random variable N ∼ExtNegBin(α, k, p), note that by (5.22)∑
Exercise 5.25. Use (4.29), (4.50) and (5.23) to verify the last statement in
Remark 5.24.
5.2 A Generalisation of the Multivariate Panjer Recursion
The multivariate extended Panjer recursion in Theorem 5.8 is a special case of
part (a) of the following theorem, which combines [21, Theorem 4.5] with the
multivariate idea in [47, Theorem 1] and is of independent interest for questions
of numerical stability, see Subsections 5.3 and 5.4 below.
Theorem 5.26. Fix l ∈ N. Let (qn)n∈N0 and (qi,n)n∈N0 denote the probability
distributions of the N0-valued random variables N and Ni for i ∈ 1, . . . , l,where (N, N1, . . . , Nl) is independent of the Nd0-valued i. i. d. sequence (Xn)n∈N.
Let (pn)n∈Nd0 and (pi,n)n∈Nd0 denote the probability distributions of the random
sums S = X1 + · · ·+XN and S(i) = X1 + · · ·+XNi for i ∈ 1, . . . , l, respectively.
(a) Assume21 that there exist k ∈ N0 and a1, . . . , al, b1, . . . , bl ∈ R such that
qn =l∑
i=1
(ai +
bin
)qi,n−i for all n ∈ N with n ≥ k + l (5.27)
and all probabilities not used on the right-hand side of (5.27) are zero, i.e.
qi,0 = · · · = qi,k+l−i−1 = 0 for all i ∈ 1, . . . ,min(l, k + l − 1). (5.28)
Then, for every n ∈ Nd0 \ 0 and cn ∈ Rd with 〈cn, n〉 6= 0,
pn =
k+l−1∑j=1
P[Sj = n] qj +
l∑i=1
∑j∈Nd0j≤n
(ai +
bi〈cn, j〉i〈cn, n〉
)P[Si = j] pi,n−j , (5.29)
and p0 is given by (5.3).
(b) Assume that there exist ν1, . . . , νl ∈ [0, 1] with ν1 + · · ·+ νl ≤ 1 such that
qn =∑l
i=1 νi qi,n for all n ∈ N. Then pn =∑l
i=1 νi pi,n for all n ∈ Nd0 \ 0.
Proof of Theorem 5.8. If (qn)n∈N0 is the Panjer(a, b, k) distribution, then Theo-
rem 5.26(a) is applicable by choosing l = 1 and q1,n = qn for all n ∈ N0, which
implies pn = p1,n for all n ∈ Nd0. Using q1 = · · · = qk−1 = 0, which implies (5.28),
and solving (5.29) for pn yields (5.4).
Proof of Theorem 5.26. (a) We extend a standard proof (cf. [39, Theorem 3.3.9]
for the case k = 0 and l = 1) with the idea from [47] for the d-dimensional setting.
21In these lecture notes, we only apply this case with l = 1.
86
To prove the representation for the initial value given in (5.3), note that
S = 0, N = 0 = N = 0. Hence
p0 = P[S = 0] = P[S = 0, N = 0]︸ ︷︷ ︸=P[N=0] = q0
+ P[S = 0, N ≥ 1]︸ ︷︷ ︸= 0 if P[X1 = 0] = 0
.
If P[X1 = 0] > 0, then we use independence of N and (Xn)n∈N as well as the
We now prove (5.29) for fixed n ∈ Nd0 \ 0 and c ∈ Rd satisfying 〈c, n〉 6= 0.
For this we need a preparation. Fix i ∈ 1, . . . , l. For every m ∈ N with m ≥ i,we use the representations Sm = X1 + · · · + Xm = Sm−i + Si,m with Si,m :=
Xm−i+1 + · · ·+Xm and independent and identically distributed X1, . . . , Xm. If
P[Sm = n] > 0, then we obtain that
〈c, n〉 = E[〈c, Sm〉
∣∣Sm = n]
=m∑j=1
E[〈c,Xj〉
∣∣Sm = n]
= mE[〈c,Xm〉
∣∣Sm = n]
=m
iE[〈c, Si,m〉
∣∣Sm = n],
hence (ai +
bim
)= E
[ai +
bi〈c, Si,m〉i〈c, n〉
∣∣∣∣Sm = n
]=∑j∈Nd0j≤n
(ai +
bi〈c, j〉i〈c, n〉
)P[Si,m = j |Sm = n] .
(5.31)
For every m ≥ i we know that Sm−i and Si,m are independent, hence
We now rewrite pn = P[S = n] using (5.27) as follows
pn =∞∑m=1qm>0
P[Sm = n |N = m]︸ ︷︷ ︸=P[Sm=n] by indep.
P[N = m]︸ ︷︷ ︸= qm
=
k+l−1∑m=1
P[Sm = n] qm +
∞∑m=k+l
l∑i=1
(ai +
bim
)P[Sm = n] qi,m−i︸ ︷︷ ︸
=: (∗)
(5.33)
87
Inserting (5.31) and (5.32) yields for the series
(∗) =
∞∑m=k+l
l∑i=1
∑j∈Nd0j≤n
(ai +
bi〈c, j〉i〈c, n〉
)P[Si = j]P[Sm−i = n− j] qi,m−i
=
l∑i=1
∑j∈Nd0j≤n
(ai +
bi〈c, j〉i〈c, n〉
)P[Si = j]
∞∑m=k+l
P[Sm−i = n− j] qi,m−i︸ ︷︷ ︸=: (∗∗)
,
where the rearrangement from the first to the second line is admissible, because
the series in the second line converge for every i ∈ 1, . . . , l and j ∈ 0, . . . , n.Using (5.28), the index shift m− i m, and similar arguments as for (5.33), we
get for these series
(∗∗) =∞∑m=i
P[Sm−i = n− j] qi,m−i
=∞∑m=0
P[Sm = n− j, Ni = m
]= P
[S(i) = n− j
]= pi,n−j .
Substituting (∗∗) into (∗) and this result into (5.33) gives (5.29).
(b) Modifying the calculation in (5.33) using independence of Sm = n and
N = m and the formula P[N = m] =∑l
i=1 νi P[Ni = m] for m ∈ N, we obtain
pn =∞∑m=1
P[Sm = n, N = m]︸ ︷︷ ︸=P[Sm=n]P[N=m]
=l∑
i=1
νi
∞∑m=1
P[Sm = n]P[Ni = m]︸ ︷︷ ︸= pi,n
for every n ∈ Nd0 \ 0.
The following corollary of Theorem 5.26(b) is useful, when only a k-truncation
of a probability distribution is a Panjer(a, b, k) distribution. It is the multivariate
extension of [21, Corollary 4.7].
Corollary 5.27. Assume that (qn)n∈N0 has mass at or above k ∈ N and that
(qn)n∈N0 denotes its k-truncated probability distribution according to Definition 5.3.
Assume that N respectively N have these distributions, and that S = X1+· · ·+XN
and S = X1 + · · ·+XN are the corresponding random sums with distributions
(pn)n∈Nd0 and (pn)n∈Nd0 . Then p0 is given by (5.3) and
pn =
k−1∑i=1
P[Si = n] qi +
(1−
k−1∑j=0
qj
)pn, n ∈ Nd0 \ 0. (5.34)
Proof. Apply Theorem 5.26(b) with l = k, νi = qi and qi,i = 1 for i ∈ 1, . . . , k−1, νk = 1− (q0 + · · ·+ qk−1), qk,n = qn for all n ≥ k, and all other qi,n = 0.
88
5.3 Numerically Stable Algorithm for ExtNegBin
Remark 5.28. As noticed in Example 5.22, the Panjer algorithm for the extended
negative binomial distribution can be numerically unstable due to cancellations.
To show that this is a real danger, let us consider the following example. Take
k ∈ N and ε, p ∈ (0, 1), define α = −k+ε and let (qn)n∈N0 denote the distribution
of N ∼ ExtNegBin(α, k, p) given by (5.19). Choose l ∈ N with l ≥ 3 and
P[X1 = 1] = P[X1 = l] = 1/2 as one-dimensional loss distribution. Note that
pk = P[N = k, X1 = · · · = Xk = 1] =qk2k
and
pk+l−1 =
k∑j=1
P[N = k, Xj = l, Xi = 1 for all i ∈ 1, . . . , k \ j]
+ P[N = k + l − 1, X1 = · · · = Xk+l−1 = 1]
=kqk2k
+qk+l−1
2k+l−1.
Recall from Example 5.22 that the frequency distribution ExtNegBin(α, k, p)
is the Panjer(p, p(α − 1), k) distribution. Note that Sk takes values in the set
k+ j(l− 1) | j = 0, . . . , k, which does not contain k+ l, hence Panjer recursion
formula (5.4) for pk+l reduces to
pk+l =
k+l∑j=1
p(
1 +α− 1
k + lj)P[X1 = j] pk+l−j .
Since P[X1 = j] 6= 0 only for j ∈ 1, l, this simplifies to two summands, i.e.,
pk+l = p(
1 +α− 1
k + l
)pk+l−1
2+ p(
1 +α− 1
k + ll)pk
2
= pk(l − 1) + εk
k + l
( qk2k+1
+qk+l−1
k2k+l
)− pk(l − 1)− εl
k + l
qk2k+1
,
hence severe cancellation occurs for pk+l when ε is small and qk+l−1 2l−1kqk.
For example, the values ε = 10−4, k = 1, l = 5 and p = 9/10 give
p6 ≈ 0.14999262− 0.14997009 = 0.00002253,
hence we lose four significant digits in this case.
Following [21, Section 5.1], we now develop a numerically stable algorithm
to compute the distribution of (pn)n∈Nd0 of S = X1 + · · ·+XN , when N has an
extended negative binomial distribution. The main ingredient is the following
corollary of Theorem 5.26(a) for the case l = 1 (we will omit the index 1 for
simplicity).
89
Corollary 5.29. For the parameters k ∈ N0, α ∈ (−k,−k+1) and p ∈ (0, 1], with
p 6= 1 for k = 0, let (qn)n∈N0 denote the ExtNegBin(α−1, k+1, p) distribution and
(qn)n∈N0 the ExtNegBin(α, k, p) distribution, where ExtNegBin(α, 0, p) stands for
the negative binomial distribution NegBin(α, p). Then (5.27) holds with l = 1
and q1,n = qn for n ≥ k + 1. The constants are given by a = 0 and
b = (α− 1)pq−α −
∑k−1j=0
(α+j−1
j
)pj
q1−α −∑k
j=0
(α+j−2
j
)pj, (5.35)
hence (5.29) simplifies to the numerically stable weighted convolution
pn =b
ni
∑j∈Nd0
j≤n, ji>0
ji P[X1 = j] pn−j , (5.36)
for every n = (n1, . . . , nd) ∈ Nd0 \ 0, where i ∈ 1, . . . , d is chosen such that
ni 6= 0. The initial value p0 is given by (5.3) with probability-generationg function
from (5.23) with parameters α and k replaced by α− 1 and k + 1, respectively.
Proof. Using (5.19), we see that, for every n ≥ k + 1,((α− 1) + n− 1
n
)pn =
(α− 1)p
n
(α+ (n− 1)− 1
n− 1
)pn−1,
hence qn = bqn−1/n and Theorem 5.26(a) is applicable.
The case k = 0, p = 1 is excluded in the preceding corollary. We cannot
reduce the calculation for a claim number N ∼ ExtNegBin(α − 1, k + 1, p) to
the one for N ∼ ExtNegBin(α, k, p) in this case, because the negative binomial
distribution is not defined for p = 1. However, a suitable limit p 1 gives the
following numerically stable procedure.
Lemma 5.30 (Stable recursion for ExtNegBin(α − 1, 1, 1)). For α ∈ (0, 1)
consider a claim number N ∼ ExtNegBin(α − 1, 1, 1). Then the distribution
(pn)n∈Nd0 of the random sum S = X1 + · · · + XN can be calculated by p0 =
1− (P[X1 ≥ 1])1−α and
pn =
1−αni
∑j∈Nd0, 0<j≤n
ji P[X1 = j] rn−j if P[X1 ≥ 1] > 0,
0 if P[X1 ≥ 1] = 0,
for every n = (n1, . . . , nd) ∈ Nd0 \ 0, where i ∈ 1, . . . , d is chosen such that
ni 6= 0. In the case P[X1 ≥ 1] > 0 the non-negative sequence (rn)n∈Nd0 is defined
by r0 = (P[X1 ≥ 1])−α and recursively in a numerically stable way by
rn =1
ni P[X1 ≥ 1]
∑j∈Nd0
0<j≤n
(αji + ni − ji)P[X1 = j] rn−j
90
for every n = (n1, . . . , nd) ∈ Nd0 \ 0, where i ∈ 1, . . . , d is chosen such that
ni 6= 0.
Proof. It suffices to consider the non-trivial case P[X1 ≥ 1] > 0. We start with
p ∈ (0, 1) and let (pn(p))n∈Nd0 denote the distribution of S = X1 + · · · + XN ,
where N ∼ NegBin(α, p), and (pn(p))n∈Nd0 the distribution of S = X1 + · · ·+XN ,
where N ∼ ExtNegBin(α−1, 1, p). Since NegBin(α, p) is the Panjer(p, (α−1)p, 0)
distribution, a recursion for the auxiliary sequence
rn(p) := (1− p)−αpn(p), n ∈ Nd0, (5.37)
follows from Panjer’s recursion (5.15) for (pn(p))n∈Nd0 , namely
rn(p) =p
ni(1− pP[X1 = 0])
∑j∈Nd0
0<j≤n
(αji + ni − ji)P[X1 = j] rn−j(p) (5.38)
for every n = (n1, . . . , nd) ∈ Nd0 \ 0 and i ∈ 1, . . . , d satisfying ni 6= 0 and
with starting value
r0(p) = (1− pP[X1 = 0])−α (5.39)
given by (5.3) with probability-generating function from (5.14). The weighted
convolution (5.36) becomes
pn(p) =(1− p)αb(p)
ni
∑j∈Nd0
j≤n, ji>0
ji P[X1 = j] rn−j(p) (5.40)
for every n = (n1, . . . , nd) ∈ Nd0 \ 0 and i ∈ 1, . . . , d satisfying ni 6= 0 and
with b(p) := (1− α)p(1− p)−α/(1− (1− p)1−α) from (5.35) and starting value
p0(p) =1− (1− pP[X1 = 0])1−α
1− (1− p)1−α (5.41)
given by (5.3) with probability-generating function from (5.24). The normalization
in (5.37) is chosen so that we can take the limit p 1 in (5.38)–(5.41), in
particular (1 − p)αb(p) tends to 1 − α. With rn := limp1 rn(p) and pn :=
limp1 pn(p), the lemma follows.
Algorithm 5.31. Corollary 5.29 and Lemma 5.30 lead to the following numerically
stable algorithm for the calculation of the distribution of the aggregate loss in
the collective risk model S = X1 + · · ·+XN , where N ∼ ExtNegBin(α, k, p) with
k ∈ N, α ∈ (−k,−k + 1) and p ∈ (0, 1]:
• If p < 1, perform a stable Panjer recursion according to Theorem 5.8 for
N ∼ NegBin(α+ k, p), followed by a stable weighted convolution according
to Corollary 5.29 to pass to N ∼ ExtNegBin(α+ k − 1, 1, p).
91
• If p = 1, use Lemma 5.30 to calculate the distribution of the compound
sum S for N ∼ ExtNegBin(α+ k − 1, 1, p).
Calculate k − 1 weighted convolutions according to (5.36) to pass iteratively to
N ∼ ExtNegBin(α+ k − 2, 2, p), . . ., and finally to N ∼ ExtNegBin(α, k, p).
Remark 5.32. Of course, compared to the ordinary (but possibly unstable) Panjer
recursion of Theorem 5.8, Algorithm 5.31 increases the numerical effort by a
factor of k + 1. Note that the weighted convolution in (5.36) is not a recurrence,
hence unavoidable rounding errors do not propagate as in a recursive calculation.
5.4 Numerically Stable Algorithm for ExtLog
Similar results as in the previous subsection can be obtained for the extended
logarithmic distribution.22
Corollary 5.33 ([21, Corollary 5.4]). For the parameters k ∈ N and p ∈ (0, 1]
with p < 1 in case k = 1, let (qn)n∈N0 denote the ExtLog(k + 1, p) distribution
and (qn)n∈N0 the ExtLog(k, p) distribution, where ExtLog(1, p) stands for Log(p).
Then (5.27) holds with l = 1 (we drop this index for convenience) and q1,n = qnfor n ≥ k + 1. The constants are given by a = 0 and
b = (k + 1)p
∑∞l=k
(lk
)−1pl∑∞
l=k+1
(l
k+1
)−1pl
(5.42)
hence (5.29) simplifies to the numerically stable weighted convolution (5.36) and
p0 is given by (5.3).
Exercise 5.34. Use Theorem 5.26(a) to prove Corollary 5.33.
In the excluded case (k, p) = (1, 1), we cannot reduce the calculation for
N ∼ ExtLog(2, p) to that forN ∼ ExtLog(1, p) = Log(p), because the logarithmic
distribution from Example 4.4 is not defined for p = 1. Fortunately, a similar
limit consideration as for the extended negative binomial distribution works.
Lemma 5.35 (Multi-dimensional version of [21, Lemma 5.5], stable recursion for
ExtLog(2, 1)). Assume that N ∼ ExtLog(2, 1). Then the distribution (pn)n∈Nd0of the random sum S = X1 + · · ·+XN can be calculated by
p0 = P[X1 = 0] + P[X1 ≥ 1] logP[X1 ≥ 1]
with the convention 0 log 0 = 0, and
pn =
1ni
∑j∈Nd0, 0<j≤n
ji P[X1 = j] rn−j if P[X1 ≥ 1] > 0,
0 if P[X1 ≥ 1] = 0,
22The results of this subsection will not be used in the remaining part of lecture notes.
92
for every n = (n1, . . . , nd) ∈ Nd0 \ 0 and i ∈ 1, . . . , d satisfying ni 6= 0, where
for the case P[X1 ≥ 1] > 0 the non-negative sequence (rn)n∈Nd0 is defined by
r0 = − logP[X1 ≥ 1] and recursively in a numerically stable way by
rn =1
P[X1 ≥ 1]
(P[X1 = n] +
1
ni
∑j∈Nd0\0j<n, ji<ni
(ni − ji)P[X1 = j] rn−j
)
for every n = (n1, . . . , nd) ∈ Nd0 \ 0 and i ∈ 1, . . . , d satisfying ni 6= 0.
Exercise 5.36. Prove Lemma 5.35. Hints: For p ∈ (0, 1) consider N ∼ Log(p),
let (pn(p))n∈Nd0 denote the distribution of S = X1 + · · ·+XN , and let (pn(p))n∈Nd0denote the distribution of S = X1 + · · ·+XN , where N ∼ ExtLog(2, p). Define
the auxiliary sequence
rn(p) := −pn(p) log(1− p), n ∈ Nd0.
and proceed in a similar way as in the proof of Lemma 5.30. Consider the limit
p 1 at the end.
6 Extensions of CreditRisk+
Note that the extended multi-period CreditRisk+ framework presented here can
also be seen as a multi-period multi-business-line extension of the collective risk
model from actuarial science.
6.1 Introduction
With the tools developed above we can now introduce the CreditRisk+ framework
and its extensions. First some general notes:
• The original CreditRisk+ framework was developed by Credit Suisse First
Boston (CSFB) [11].
• It is a one-period actuarial model for the aggregation of credit risks.
• It is based on the Poisson approximation of individual defaults, utilizing a
trade-off effect occurring in sums, cf. Remark 3.30 .
• One of the big advantages of the model is that the probability-generating
function of the loss distribution is available in closed form.
• Extending the Poisson mixture model, several independent and gamma-
distributed default causes as well as deterministic exposures are taken into
account.
93
• The model does not call for Monte Carlo methods, hence the output is
completely determined by the input data without any variations due to
different simulation runs.
The extensions presented here include:
• The individual exposures of obligors are allowed to be d-dimensional random
vectors making a multi-period model possible.
• Risk groups of obligors and corresponding, possibly stochastically dependent
exposures can be handled.
• Default causes don’t need to be independent, they are allowed to have a spe-
cial but flexible dependence structure, given by scenarios and independent
risk factors.
• The distributions of the risk factors are not restricted to gamma dis-
tributions, instead also more flexible distributions like tempered stable
distributions can be used.
• At least for gamma-distributed risk factors, the risk contributions of indi-
vidual obligors can be calculated.
• The probability distribution of the portfolio loss can be derived with a
numerically stable algorithm, even with all the mentioned extensions.
Note that, due to stochastic exposures, the risk of a downgraded credit rating
can easily be incorporated in the extended version of CreditRisk+. Using risk
groups, even joint downgrades can be modelled.
Remark 6.1 (Multi-period extension). The extension to several periods can be
used in various ways and is also applicable in actuarial mathematics.
(a) If there are d periods, it is of importance to know in which period an obligor
defaults. For example, an early default might cause liquidity problems for
the lender, because write off is required early. Furthermore, the size of the
loss given default can depend on the time of the default, in particular when
a loan or a mortgage is amortized during its life span and not at maturity.
(b) A two-period model is of interest for a portfolio of credit guarantees. Here
the default probability (or intensity) only refers to defaults happening
during the first period, and the first component for the losses refers to the
payout during this period. The second component of the losses models
the payment obligations after the first period, it would correspond to the
actuarial reserves to be built up at the end of the first period.
94
(c) In an insurance context, the d components can represent different types of
claim payments. For a portfolio of health insurance contracts, this can be
costs of medical treatments and allowances for missing income of the insured.
For a portfolio of personal liability or automobile collision insurances, these
can be claims for bodily injuries and property damages.
(d) In the context of stochastic claims reserving (see [57] for a textbook pre-
sentation), the d periods can represent the development years. Here the
default probability (or intensity) refers to the claims originating from the
initial insured period; the claims may be reported at a later period and
payments may be spread out during the remaining periods of the model.
6.2 Description of the Model
We now assemble the necessary input parameters and the notation of the extended
CreditRisk+ methodology.
6.2.1 Input Parameters
Our extended version of CreditRisk+ needs the following input parameters:
• The number m ∈ N of obligors,
• the number d ∈ N of periods,
• the basic loss units E1, . . . , Ed > 0 for the d periods,
• the number C ∈ N of non-idiosyncratic default causes,
• the number K ∈ N of independent risk factors,
• the parameters specifying the gamma distributions or the tempered stable
distributions of the independent risk factors R1, . . . , RK ,
• a non-empty finite set J of dependence scenarios,
• a probability distribution on the set J of dependence scenarios,
• for each dependence scenario j ∈ J a matrix Aj = (ajc,k)c∈0,...,C,k∈0,...,Kof size (C + 1)× (K + 1) with non-negative entries, where
aj0,k = 0 for all j ∈ J and k ∈ 1, . . . ,K, (6.1)
• the collection G of nonempty subsets of all obligors 1, . . . ,m, called the
risk groups, which are subject to joint defaults.
For every group g ∈ G we need
95
• the d-period default probability pg ∈ [0, 1],
and then, for every dependence scenario j ∈ J ,
• the susceptibility w0,g,j ∈ [0, 1] to idiosyncratic default,
• the susceptibilities wc,g,j ∈ [0, 1] to default causes c ∈ 1, . . . , C,
• the multivariate probability distributions Qc,g,j = (qc,g,j,µ)µ∈(Nd0)g on (Nd0)g
describing the stochastic losses in d periods of all the obligors i ∈ g in
multiples of the basic loss units E1, . . . , Ed in case the risk group g defaults
due to cause c ∈ 0, . . . , C.
Assumption 6.2. Every obligor i ∈ 1, . . . ,m belongs to at least one group
g ∈ G. Let Gi := g ∈ G | i ∈ g denote the set of all groups to which obligor
i ∈ 1, . . . ,m belongs, by assumption Gi 6= ∅.
Remark 6.3. While Assumption 6.2 is not necessary for the algorithm, it is useful
to check the proper set-up of the model. If an obligor is not contained in any
risk group, then a default is impossible and the obligor could be left out from the
credit risk model.
Assumption 6.4. For each group g ∈ G and each scenario j ∈ J , the suscepti-
bilities (also called weights) exhaustively describe the default causes. That is, for
every g ∈ G and j ∈ J ,C∑c=0
wc,g,j = 1. (6.2)
Remark 6.5. Assumption 6.4 is useful for the interpretation of the default proba-
bility pg and the default intensity λg for every risk group g ∈ G in every scenario
j ∈ J , but the assumption is not necessary for the algorithm itself. See also the
normalization in Assumption 6.35 below.
The idea of risk groups modelling joint defaults is motivated by the common
Poisson shock models discussed by Lindskog and McNeil [36]. The idea to have
different scenarios comes from [45], it originates from the desire to make negatively
correlated default causes possible, see Example 6.38 below.
Remark 6.6 (Classical CreditRisk+ model). The classical CreditRisk+ model is
contained in the above set-up by choosing G = 1, 2, . . . , m, that means
the only risk groups are the individual obligors. In this case Qc,i,j denotes the
univariate distribution of the stochastic loss given default of obligor i ∈ 1, . . . ,mdue to cause c ∈ 0, . . . , C in scenario j ∈ J . Note also that in the classical
CreditRisk+ model there is just one scenario, i.e. |J | = 1, one period, i.e. d = 1,
and risk causes and risk factors are identified, which corresponds to Aj being the
identity matrix. Furthermore, all loss distributions Qc,i,j are one-dimensional
96
and degenerate, which corresponds to deterministic one-period losses given default.
Therefore, the classical CreditRisk+ model doesn’t even contain the collective
model from actuarial mathematics.
Remark 6.7 (Directly dependent defaults). Suppose obligor i ∈ 1, . . . ,m is
a large factory and obligors i1, . . . , il ∈ 1, . . . ,m are suppliers of i, being
economically heavily dependent on the factory. If the factory i defaults and is
subsequently closed, the suppliers i1, . . . , il have a high probability to default,
too. Therefore, i, i1, . . . , il is certainly a meaningful risk group. Of course,
G should also contain i, because i could default and subsequently be taken
over by a competitor running its production in the factory. Also i1, . . . , il ∈G makes sense, because every supplier can individually default due to poor
management and subsequently be replaced by a competing supplier. Note that
different distributions Qc,g,j of the (Nd0)g-valued loss vectors given default due
to cause c ∈ 0, . . . , C in scenario j ∈ J can be specified for the big risk group
g = i, i1, . . . , il and for the individual obligors represented by g = i and
g = i1, . . . , il.
Remark 6.8 (Hindering defaults, competition groups). Suppose that the obligors
i1, . . . , il ∈ 1, . . . ,m are direct competitors in the market (e.g. airline compa-
nies), and a default of one of them may hinder a default of the others during the
d periods, because they can take over the market share of the defaulting obligor
and are then economically better off, they may even raise prices. To include this
effect in the model, define a risk group g = i1, . . . , il with a default probability
pg and choose the multivariate loss distribution Qc,g,j = (qc,g,j,µ)µ∈(Nd0)g in such a
way that qc,g,j,µ = 0 for every integer vector µ = (µi1 , . . . , µil) where two or more
of the components µi1 , . . . , µil ∈ Nd0 representing the losses during the d periods
are different from 0 ∈ Nd0. This means in case of a default of risk group g due
to cause c ∈ 0, . . . , C in scenario j ∈ J , that only one of the obligors in the
group g causes a loss, and the distribution of this loss can of course depend on
the obligor, on the cause c and on the scenario j.
Remark 6.9 (Examples of default causes). Default causes make it possible to
build-in joint variations of default intensities for risk groups (and individual
obligors); these variations jointly improve or degrade the credit quality of these
groups/obligors. Default causes can be industry sectors, individual countries,
currency regions (e.g. Euro zone), geographic regions (e.g. North Africa, Latin
America), religious regions (e.g. Islamic countries), economic regions (e.g. south-
ern Europe, petroleum exporting countries (OPEC)), or represent exposure to
macroeconomic indices like exchange rates, interest rates, business cycle, unem-
ployment rates, real estate prices, interest rate changes and divorce rates (for
modelling the risk of mortages, cf. [12, 13]), and so on. Note that these default
causes don’t need to be stochastically independent, this is handled separately by
97
the dependence scenarios and the matrices Aj with j ∈ J .
Remark 6.10 (Hierarchically ordered default causes). For a worldwide diversified
credit risk portfolio, it is a good idea to start with default cause intensities ordered
in a hierarchical way:
(a) Worldwide, continental or multi-national causes, like the state of the econ-
omy in developed countries, international political or military conflicts,
energy prices, crises due to excessive national debt in the European Union,
turmoil in arabic countries, . . .
(b) Default causes for every country, modeling an economic crises, the burst
of a real-estate bubble, political turmoil, civil war, transfer risk, convert-
ibility of the local currency, international sanctions, natural or man-made
disasters, . . .
(c) Local, industry sector specific causes within every country, like agriculture,
mining, manufacturing, transport, financial and insurance industry, etc.,
where the granularity depends on the individual needs.
6.2.2 Stochastic Rounding
While losses are certainly multiples of one cent, the computation time required
for this precision normally forces us to use basic loss units E1, . . . , Ed of a larger
size like 100 000 Euro. Then, however, losses are in general not integer multiples
of this quantity and some rounding is required. Deterministic rounding with
the aforementioned basic loss unit would round, for example, every loss below
50 000 Euro to zero, which is certainly not acceptable since it ignores the risk.
The idea of stochastic rounding is to keep at least the expected loss constant.
Hence, for example, a loss of 150 000 Euro happening with probability p should
be turned into two losses of sizes 100 000 and 200 000 Euros, respectively, each
one happening with probability p/2. This idea, generalized to higher dimensions
and mixed moments, is the content of the next lemma.
Lemma 6.11 (Stochastic rounding). Let X = (X1, . . . , Xd) be an Rd-valued
random vector. Define
pn = E[ d∏i=1
(1− |Xi − ni|)+
], n = (n1, . . . , nd) ∈ Zd, (6.3)
where x+ := maxx, 0 for all x ∈ R. Then the following holds:
(a) (pn)n∈Zd is a probability mass function.
(b) If all components of X are almost surely non-negative, then (pn)n∈Nd0 is a
losses are deterministic, then the losses are independent and the distribution of
the (Nd0)g-valued group loss vector and, therefore, the distribution Qsc,g,j from
(6.8) and (6.25) are uniquely determined. If at least two individual loss vectors
are non-deterministic, then their joint distribution on (Nd0)g is not uniquely
determined and can only be computed under additional assumptions. We treat
the case of independent loss vectors in Example 6.20. For d = 1, we treat the
case of comonotonic losses in Example 6.21, and the mixture of independent and
comonotonic losses in Example 6.22. In applications, it remains to decide whether
the marginal distributions of the group loss vector should equal the distributions of
the loss vectors of the individual obligors and whether the additional assumption
is a good approximation of economic reality.
Example 6.20 (Independent losses within a risk group). Given a risk group g ∈ Gwith at least two obligors, a scenario j ∈ J and a default cause c ∈ 0, . . . , C,we can consider independent Nd0-valued loss vectors (Lc,g,i,j,n)i∈g of the obligors
in g given default of the group, with Lc,g,i,j,n ∼ Qc,g,i,j = (qc,g,i,j,ν)ν∈Nd0 for every
i ∈ g and n ∈ N. In this case Qc,g,j = (qc,g,j,µ)µ∈(Nd0)g is given by
qc,g,j,µ = P[Lc,g,i,j,1 = µi for all i ∈ g] =∏i∈g
P[Lc,g,i,j,1 = µi]︸ ︷︷ ︸= qc,g,i,j,µi
(6.28)
for every µ = (µi)i∈g ∈ (Nd0)g. The distribution Qsc,g,j = (qs
c,g,j,ν)ν∈Nd0from (6.25)
for the group loss is then the convolution of the Qc,g,i,j with i ∈ g, explicitly
qsc,g,j,ν =
∑µ=(µi)i∈g∈(Nd0)g∑
i∈g µi=ν
∏i∈g
qc,g,i,j,µi , ν ∈ Nd0. (6.29)
Example 6.21 (Comonotonic one-period losses within a risk group). Given a risk
group g ∈ G with at least two obligors, a scenario j ∈ J and a default cause
c ∈ 0, . . . , C, we can consider comonotonic N0-valued losses (Lc,g,i,j,n)i∈g of the
obligors in g given default of the group, with Lc,g,i,j,n ∼ Qc,g,i,j = (qc,g,i,j,ν)ν∈N0
for every i ∈ g and n ∈ N. Let
Fc,g,i,j(µi) =
µi∑ν=0
qc,g,i,j,ν , µi ∈ N0,
denote the discrete distribution function of Qc,g,i,j for i ∈ g. In this case the dis-
tribution Qcc,g,j = (qc
c,g,j,µ)µ∈Ng0 , where the superscript reminds of comonotonicity,
Remark 6.23 (Obligors with a credit guarantee24). Suppose a bank, a regional
authority or a country, let’s call it obligor a ∈ 1, . . . ,m, gives a credit guarantee
to all obligors of a group g ⊂ 1, . . . ,m \ a and possibly also issues a bond on
its own. A default of institution a can cause a substantial loss, because all its
credit guarantees become worthless and defaults of obligors in g cause greater
losses. To model this concentration of risk, there are several options:
(a) A rough solution is to take, for every obligor i ∈ g, every risk group h ∈ Gito which i belongs, every default cause c ∈ 0, . . . , C and every scenario
j ∈ J , as loss distribution Qc,h,j a mixture of two distributions, the first
corresponding to the loss given the guarantee for i is in place, and the second
corresponding to the loss given the guarantor a defaulted before or together
with i. The weights for these mixtures have to be chosen appropriately.
Note that this modelling approach can be set up such that the expected
loss is the right one and the computational effort is minor. However, it
can be a (rough) approximation of the loss distribution, because it can
ignore a substantial part of the concentration risk arising from a default of
guarantor a while taking the larger losses of the obligors in g into account
without guarantor a actually defaulting.
(b) We can consider a risk group g(a) = a ∪ g consisting of the guarantor a
and all guarantees, because they may all default together. In the simplest
case, the default intensity λg(a) and the susceptibilities of the risk group
g(a) are those of obligor a, who does not appear as a risk group of its own.
Of course, a multivariate distribution Qc,g(a),j on (Nd0)g(a) describing the
stochastic loss of all the obligors in g(a) for scenario j ∈ J and default
cause c ∈ 0, . . . , C is needed. The following practical problems come to
mind:
• If g is large, think of |g| ≥ 100, then Qc,g(a),j and the corresponding
sum Qsc,g(a),j from (6.8) are computationally hard to calculate. A
solution might be to make additional assumptions and apply the
extended CreditRisk+ methodology to calculate an approximation of
Qsc,g(a),j .
• It’s not apparent how to choose the susceptibilities for the risk group
g(a). The default causes for the guarantor a might be disjoint from
the default causes of the obligors in g, for example.
Assumption 6.24 (Distribution of idiosyncratic default numbers). For each
group g ∈ G, the number N0,g of idiosyncratic defaults is, conditioned on J ,
24This remark is work in progress.
109
Poisson distributed according to the Poisson intensity λg, the susceptibility w0,g,J
and the matrix entry aj0,0, i.e.,
L(N0,g|J) = Poisson(λgw0,g,J a
J0,0
)for every g ∈ G. (6.32)
Assumption 6.25 (Conditional independence of idiosyncratic default numbers).
Conditioned on J , the group default numbers (N0,g)g∈G due to idiosyncratic
defaults are independent from one another and everything else,25 in particular
P[N0,g = n0,g for all g ∈ G |J ] =∏g∈G
P[N0,g = n0,g |J ]
=∏g∈G
e−λgw0,g,JaJ0,0
(λgw0,g,J aJ0,0)n0,g
n0,g!
(6.33)
for all n0,g ∈ N0, where we used (6.32) for the second equality.
Assumption 6.26 (Structure of default cause intensities). The default cause
intensities Λ1, . . . ,ΛC are expressed in terms of the random matrix AJ =∑j∈J Aj1J=j of size (C + 1) × (K + 1) and the non-negative risk factors
R1, . . . , RK by
Λc = aJc,0 +K∑k=1
aJc,kRk, c ∈ 1, . . . , C. (6.34)
Remark 6.27 (Lower bound for default cause intensity). The scenario-dependent
but otherwise constant term aJc,0 ≥ 0 in (6.34) is added so that a strictly positive
lower bound for the default cause intensity Λc can be put into the model despite
mathematically convenient distributions (like gamma distributions) for the risk
factors R1, . . . , RK .
Remark 6.28. For notational convenience, we will sometimes use a constant ‘risk
factor’ R0 ≡ 1 and a scenario-dependent default cause intensity Λ0 = aJ0,0 for
idiosyncratic risk, see (6.1), to write (6.34) in a more compact form or in matrix
notation as
Λ = AJR (6.35)
with column random vectors Λ = (Λ0, . . . ,ΛC)> and R = (R0, . . . , RK)>.
Assumption 6.29 (Conditional distribution of non-idiosyncratic default num-
bers). For every default cause c ∈ 1, . . . , C and every group g ∈ G, the
non-idiosyncratic default number Nc,g is, conditioned on J,R1, . . . , RK , Poisson
25This means the random loss vectors in Assumption 6.16, the non-idiosyncratic default
numbers (Nc,g)c∈1,...,C,g∈G in Assumption 6.30 and the risk factors R1, . . . , RK in Assumption
6.31 below.
110
distributed with parameter given as product of the group default intensity λg, the
susceptibility wc,g,J , and the default cause intensity Λc, this means
P[Nc,g = n |J,R1, . . . , RK ]a.s.= P[Nc,g = n |J,Λc]a.s.= e−λgwc,g,JΛc (λgwc,g,JΛc)
n
n!
(6.36)
for all n ∈ N0, i.e.,
L(Nc,g|J,R1, . . . , RK)a.s.= L(Nc,g|J,Λc)
a.s.= Poisson(λgwc,g,JΛc) . (6.37)
Assumption 6.30 (Conditional independence of non-idiosyncratic default num-
bers). Conditionally on J,R1, . . . , RK , the familyNc,g
∣∣ c ∈ 1, . . . , C, g ∈ Gof default numbers is independent, hence
P[Nc,g = nc,g for c ∈ 1, . . . , C and g ∈ G |J,R1, . . . , RK ]
a.s.=
C∏c=1
∏g∈G
P[Nc,g = nc,g |J,R1, . . . , RK ]
a.s.=
C∏c=1
∏g∈G
e−λgwc,g,JΛc (λgwc,g,JΛc)nc,g
nc,g!by (6.36)
(6.38)
for all nc,g ∈ N0.
Assumption 6.31 (Independence of risk factors and scenario). The non-negative
risk factors R1, . . . , RK and the scenario variable J are stochastically independent
random variables.
The independence of J and the risk factors R1, . . . , RK is used for the al-
gorithm in (6.89) below. It is also useful for calculating the moments and the
covariances of the default cause intensities, as the following remark shows.
Remark 6.32 (Expectation, variance and covariance of default cause intensities).
If R1, . . . , RK ∈ L1(P) and Assumptions 6.26 and 6.31 hold, then
E[Λc |J ] = aJc,0 +K∑k=1
aJc,k E[Rk] (6.39)
hence
E[Λc] = E[aJc,0]
+
K∑k=1
E[aJc,k]E[Rk] (6.40)
111
for every c ∈ 1, . . . , C. If, in addition, R1, . . . , RK ∈ L2(P), then, for all
c, d ∈ 1, . . . , C,
Cov(Λc,Λd |J) =K∑
k,l=1
aJc,kaJd,l Cov(Rk, Rl)︸ ︷︷ ︸
= δk,l Var(Rk)
=K∑k=1
aJc,kaJd,k Var(Rk) , (6.41)
hence, by (3.59) from Lemma 3.48, it follows from (6.39) and (6.41) that
Cov(Λc,Λd) = E[Cov(Λc,Λd |J)
]+ Cov
(E[Λc |J ] ,E[Λd |J ]
)=
K∑k=1
E[aJc,ka
Jd,k
]Var(Rk) +
K∑k,l=0
Cov(aJc,k, a
Jd,l
)ek el
(6.42)
with e0 := 1 and ek := E[Rk] for k ∈ 1, . . . ,K.Remark 6.33 (Pseudo risk factors). Due to the independence of the risk factors
R1, . . . , RK , see Assumption 6.31, it is not always possible to give them an
economic interpretation. On the other hand, the distribution of the group
losses, see Assumption 6.16, may vary with the default causes and might be
determined by the legal contract. Therefore, it can be difficult to set up a
dependence structure between the default cause intensities Λ1, . . . ,ΛC as in (6.34)
by economic considerations. A solution is the introduction of a random vector
P = (P0, . . . , PK′)> of pseudo risk factors with an economic interpretation. Then
a random matrix A′J =∑
j∈J A′j1J=j of size (C+1)×(K ′+1) with non-negative
entries can be set up by economic considerations such that Λ = A′JP , where as
before Λ = (Λ0, . . . ,ΛC)>. The dependence of P0, . . . , PK′ can be specified by a
random matrix AJ =∑
j∈J Aj1J=j of size (K ′+1)× (K+1) with non-negative
entries such that P = AJR, where R = (R0, . . . , RK)> is the column vector of
the independent risk factors. Then (6.35) is satisfied for the matrix product
AJ = A′J AJ =∑j∈J
A′jAj1J=j. (6.43)
Of course one has to make sure that the entries of the matrices Aj := A′jAj for
j ∈ J satisfy (6.1); this is certainly the case if the corresponding entries of A′jand Aj satisfy (6.1).
RK are gamma distributed random variables with expectation ek := E[Rk] > 0
and variance σ2k := Var(Rk) > 0, i.e., with shape parameter αk = e2
k/σ2k and
inverse scale parameter βk = ek/σ2k for all k ∈ 1, . . . ,K by (4.39) and (4.41).
Assumption 6.35 (Normalization of default causes). We assume that
E[w0,g,J a
J0,0 +
C∑c=1
wc,g,J Λc
]= 1 (6.44)
for every group g ∈ G.
112
Remark 6.36. Similar to Assumption 6.4, the preceeding Assumption 6.35 is useful
for the interpretation of the default probability pg and the default intensity λgfor every risk group g ∈ G, but the assumption is not necessary for the algorithm
itself.
Remark 6.37 (Sufficient conditions for Assumption 6.35). If E[Rk] = 1 for every
risk k ∈ 1, . . . ,K and E[AJ ] is a stochastic matrix, then E[Λc] = 1 by (6.40) for
every default cause c ∈ 1, . . . , C. If the weights are deterministic, meaning that
they do not depend on the scenario, then due to (6.1), which implies E[aJ0,0] = 1
for the stochastic matrix E[AJ ], and due to Assumption 6.4, the condition (6.44)
is satisfied for every group g ∈ G.
6.4 Covariance Structure of Default Cause Intensities
The following example, which is based on [43, Ex. 3.14], shows that due to the
scenarios we can have negatively correlated default cause intensities and the
correlation can be any value in [−1, 0).
Example 6.38 (Negative correlation of default cause intensities). Let J attain the
values in J = 0, 1 with strictly positive probability. Let R1 and R2 be two
independent and gamma distributed random variables, independent of J , with
E[R1] = E[R2] = 1. Then Assumptions 6.31 and 6.34 are satisfied. Define
AJ =
1 0 0
0 JE[J ] 0
0 0 1−J1−E[J ]
.
Then Λ1 = JR1/E[J ] and Λ2 = (1− J)R2/E[1− J ] by (6.34). Since E[AJ ] = I3
is a stochastic matrix, E[Λ1] = E[Λ2] = 1. If the weights do not depend on the
scenario j ∈ 0, 1 and satisfy Assumption 6.4, then Assumption 6.35 is satisfied,
cf. Remark 6.37. Since the product Λ1Λ2 contains the factor J(1− J) = 0, we
get Λ1Λ2 = 0 and
Cov(Λ1,Λ2) = −E[Λ1]E[Λ1] = −1.
By direct computation using E[R2k] = Var(Rk) + 1 for k ∈ 1, 2 or by (6.42),
Var(Λ1) =Var(R1) + 1
E[J ]− 1 and Var(Λ2) =
Var(R2) + 1
1− E[J ]− 1.
The correlation is therefore given by
Corr(Λ1,Λ2) =Cov(Λ1,Λ2)√
Var(Λ1) Var(Λ2)= −
√E[J ]E[1− J ]√
Var(R1) + 1− E[J ]√
Var(R2) + E[J ],
which attains every value in [−1, 0) if suitable values for Var(R1) and Var(R2) in
[0,∞) are chosen. For the symmetric case E[J ] = 1/2 and Var(R1) = Var(R2),
this simplifies to
Corr(Λ1,Λ2) = − 1
1 + 2 Var(R1).
113
Example 6.38 raises the question, whether every covariance structure of the
default cause intensities is possible. We first characterize covariance matrices and
collect some of their properties.
Definition 6.39. A quadratic matrix Σ of size d with real entries is called
positive semi-definite, if Σ is symmetric and v>Σv ≥ 0 for all v ∈ Rd.
Remark 6.40. If a symmetric matrix Σ with real entries is not positive semi-
definite, the R-command nearPD can be used to calculate a corresponding
approximation.
Lemma 6.41. (a) Let X be a square-integrable Rd-valued random vector.
Then its covariance matrix Cov(X,X) is positive semi-definite.
(b) Let Σ be a positive semi-definite d× d matrix with real entries. Then there
exists a square-integrable Rd-valued random vector with Cov(X,X) = Σ.
(c) Let X = (X1, . . . , Xd)> be a square-integrable [0,∞)d-valued random vector.
Then Cov(Xi, Xj) ≥ −E[Xi]E[Xj ] for all i, j ∈ 1, . . . , d with i 6= j.
Let Σ = (Σi,j)i,j∈1,...,d be a positive semi-definite matrix with real entries.
(d) For all i, j ∈ 1, . . . , d,
Σi,i ≥ 0 and |Σi,j | ≤√
Σi,iΣj,j .
(e) Let A be a matrix of size d × k with real entries. Then Σ′ := A>ΣA is
positive semi-definite.
(f) Assume that Σ satisfies Σ = AΣ′A> with a matrix A of size d× k and a
quadratic matrix Σ′ of size k, both with real entries. If A>A is invertible,
then Σ′ is positive semi-definite.
Remark 6.42. To see that the invertibility of A>A in Lemma 6.41(f) is necessary,
let all entries of A and Σ be zero. Then Σ = AΣ′A> gives no information about
the entries of Σ′, in particular Σ′ = −Ik is possible.
Proof of Lemma 6.41. (a) Note that Cov(X,X) is symmetric and of size d with
real entries. Consider X and v ∈ Rd as column vectors. Then
v>Cov(X,X) v = v> E[(X − E[X])(X − E[X])>
]v
= E[v>(X − E[X]) (X − E[X])>v︸ ︷︷ ︸
= v>(X−E[X])
]≥ 0.
(b) Let Σ = LL> be the Cholesky decomposition of Σ, where L is a lower
triangular matrix of size d with real entries. Let Y = (Y1, . . . , Yd)> be any square-
integrable random vector with independent components satisfying Var(Yi) = 1 for
(d) Let X = (X1, . . . , Xd) be a random vector according to (b). Then
Σi,i = Var(Xi) ≥ 0 and, by the Cauchy–Schwarz inequality,
|Σi,j | =∣∣Cov(Xi, Xj)
∣∣ =∣∣E[(Xi − E[Xi])(Xj − E[Xj ])
]∣∣≤√
E[(Xi − E[Xi])2]√E[(Xj − E[Xj ])2] =
√Σi,iΣj,j .
(e) Since Σ> = Σ, the matrix Σ′ is symmetric, too. Furthermore, v>Σ′v =
(Av)>Σ(Av) ≥ 0 for every v ∈ Rk. Hence Σ′ is positive semi-definite.
(f) Note that AΣ′A> = Σ implies A>AΣ′A>A = A>ΣA. Since A>A is
invertible with symmetric inverse, this implies Σ′ = B>ΣB with B := A(A>A)−1.
Hence (f) follows from part (e).
Remark 6.43. While the Cholesky decomposition used in the proof of Lemma
6.41(b) always gives a lower triangular matrix L with non-negative diagonal
entries, the example (1 −1
−1 2
)=
(1 0
−1 1
)(1 −1
0 1
)
shows that L can have negative off-diagonal entries. Hence, if Y has independent
gamma distributed components, the X = LY as is the proof of Lemma 6.41(b)
cannot always be used to model default cause intensities, because the components
of X might attain negative values. Therefore, we need a more sophisticated
approach.
Theorem 6.44. 26 Let Σ = (Σi,j)i,j∈1,...,d be a positive semi-definite matrix.
Then there exist an integer k ∈ 1, . . . , d and independent random variables
J2, . . . , Jd, X1,1, . . . , X1,k, where J2, . . . , Jd take values in 0, 1 and X1,1, . . . ,
X1,k are non-negative and square-integrable, and random matrices AJ2 , . . . , AJdwith non-negative entries, where AJi is σ(Ji)-measurable for every i ∈ 2, . . . , d,such that their sizes are non-decreasing and compatible such that the product
Xd := AJd . . . AJ2X1 with X1 := (X1,1, . . . , X1,k)> is well defined and satisfies
Cov(Xd, Xd) = Σ. In addition, E[AJ2 ] , . . . ,E[AJd ] are sub-stochastic matrices
(meaning that the entries in every row sum to at most 1).
26This theorem and its proof are work in progress, skip it or read with caution.
115
Remark 6.45 (Non-uniqueness of the representation). Without further conditions,
the representation in Theorem 6.44 is not unique. Already for Σ = Id, where Iddenotes the identity matrix of size d ≥ 2, there exist several solutions: Take k = d
and deterministic AJl = Pil,jl with il, jl ∈ 1, . . . , d for l ∈ 2, . . . , d, where Pi,jdenotes the matrix permuting rows i and j, with Pi,j = Id if i = j.
Proof of Theorem 6.44. We give a constructive, inductive proof of Theorem 6.44,
where in each induction step several cases have to be considered.
Case 1: If d = 1, then take k = 1 and any non-negative random variable X1,1
with Var(X1,1) = Σ.
Case 2: If d ≥ 2 and Σ is a diagonal matrix with all diagonal elements
different from zero, take k = d and independent and non-negative X1,1, . . . , Xd,d
with Var(Xi,i) = Σi,i for all i ∈ 1, . . . , d. Furthermore, take degenerate random
. Define Jd = J and k = k′+ 1 as well as the random
matrix AJd = PAAJ , and take any independent, non-negative random variable
X1,k with E[X1,k] = ed and Var(X1,k) = c. Furthermore, define by AJ2 , . . . , AJd−1
by (6.45).
6.5 Expectations, Variances and Covariances for Defaults
To illustrate the above assumptions, we calculate the expectations, variances and
covariances of various default numbers and losses. The first three subsections
apply Subsection 3.6.1 to the current model. Note that the results of Subsections
6.5.1, 6.5.2 and 6.5.3 are actually special cases of the results of Subsection 6.5.4,
cf. Remark 6.51.
6.5.1 Expectation of Default Numbers
Let us start with the number of defaults
Ni =∑g∈Gi
Ng =∑g∈Gi
C∑c=0
Nc,g (6.46)
of obligor i ∈ 1, . . . ,m. First note that by Assumptions 6.24, 6.25, 6.29, 6.30
and the Poisson summation property (3.5) we have
L(Ni|J,R1, . . . , RK)a.s.= L
( ∑g∈Gi
(N0,g +
C∑c=1
Nc,g
)∣∣∣∣J,R1, . . . , RK
)a.s.= Poisson(Λi) .
(6.47)
where
Λi :=∑g∈Gi
λg
(w0,g,J a
J0,0 +
C∑c=1
wc,g,J Λc
)(6.48)
is the conditional default intensity of obligor i, hence
E[Ni |J,R1, . . . , RK ] = Λi (6.49)
by (3.3). By inserting a conditional expectation given J,R1, . . . , RK , using (6.49)
and the normalization given in Assumption 6.35,
E[Ni] = E[Λi] =∑g∈Gi
λg E[w0,g,J a
J0,0 +
C∑c=1
wc,g,J Λc
]=∑g∈Gi
λg. (6.50)
Therefore, the expected number of defaults of obligor i is the sum of the default
intensities of the risk groups, to which i belongs.
120
Remark 6.46. Note that (6.50) gives the expected number of defaults of obligor i ∈1, . . . ,m, but not every default has to lead to a credit loss, due to a sufficiently
high collateral or deductable (in case of credit insurance). A corresponding
remark applies to the results of Subsections 6.5.2 and 6.5.3 below.
Example 6.47. Consider a credit risk model with m = 2 obligors and the three
risk groups 1, 2 and 1, 2. Assume that the one-year default intensities
λi = E[Ni] > 0 for obligors i ∈ 1, 2 are known. To calibrate the model, we
can take any λg ∈ [0,minλ1, λ2] for g = 1, 2 and define for the remaining
one-obligor risk groups λi = λi − λg, where i ∈ 1, 2. Then (6.50) is satisfied,
which shows that default intensities of risk groups with several obligors can in
general not be derived from individual default intensities.
Remark 6.48. Suppose that in a credit risk model with m ≥ 2 obligors, the
individual default intensities λi = E[Ni] of all obligors i ∈ 1, . . . ,m and the
default intensities λg of all groups g ∈ G with at least two obligors were derived
by statistical estimates and expert opinions. Assuming that all one-obligor risk
groups i with i ∈ 1, . . . ,m belong to G, we can then define
λi = λi −∑g∈Gig 6=i
λg, i ∈ 1, . . . ,m,
provided that this results in λi ≥ 0 for every i ∈ 1, . . . ,m. Otherwise the
statistical estimates and expert opinions are inconsistent.
6.5.2 Variance of Default Numbers
To calculate the variance of the number Ni of defaults of obligor i ∈ 1, . . . ,m,first note that Var(Ni |J,R1, . . . , RK)
a.s.= Λi by (6.47), (3.3) and (3.4). Using
(3.60) from Lemma 3.48 and (6.49), we obtain
Var(Ni) = E[Var(Ni |J,R1, . . . , RK)︸ ︷︷ ︸
a.s.= Λi
]+ Var
(E[Ni |J,R1, . . . , RK ]︸ ︷︷ ︸
a.s.= Λi
), (6.51)
which corresponds to (3.61). Using (6.50) and again (3.60) from Lemma 3.48,
equation (6.51) turns into
Var(Ni) = E[Ni] + E[Var(Λi |J)
]+ Var
(E[Λi |J ]
). (6.52)
Note that Var(Ni) ≥ E[Ni], because variances are non-negative. Using Assump-
tion 6.26 about the structure of the default cause intensities, it follows from (6.48)
that
Λi =∑g∈Gi
λg
( C∑c=0
wc,g,J aJc,0 +
K∑k=1
Rk
C∑c=1
wc,g,J aJc,k
). (6.53)
121
Using Assumption 6.31 about the independence of J,R1, . . . , RK ,
E[Λi |J ] =∑g∈Gi
λg
( C∑c=0
wc,g,J aJc,0 +
K∑k=1
E[Rk]
C∑c=1
wc,g,J aJc,k
)(6.54)
and
Var(Λi |J) =K∑k=1
Var(Rk)
(∑g∈Gi
λg
C∑c=1
wc,g,J aJc,k
)2
, (6.55)
where E[Rk] and Var(Rk) are specified by Assumption 6.34.
If there is just one scenario, then J and therefore E[Λi |J ] are constant, hence
the last term Var(E[Λi |J ]) in (6.52) is zero and Var(Λi |J) from (6.55) coincides
with the term E[Var(Λi |J)] in (6.52).
For the general case, note that Var(E[Λi |J ]) = E[(E[Λi |J ]
)2]− (E[Λi])2 with
E[Λi] given by (6.50) and
E[(E[Λi |J ]
)2]=∑j∈J
(∑g∈Gi
λg
( C∑c=0
wc,g,j ajc,0 +
K∑k=1
E[Rk]
C∑c=1
wc,g,j ajc,k
))2
P[J = j] . (6.56)
Taking the expectation of (6.55) shows that
E[Var(Λi |J)
]=
K∑k=1
Var(Rk)∑j∈J
(∑g∈Gi
λg
C∑c=1
wc,g,j ajc,k
)2
P[J = j] , (6.57)
6.5.3 Covariances of Default Numbers
For obligors i, i′ ∈ 1, . . . ,m with i 6= i′ we can calculate the covariance of Ni
and Ni′ . By (3.59) from Lemma 3.48,
Cov(Ni, Ni′) = Cov(E[Ni |J ] ,E[Ni′ |J ]
)+ E
[Cov(Ni, Ni′ |J)
](6.58)
Using (6.46), the linearity of conditional covariance in both arguments, Assump-
where we used Assumption 6.29, (3.3) and (3.4) to calculate the conditional
expectations and the conditional variance. The conditional covariance of Nc,g and
Nd,h given J,R1, . . . , RK vanishes if (g, k) 6= (h, l) due to conditional independence
formulated in Assumption 6.30. Therefore,
Cov(Ni, Ni′ |J) =∑
g∈Gi∩Gi′
λg
(w0,g,J a
J0,0 +
C∑c=1
wc,g,J E[Λc |J ]︸ ︷︷ ︸E[ · ] = 1 by (6.44)
)
+∑g∈Gi
λg∑h∈Gi′
λh
C∑c,d=1
wc,g,Jwd,h,J Cov(Λc,Λd |J) ,
(6.59)
where the remaining covariance is given by (6.41). Substituting (6.41) into (6.59),
and the result into (6.58) yields
Cov(Ni, Ni′) = Cov(E[Ni |J ] ,E[Ni′ |J ]) +∑
g∈Gi∩Gi′
λg
+∑g∈Gi
λg∑h∈Gi′
λh
K∑k=1
Var(Rk)
C∑c,d=1
E[wc,g,Jwd,h,J a
Jc,ka
Jd,k
],
(6.60)
and it follows from (6.49) and (6.54) that
E[Ni |J ] = E[Λi |J ] =∑g∈Gi
λg
( C∑c=0
wc,g,J aJc,0 +
K∑k=1
E[Rk]
C∑c=1
wc,g,J aJc,k
)(6.61)
and similarly for E[Ni′ |J ].
If there is just one scenario, then E[Ni |J ] and E[Ni′ |J ] are deterministic
and the covariance in (6.60) vanishes. Furthermore, there is no need to take the
expectation on the right hand side of (6.60) and (omitting the J) we obtain
Cov(Ni, Ni′) =∑
g∈Gi∩Gi′
λg
+K∑k=1
Var(Rk)
(∑g∈Gi
λg
C∑c=1
wc,gac,k
)( ∑h∈Gi′
λh
C∑d=1
wd,had,k
).
(6.62)
Remark 6.49. In the classical CreditRisk+ model (cf. Remark 6.6) with only
one-element risk groups, the expectation in (6.50), the variance from Subsection
6.5.2, and the covariance given in (6.62) simplify to E[Ni] = λi,
Var(Ni) = λi + λ2i
K∑k=1
w2k,i Var(Rk) (6.63)
123
and
Cov(Ni, Ni′) = λiλj
K∑k=1
wk,iwk,i′ Var(Rk) (6.64)
for all i, i′ ∈ 1, . . . ,m with i 6= i′, where we used the abbreviations λi := λiand wk,i := wk,i and corresponding ones for the index i′. Note that in the
extended version, as shown by (6.60), contributions to the covariance can also
come from the risk groups in Gi ∩Gi′ and from the scenarios
6.5.4 Default Losses
27 In this subsection, we assume that every Nd0-valued stochastic loss vector Lc,g,i,j,1attributed to obligor i ∈ g of risk group g ∈ G in scenario j ∈ J due to default
cause c ∈ 0, . . . , C, as introduced in Subsection 6.2.5, satisfies E[‖Lc,g,i,j,1‖] <∞and, when calculating variances and covariances, E[‖Lc,g,i,j,1‖2] <∞.
Let us start with the calculation of the conditional expected loss vector
attributed to obligor i ∈ 1, . . . ,m given the scenario J and the risk factors
R1, . . . , RK .
Li =
C∑c=0
∑g∈Gi
Lc,g,i,J
By (6.21) and (6.23),
E[Li |J,R1, . . . , RK ]a.s.=∑g∈Gi
(E[L0,g,i,J |J ] +
C∑c=1
E[Lc,g,i,J |J,R1, . . . , RK ]
),
(6.65)
where we used Assumptions 6.16, 6.25, and (6.36) to simplify the conditional
expectations. By Assumptions 6.16 and 6.24, the loss L0,g,i,J defined in (6.20)
has a compound Poisson distribution and (4.89) implies that
By Assumptions 6.16 and 6.29, the loss Lg,i,k due to risk factor k ∈ 1, . . . ,Khas a conditional compound Poisson distribution given Λk, hence by (4.87)
E[Lg,i,k |Λk ]a.s.= λgwg,kΛk E[Lg,i,k,1] . (6.67)
Substitution of (6.66) and (6.67) into (6.65) yields
E[Li |Λ1, . . . ,ΛK ]a.s.=∑g∈Gi
λg
(wg,0 E[Lg,i,0,1] +
K∑k=1
wg,kΛk E[Lg,i,k,1]
). (6.68)
27This section has to be adapted to the new notation and the generalized setting.
124
Since E[Λk] = 1 by Assumption 6.34, we obtain
E[Li] =∑g∈Gi
λg
K∑k=0
wg,k E[Lg,i,k,1] . (6.69)
Using (6.14) and (6.25), we get for the expected credit loss in the entire portfolio
E[L] =
m∑i=1
E[Li] =∑g∈G
λg
K∑k=0
wg,k E[Lg,k,1]︸ ︷︷ ︸=∑ν∈N νq
sg,k,ν
. (6.70)
Due to (6.2), the sums over the risks k ∈ 0, . . . ,K in (6.69) and (6.70) are
actually convex combinations.
The next step is to calculate the conditional covariance of the losses due to
obligors i, j ∈ 1, . . . ,m given the risk factors Λ1, . . . ,ΛK . Considering i = j,
this calculation will give the conditional variance. We first rewrite Li and Ljusing (6.21) and (6.23). We then note that, conditioned on the risk factors
Λ1, . . . ,ΛK , the family of random vectors
(Lg,i′,k)i′∈g | g ∈ G, k ∈ 0, . . . ,K
is independent by Assumptions 6.16, 6.25, and 6.30, hence
Cov(Li, Lj |Λ1, . . . ,ΛK)
a.s.=∑g∈Gi
∑h∈Gj
K∑k,l=0
Cov(Lg,i,k, Lh,j,l |Λ1, . . . ,ΛK)
a.s.=
∑g∈Gi∩Gj
(Cov(Lg,i,0, Lg,j,0) +
K∑k=1
Cov(Lg,i,k, Lg,j,k |Λk)),
(6.71)
where we used Assumptions 6.16, 6.25 and (6.36) to simplify the conditional
covariances. By Assumptions 6.16 and 6.24, the loss vector (Lg,i,0, Lg,j,0) with
components defined in (6.20) has a compound Poisson distribution and (4.90)
Substitution of (6.72) and (6.73) into (6.71) yields
Cov(Li, Lj |Λ1, . . . ,ΛK)
a.s.=
∑g∈Gi∩Gj
λg
(wg,0 E[Lg,i,0,1Lg,j,0,1] +
K∑k=1
wg,kΛk E[Lg,i,k,1Lg,j,k,1]
).
(6.74)
125
To calculate the covariance of the credit losses due to obligors i, j ∈ 1, . . . ,m,we use (3.59), substitute (6.74) and (6.68), and use Assumption 6.34 to obtain
Remark 6.50. In the classical CreditRisk+ model (cf. Remarks 6.6 and 6.49) with
only one-element risk groups, the results (6.69), (6.76) and (6.75) simplify to
E[Li] = λi
K∑k=0
wi,k E[Li,k,1] , (6.77)
Var(Li) = λi
K∑k=0
wi,k E[L2i,k,1
]+ λ2
i
K∑k=1
σ2kw
2i,k
(E[Li,k,1]
)2(6.78)
and
Cov(Li, Lj) = λiλj
K∑k=1
σ2kwi,kwj,k E[Li,k,1]E[Lj,k,1] . (6.79)
for all i, j ∈ 1, . . . ,m with i 6= j, where we used the abbreviations λi := λiand wi,k := wi,k as well as Li,k,1 := Li,i,k,1 and corresponding ones for the
index j.
Remark 6.51. To see that the results of Subsections 6.5.1, 6.5.2 and 6.5.3 are
actually special cases of the results of Subsection 6.5.4, define Lg,i,k,n = 1 for
all risk groups g ∈ G, risks k ∈ 0, 1, . . . ,K, obligors i ∈ g, and defaults n ∈ N.
Then (6.46) and (6.20)–(6.23) imply Ni = Li for all obligors i ∈ 1, . . . ,m.Comparison shows that the expectation in (6.69) simplifies to (6.50), the variance
in (6.76) simplifies to (??), and the covariance in (6.75) simplifies to (6.60).
126
6.5.5 Default Numbers with Non-Zero Loss
28 The default numbers considered in Subsections 6.5.1, 6.5.2 and 6.5.3 include
defaults which lead to a loss of zero. This can actually happen in practice, for
example, when the collateral is sufficient to cover the outstanding amount. The
results of the previous subsection can be used to calculate the expectations,
variances and covariances of the default numbers with non-zero loss. This is
accomplished by using the Bernoulli random variables L′g,i,k,n := 1N(Lg,i,k,n)
instead Lg,i,k,n.
Define for every obligor i ∈ 1, . . . ,m the number L′i of defaults with non-zero
loss via (6.20), (6.21), and (6.23) using the just introduced L′g,i,k,n. The results
(6.69), (6.76) and (6.75) applied to L′i and L′j can easily be rewritten using
E[(L′g,i,k,1)2
]= E
[L′g,i,k,1
]= P[Lg,i,k,1 > 0]
and
E[L′g,i,k,1L
′g,j,k,1
]= P[Lg,i,k,1 > 0, Lg,j,k,1 > 0]
for all obligors i, j ∈ 1, . . . ,m, risks k ∈ 0, . . . ,K and groups g ∈ Gi and
g ∈ Gi ∩Gj , respectively.
6.6 Probability-Generating Function of the Biased Loss Vector
Fix a γ = (γ1, . . . , γK) ∈ [0,∞)K such that 0 < E[Rγ1
1 . . . RγKK
]< ∞. In this
section, using multi-index notation, we calculate the coefficients of the probability-
generating function of the portfolio loss vector L under the Rγ11 . . . R
γKK -biased
probability measure, given according to Definition 2.10, which we denote by Pγfor short. The corresponding expectation operator is denoted by Eγ . Hence we
want to calculate
ϕL,γ(s) :=∑ν∈Nd0
Pγ [L = ν] sν = Eγ [sL] =E[E[Rγ1
1 . . . RγKK sL |J ]
]E[Rγ1
1 . . . RγKK
] ,
s ∈ Cd, ‖s‖∞ ≤ 1, (6.80)
of the Nd0-valued total loss vector L given by (6.19). For γ = (0, . . . , 0), we will
obtain the usual probability-generating function ϕL of L. Let
L′ =
C∑c=1
∑g∈G
Lc,g (6.81)
denote the non-ideosycratic Nd0-valued portfolio loss vector. By Assumptions 6.16
and 6.25, the random vectors (L0,g)g∈G and the random vector (L′, R1, . . . , RK)
28This section has to be adapted to the new notation and the generalized setting.
127
are conditionally independent given J . Since
L = L′ +∑g∈G
L0,g ,
it therefore follows that
E[Rγ1
1 . . . RγKK sL
∣∣J] = E[Rγ1
1 . . . RγKK sL
′ ∣∣J] ∏g∈G
E[sL0,g
∣∣J]. (6.82)
By Assumptions 6.16, 6.24 and (4.58), it follows for the compound Poisson sum
L0,g,j , defined in (6.15), of idiosyncratic loss vectors of group g ∈ G in scenario
j ∈ J , that
E[sL0,g
∣∣J = j]
= exp(λgw0,g,j a
j0,0(ϕL0,g,j,1(s)− 1)
). (6.83)
Conditioning on J,R1, . . . , RK , the sector default numbers (Nc,g)c∈1,...,C,g∈Gare independent by Assumption 6.30, hence the random sums (Lc,g)c∈1,...,C,g∈Gin (6.81), given by (6.16), are also conditionally independent due to Assumption
6.16. Therefore, we obtain
E[Rγ1
1 . . . RγKK sL
′ ∣∣J,R1, . . . , RK]
a.s.= Rγ1
1 . . . RγKK
C∏c=1
∏g∈G
E[sLc,g
∣∣J,R1, . . . , RK]
(6.84)
Due to Assumptions 6.16 and 6.29, the result (4.81) and Assumption 6.26, it
follows that, for every default cause c ∈ 1, . . . , C and every group g ∈ G,
E[sLc,g
∣∣J = j, R1, . . . , RK]
a.s.= E
[sLc,g
∣∣J = j,Λc]
a.s.= exp
(λgwc,g,jΛc(ϕLc,g,j,1(s)− 1)
)= exp
(λgwc,g,j
(ajc,0 +
K∑k=1
ajc,kRk
)(ϕLc,g,j,1(s)− 1)
).
(6.85)
Substitution (6.83), (6.84) and (6.85) into (6.82) and rearrangement leads to
E[Rγ1
1 . . . RγKK sL |J = j, R1, . . . , RK
]a.s.= exp
(∑g∈G
λg
C∑c=0
wc,g,j ajc,0(ϕLc,g,j,1(s)− 1)
)
×K∏k=1
Rγkk exp
(Rk∑g∈G
λg
C∑c=1
wc,g,j ajc,k(ϕLc,g,j,1(s)− 1)
).
(6.86)
128
For every scenario j ∈ J and risk k ∈ 0, . . . ,K let
ϕj,k(s) =∑
ν∈Sj,k∪0
qj,k,ν sν =
λ−1j,k
∑ν∈Sj,k λj,k,ν s
ν if λj,k > 0,
1 if λj,k = 0,(6.87)
at least for all s ∈ Cd with ‖s‖∞ ≤ 1, denote the probability-generating function of
the distribution Qj,k = (qj,k,ν)ν∈Nd0 defined in (6.12) and (6.13), respectively, with
the set Sj,k defined in (6.10). Recall that, for all default causes c ∈ 0, . . . , C,groups g ∈ G and scenarios j ∈ J ,
ϕLc,g,j,1(s) =∑ν∈Nd0
sν P[Lc,g,j,1 = ν]︸ ︷︷ ︸= qs
c,g,j,ν by (6.25)
,
hence
ϕLc,g,j,1(s)− 1 =∑
ν∈Nd0\0
sνqsc,g,j,ν − (1− qs
c,g,j,0)
and rearrangement of the exponents on the right-hand side of (6.86) leads to
∑g∈G
λg
C∑c=0
wc,g,j ajc,k(ϕLc,g,j,1(s)− 1)
=∑ν∈Sj,k
sν∑g∈G
λg
C∑c=0
wc,g,j ajc,k q
sc,g,j,ν︸ ︷︷ ︸
=λj,k,ν by (6.9)
−∑g∈G
λg
C∑c=0
wc,g,j ajc,k(1− q
sc,g,j,0)︸ ︷︷ ︸
= λj,k by (6.11)
= λj,k(ϕj,k(s)− 1)
(6.88)
for every risk k ∈ 0, . . . ,K with the set Sj,k defined in (6.10). Substituting
(6.88) into of (6.86), using (6.1) in the case k ∈ 1, . . . ,K, taking the conditional
expectation given J , and using the independence of J,R1, . . . , RK , it follows that
E[Rγ1
1 . . . RγKK sL
∣∣J = j]
= exp(λj,0(ϕj,0(s)− 1)
)×
K∏k=1
E[Rγkk exp
(λj,k(ϕj,k(s)− 1)Rk
)∣∣J = j],
(6.89)
at least for all s ∈ Cd with ‖s‖∞ ≤ 1.
To proceed further, we need to make an assumption on the distribution of
the risk factors R1, . . . , RK .
129
6.6.1 Risk Factors with a Gamma Distribution
Since Rk ∼ Γ(αk, βk) for every k ∈ 1, . . . ,K by Assumption 6.34, and since Rkis independent of J , it follows from (4.43) that
E[Rγkk exp
(λJ,k(ϕJ,k(s)− 1)Rk
)∣∣J = j]
= E[Rγkk
](1− λj,k
ϕj,k(s)− 1
βk
)−(αk+γk)
. (6.90)
Substituting (6.90) into (6.89), we obtain
E[Rγ1
1 . . . RγKK sL |J = j
]= exp
(λj,0(ϕj,0(s)− 1)
)×
K∏k=1
E[Rγkk
](1− λj,k
ϕj,k(s)− 1
βk
)−(αk+γk)
. (6.91)
Transferring everything into a common exponential, we finally get for the
probability-generating function under the Rγ11 . . . R
γKK -biased probability mea-
sure, defined in (6.80),
ϕL,γ(s) =1
E[Rγ1
1 . . . RγKK
] ∑j∈J
E[Rγ1
1 . . . RγKK sL |J = j
]P[J = j]
=∑j∈J
exp
(λj,0(ϕj,0(s)− 1)
−K∑k=1
(αk + γk) log
(1− λj,k
ϕj,k(s)− 1
βk
))P[J = j] ,
(6.92)
at least for all s ∈ Cd with ‖s‖∞ ≤ 1.
6.7 Algorithm for Risk Factors with a Gamma Distribution
Formula (6.92) is the probability-generating function of the accumulated Nd0-
valued loss vector in the credit portfolio under the Rγ11 . . . RγKK -biased probability
measure. From the definition (4.1) we know that the coefficients of the power
series of (6.92) provide the desired distribution on Nd0. We are aiming for an
algorithm that works well for small (and even zero) variances of the risk factors,
so we will rewrite our main formulas in terms of the expectations ek = E[Rk] and
variances σ2k = Var(Rk) for all k ∈ 1, . . . ,K using the formulas
αk =e2k
σ2k
and βk =ekσ2k
, (6.93)
derived from 4.39 and (4.41).
130
Remark 6.52 (Historical remark). The computation of these coefficients, however,
can lead to numerical instabilities even in the one-period case with (γ1, . . . ,
γK) = 0, cf. [24]. Therefore, this section describes an algorithm, basically due
to G. Giese [24], for which Haaf, Reiß, Schoenmakers [27] proved the numerical
stability. Apparently these authors didn’t notice the relation to Panjer’s recursion,
see Theorem 5.8, which was pointed out in [21, Section 5.5]. The algebraic step
of putting everything into a common exponential to pass from (6.91) to (6.92)
reflects the fact that the negative binomial distribution is a compound Poisson
distribution, where the severity distribution is a logarithmic one, see Example
4.27. Since Panjer’s recursion is numerically stable for the Poisson as well as
the logarithmic distribution, see Examples 5.17 and 5.21, respectively, numerical
stability is guaranteed. The idea for the multi-period extension relies on the
multivariate extension of Panjer’s algorithm given by Sundt [47].
6.7.1 Expansion of the Logarithm by Panjer’s Recursion
To calculate the coefficients of the power series of (6.92), we first treat the
logarithmic term. For this purpose, fix a scenario j ∈ J and a risk factor
k ∈ 1, . . . ,K. Define
pj,k =λj,k
βk + λj,k=
λj,kσ2k
ek + λj,kσ2k
∈ [0, 1) (6.94)
with inverse scale parameter βk > 0, expectation ek > 0 and variance σ2k from
Assumption 6.34 and λj,k ≥ 0 from (6.11). Note that the right-hand side of (6.94)
works fine for the degenerate case σ2k = 0.
We consider a random variable Mj,k ∼ Log(pj,k). Let (Yj,k,n)n∈N be an i.i.d.
sequence of Nd0-valued random vectors, independent of Mj,k, with probability-
generating function ϕj,k defined in (6.87). Then by Example 4.4 and (4.56), the
probability-generating function
ϕj,k(s) =∑ν∈Nd0
bj,k,ν sν , s ∈ Cd, ‖s‖∞ ≤ 1,
of the Nd0-valued random sum
Sj,k :=
Mj,k∑n=1
Yj,k,n
is given by
ϕj,k(s) = ϕj,k(s)c(pj,kϕj,k(s))
c(pj,k), s ∈ Cd, ‖s‖∞ ≤ 1, (6.95)
131
and its coefficients (bj,k,ν)ν∈Nd0 can be computed in a numerically stable way
by Panjer’s recursion for the logarithmic distribution, see Example 5.21. More
explicitly, using (4.6) and (5.17), the initial value is
bj,k,0 = qj,k,0c(pj,k qj,k,0)
c(pj,k), (6.96)
and, using (5.18), the recursion formula is, for every ν ∈ Nd0 \ 0,
bj,k,ν =1
1− pj,k qj,k,0
(qj,k,νc(pj,k)
+pj,kνi
∑n∈Sj,k
n<ν, ni<νi
(νi − ni)qj,k,n bj,k,ν−n), (6.97)
where i ∈ 1, . . . , d is chosen such that νi 6= 0, and with pj,k given by (6.94),
(qj,k,ν)ν∈Nd0 given by (6.12), and Sj,k defined in (6.10). Note that γk does not
enter into this recursion. If pj,k = 0, then (6.96) and (6.97) simplify dramatically
to bj,k,ν = qj,k,ν for all ν ∈ Nd0. To calculate the function c from (4.5) in a
numerically stable way, see the corresponding comment in Example 4.4.
Rearranging and using (6.94) shows that
1− λj,kϕj,k(s)− 1
βk=βk + λj,k
βk
(1−
λj,k
βk + λj,kϕj,k(s)
)=
1
1− pj,k(1− pj,kϕj,k(s)
),
hence using (4.5) and (6.95) the logarithmic term in (6.92) can be rewritten as
6.7.2 Expansion of the Exponential by Panjer’s Recursion
To calculate the coefficients of the power series of (6.99), we first rewrite the
argument of the exponential function. Define
λj = λj,0 +K∑k=1
λj,ke2k + γkσ
2k
ek + λj,kσ2k︸ ︷︷ ︸
= (αk+γk)pj,k
c(pj,k), j ∈ J , (6.100)
with the shape parameter αk > 0 expectation ek > 0 and variance σ2k given in
Assumption 6.34, Poisson intensity λj,0 ≥ 0 given in (6.11), parameter pj,k ∈ [0, 1)
of the logarithmic distribution given in (6.94), and function c defined in (4.5).
Note that only non-negative terms are added in (6.100) and that its right-hand
side even works in the degenerate case σ2k = 0, both facts guarantee numerical
stability. For every j ∈ J with λj > 0, we define
ϕj(s) =1
λj
(λj,0ϕj,0(s) +
K∑k=1
λj,ke2k + γkσ
2k
ek + λj,kσ2k
c(pj,k)ϕj,k(s)
),
at least for all s ∈ Cd with ‖s‖∞ ≤ 1. Note that the coefficients of the power
series
ϕj(s) =∑ν∈N0
cj,ν sν , s ∈ Cd, ‖s‖∞ ≤ 1,
are given as convex combinations of the corresponding coefficients of ϕj,0 and
ϕj,1, . . . , ϕj,K , which is a numerically stable operation. More explicitly,
cj,ν =1
λj
(λj,0qj,0,ν +
K∑k=1
bj,k,ν λj,ke2k + γkσ
2k
ek + λj,kσ2k
c(pj,k)
), ν ∈ Nd0, (6.101)
with (qj,0,ν)ν∈Nd0 given by (6.12) or (6.13) and (bj,k,ν)ν∈Nd0 given by (6.96) and
(6.97). For every j ∈ J with λj = 0, we define ϕj(s) = 1 for all s ∈ Cd and
cj,ν =
1 for ν = 0 ∈ Nd0,0 for ν ∈ Nd0 \ 0.
(6.102)
In every case, ϕj is again a probability-generating function, and (6.99) can be
written as
ϕL,γ(s) =∑j∈J
exp(λj(ϕj(s)− 1)
)P[J = j] . (6.103)
Fix a scenario j ∈ J , let Mj ∼ Poisson(λj) and consider an independent
sequence (Yj,n)n∈N of i.i.d. random variables, each one with probability-generating
133
function ϕj . Then by Example 4.3 and (4.56), the probability-generating function
ψj of the distribution of the random sum
Sj :=
Mj∑n=1
Yj,n
is given by
ψj(s) = exp(λj(ϕj(s)− 1)
), s ∈ Cd, ‖s‖∞ ≤ 1,
and its coefficients, let’s call them (dj,ν)ν∈Nd0 , can be computed in a numerically
stable way by Panjer’s recursion for the Poisson distribution, see Example 5.17.
Explicitly, (5.12) implies for the initial value
dj,0 = exp(λj(cj,0 − 1)
)(6.104)
(in case of numerical underflow, see Remark 5.19 for a remedy) and the recursion
formula (5.13) turns, for every ν = (ν1, . . . , νd) ∈ Nd0 \ 0, into
dj,ν =λjνi
∑n∈Nd0
0<n≤ν
nicj,ndj,ν−n, (6.105)
where i ∈ 1, . . . , d is chosen such that νi 6= 0, with λj given by (6.100) and the
coefficients (cj,ν)ν∈Nd0 given by (6.101) and (6.102), respectively. See Remark 5.10
to omit terms in (6.105) with value zero.
The weighted probability-generating function (6.103) simplifies to
ϕL,γ(s) =∑j∈J
ψj(s)P[J = j] , s ∈ Cd, ‖s‖∞ ≤ 1,
and the coefficients of this power series are convex combinations of the correspond-
ing coefficients of (ψj)j∈J . These operations are numerically stable. Explicitly,
the coefficients in (6.80) are determined by
Pγ [L = ν] =∑j∈J
dj,ν P[J = j] , ν ∈ Nd0,
with (dj,ν)ν∈N0 given by (6.104) and (6.105).
Exercise 6.53 (Implementation of the algorithm). Assume that there are m ∈ Nobligors, where obligor i ∈ 1, . . . ,m has default probability pi = 1/(20 + i)
within one period, and that there is the idiosyncratic cause and C = 3 additional
default causes. Assume that the loss given default of obligor i ∈ 1, . . . ,m due
to cause c ∈ 0, . . . , C has the distribution Bin(i+ c, i/(2i+ 2c)) and that all
susceptibilities are equal to 1/(C+ 1). Let Λ1, . . . ,ΛC be default cause intensities
134
with E[Λc] = 1 and Λc ≥ 1/3c for all c ∈ 1, . . . , C. Assume the there are only
one-element risk groups and that there are two scenarios J = 0, 1. Extending
Example 6.38, let J be J -valued and consider the (C + 1)× (K + 1)-matrix
AJ =
1 0 0 0 0
∗ J 0 0 0
∗ 0 1− J ∗ 0
∗ 0 0 ∗ ∗
, (6.106)
where ∗ denotes non-zero, deterministic entries.
(a) With the given constraints, set up a flexible model satisfying Assumptions
6.34 and 6.35 such that Cov(Λ1,Λ2) < 0 and Cov(Λ2,Λ3) > 0.
(b) Calculate the expectations, variances and covariances of the default cause
intensities Λ1,Λ2,Λ3 (see Remark 6.32) in your model.
(c) Calculate the expected total credit portfolio loss. Does the result depend
on your specific choice of the dependence structure?
(d) Calculate the distribution of the total credit portfolio loss numerically for
an m ≥ 50 of your choice for your specific dependence structure.
6.8 Algorithm for Risk Factors with a Tempered Stable Distri-
bution
6.9 Special Cases
29 In order to test the algorithm, its implementation and its numerical stability,
it is helpful to consider special cases of the parameters, where the corresponding
distribution of the total loss L given in (6.19) can be calculated directly. In this
section we assume that all group losses are multiples of some C ∈ N, meaning
that we have
Lg,k,n = C L′g,k,n.
with an N0-valued L′g,k,n for every loss n ∈ N of risk group g ∈ G due to risk
k ∈ 0, . . . ,K. We adopt the notation from (6.15), (6.17) and (6.19). In this
section, we will not attribute the group loss to its individual members.
6.9.1 Pure Poisson Case
30 We only consider the degenerate case σ21 = · · · = σ2
K = 0, for which the
algorithm described in Section 6.7 works and for which Λk ≡ 1 almost surely for
29This section has to be adapted to the new notation and the generalized setting.30This section has to be adapted to the new notation and the generalized setting.
135
all k ∈ 1, . . . ,K. In this case the familyNg,k
∣∣ g ∈ G, k ∈ 0, . . . ,Kconsists of independent, Poisson distributed random variables.
Bernoulli Loss Distribution Assume that every L′g,k,n is a Bernoulli random
variable, i.e.,
p := P[L′g,k,n = 1
]= 1− P
[L′g,k,n = 0
]with p ∈ [0, 1] for all g ∈ G, k ∈ 0, . . . ,K and n ∈ N. Then, by (6.9), (??) and
(6.11), λk,ν = 0 for every ν ∈ N \ C, νk ∈ 0, C and λk = λk,C for every risk
k ∈ 0, . . . ,K. By (??),
L′g,k :=
Ng,k∑n=1
L′g,k,n ∼ Poisson(pλgwg,k) .
By the Poisson summation property (3.5), we obtain for
L′ :=∑g∈G
K∑k=0
L′g,k (6.107)
that L′ ∼ Poisson(pλ) with
λ :=∑g∈G
λg
K∑k=0
wg,k︸ ︷︷ ︸= 1 by (6.2)
. (6.108)
Therefore, the distribution of L = C L′ satisfies
P(L = l) =
(pλ)n
n! e−pλ if n := l/C ∈ N0,
0 otherwise.
Logarithmic Loss Distribution Assume that every L′g,k,n ∼ Log(q) with
q ∈ (0, 1). According to Example 4.27, the compound Poisson sum L′g,k has the
distribution NegBin(αg,k, p) with parameters p := 1− q and
αg,k := −λgwg,klog p
≥ 0.
By Lemma 4.24, the sum L′ defined in (6.107) has distribution NegBin(α, p) with
α := −λ/ log p and λ given by (6.108). Therefore, L = C L′ satisfies
P(L = l) =
(α+n−1
n
)pαqn if n := l/C ∈ N0,
0 otherwise.(6.109)
136
General Loss Distributions Let Qsg,k = (qs
g,k,ν)ν∈N0 be a general distribution
for the i.i.d. group losses (Lg,k,n)n∈N, depending on the group g ∈ G and the
risk k ∈ 0, . . . ,K. Then every Lg,k ∼ CPoisson(λgwg,k, Q
sg,k
)has a compound
Poisson distribution. By (4.58), its generating function is
ϕLg,k(s) = exp
(λgwg,k
( ∑ν∈N0
qsg,k,νs
ν
︸ ︷︷ ︸=ϕLg,k,1 (s)
−1
)). (6.110)
Assume that the sum λ of all weighted intensities, given by (6.108), is strictly
positive. Define the probability distribution Q = (qν)ν∈N0 by
qν =1
λ
∑g∈G
K∑k=0
λgwg,k qsg,k,ν , ν ∈ N0.
Due to independence, the generating function ϕL of the total loss L is the product
of the individual functions from (6.110), hence
ϕL(s) =∏g∈G
K∏k=0
ϕLg,k(s) = exp
(λ
( ∑ν∈N0
qνsν − 1
)),
in particular L ∼ CPoisson(λ,Q) has a compound Poisson distribution. Hence,
the distribution of L can be calculated by the Panjer recursion formula (5.13),
i.e.
P[L = l] =λ
l
l∑ν=1
ν qν P[L = l − ν], l ∈ N,
starting from
P[L = 0] = ϕL(0) = eλ(q0−1) .
6.9.2 Case of Negative Binomial Distribution
31 Here we assume absence of idiosyncratic risk, meaning that λ0,ν = 0 for all
ν ∈ N and λ0 = 0, see (6.9) and (6.11).
Bernoulli Loss Distribution Assume that L′g,k,n is a Bernoulli random vari-
able with risk-dependent distribution, i.e.,
pk := P[L′g,k,n = 1
]= 1− P
[L′g,k,n = 0
]with pk ∈ [0, 1] for all g ∈ G, k ∈ 1, . . . ,K and n ∈ N. Then, by (6.9) and
(6.11), λk,ν = 0 for every ν ∈ N\C and λk = λk,C for every risk k ∈ 1, . . . ,K.31This section has to be adapted to the new notation and the generalized setting.
137
Furthermore, assume that there exist a non-empty I ⊂ 1, . . . ,K and p ∈ (0, 1)
such that σ2kλk = (1− p)/p for all k ∈ I and λk = 0 for all k ∈ 1, . . . ,K \ I. By
(??) this means νk = C for all k ∈ I and νk = 0 for all k ∈ 1, . . . ,K \ I. Define
α =∑k∈I
1
σ2k
.
Then (??) simplifies to
E[sL]
=(
1 +1− pp
(1− sC))−α
=( p
1− qsC)α
with q := 1− p, which by (4.50) means that L′ := L/C ∼ NegBin(α, p), hence L
has the distribution given by (6.109).
General Loss Distributions We assume that the i. i. d. losses (Lg,k,n)n∈Nhave the same distribution Q = (qν)ν∈N0 for every group g ∈ G and every risk
k ∈ 1, . . . ,K. Since L(Ng,k|Λk)a.s.= Poisson(λgwg,kΛk) by Assumption 6.29,
and since (Ng,k)g∈G are conditionally independent given Λk by Assumption (6.30),
Lemma 3.2 for sums of independent Poisson random variables implies that
L(N(k)|Λk)a.s.= Poisson
(λ(k)Λk
)for every k ∈ 1, . . . ,K, where
N(k) :=∑g∈G
Ng,k and λ(k) :=∑g∈G
λgwg,k .
Here N(k) is the number of defaults in the portfolio caused by risk k ∈ 1, . . . ,K.Since Λk ∼ Γ(αk, βk) with αk = βk = 1/σ2
k by Assumption 6.34, hence
λ(k)Λk ∼ Γ(αk, βk/λ(k)),
we get for the unconditional distribution that
N(k) ∼ NegBin(αk, pk) with pk :=βk/λ(k)
1 + βk/λ(k)=
1
1 + λ(k)σ2k
,
where we use the notation from (4.46). Assuming that λ(k)σ2k and, therefore,
p := pk are the same for every risk k ∈ 1, . . . ,K, then we get for the total
number N :=∑K
k=1N(k) of defaults caused by all the independent risk factors
that
N ∼ NegBin(α, p) with α := α1 + · · ·+ αK ,
see Lemma 4.24. Therefore we have a compound negative binomial distribution
for the loss L given in (6.19), meaning that
L =∑g∈G
K∑k=1
Ng,k∑n=1
Lg,k,nd=
N∑n=1
Xn ∼ CNegBin(α, p,Q)
138
with an i.i.d. sequence (Xn)n∈N with Xn ∼ Q. Therefore, the distribution of L
can be calculated by the Panjer recursion formula (5.15)
P[L = l] =1
1− (1− p)q0
1− pl
l∑ν=1
(αν + l − ν)qν P[L = l − ν], l ∈ N,
starting from
P[L = 0] = ϕN (q0) =( p
1− (1− p)q0
)α,
see (5.14).
Exercise 6.54. Consider a logarithmic distribution for the idiosyncratic losses
and a Bernoulli distribution for the losses due to the risks k ∈ 1, . . . ,K,everything in multiples of C ∈ N. By combining the above results and putting
appropriate conditions on the parameters, show that the portfolio loss L has a
distribution given by (6.109).
7 Risk Measures and Risk Contributions
Knowing the distribution of the portfolio loss L given in (6.19), we can calculate
risk measures ρ(L). The quantity ρ(L) can be interpreted as the amount of
money that has to be added to the portfolio risk L to make it “acceptable.” For
expected shortfall as risk measure, we will also calculate risk contributions in the
context of extended CreditRisk+. These contributions indicate the conditional
expected loss caused by individual obligors, given a large portfolio loss occurs.
When comparing some of the following definitions with the literature, note
that our losses have a positive sign.
7.1 Quantiles and Value-at-Risk
Definition 7.1. For a real-valued random variable X and a level δ ∈ (0, 1),
define the lower δ-quantile of X by
qδ(X) = minx ∈ R | P[X ≤ x] ≥ δ (7.1)
and the upper δ-quantile of X by
qδ(X) = infx ∈ R | P[X ≤ x] > δ. (7.2)
Since the distribution function R 3 x 7→ FX(x) = P[X ≤ x] of X is right-
continuous, the minimum defining the lower quantile exists. Note that the
quantiles depend on X only via the distribution function FX . If we don’t specify
lower/upper in the following, we always refer to the lower quantile. Obviously,
we always have that qδ(X) ≤ qδ(X).
139
Exercise 7.2. Give an example were qδ(X) < qδ(X).
The lower quantile is the smallest threshold such that qδ(X) − X is non-
negative with probability at least δ. In financial risk management, the lower
quantile qδ(X) of a loss variable X is called Value-at-Risk (VaR) at level 1− δand used as a tool to quantify risk. Rewriting (7.1) as
qδ(X) = minx ∈ R | P[X > x] ≤ 1− δ,
we see that qδ(X) is the smallest threshold which is exceeded by the loss X with
probability at most 1− δ.
Exercise 7.3. Give an example were (0, 1) 3 δ 7→ qδ(X) is discontinuous.
The following example shows that small variations of X can lead to substantial
jumps of the quantile qδ(X), the subsequent lemma gives a condition, when this
does not happen.
Example 7.4. Consider the unit interval Ω = [0, 1] equipped with Borel σ-algebra
B([0, 1]). Let P denote the Lebesgue measure restricted to B([0, 1]). Given a
level δ ∈ (0, 1) and n ∈ N, define δn = max0, δ− 1/n and the Bernoulli random
variables Xn = 1[δn,1] and X = 1[δ,1]. Then Xn X pointwise as n → ∞,
qδ(Xn) = 1 for all n ∈ N but qδ(X) = 0.
Exercise 7.5. Modify Example 7.4 such that Xn X pointwise as n → ∞,
qδ(Xn) = 0 for all n ∈ N but qδ(X) = 1.
Lemma 7.6. Fix a level δ ∈ (0, 1). Let (Xn)n∈N be a sequence of real-valued
random variables converging to X in probability, i.e.,
limn→∞
P[ |X −Xn| ≥ ε] = 0 for every ε > 0.
(a) The lower δ-quantiles satisfy
lim infn→∞
qδ(Xn) ≥ qδ(X).
(b) The upper δ-quantiles satisfy
lim supn→∞
qδ(Xn) ≤ qδ(X).
(c) If the distribution of X satisfies P[X ≤ x] > δ for all x > qδ(X), which is
equivalent to qδ(X) = qδ(X), then
limn→∞
qδ(Xn) = qδ(X) and limn→∞
qδ(Xn) = qδ(X).
140
Proof. (a) If x < y < qδ(X), then
P[Xn ≤ x] ≤ P[X ≤ y] + P[|X −Xn| ≥ y − x]︸ ︷︷ ︸→ 0 as n→∞
,
hence
lim supn→∞
P[Xn ≤ x] ≤ γ := P[X ≤ y] < δ
by the definition of qδ(X) in (7.1). Therefore P[Xn ≤ x] ≤ (δ + γ)/2 < δ for all
sufficiently large n ∈ N, hence qδ(Xn) ≥ x for these n and lim infn→∞ qδ(Xn) ≥ x.
Since x < qδ(X) was arbitrary, the lower bound in (a) follows.
(b) The proof is very similar to part (a). If x > y > qδ(X), then
(e) Monotonicity: If X ≤ Y , then ESδ[X] ≤ ESδ[Y ].
(f) Convexity: If α ∈ (0, 1), then
ESδ[αX + (1− α)Y ] ≤ αESδ[X] + (1− α) ESδ[Y ].
(g) Minimization property:
ESδ[X] = minq∈R
(q +
E[(X − q)+]
1− δ
),
and the minimum is attained if and only if q ∈ [qδ(X), qδ(X)].
(h) Bounds: For every q ∈ R,
qδ(X) ≤ ESδ[X] ≤ q +E[(X − q)+]
1− δ,
where the lower bound is an equality if and only if P[X ≤ qδ(X)] = 1, and
the upper bound is an equality if and only if q ∈ [qδ(X), qδ(X)].
(i) Quantile representation:
ESδ[X] =1
1− δ
∫[δ,1)
qu(X) du.
(j) Let (Xn)n∈N be bounded below, i.e., there exists a constant a ∈ [0,∞) such
that Xn ≥ −a for all n ∈ N. Then X := lim infn→∞Xn satisfies
ESδ[X] ≤ lim infn→∞
ESδ[Xn]. (7.14)
(k) Let (Xn)n∈N be bounded below and converging in probability to a random
variable X. Then (7.14) holds, too.
Corollary 7.21. For every real-valued random variable X, the map
(0, 1) 3 δ 7→ ESδ[X] ∈ R ∪ ∞
is continuous and non-decreasing.
146
Proof of Corollary 7.21. Continuity follows from the quantile representation in
part (i). For δ ≤ δ′ we have Fδ,X ⊂ Fδ′,X which implies ESδ[X] ≤ ESδ′ [X] by
the scenario representation (c).
Remark 7.22. A coherent risk measure is defined by monotonicity, positive
homogeneity, translation invariance and sub-additivity, cf. Artzner, Delbaen, Eber
and Heath [3]. A convex risk measure is defined by monotonicity, translation
invariance and convexity, cf. Follmer and Schied [20]. Note that risk measures are
often defined for random variables representing the profit and loss, while in our
notation losses have a positive sign. For more details on expected shortfall, see
Acerbi and Tasche [1]. The minimization property (g) can be found in Rockafellar
and Uryasev [42].
Remark 7.23. We excluded the cases α = 0 in (a) and (f), and α = 1 in (f) to
avoid expression of the form 0 · ∞.
Remark 7.24. Concerning the properties in Lemma 7.20, some comments might
be useful:
(a) If all losses are scaled, then the risk and the needed capital scales in the
same way.
(b) If a constant loss is added, the corresponding amount of capital is needed
in addition to make the risk acceptable.
(c) If probabilities of events can be raised by at most the factor 1/(1 − δ),then ESδ[X] is the worst expected loss possible.
(d), (f) Diversification does not increase the risk.
(e) Smaller losses need less capital.
(g) For an economic interpretation, assume that you can choose an amount q
and enter into a special stop-loss insurance contract such that, whenever your loss
X is above q, you must to pay the fair insurance premium E[(X − q)+] multiplied
with the security loading factor 11−δ and receive in return the (possibly smaller,
maybe higher) amount X − q to cover your losses above q. Which q is optimal for
you and how much do you lose given the loss X exceeds q? If q is too high, your
deductible is high when X > q happens; if q is too small, your premium is high
when X > q happens, the optimal compromise is given by q ∈ [qδ(X), qδ(X)].
(i) The quantile representation implies that the expected shortfall varies
continuously with the level δ, contrary to the quantile function (0, 1) 3 δ 7→qδ(X), which can jump, cf. Exercise 7.3. For discrete distributions like the loss
distribution in the extended CreditRisk+ model, the quantile function has to
jump unless the loss is degenerate. The quantile representation also justifies the
name average value-at-risk for expected shortfall.
(k) implies the Fatou property discussed in Delbaen [14].
147
Proof of Lemma 7.20. (a) follows from the homogeneity of the expectation and
the observation that qδ(αX) = αqδ(X).
(b) holds because of the translation invariance of the expectation and the
observation that qδ(X + a) = qδ(X) + a.
(c) Remark 7.18, in particular (7.11), shows that equality holds for fX ∈ Fδ,X .
Therefore, the supremum is an upper estimate and (i) holds in the case ESδ[X] =
∞. If ESδ[X] < ∞, then necessarily E[X+] < ∞, hence Fδ,X = Fδ. Consider
f ∈ Fδ with E[Xf ] > −∞. We have E[f − fX ] = 0, hence
E[Xf ]− E[XfX ] = E[(X − qδ(X))(f − fX)]
= E[(X − qδ(X)︸ ︷︷ ︸> 0
)(f − fX︸ ︷︷ ︸≤ 0
)1X>qδ(X)]
+ E[(X − qδ(X)︸ ︷︷ ︸< 0
)(f − fX︸ ︷︷ ︸≥ 0
)1X<qδ(X)] ≤ 0,
which means that the supremum is identical with E[XfX ].
(d) It suffices to consider the case where ESδ[X] <∞ and ESδ[Y ] <∞. Then
E[X+], E[Y +] and E[(X + Y )+] are finite and with the the representation from
(c), part (ii), we get
ESδ[X + Y ] = supf∈Fδ
E[(X + Y )f ]
≤ supf∈Fδ
E[Xf ] + supf∈Fδ
E[Y f ] = ESδ[X] + ESδ[Y ].
(e) Note that ESδ[X] ≤ ESδ[X − Y ] + ESδ[Y ] by subadditivity (d). For Z :=
X − Y ≤ 0, we have ESδ[Z] ≤ 0 according to (7.5) because E[Z1Z>qδ(Z)
]≤ 0
and qδ(Z) ≤ 0 for a non-positive random variable and P[Z ≤ qδ(Z)] ≥ δ by the
definition of the lower quantile.
(f) follows from (d) and (a).
(g) By the alternative representation (7.7), we have equality for q = qδ(X).
Note that X − qδ(X) = (q− qδ(X)) + (X − q) for every q ∈ R. Consider the case
q > qδ(X). Then
(X − qδ(X))+ ≤ (q − qδ(X))1X>qδ(X) + (X − q)+
with strict inequality precisely on the event qδ(X) < X < q. Adding to both
sides qδ(X)(1− δ) and taking expectations, it follows that
qδ(X)(1− δ) + E[(X − qδ(X))+
]≤ qδ(X)(1− δ) + (q − qδ(X)︸ ︷︷ ︸
> 0
)P[X > qδ(X)]︸ ︷︷ ︸≤ 1−δ by (7.1)
+ E[(X − q)+
]≤ q(1− δ) + E
[(X − q)+
]148
with equality if and only if P[qδ(X) < X < q ] = 0 and P[X ≤ qδ(X)] = δ, which
by (7.1) and (7.2) is equivalent to qδ(X) < q ≤ qδ(X). Finally, consider the case
q < qδ(X). Then
(X − qδ(X))+ ≤ (q − qδ(X))1X≥qδ(X) + (X − q)+
with strict inequality precisely on the event q < X < qδ(X). It follows that
qδ(X)(1− δ) + E[(X − qδ(X))+
]≤ qδ(X)(1− δ) + (q − qδ(X)︸ ︷︷ ︸
< 0
)P[X ≥ qδ(X)]︸ ︷︷ ︸≥ 1−δ by (7.1)
+ E[(X − q)+
]≤ q(1− δ) + E
[(X − q)+
]with equality if and only if P[q < X < qδ(X)] = 0 and P[X < qδ(X)] = δ. By
the minimizing property of the lower quantile qδ(X) defined in (7.1), these two
conditions cannot be satisfied simultaneously for a q < qδ(X).
(h) The lower bound together with the discussion of equality follows directly
from the alternative representation (7.7), the upper bound follows from (g).
(i) By extending the probability space if necessary, we may assume the
existence of a random variable U on (Ω,A,P) which is uniformly distributed on
(0, 1), meaning that P[U ≤ u] = u for all u ∈ [0, 1]. Let qU (X) denote the random
quantile Ω 3 ω 7→ qU(ω)(X). For every x ∈ R and u ∈ (0, 1) we have
qu(X) ≤ x =⇒ P[X ≤ x] ≥ u and qu(X) > x =⇒ P[X ≤ x] < u
by the definition (7.1) of the lower quantile, hence
P[qU (X) ≤ x] = P[U ≤ P[X ≤ x]] = P[X ≤ x]
for all x ∈ R, meaning that qU (X) and X have the same distribution.
Define δ′ = P[X ≤ qδ(X)]. Note that δ′ ≥ δ and qu(X) = qδ(X) for every
u ∈ [δ, δ′]. Using the above implications for x = qδ(X) shows that U > δ′ =
qU (X) > qδ(X). Therefore,∫[δ,1)
qu(X) du =
∫(δ′,1)
qu(X) du+
∫[δ,δ′]
qu(X) du
= E[qU (X)1U>δ′
]+ qδ(X)(δ′ − δ)
= E[X1X>qδ(X)
]+ qδ(X)(P[X ≤ qδ(X)]− δ).
Division by 1− δ gives the right-hand side of (7.5), which is the result.
(j) By translation invariance from (b), we may assume without loss of generality
that every Xn is non-negative. Using the density fX from (7.8), the representation
149
of expected shortfall with the density fX given in (7.11), Fatou’s lemma for
(XnfX)n∈N and the scenario representation from (c), we get
ESδ[X] = E[XfX ] ≤ lim infn→∞
E[XnfX ]︸ ︷︷ ︸≤ESδ[Xn]
.
(k) By passing to a subsequence if necessary, we may assume that the sequence
(ESδ[Xn])n∈N converges to the limit inferior in (7.14). By passing to a further
subsequence if necessary, we may assume that (Xn)n∈N converges almost surely
to X. Now, (7.14) follows from (j).
If we have an estimate for the Wasserstein distance of two distributions, see
Definition 3.14, then we get bounds for the expected shortfall of these distributions.
Lemma 7.25 (Expected shortfall and Wasserstein distance). Let X and Y be
real-valued, integrable random variables and denote the Wasserstein distance of
their distributions by dW(L(X),L(Y )). Then the expected shortfall of X and Y
satisfies, for every level δ ∈ (0, 1),∣∣ESδ[X]− ESδ[Y ]∣∣ ≤ dW(L(X),L(Y ))
1− δ. (7.15)
Proof. Let (ai)i∈I and (bi)i∈I be non-empty collections of real numbers, which
are bounded below. Define
a = infi∈I
ai, b = infi∈I
bi and c = supi∈I|ai − bi|.
Then ai ≤ bi + c for every i ∈ I, hence a ≤ b + c. Similarly b ≤ a + c, hence
|a− b| ≤ c. Using this observation and the minimization property from Lemma
7.20(g), it follows that∣∣ESδ[X]− ESδ[Y ]∣∣ ≤ 1
1− δsupq∈R
∣∣E[(X − q)+]− E
[(Y − q)+
]∣∣.For every q ∈ R, the function R 3 x 7→ (x − q)+ is Lipschitz continuous with
constant 1, hence (7.15) follows directly from the lower bound (3.15).
7.3 Contributions to Expected Shortfall
If the risk and the necessary risk capital for a portfolio loss are calculated
with expected shortfall, the question about the risk contributions of individual
components of the portfolio arises. Let L0(P) = L0(Ω,A,P) denote the vector
space of all random variables X: Ω → R on the probability space (Ω,A,P).
Let L−1 (P) denote the cone of those X ∈ L0(P), for which the negative part
X− = max0,−X is P-integrable. Let L1(P) denote the vector space of all
P-integrable X ∈ L0(P).
150
Then, if Z ∈ L0(P) denotes a portfolio loss and X1, . . . , Xn ∈ L−1 (P) with
X1 + · · ·+Xn = Z denote the losses of the n subportfolios, we can ask how to
allocate the risk capital ESδ[Z] to the n subportfolios in a fair and risk-adequate
way.
Definition 7.26 (Allocation of risk capital by expected shortfall). For a portfolio
loss Z ∈ L0(P) and a level δ ∈ (0, 1), consider a subportfolio loss X ∈ L0(P)
with X1Z≥qδ(Z) ∈ L−1 (P). Then the expected shortfall contribution of the
subportfolio loss X to Z at level δ is defined by
ESδ[X,Z] =E[X1Z>qδ(Z)] + βZ E[X1Z=qδ(Z)]
1− δ(7.16)
with βZ as in (7.9), i. e.
βZ =
P[Z≤qδ(Z)]−δP[Z=qδ(Z)] if P[Z = qδ(Z)] > 0,
0 otherwise.(7.17)
Remark 7.27. Note that ESδ[X,Z] = ∞ is possible and that the condition
X1Z≥qδ(Z) ∈ L−1 (P) is certainly satisfied for all X ∈ L−1 (P).
Remark 7.28. If P[Z ≤ qδ(Z)] = δ, then βZ = 0 and (7.16) simplifies to
ESδ[X,Z] = E[X |Z > qδ(Z)] ,
cf. Remark 7.14. Therefore, ESδ[X,Z] is the conditional expectation of the
subportfolio loss X given a large portfolio loss Z occurs. This allocation principle
was already presented in [46].
Remark 7.29. With the density fZ defined as in (7.8), we get the representation
ESδ[X,Z] = E[XfZ ].
7.3.1 Theoretical Properties
Allocation of risk capital by the expected shortfall principle has a number of good
properties. For an axiomatic approach to risk capital allocation, see Kalkbrener
[31].
Lemma 7.30. Expected shortfall contribution at level δ ∈ (0, 1) has, for all
X,Y ∈ L−1 (P) and Z ∈ L0(P), the following properties:
(a) Consistency with expected shortfall: ESδ[Z,Z] = ESδ[Z].
(b) Diversification: ESδ[X,Z] ≤ ESδ[X,X].
(c) Linearity: For all α, β > 0,
ESδ[αX + βY, Z] = αESδ[X,Z] + β ESδ[Y,Z].
If X,Y ∈ L1(P), the equality holds for all α, β ∈ R.
151
(d) Translation (or cash) invariance: If a ∈ R, then
ESδ[X + a, Z] = ESδ[X,Z] + a.
(e) Monotonicity: If X ≤ Y , then ESδ[X,Z] ≤ ESδ[Y, Z].
(f) Independence: If X and Z are independent, then ESδ[X,Z] = E[X].
(g) Invariance of portfolio scale: ESδ[X,αZ] = ESδ[X,Z] for all α > 0.
(h) Subportfolio continuity: If Y ∈ L1(P), then
∣∣ESδ[X,Z]− ESδ[Y,Z]∣∣ ≤ ESδ[|X − Y |, Z] ≤ E[|X − Y |]
1− δ.
(i) Portfolio continuity: Suppose that X ∈ L1(P). If P[Z ≤ qδ(Z)] = δ or if
X is almost surely constant on Z = qδ(Z), then capital allocation for X
by expected shortfall at level δ is continuous at Z, i.e., for every sequence
(Zn)n∈N ⊂ L0(P) converging to Z in probability,
limn→∞
ESδ[X,Zn] = ESδ[X,Z]. (7.18)
(j) Representation of expected shortfall contribution by directional derivative:
If capital allocation for X ∈ L1(P) by expected shortfall is continuous at
Z ∈ L1(P) as specified in part (i), then
ESδ[X,Z] = limε→0
ESδ[Z + εX]− ESδ[Z]
ε. (7.19)
Remark 7.31. Property (b) shows that X considered as a subportfolio of any
other portfolio Z does not need more risk capital than on its own, meaning that
diversification never increases the risk capital. The proof of (i) is due to the
author.
Example 7.32. To see that the continuity in part (i) and the representation as
directional derivative from part (j) don’t hold for all Z, consider on Ω = 0, 1with P[0] = δ the random variables given by X(ω) = ω and Z(ω) = 0 for
all ω ∈ Ω. Define Zε = εX. Then Zε → Z pointwise as ε → 0. Furthermore,