Page 1
Chapter 3. Some Special Distributions
§3.1 The Binomial and Related Distributions
(a) Bernoulli trial (p.133)
X ∼ Bernoulli(p) (0 ≤ p ≤ 1)
- pdf : f(x) = px(1 − p)1−x, x = 0, 1
- mgf : M(t) = pet + q, −∞ < t < ∞- mean & variance : E(X) = p, Var(X) = p(1 − p)
- Bernoulli process Xnn≥1 : XnIID∼ Bernoulli(p)
(b) Binomial distribution (p.134-p.135)
X ∼ Bin(n, p) 0 ≤ p ≤ 1 (교재 b(n, p))
- pdf : f(x) =
(n
x
)px(1 − p)n−x, x = 0, 1, . . . , n
- mgf : M(t) = (pet + q)n, −∞ < t < ∞ (q = 1 − p)
- mean & variance : E(X) = np, Var(X) = np(1 − p)
- X ∼ Bin(n, p) ⇔ Xd≡ X1 + · · ·+ Xn, Xi
IID∼ Bernoulli(p)
(# of successes)
(c) Geometric distribution (p.137)
X ∼ Geo(p) (0 < p < 1)
- pdf : f(x) = p(1 − p)x, x = 0, 1, 2, . . .
- mgf : M(t) = p(1 − qet)−1, t < − log q (q = 1 − p)
- mean & variance : E(X) = q/p, Var(X) = q/p2
- X ∼ Geo(p) ⇔ Xd≡ W1 − 1, W1 = minn : X1 + · · ·+ Xn ≥ 1(
# of trials before the first success
in a Bernoulli process
)
1
Page 2
(d) Negative binomial distribution (p.137)
X ∼ Negbin(r, p)
- pdf : f(x) =
(x + r − 1
r − 1
)pr(1 − p)x, x = 0, 1, 2, . . .
- mgf : M(t) = pr(1 − qet)−r, t < − log q (q = 1 − p)
- binomial expansion : (1 + x)r =
∞∑
k=0
(r
k
)xk, |x| < 1
- mean & variance : E(X) = rq/p, Var(X) = rq/p2
- X ∼ Negbin(r, p) ⇔ Xd≡ Wr − r, Wr = minn : X1 + · · · + Xn ≥ r(
# of failures before the rth success
in a Bernoulli process
)
Fact : X ∼ Negbin(r, p) ⇔ Xd≡ X1 + · · ·+ Xr, Xi
IID∼ Geo(p)
Note that the inter “arrival (occurrence)” times
W1 − 1, W2 − W1 − 1, W3 − W2 − 1, · · · , Wr − Wr−1 − 1 : IID Geo(p)
∵ P (W1 − 1 = y1, W2 − W1 − 1 = y2, · · · , Wr − Wr−1 − 1 = yr)
= P (Xi = 0, i = 1, · · · , y1, Xy1+1 = 1, Xy1+1+i = 0, i = 1, · · · , y2,
Xy1+1+y2+1 = 1, · · · )= (qy1p) · (qy2p) · · · (qyrp)
Property (p.137)
1© X1 ∼ Bin(n1, p), X2 ∼ Bin(n2, p), X1 & X2 independent
|⇒ X1 + X2 ∼ Bin(n1 + n2, p)
2© X1 ∼ Negbin(r1, p), X2 ∼ Negbin(r2, p), X1 & X2 independent
|⇒ X1 + X2 ∼ Negbin(r1 + r2, p)
P (X1 + X2 = y) =∑∑
x1+x2=y
(n1
x1
)px1qn1−x1
(n2
x2
)px2qn2−x2
=
∑∑
x1+x2=y
(n1
x1
)(n2
x2
)pyqn1+n2−y
=
(n1 + n2
y
)pyqn1+n2−y, y = 0, 1, · · · , n1 + n2
2
Page 3
(e) Multinomial trial (p.137-p.138)
(X1, · · · , Xk−1)′ ∼ Multi(n, (p1, · · · , pk−1, pk)
′) (
k∑
1
pi = 1, pi > 0)
- pdf : f(x1, · · · , xk−1) =
(n
x1 · · · xk−1 (n − x·)
)px1
1 · · · pxk−1
k−1 pn−x·
k
xi = 0, 1, · · · , n (i = 1, · · · , k−1), x· =∑k−1
i=1 xi ≤ n
- mgf : (p1et1 + · · ·+ pk−1e
tk−1 + pk)n, −∞ < ti < ∞
- mean and variance : EXi = npi, cov(Xi, Xj) =
npi(1 − pi) (i = j)
−npipj (i 6= j)
- From now on :
(X1, · · · , Xk)′ ∼ Multi(n, (p1, · · · , pk)
′)
with Xk ≡ n − (X1 + · · · + Xk−1) when k ≥ 3
mgf : (p1et1 + · · ·+ pk−1e
tk−1 + pketk)n
Var : n(diag(p) − pp′) with p = (p1, · · · , pk)′.
- marginal distribution and conditional distribution (p.139)
(X1, · · · , Xk)′ ∼ Multi(n, (p1, · · · , pk)
′)
|⇒ (i) Xi ∼ Bin(n, pi)
(ii) (X1, X2, n − (X1 + X2))′ ∼ Trinomial(n, (p1, p2, 1 − p1 − p2)
′)
(iii) (X2, X3, · · · , Xk)′|X1 = x1 ∼ Multi(n−x1, (
p2
1 − p1, · · · ,
pk
1 − p1)′)
Probability (conceptualization of the relative frequency) (WLLN) (p.136)
- X1, X2, · · · , Xn : iid Bernoulli(p)
pn =X1 + · · ·+ Xn
n
limn→∞
P (|pn − p| ≥ ε) = 0 for all ε > 0
R code (p.135, p.140 #3.1.8)
dbinom(pdf), qbinom(quantile), pbinom(cdf), rbinom(random number)
rmultinom, dmultinom
3
Page 4
§3.2 The Poisson Distribution
Poisson distribution
X ∼ Poisson(m)
- pdf : f(x) = mxe−m/x!, x = 0, 1, 2, · · ·
- mgf : M(t) = exp(m(et − 1))
- mean and variance : E(X) = m, Var(X) = m
(exponential : em =
∞∑
x=0
mx
x!)
Poisson approximation to binomial distribution
limn→∞
npn→µ
(n
x
)px
n(1 − pn)n−x =µxe−µ
x!
(Recall lim
n→∞
(1 +
a
n
)n
= ea, limn→∞
(1 +
a + o(1)
n
)n
= ea, Handout #2)
Poisson process with arrival rate λ : Nt : t ≥ 0(i) (stationarity) Nt+s − Ns d
= Nt(ii) (independent increments) (3 on p.143) For 0 < t1 < · · · < tk,
Nt1 , Nt2 − Nt1 , · · · , Ntk − Ntk−1are independent.
(iii) (rareness) (2 on p.143)
P (Nh ≥ 2) = o(h) as h → 0 (Handout 2)
(iv) (proportionality) (1 on p.143)
P (Nh = 1) = λh + o(h) as h → 0 (Handout 2)
P (Nt = n) =(λt)ne−λt
n!, n = 0, 1, · · ·
4
Page 5
(Derivation) (p.144)
g(n, t) ≡ P (Nt = n)
(∗)
g(0, 0) = 1∂
∂tg(0, t) = −λg(0, t)
g(0, t) = e−λt
(∗∗)
g(n, 0) = 0 for n = 1, 2, · · ·∂
∂tg(n, t) = −λg(n, t) + λg(n − 1, t)
(∗) g(0, t + h) = P (Nt+h = 0)
= P (Nt = 0, Nt+h − Nt = 0)
= P (Nt = 0)P (Nt+h − Nt = 0) (ii)
= P (Nt = 0)P (Nh = 0) (i)
= g(0, t)(1− P (Nh = 1) − P (Nh ≥ 2))
= g(0, t)(1− λh + o(h)) (iii)
g(0, t + h) − g(0, t) = −λg(0, t) · h + o(h)
(∗∗) g(n, t + h) = P (Nt+h = n)
= P (Nt+h = n, Nt = n)
+P (Nt+h = n, Nt = n − 1)
+P (Nt+h = n, Nt ≤ n − 2)
= g(n, t)(1 − λh + o(h)) + g(n − 1, t)λh + o(h)
∂
∂tg(n, t) = −λg(n, t) + λg(n − 1, t)
∂
∂t
(eλtg(n, t)/λn
)= eλtg(n − 1, t)/λn−1
eλtg(n, t)/λn = tn/n! (eλtg(0, t)/λ0 = 1)
5
Page 6
Property (p.146)
X1, · · · , Xn : independent Poisson(mi) (i = 1, · · · , n), respectively
|⇒ X1 + · · · + Xn ∼ Poisson(m1 + · · ·+ mn)
Proof (n = 2)
P (X1 + X2 = y) =∑∑
x1+x2=y
mx11 e−m1
x1!
mx22 e−m2
x2!, y = 0, 1, · · ·
=
y∑
x=0
(y
x
)mx
1my−x2
1
y!e−(m1+m2), y = 0, 1, · · ·
=(m1 + m2)
ye−(m1+m2)
y!
Read (eg 3.2.3, eg 3.2.4 on p.147)
6
Page 7
§3.3 The Gamma and Related Distributions
Fact (p.149)
1© Γ(α) ≡∫ ∞
0
yα−1e−ydy < ∞ for α > 0
2© Γ(n) = (n − 1)! (n = 1, 2, · · · ), Γ(α) = (α − 1)Γ(α − 1), α > 1
3© Γ(1
2
)=
√π
Gamma distribution (p.149)
X ∼ Gamma(α, β) (α > 0, β > 0) (교재 Γ(α, β))
- pdf : f(x) =1
Γ(α)βαxα−1e−x/βI(0,∞)(x)
- mgf : M(t) = (1 − βt)−α, t < 1/β
- mean and variance : E(X) = αβ, Var(X) = αβ2
α : shape parameter, β : scale parameter (See p.152)
Exponential distribution (p.150)
Exp(β) = Gamma(1, β) (exponential with mean β)
(교재) Exp(λ) = Gamma(1, 1/λ)
Fact (p.150) (Waiting time in a Poisson process)
Ntt≥0 : Poisson process with occurrence rate λ > 0
Wr ≡ mint : Nt ≥ r waiting time until rth occurrence.
(a) Wr ∼ Gamma(r, 1/λ)
(b) W1, W2 − W1, · · · : IID Gamma(1, 1/λ) ≡ Exp(λ)
7
Page 8
(a) P (Wr > t) = P (Nt ≤ r − 1)
=r−1∑
k=0
e−λt(λt)k
k!(∵ Nt ∼Poisson (λt))
d
dtP (Wr ≤ t) = −
r−1∑
k=0
((−λ)
e−λt(λt)k
k!+ λk
e−λt(λt)k−1
k!
)
= λ
(r−1∑
k=0
e−λt(λt)k
k!−
r−2∑
k=0
e−λt(λt)k
k!
)
= λ · e−λt(λt)r−1
(r − 1)!, t > 0
: pdf of Gamma(r, 1/λ)
(b) (Proof for W1 & W2 − W1)
P (W1 > t1, W2 > t2) = P (Nt1 = 0, Nt2 ≤ 1)
= P (Nt1 = 0, Nt2 − Nt1 ≤ 1)
= P (Nt1 = 0)P (Nt2 − Nt1 ≤ 1)
Nt2 − Nt1 ∼ Poisson(λ(t2 − t1))
= e−λt1(e−λ(t2−t1) + e−λ(t2−t1)λ(t2 − t1)
), t1 < t2
pdfW1,W2(t1, t2) = λ2e−λt2I(0<t1<t2)
pdfW1,W2−W1(y1, y2) = λe−λy1I(0,∞)(y1) · λe−λy2I(0,∞)(y2)
∴ W1, W2 − W1 : IID Gamma(1, 1/λ)
Fact (p.154)
(a) X1, · · · , Xn : independent Gamma(αi, β) (i = 1, · · · , n), respectively
⇒ X1 + · · ·+ Xn ∼ Gamma(α1 + · · ·+ αn, β)
(b) X ∼ Gamma(r, β)
⇔ Xd= Z1 + · · · + Zr, Zi
IID∼ Exp(β−1) = Gamma(1, β)
(mgf technique : (1 − βt)−(α1+···+αn))
8
Page 9
Chi-square distribution (p.152)
χ2(r) ≡ Gamma(r
2, 2)
- pdf : f(x) =1
Γ( r2)2r/2
xr2−1e−
x2 I(0,∞)(x)
- mgf : M(t) = (1 − 2t)−r/2, t < 1/2
- mean and variance : E(X) = r, Var(X) = 2r
(r : degree of freedom)
Rcode (p.153) pchisq, dchisq
Remark : It is related to the distribution of the sample variance from a
normal population (p.186) (Thm 3.6.1)
Fact (p.154)
(a) X1, · · · , Xn : independent χ2(ri) (i = 1, · · · , n), respectively
⇒ X1 + · · ·+ Xn ∼ χ2(r1 + · · ·+ rn)
(b) X ∼ χ2(r) ⇔ Xd≡ X1 + · · ·+ Xr, Xi
IID∼ χ2(1)
Beta distribution (p.155)
X ∼ Beta(α, β)
- pdf : f(x) =Γ(α + β)
Γ(α)Γ(β)xα−1(1 − x)β−1I(0,1)(x)
- mgf : (not explicit form)
- mean and variance : E(X) =α
α + β, Var(X) =
αβ
(α + β)2(α + β + 1)
Rcode (p.156) pbeta, dbeta
9
Page 10
Fact (p.155)
X ∼ Beta(α, β) ⇔ Xd≡ X1
X1 + X2
where X1 ∼ Gamma(α, 1), X2 ∼ Gamma(β, 1), X1, X2 : independent
∵
-
Y1 = X1 + X2 x1 = y1y2
Y2 =X1
X1 + X2x2 = y1(1 − y2)
- 1-1 from (0,∞) × (0,∞) onto (0,∞) × (0, 1)
-∣∣∣det
(∂x
∂y
)∣∣∣ =
∣∣∣∣∣det
y2 y1
1 − y2 −y1
∣∣∣∣∣ = y1
pdfY1,Y2(y1, y2) = pdfX1,X2
(x1, x2)∣∣∣det
(∂x
∂y
)∣∣∣
=1
Γ(α)xα−1
1 e−x11
Γ(β)xβ−1
2 e−x2 · y1I(0,∞)(y1)I(0,1)(y2)
=1
Γ(α)Γ(β)(y1y2)
α−1(y1(1 − y2))β−1 · y1e
−y1I(0,∞)(y1)I(0,1)(y2)
=1
Γ(α + β)yα+β−1
1 e−y1I(0,∞)(y1)
· Γ(α + β)
Γ(α)Γ(β)yα−1
2 (1 − y2)β−1I(0,1)(y2)
Dirichlet distribution (p.156)
(Y1, · · · , Yk)′ ∼ Dirichlet(α1, · · · , αk, αk+1)
- pdf :Γ(α1 + · · ·+ αk+1)
Γ(α1) · · ·Γ(αk+1)yα1−1
1 · · · yαk−1k (1 − y1 − · · · − yk)
αk+1−1
0 < yi, y1 + · · · + yk < 1
- mean and variance :
E(Y1) =α1
α1 + · · · + αk+1, Var(Y1) =
α1(α2 + · · ·+ αk+1)
α2· (α· + 1)
,
cov(Y1, Y2) =−α1α2
α2· (α· + 1)
, α· = α1 + · · · + αk+1
10
Page 11
Fact (p.156)
(W1, · · · , Wk)′ ∼ Dirichlet(α1, · · · , αk, αk+1)
⇔ (Wi)1≤i≤kd≡( Xi
X1 + · · ·+ Xk + Xk+1
)
1≤i≤k
where Xiindep∼ Gamma(αi, 1)
(Derivation)
-
Yi ≡Xi
X1 + · · ·+ Xk + Xk+1(i = 1, · · · , k)
Yk+1 ≡ X1 + · · ·+ Xk + Xk+1
: 1-1 from (0,∞)k+1
onto Y = y : 0 < yi < 1, i = 1, · · · , k,∑k
1 yi < 1, yk+1 > 0
-
xi = yiyk+1 (i = 1, · · · , k)
xk+1 = yk+1(1 − y1 − · · · − yk)
- det(∂x
∂y
)= det
yk+1 © y1
. . ....
© yk+1 yk
−yk+1 · · · −yk+1 1 −∑ki=1 yi
= (yk+1)
k
∴ pdfY (y1, · · · , yk, yk+1) =k+1∏
i=1
1
Γ(αi)(xαi−1
i e−xi) ·∣∣∣det
(∂x
∂y
)∣∣∣
for
xi = yiyk+1 (i = 1, · · · , k)
xk+1 = yk+1(1 − y.), y. =∑k
1 yi
=1
Γ(α1) · · ·Γ(αk+1)yα1−1
1 · · · yαk−1k (1 − y.)
αk+1−1
· yα·+αk+1−1k+1 e−yk+1, y ∈ Y , y· =
k∑
i=1
yi, α· =k∑
i=1
αi
∴ (Y1, · · · , Yk)′ and Yk+1 are independent.
Yk+1 ∼ Gamma(α· + αk+1, 1)
pdfY1,··· ,Yk(y1, · · · , yk) =
Γ(α1 + · · ·+ αk+1)
Γ(α1) · · ·Γ(αk)Γ(αk+1)yα1−1
1 · · · yαk−1k (1 − y·)
αk+1−1
0 < yi (i = 1, · · · , k), y· < 1
11
Page 12
§3.4 The Normal Distribution
The “error” function and it’s integral
φ(x) ≡ 1√2π
e−12x2
,
∫ ∞
−∞φ(x)dx = 1
∵
( ∫∞−∞ φ(x)dx
)2
= 12π
∫∞−∞∫∞−∞ e−
12(x2+y2)dxdy
= 12π
∫ 2π
0
∫∞0
e−12r2
rdrdθ
= 1
Normal approximation to the binomial probability
(De Moivre-Laplace : Handout #3)
limn→∞
∑
x:a≤x−np√
npq≤b
(n
x
)px(1 − p)n−x =
∫ b
a
1√2π
e−12z2
dz
Key steps :
1© (Stirling’s formula)
m! = mm+1/2e−m√
2π(1 + o(1)) as m → ∞
2© (approximation of the density)
For x : a ≤ x − np√npq
≤ b,(
n
x
)px(1 − p)n−x =
1√npq
φ(x − np√
npq
)(1 + o(1))
3© (approximation of the sum as an integral)
binomial probability =b − a
N − 1
N∑
j=1
φ(zj)(1 + o(1))
=
∫ b
a
φ(z)dz + o(1)
where z1 =xmin − np√
npq∼ a, · · · , zN =
xmax − np√npq
∼ b
for xmin(xmax) is the smallest(largest) integer satisfying the inequality.
12
Page 13
Normal distribution (p.162)
X ∼ N(µ, σ2)
- pdf : f(x) =1√
2πσ2exp
(− 1
2
(x − µ)2
σ2
)
- mgf : M(t) = exp(µt +
1
2σ2t2
)
- mean and variance : E(X) = µ, Var(X) = σ2
Standard normal distribution : N(0, 1)
Z ∼ N(0, 1)
- E(Z) = 0, Var(Z) = E(Z2) = 1,
- E(Z3) = 0 (skewness), E(Z4) = 3 (kurtosis)
Properties
(a) (affine transformation) (p.162, p.166)
(i) X ∼ N(µ, σ2) |⇒ aX + b ∼ N(aµ + b, a2σ2) (a, b : constants)
(ii) X ∼ N(µ, σ2) ⇔ Xd≡ σZ + µ, Z ∼ N(0, 1)
(b) (sum of independent normal rv’s) (p.166)
(i) Xi ∼ N(µi, σ2i ) (i = 1, 2) independent
|⇒ X1 + X2 ∼ N(µ1 + µ2, σ21 + σ2
2)
(ii) X1, · · · , Xn : IID N(µ,σ2)
|⇒ X ∼ N(µ,σ2/n) or
X − µ
σ/√
n∼ N(0, 1)
(c) (square of normal rv) (p.166)
(i) Z ∼ N(0, 1) |⇒ Z2 ∼ χ2(1)
(ii) Y ∼ χ2(r) ⇔ Yd≡ Z2
1 + · · ·+ Z2r , Zi
IID∼ N(0, 1)
13
Page 14
(Derivation)
(a) (i) (a 6= 0)
pdfaX+b(y) = pdfX(x)∣∣∣dx
dy
∣∣∣, ax + b = y
(b) Note that
pdfX1⊕X2(y) =
∫ ∞
−∞pdfX2
(y − x) · pdfX1(x)dx
=
∫ ∞
−∞
1√2πσ2
2
exp(− (y − x − µ2)
2
2σ22
)· 1√
2πσ21
exp(− (x − µ1)
2
2σ21
)dx
=
∫ ∞
−∞exp
(− 1
2(σ−2
1 + σ−22 )z2
)dz · 1√
2πσ22
1√2πσ2
1
· exp(− 1
2(σ2
1 + σ22)
−1(y − µ1 − µ2)2)
=1√
2π(σ21 + σ2
2)exp
(− 1
2(σ2
1 + σ22)
−1(y − µ1 − µ2)2)
(convolution of pdfX1and pdfX2
: p.93)
(c) (i) pdfZ2(y) =∑
z:z2=y
pdfZ(z)∣∣∣dy
dz
∣∣∣−1
= 2 · (2π)−1/2 exp(−y/2)(2√
y)−1I(0,∞)(y)
=1
Γ(1/2)21/2y1/2−1 exp(−y/2)I(0,∞)(y)
: pdf of Gamma(1/2, 2) = χ2(1)
(ii) follows from the property of χ2 and (i).
Read Handout #4 (Facts from Linear Algebra) and Appendix II
14
Page 15
§3.5 The Multivariate Normal Distribution
Multivariate normal distribution (p.172 (3.5.8)) (p.173 (3.5.11))
1© (Equivalent definition of MVN)
X ∼ Nn(µ, Σ) Σ : n × n real symmetric non-negative definite
⇔ Xd≡ AZ + µ with AA′ = Σ, A : n × k, rank(A) = k and
Z1, · · · , Zk ∼ N(0, 1) IID
⇔ Xd≡ Σ1/2Z + µ with Σ1/2Σ1/2 = Σ, Σ1/2 : n × n, real symmetric and
Z1, · · · , Zn ∼ N(0, 1) IID
⇔ mgfX(t) = exp(µt +1
2t′Σt), t ∈ Rn
⇔ a′X ∼ N(a′µ, a′Σa) for any a : n × 1
Note that Σ can be singular in the above definition.
IDEA : Derive
E[exp(t′X)] = exp(µ′t +
1
2t′Σt
)
from E[exp(s′Z)] = exp(
12s′s).
2© (pdf of non-singular MVN)
X ∼ Nn(µ, Σ), Σ : n × n real symmetric positive definite
⇔ pdfX(x) = det(2πΣ)−1/2 exp−1
2(x − µ)′Σ−1(x − µ)
IDEA : (p.173 (3.5.12)) Derive the pdf of X = Σ1/2Z + µ from
pdfZ(z) = |2πI|−1/2 exp(− 1
2z′z).
15
Page 16
3© (Properties of MVN)
(i) (affine invariance of MVN family)
X ∼ N(µ, Σ) ⇒ AX + b ∼ N(Aµ + b, AΣA′)
(ii) (independence of components)
Suppose
X1
X2
∼ N
µ1
µ2
,
Σ11 Σ12
Σ21 Σ22
.
Then, X1 and X2 : independent ⇔ Σ12 = cov(X1, X2) = 0
(iii) (independence of linear functions)
Suppose X ∼ N(µ, Σ).
Then, AX and BX : independent ⇔ cov(AX, BX) = AΣB′ = 0
IDEA :
(i) (p.173 Thm 3.5.1) Xd≡ Σ1/2Z + µ AX + b
d≡ AΣ1/2Z + (Aµ + b)
(AΣ1/2)(AΣ1/2)′ = AΣA′
(ii) (p.175) MX1,X2(t1, t2)t1,t2= MX1(t1)MX2(t2)
where MX1(t1) = MX1,X2(t1, 0) and MX2(t2) = MX1,X2(0, t2)
2t′1Σ12t2t1,t2= 0
(iii)
(AX
BX
)=
(A
B
)X ∼ N(·, ·) & apply (ii).
4© (Conditional and marginal distributions)
Suppose
X1
X2
∼ N
µ1
µ2
,
Σ11 Σ12
Σ21 Σ22
, Σ22 : positive definite.
Then (i) X1|X2=x2∼ N
(µ1 + Σ12Σ
−122 (x2 − µ2), Σ11·2
)with
Σ11·2 = Σ11 − Σ12Σ−122 Σ21,
(ii) X2 ∼ N(µ2, Σ22).
IDEA : Consider
X1 − Σ12Σ−122 X2
X2
=
I − Σ12Σ−122
0 I
X1
X2
,
and apply 3© (i)(ii).
16
Page 17
5© (Distribution of a quadratic form)
Suppose Z ∼ N(0, I), A : real symmetric.
Then, A2 = A ⇒ Z ′AZ ∼ χ2(r) with r = trace(A).
Remark : The converse (⇐) is also true.
IDEA
• From diagonalization of a real symmetric matrix,
A = Pdiag(λi)P′, PP ′ = I = P ′P
and λi = 1 or 0 since A2 = A.
• May assume λ1 = · · · = λr = 1, λr+1 = · · · = λn = 0
since r = trace(A) = tracePdiag(λi)P′ = tracediag(λi).
• Let W = P ′Z ∼ N(0, I) by 3© (i).
• Z ′AZ = W ′diag(λi)W =∑r
i=1 W 2i ∼ χ2(r)
17
Page 18
Fundamental Theorem in Normal Sampling (p.186)
X1, · · · , Xn : IID N(µ, σ2) (random sample)
|⇒ (a) X =∑n
1 Xi/n ∼ N(µ, σ2/n)
(b) X and S2 =∑n
i=1(Xi − X)2/(n − 1) are independent
(c) (n − 1)S2/σ2 ∼ χ2(n − 1)
(Derivation)
(a) (done before)
(b) Let X = (X1, · · · , Xn)′ ∼ N(µ1, σ2I)
X = (1′1)−11′X
(n − 1)S2 = X ′(I − 1(1′1)−11′)X = ||(I − 1(1′1)−11′)X||2
Since (1′1)−11′(σ2I)(I − 1(1′1)−11′) = 0,
X = (1′1)−11′X and
X1 − X...
Xn − X
= (I − 1(1′1)−11′)X are indep.
∴ X and
n∑
i=1
(Xi − X)2 = ||(I − 1(1′1)−11′)X||2 are independent.
(c) Xd≡ σZ + µ1, Z ∼ N(0, I)
(n − 1)S2/σ2 d≡ Z ′ (I − 1(1′1)−11′)︸ ︷︷ ︸Z
idempotent with trace n − 1
18
Page 19
§3.6 t and F -Distributions
(I) (Student’s) t-distribution
T ∼ t(r)
(a) representational definition
Td≡ W√
V/r
where W ∼ N(0, 1), V ∼ χ2(r), V, W : independent.
(b) pdf
Γ( r+12
)
Γ(12)Γ( r
2)
(1 +
t2
r
)− r+12 1√
r, −∞ < t < ∞
mgf : does not exist.
(c) mean and variance
- E(T ) = 0 for r > 1 (E(T ) does not exist for r = 1)
- Var(T ) =r
r − 2for r > 2
Property
pdfT (r)(t)r→∞−→ 1√
2πe−
12t2
(use Stirling’s formula)
R code (p. 183)
(Derivation of pdf) (p.182-p.183)
-
T =
W√V/r
U = V
w = t√
u/r
v = u
19
Page 20
-∣∣∣det
(∂(w, v)
∂(t, u)
)∣∣∣ =
∣∣∣∣∣det
√
u/r 12t 1√
r√
u
0 1
∣∣∣∣∣ =
√u√r
pdfT (t) =
∫ ∞
0
pdfW,V (w, v) ·∣∣∣det
(∂(w, v)
∂(t, u)
)∣∣∣du
=
∫ ∞
0
1
Γ(r/2)2r2
ur2−1e−
12u 1√
2πe−
12(t√
u/r)2 ·√
u
rdu
=1√
2πrΓ(r/2)2r2
∫ ∞
0
ur+12
−1 exp(− 1
2(1 +
t2
r)u)du
Application (p.186)
X1, · · · , Xn : IID from N(µ, σ2)
|⇒ X − µ
S/√
n∼ t(n − 1)
where (n − 1)S2 =∑n
1 (Xi − X)2.
Statement :
P(∣∣∣
X − µ
S/√
n≤ tα/2(n − 1)
∣∣∣)
= 1 − α
(교재 tα/2,n−1) (p.257)
(II) (Fisher’s) F -distribution
F ∼ F (r1, r2)
(a) representational definition
Fd≡ U/r1
V/r2
where U ∼ χ2(r1), V ∼ χ2(r2), U, V : independent.
(b) pdf
Γ( r1+r2
2)
Γ( r1
2)Γ( r2
2)
(r1
r2
) r12 w
r12−1
(1 + r1w/r2)r1+r2
2
, w > 0
mgf : does not exist.
20
Page 21
(Derivation of pdf) (p.185)
-
W =
U/r1
V/r2
Z = V
u =
r1
r2wz
v = z
-∣∣∣det
( ∂(u, v)
∂(w, z)
)∣∣∣ =
∣∣∣∣∣det
r1
r2z r1
r2w
0 1
∣∣∣∣∣ =
r1
r2z
pdfW (w) =
∫ ∞
0
pdfU,V (u, v) ·(r1
r2
z)dz
=1
Γ( r1
2)Γ( r2
2)2
r1+r22
∫ ∞
0
(r1
r2
wz) r1
2−1
zr22−1(r1
r2
)z exp
(− 1
2
r1
r2
wz − 1
2z)dz
= result
Application (p.263, #5.4.25)
X1, · · · , Xn : IID from N(µ1, σ21)
Y1, · · · , Ym : IID from N(µ2, σ22)
indep.
(n − 1)S21 =
∑ni=1(Xi − X)2, (m − 1)S2
2 =∑m
i=1(Yi − Y )2
=⇒
(m − 1)S22
σ22
∼ χ2(m − 1)
(n − 1)S21
σ21
∼ χ2(n − 1)
indep.
=⇒
(m − 1)S22
σ22
/(m − 1)
(n − 1)S21
σ21
/(n − 1)
∼ F (m − 1, n − 1)
Statement :
P(F1−α/2(m − 1, n − 1) ≤ S2
2
σ22
/S21
σ21
≤ Fα/2(m − 1, n − 1))
= 1 − α
F1−α/2 ·S2
1
S22
≤ σ21
σ22
≤ Fα/2 ·S2
1
S22
21
Page 22
§3.7 The Uniform Distribution and Random Number
(§5.8 : p.288∼)
Uniform distribution
U ∼ U(a, b)
1© pdf : f(u) =1
b − aI(a,b)(u)
2© representational definition :
U ∼ U(a, b) ⇔ Ud≡ (b − a)Z + a, Z ∼ U(0, 1)
3© mean and variance : For U ∼ U(0, 1),
E(U) = 1/2, Var(U) = 1/12
Note that Ud≡ 1 − U
Probability Integral Transformation (Thm 5.8.1 p.288)
(i) X ∼ cdf F , continuous type, F : strictly ↑ (#5.3.1 p.253)
|⇒ F (X) ∼ U(0, 1)
(ii) U ∼ U(0, 1), F : cdf of a random variable (Thm 5.8.1 p.288)
|⇒ F−1(U) ∼ cdf F
where F−1(u) ≡ inft : F (t) ≥ u.∵ (i) P (F (X) ≤ u) = P (X ≤ F−1(u)) (assuming F is strictly ↑)
= F (F−1(u))
= u (continuous type)
(ii) P (F−1(U) ≤ x) = P (U ≤ F (x))
= F (x)
22
Page 23
Can show F−1(u) ≤ x ⇔ u ≤ F (x) for F−1(u) ≡ inft : F (t) ≥ u
1© F (F−1(u)) ≥ u, F−1(F (x)) ≤ x
u ≤ F (x) ⇔ F−1(u) ≤ x
2© F : continuous ⇒ F (F−1(u)) = u
3© F : continuous and strictly increasing
⇒ F (F−1(u)) = u, F−1(F (x)) = x
∵ 1© ⇐) F−1(u) ≤ x
⇒ u ≤ F (x + 1/n) ↓ F (x)
⇒ u ≤ F (x)
⇒) u ≤ F (x)
⇒ F−1(u) ≤ x by definition
2© ⇒) F (F−1(u) − 1/n) ≤ u ↑ F (x)
F (F−1(u)) ≤ u
b
u3
u2
u1
F−1(u3) F−1(u2) F−1(u1)
F (F−1(u3)) ≥ u3
F (F−1(u)) ≥ u
(eg) X ∼ Exp(1) ⇔ Xd≡ − log(1 − U), U ∼ U(0, 1)
Z ∼ N(0, 1) ⇔ Zd≡ Φ−1(U), Φ : cdf of N(0, 1)
23
Page 24
eg 5.8.4 (p.290-p.291)
(a) X1, X2 : IID N(0, 1)
X1 = R cos Θ
X2 = R sin Θ, 0 ≤ R < ∞, 0 ≤ Θ < 2π
pdf of (R, Θ)?
(Solution)
-
x1 = r cos θ
x2 = r sin θ1 − 1
-∣∣∣det
(∂(x1, x2)
∂(r, θ)
)∣∣∣ =
∣∣∣∣∣det
cos θ sin θ
−r sin θ r cos θ
∣∣∣∣∣ = r
pdfR,Θ(r, θ) =1
2πexp
(− 1
2(x2
1 + x22))· r, 0 ≤ r < ∞, 0 ≤ θ < 2π
∴ pdfR,Θ(r, θ) = re−12r2
I(0,∞)(r)︸ ︷︷ ︸· 1
2πI(0,2π)(θ)
︸ ︷︷ ︸12R2 ∼ Exp(1), Θ ∼ U(0, 2π) independent
(b) U1, U2 : IID U(0, 1)
X1 =
√−2 log(1 − U1) cos(2πU2)
X2 =√−2 log(1 − U1) sin(2πU2)
|⇒ X1, X2 : IID N(0, 1)
(Solution)
R =√
−2 log(1 − U1), Θ = 2πU2
⇒ 12R2 ∼ Exp(1), Θ ∼ U(0, 2π) independent.
pdfX1,X2(x1, x2) = re−
12r2 · 1
2π
∣∣∣det(∂(x1, x2)
∂(r, θ)
)∣∣∣−1
, −∞ < x1, x2 < ∞
= (2π)−1/2 · exp(−12x2
1)(2π)−1/2 exp(−12x2
2)
24
Page 25
§3.8 Order Statistics (§5.1 and §5.2)
Sampling and Statistics (§5.1 : p.233 ∼ p.235)
a populationobservation :
a part of the populationx1, . . . , xn
sampling
inference
Sampling : to get the “good representatives”, need “random sampling”(p.234)
Inference :
1© modelling : postulate a set of possible distributions
f(·; θ) | θ ∈ Ω (θ : parameter, Ω : parameter space)
(eg Bernoulli(p), N(µ, σ2), Exp(β), . . . )
2© design a sampling : often
X1, · · · , Xn : random sample (r.s.) IID (p.234)
- representative of an infinite population
3© choose a function of the r.s. for an inference:
T (X1, · · · , Xn) (or simply T ) : statistic (p.235)
(eg X or
√∑ni=1(Xi − X)2
(n − 1)or med(Xi),. . . )
4© establish theories for distributions of T (X1, · · · , Xn):
sampling distribution of T (X1, · · · , Xn)
(eg X1 + · · ·+ Xn ∼ Bin(n, p), X1 + · · · + Xn ∼ Poisson(nµ),
X ∼ N(µ, σ2/n), . . .)
Approximation of sampling distributions (§4.2 ∼ §4.5)
Except some trivial cases, difficult to derive the sampling distribution
- approximate it for large sample size n → ∞
25
Page 26
Order Statistics (§5.2 : p.238 ∼ p.242)
(eg) X1, X2 : IID Exp(1) r.v.’s
Let Y1 < Y2 denote the ordered X1 and X2, i.e.,
Y1 = min(X1, X2), Y2 = max(X1, X2).
(a) Find P (Y1 ≤ y1, Y2 ≤ y2) for 0 < y1 < y2 < ∞.
(b) Find the joint pdf of Y1 and Y2.
(c) Find the marginal pdf’s of Y1, Y2, respectively.
(Solution) Note that
pdfX1,X2(x, y) = e−x−yI(0,∞)(x)I(0,∞)(y)
(a) P (Y1 ≤ y1, Y2 ≤ y2)
= P (Y1 ≤ y1, Y2 ≤ y2, X1 < X2) + P (Y1 ≤ y1, Y2 ≤ y2, X2 < X1)
= P (X1 ≤ y1, X2 ≤ y2, X1 < X2) + P (X2 ≤ y1, X1 ≤ y2, X2 < X1)
=
∫ ∫
0<x<yx≤y1, y≤y2
e−x−ydxdy +
∫ ∫
0<y<xy≤y1, x≤y2
e−x−ydxdy
=
∫ ∫
0<x<yx≤y1, y≤y2
2e−x−ydxdy
=
∫ ∫
x≤y1, y≤y2
2e−x−yI(0<x<y)dxdy
=
∫ y2
0
∫ y∧y1
0
2e−x−ydxdy
= 1 − 2e−y2 − e−2y1 + 2e−y1−y2 (0 < y1 < y2 < ∞)
(b) pdfY1,Y2(y1, y2) =
∂2
∂y1∂y2P (Y1 ≤ y1, Y2 ≤ y2)
= 2e−y1−y2I(0<y1<y2)
26
Page 27
(c) pdfY1(y1) =
∫ ∞
−∞pdfY1,Y2
(y1, y2)dy2
=
∫ ∞
y1
2e−y1−y2dy2I(0,∞)(y1)
= 2e−2y1I(0,∞)(y1)
pdfY2(y2) =
∫ ∞
−∞pdfY1,Y2
(y1, y2)dy1
=
∫ y2
0
2e−y1−y2dy1I(0,∞)(y2)
= 2(1 − e−y2)e−y2I(0,∞)(y2)
Order statistics (p.238)
X1, · · · , Xn : IID rv’s of continuous type
The ordered X1, · · · , Xn are denoted by X(1) < X(2) < · · · < X(n) and
called the order statistics based on X1, · · · , Xn.
Pdf of order statistics (p.238, p.241)
X1, · · · , Xn : IID with pdf f & cdf F of continuous type.
(a) Joint pdf of X(1), · · · , X(n) :
pdfX(1),X(2),··· ,X(n)(y1, y2, · · · , yn) = n!
n∏
i=1
f(yi)I(y1<y2<···<yn)
(b) Marginal pdf of (X(r), X(s)) (1 ≤ r < s ≤ n) :
pdfX(r),X(s)(x, y) =
n!
(r − 1)! · 1! · (s − 1 − r)! · 1! · (n − s)!(F (x))r−1f(x)
× (F (y) − F (x))s−1−rf(y)(1 − F (y))n−sI(x<y)
(c) Marginal pdf of X(r) :
pdfX(r)(x) =
n!
(r − 1)! · 1! · (n − r)!F (x)r−1f(x)(1 − F (x))n−r
27
Page 28
Heuristic derivation (p.241)
(eg) (# 5.2.22) (p.249) (Exponential Spacings)
- X1, · · · , Xn : IID Exp(1)
- X(1) < · · · < X(n) : order statistics based on X1, · · · , Xn
-
Z1 = nX(1)
Z2 = (n − 1)(X(2) − X(1))...
Zr = (n − r + 1)(X(r) − X(r−1))...
Zn = 1 · (X(n) − X(n−1))
normalized spacings
|⇒ Z1, · · · , Zn : IID Exp(1)
In other words,
(X(r)
)1≤r≤n
d≡(1
nZ1 + · · ·+ 1
n − r + 1Zr
)
1≤r≤n
where Z1, · · ·Zn : IID Exp(1).
(Proof)
pdfX(1),··· ,X(n)(x(1), · · · , x(n)) = n! · e
−n∑
i=1x(i)
I(0<x(1)<···<x(n))
det
(∂(x(1), · · · , x(n))
∂(z(1), · · · , z(n))
)= det
1
n0 · · · 0 · · · 0
1
n
1
n − 1· · · 0 · · · 0
...
1
n
1
n − 1· · · 1
n − r + 1· · · 0
...
1
n
1
n − 1· · · 1
n − r + 1· · · 1
1
=1
n!
28
Page 29
0 < x(1) < · · · < x(n) ⇔ zi > 0 (i = 1, · · · , n),n∑
i=1
x(i) =n∑
i=1
zi
∴ pdfZ1,··· ,Zn(z1, · · · , zn) = n! · e
−n∑
i=1x(i) 1
n!
n∏
i=1
I(0,∞)(zi)
=n∏
i=1
e−ziI(0,∞)(zi)
(eg) (uniform order statistics)
U1, · · · , Un : IID U(0,1)
U(1) < · · · < U(n) : order statistics based on U1, · · · , Un
(a) Y1 = U(1)
Y2 = U(2) − U(1)
...
Yn = U(n) − U(n−1)
spacings
|⇒ pdfY1,··· ,Yn(y1, · · · , yn) = n! · I(0<yi, y1+···+yn<1)
In other words,
(U(1), U(2) − U(1), · · · , U(n) − U(n−1))′ ∼ Dirichlet (1, · · · , 1, 1)︸ ︷︷ ︸
n+1
i.e. with U(0) ≡ 0,
(U(r) − U(r−1)
)
1≤r≤n
d≡(
Zr
Z1 + · · ·+ Zn + Zn+1
)
1≤r≤n
where Z1, · · · , Zn, Zn+1iid∼ Exp(1) = Gamma(1,1)
(b) U(r) ∼ Beta(r, n − r + 1)(
∵ U(r)d≡ Z1 + · · · + Zr
(Z1 + · · ·+ Zr) + (Zr+1 + · · ·+ Zn+1)
)
(c) (U(r), U(s) − U(r))′ ∼ Dirichlet(r, (s − r), n − s + 1) (1 ≤ r < s ≤ n)
(∵ (U(r), U(s) − U(r))
′ d≡( Z1 + · · ·+ Zr
Z1 + · · ·+ Zn+1
,Zr+1 + · · ·+ Zs
Z1 + · · · + Zn+1
)′)
29