Generating Random Variables(1)

Post on 09-Jul-2016

232 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

generating random variables

Transcript

1

Computational Statistics Ch 3 Methods for generating random variables Prof. Donna Pauler Ankerst

Book Statistical Computing with R Maria L. Rizzo Chapman & Hall/CRC, 2008 Please review yourself: Chapter 1: Introduction to the R environment Chapter 2: Probability and statistics review

2

Random variable simulation

 The fundamental tool required in compuational statistics is the ability to simulate random variables from specified probability distributions.

 All random variable generation starts with uniform random variable generation.

 A uniform distribution means all values in the domain space of consideration have equal probability of occurrence.

3

Discrete uniform

 Let the random variable Y be the outcome from the roll of a single fair die. What is the distribution of Y?

Answer: Y ~ Discrete Uniform on {1,2,3,4,5,6} where p(Y = i) = 1/6 for i = 1,...,6.  Write R code to simulate a random sample of 600 observations of Y and show a histogram to prove it follows the correct distribution.

Answer: >ysamp=sample(1:6,600,replace=T)

>hist(ysamp)

4

Discrete uniform >ysamp=sample(1:6,600,replace=T) >hist(ysamp)

I expect the bars to be of the same height, 100 of each, so I try again.

This must be random variation, double check by increasing from 600 to 60000.

5

Discrete uniform >ysamp=sample(1:6,60000,replace=T) >hist(ysamp)

That is better, now I am convinced the sample command is doing what I think it is doing.

6

More about the R sample function

The multinomial distribution is not uniform in general, it allows different values to have different probabilities as long as they sum to one.

7

Continuous uniform

>ysamp=runif(1000) >hist(ysamp,prob=T)

The height is near 1, correct for the U(0,1) density.

 Let the random variable Y come from the continuous uniform distribution on the interval (0,1) [U(0,1) density]. Write down the density function of Y.

Answer: Y ~ f(Y) where f(Y) = 1 for 0 < Y < 1, and 0 otherwise.  Write R code to simulate a random sample of 1000 observations of Y and show an empirical density plot to prove it follows the correct distribution.

Answer:

8

Uniform(0,1)

 The U(0,1) distribution provides the base generating point for generating most distributions.

 Most programming languages, such as C, and statistical packages, such as R, include a U(0,1) generator.

 There are many computer algorithms for generating U(0,1) random variables based on congruential methods. These fall more in the real of informatics and are beyond the scope of this course.

 For this course we will assume that we have a method for generating a U(0,1) random variable.

9

Uniform(a,b)

>hist(1+3*runif(1000),prob=T)

 Question: How would you generate Z ~ U(a,b), the uniform distribution on (a,b), if you only had a U(0,1) generator available?  Answer: Generate Y ~ U(0,1) and let Z = a + (b-a)Y.

Write the R code to verify this for a = 1, b = 4.

The height of a U(a,b) density is 1/(b-a) and the height appears to be near 1/3 as expected for U(1,4).

10

Generators in R

In general, p returns the cdf and d the pdf evaluated at a given value, q returns a quantile and r returns a random number from the distribution. Try these functions out yourself. Use the help command, eg help(runif).

11

Discrete random variables

Although R now contains generators for most discrete distributions, and the list is constantly growing, we will learn the algorithms behind them. Specifically, we will now cover how to generate from the following distributions  Bernoulli  Binomial  Discrete

assuming that a U(0,1) generator is available.

12

Bernoulli(p)

(0,1).~for 1)(0)()(1)(

: worksalgorithm theProof

)( 2.))1,0(~ Generate 1.)

:Algorithm

(0,1).for 1)0(

)1( ,}1,0{)(~

UUp-pUPXPp pUPXP

pUIXUU

ppXP

pXPXpBerX

=>==

=≤==

≤=

−==

==

∈⇒

13

Binomial(n,p)

).( ddistribute is variablesrandom )(t independen of sum theand

,1for )(~)( slide, previous By the : worksalgorithm theProof

)( 2.)

1for )1,0(~ Generate 1.):Algorithm

).(~ where Therefore,

0,1,..., ,)1( )( if ),(~

1

1

n,pBinpBern

,...,nipBerpUI

pUIX

,...,niUU

pBerXXX

nk-ppkn

kXPpnBinX

i

n

ii

i

iid

i

n

ii

n-kk

=≤

≤=

=

=

=⎟⎟⎠

⎞⎜⎜⎝

⎛==

=

=

You may have forgotten your probability distributions. These are reviewed in Ch. 2 of the Rizzo book or can be found in Wikipedia. It is important to know the distributions well to be able to simulate from them. The Binomial distribution counts the number of successes in n independent Bernoulli trials.

14

Discrete(x1,x2,...,xn)

( )

=

=

=

∈=

≤=

===

===

=>=

n

iii

ii

nnn

j

iij

n

iiii

nn

.IUIxX

X

XxXPFFFIFFIFFI

,...,njpFF

ppxX

xxxDiscretexxxX

1

1212101

10

1

2121

)( 2.)

U(0,1)~ UGenerate 1.): generate toAlgorithm

. of cdf theis )( that Note].,[),...,,[ ),,[ lssubinterva the

into [0,1] interval thepartition and,1for and 0Let

.1 and 0y probabilit with if

),...,,(~ ,...,,on on distributi discrete a follows

⎩⎨⎧

= CAA

AI 0 1

)(

:functionIndicator

15

Discrete(x1,x2,...,xn)

.1,...,for

)()(:Proof

)( 2.)

(0,1)~ Generate 1.): generate toAlgorithm

].,[),...,,[ ),,[ lssubinterva the

into [0,1] interval thepartition and,1for and 0Let

1

1

1212101

10

nipFF

IUPxXP

.IUIxX

UUX

FFIFFIFFI

,...,njpFF

i

ii

ii

n

iii

nnn

j

iij

==

−=

∈==

∈=

===

===

=

=

16

Discrete(x1,x2,...,xn)

=

=

∈=

===

===

n

iii

nnn

j

iij

.IUIxX

UUX

FFIFFIFFI

,...,njpFF

1

1212101

10

)( 2.)

(0,1)~ Generate 1.): generate toAlgorithm

].,[),...,,[ ),,[ lssubinterva the

into [0,1] interval thepartition and,1for and 0Let

REMARK This algorithm can be extended for a very large and even infinitely countable n, but the search for the correct interval can become numerically infeasible. For these cases binary and indexed searches can be used (Ripley 1987).

17

Inverse Transform Method

 To move on to generating from common continuous distributions or infinite discrete distributions we need a new general and useful technique.

 The inverse transform method works for any random variable that has an invertible cdf.

18

Theorem: Probability Integral Transformation

.))(())((

))())((())(()(

uniform. a of cdf thefollows )( thatshow We.)P( satisfies variable

1)uniform(0, a of cdf the10For :Proof

).1,0(~)(then ),( cdf with variablerandom continuous a is if note,First

11

11

uuFFuFXPuFXFFPuXFPuUP

XFUuuUU

u

UXFUxFX

XXX

XXXX

X

X

X

==≤=

≤=≤=≤

=

=≤

≤≤

=

−−

−−

).()(: of cdf of Def.xXPxFX

X ≤=

19

Theorem: Probability Integral Transformation

on.distributi (0,1) theof cdf theof definitionby )())(())())((())(()( :Proof

).( 2.)(0,1).~ Generate .)1

:)( pdf hence and ,)( cdf with X generate Then to

.(0,1)for })(:inf{)(ation transforminverse theDefine

11

1

1

UxFxFUPxFUFFPxUFPxXP

UFXUU

xfxF

uuxFxuF

XX

X-XX

-X

-X

XX

XX

=≤=

≤=≤=≤

=

∈==−

20

Example from book

21

Good fit!

22

Example: Exponential distribution

The exponential distribution is often used in industrial research to model time until failure of machines, light bulbs, etc.

23

Example: Exponential distribution

.)1(1)1P(1)1P()1( :Proof(0,1).both are 1 and since )/log( setting toequivalent is This

)/1log()1log(

)exp(1)(exp1

.for )( Solve)(Set (0,1).~ Generate

1

uuuUUuuUPUUUUX

UXλXUλXUUλX

XUXFUFXUU

X-X

=−−=−<−=≤−=≤−

−−=

−−=

−=−

−=−

=−−

=⇔=

λ

λ

To generate n Exp(lambda) random variables in R: >-log(runif(n))/lambda R has an exponential generator >rexp(n,lambda)

24

Inverse Transformation Method, Discrete Case

i

iXiXi

iXiXi-X

X

iii-

xXxFUxFx

,U~U

xFuxFxuFxF

xxxX

=•

≤<•

≤<=

<<<<

+

Set )()( where Find

)10( is algorithm The

).()( where,)(

is transforminverse then the),( ofity discontinu of points theare

and variablerandom discrete a is If

ons.distributi discretefor applied be alsocan transforminverse but theknown Less

1

11

11 !!

25

Example: Poisson distribution

.Set )()1( where Find

)10( is algorithm The

.1)()1( and )()1()(...

)0()0(!1

)1()0()1(

)0()0(:lysequential scdf' theCalculate

on.distributiPoisson thefrom generate toused be could transforminverse the

,algorithmsefficient more are hereAlthough t

xXxFUxFx

,U~U

xxfxfxfxFxF

ffeeffF

efF-λ

=•

≤<−•

+=++−=

+=+=+=

==

λ

λλ

To generate n Poisson(lambda) random variables in R: >rpois(n,lambda)

26

Acceptance-Rejection Algorithm

You want to generate from some distribution f(y) but just do not know how.

27

Acceptance-Rejection Algorithm

What you need is a density g(y) that you can sample from and such that cg(y) covers f(y) in the sense that cg(y) ≥ f(y) for all y such that f(y) > 0.

f(y)

cg(y)

28

Acceptance-Rejection Algorithm

(a). back to go otherwise

;return and accept )()( If (c)

(0,1)~ Generate (b))(~ Generate (a)

required variablerandomeach For

from. samplecan you that and

0)( with )()( satisfying )(density Find

:)(~ variablerandom a generate To

YXYtcgtfU

UUYgY

tftctgtfYg

xfX

=≤

>∀≤•

29

Acceptance-Rejection Algorithm

).()( since 1)()( Note

(0,1).~for )( since )()(

|)()()|(accept that see we(c)In

(a). back to go otherwise

;return and accept )()( If (c)

(0,1)~ Generate (b))(~ Generate (a)

required variablerandomeach For

YcgYfYcgYf

UUaaUPYcgYf

YYcgYfUPYP

YXYYcgYfU

UUYgY

≤≤

=≤=

⎟⎟⎠

⎞⎜⎜⎝

⎛≤=

=≤

So the closer cg(Y) is to f(Y) the higher the acceptance probability and the more efficient the algorithm.

30

Acceptance-Rejection Algorithm

f(y)

cg(y)

The closer cg(Y) is to f(Y) the higher the acceptance probability and the more efficient the algorithm. It can be a challenge to find an efficient g(Y). High probabilities

of rejection regions

31

Acceptance-Rejection Algorithm

f(y)

Adaptive rejection, implemented in the R Winbugs package, uses g(Y) as a piecewise spline approximation to densities f(Y) that are log concave [Not covered in detail here].

g(y) Sometimes g(Y) is called the envelope density.

32

Acceptance-Rejection Algorithm

.])1(11mean has that variable,random geometric a istry eachon 1y probabilit with success a until iterations ofnumber [The

. is )(~ single a generate torequired iterations ofnumber theaverageOn

y.probabilit acceptance average themaximizingfor ideal is 1 So

.1)(1)()()()()|(accept(accept)

is averageon y probabilit acceptance that theimplies This .continuous )( Assume

. )( ~ where)()()|(accept Back to

c/c//p/cp

cXfX

c

cdyyf

cdyyg

ycgyfdyygyPP

Yg

YgYYcgYfYP

YYY

==

=

=

====

=

∫∫∫

33

Acceptance-Rejection Algorithm

. since )( )(

shownjust sby what wa 1

)()()(

rule Bayesby (accept)

)()|(accept )( ~ ere wh)accept|(accept)|(

).(accept)|(show weaccepted, are that sgeneration only take weSince

:book]in case [discrete casecontinuous for the )(on distributiright thefollows algorithm

rejection acceptance by the generated r.v. that theProof

XYXfYfc

YgYcgYfP

YgYPYgYYPXP

XfXP

XfX

===

=

=

=

=

34

Acceptance-Rejection Algorithm

The first step to successfully using acceptance-rejection is knowing what your density f(y) looks like. For multivariate densities this can be very difficult.

35

Example: Beta distribution

Beta(1,1) = U(0,1)

.)1()(

)(;)(

.1!0,)!1()( 1, numbers naturalFor

0. 0, 1,0for )1()()()()(),Beta(~

Beta(2,2).~ nsobservatio 1,000 generate toalgorithmrejection -acceptance theUse

2

11

+++=

+=

=−=Γ≥

>>≤≤−ΓΓ

+Γ=⇒

=

−−

βαβααβ

βαα

βαβαβα

βα βα

XVarXE

nnn

xxxxfX

Xn

36

Example: Beta distribution

ns.observatio 1,000 requested theget toiterations 1,500 i.e. ,acceptance oneevery

for needed be woulditerations 1.53/2 averageon and 0.672/31/ be y wouldprobabilit acceptance

average themeans This 3/2. and (0,1))( ischoice natural a [0,1] is )( ofsupport theSince

3/2.)6(1/2)(1/2 of valueawith 1/22)2/(2at mean itsat max a hason distributi This

).1(6)1(!1!1!3 )1(

)2()2()22()(

ly.analyticalout it writeon,distributiyour know get to step,First

Beta(2,2).~ nsobservatio 1,000 generate toalgorithmrejection -acceptance theUse

1212

==

≈=

==

=

=+

−=−=−ΓΓ

+Γ=

=

−−

cc

cUxgxf

xxxxxxxf

Xn

Beta(2,2) is symmetric and unimodal in [0,1].

37

The book hastily chooses the same U(0,1) generating density but c = 6 instead of 3/2. It therefore requires 4 times the iterations.

Book

38

R Code for Example 3.7

5873 iterations required to get

1000 r.v.’s

39

Missing a square as shown next. The correct se yields similar conclusions.

Errors in the book

Ch 2, not Ch 1. Show next.

40

Variance of the qth sample quantile

expected. be toas quantiles extremefor iancehigher var causing numerator, then faster tha zero approachesy r typicalldenominato theons,distributi tailed-heavyFor

on.distributi

theof tailsat the 0 approaches also )( However, 1). 0, (qon distributi the of tailsat the 0 equals and (median) 0.5qat 1/4 of maximum a hasnumerator The

accuracy. moreget to increasecan you so increases as Decreases:Note

on.distributi sampled theofdensity theis and size sample theis where

,)()1()ˆVar(

is ˆ quantile sampleth theof varianceThe

2

=

=•

−=

q

qq

q

xf

nn

fn

xnfqqx

xq

41

Transformations

(0,1).t independen are )2sin()log(2~ Z

)2cos()log(2~ Z

t independen are (0,1)~ 4.)

df on,distributi- sStudent' ~t independen are l),(0,~ .)3

~// oft independen ,, 2.)

(df) freedom of degrees 1 with square-chi l)(0,~ 1.)

sources) books, in various found becan required,not (proofs Examples

. variablesrandom generatingfor methodefficient more a providecan ,applicable if tions,Transforma

2

1

2

,22

21

2

NVU

VU

UU,V

nttV/nZTχ~VNZ

FnVmUFVUχ~VχU ~

~χZVNZ

nn

nmnm

π

π

=⇒

=⇒

=⇒

Error in these formulas in book

42

Transformations

lecture. in thisearlier on distributilexponentia thereviewed Weon.distributi )( theison distributi ),1( The

.)(,)( are varianceandmean The

.0,)(

)( is of pdf theif 0parameter

rate and 0parameter shapeon with distributi a follows0

(0,1).domain has which on,distributi seen thejust have We

),(~ t independen are ),(~,),(~ 5.)

2

1

λλ

λλ

λλ

λλ

λ

ExpGamma

rXVarrXE

xexr

xfX

r Gamma X

Beta

srBetaVU

UXsGammaVrGammaU

xrr

==

=>

>>

+=⇒

−−

43

> u = rgamma(1000,shape=3,rate=1) > v = rgamma(1000,shape=2,rate=1) > qqplot(qbeta(ppoints(1000),3,2),u/(u+v)) > abline(0,1)

ppoints():  Generates  a  symmetric  series  in  [0,1].  You  could  have  used  seq(0,1,length=1000)  instead  to  get  evenly  spaced  numbers  in  [0,1]  or  the  rbeta(1000,3,2)  to  just  get  a  random  sample.  

Example 3.8

44

Transformations

=

=>

>>•

−−

Next attention. special deserve that ations transformspecial are mixtures and nsConvolutio ns.convolutio called are variablesrandom of sums of onsDistributi

them.add and variables )( simulate variable,random ),( single a simulate toSo,

).,( is variablesrandom )( i.i.d. of sum The .)6

.)( ),1( that Note . variance,and

ismean The .0,)(

)( is of pdf theif 0parameter

rate and 0parameter shapeon with distributi a follows0

2

1

λλ

λλ

λλλ

λλ

λ λ

ExprrGamma

rGammaExp

ExpGammar/

r/xexr

xfX

r Gamma X

xrr

45

Convolutions

sum. their takingand ,...,, generatingby n convolutio aison that distributi a with variablerandom a simulate torwardstraightfo isIt

ofn convolutio fold)-( thecalled is

~

~,...,,

21

21

21

n

XS

Sn

X

iid

n

XXX

FnF

FXXXS

FXXX

+++= !

46

Distributions related by convolution

).,( is onsdistributi )(t independen ofn convolutio The

0,1,...for )1()()(~:successfirst theuntil failures ofnumber thecountson distributi Geometric

A ons.distributi Geometrict independen ofn convolutio theison distributi This

0,1,...for )1(1

)(),(~

. toequal each trialon success ofy probabilit with t trialsindependen among obtained is successth theuntil failures ofnumber thecounts ),(on distributi binomial negative The

. variables(0,1) squared ofn convolutio theis the0, integer For 2

λλ

χ

rGammaExpr

xppxXPpGeomX

r

xppxrx

xXPprNegBinX

pr

prNegBin

Niidvv

x

rx

v

=−==⇒

=−⎟⎟⎠

⎞⎜⎜⎝

⎛ −+==⇒

>•

There are now R functions for these (rchisq,rnbinom, rgamma) but this is not just an R course.

47

Mixtures

=

=

=

=

=>

=

-

|

-|

i

21

.1)(such that function weighting

and numbers realby indexed )( onsdistributi offamily afor

)()( )(

is ofon distributi theif mixture continuous a follows variablerandomA

ies.probabilit mixingor

weightsmixing thecalled are ' The .1such that 0 and,..., variablesrandomt independen of sequence somefor

)( )( sum weighteda is ofon distributi theif mixture discrete a is variablerandomA

yff

yxF

dyyfxFxF

XX

sπXX

xFπxFXX

YY

yYX

YyYXX

ii

i

iXiX i

ππ

48

Example: Mixture of Normals

To generate a random variable that follows a 0.5:0.5 mixture of independent N(0,1) and N(3,1) distributions: 1.) Generate an integer k in {1,2}, where P(1) = P(2) = 0.5. 2.) If k = 1, draw X ~ N(0,1); otherwise X ~ N(3,1).

A mixture of Normals is different from a convolution of Normals. A mixture of Normals may not be Normal, it may be bimodal or multimodal. Is a convolution of Normals Normal?

49

Example: Convolution of Normals

What is the distribution of the convolution of independent N(0,1) and N(3,1) distributions? Let X = X1 + X2, where X1 ~ N(0,1) and X2 ~ N(3,1) Then from Statistics courses we know that   E(X) = E(X1) + E(X2) = 0 + 3 = 3 ...linearity of expectation   Var(X) = Var(X1) + Var(X2) = 1 + 1 = 2 ...independence   X ~ Normal ...characteristic functions

Therefore, X ~ N(3,2).

50

Example: Mixture of Gammas

densities.component over the mixture theof estimatedensity empiricalan overlay and , from nsobservatio 5000 simulate toloops

withoutcode Refficient Write1,...,5.for ,15/ are iesprobabilit

mixing theandt independen are )/1,3( where

,

:onsdistributi Gamma of mixture following heConsider t5

1

X

j

jj

jXjX

FjjπjrGamma~X

FπFj

==

==

=∑=

λ

WAKE-UP CALL: A question like this could appear on the exam!

51

Example: Mixture of Gammas

ghts.higher weiget meanshigher with onsdistributi theSo 5/15. 4/15, 3/15, 2/15, 1/15,

ies,probabilit mixing respective with 53,6,9,12,1 are means The

1,...,5.for 3/)( on,distributi for the Recall...simulating areyou what know Get to

1,...,5.for ,15/ iesprobabilit

mixing with )/1,3( where,5

1

Gamma

jjrXEGamma

jjπ

jrGamma~XFπF

jj

j

jjj

XjX j

===

==

===∑=

λ

λ

52

Ex. 3.12 R code

density(): Generates a kernel density estimate, like a smoothed histogram, for a sample of data.

Your code would not have to appear exactly like this but would need to work and not contain loops.

53

Ex. 3.12 Mixture of several gamma distributions

54

Ex. 3.15 Poisson-gamma mixture

)).1/(,(~

),(~ ),(~X

ly,Specifical ).,(~ whereons,distributi )( of mixture continuous a also is success)th theuntil

failures ofnumber the(countson distributi Binomial Negative The

ββ

βλλ

βλ

λ

+

rNegBinX

rGammaPoisson

rGammaPoissonr

Question: Simulate 5000 observations of a variable that counts the number of failures until the 4th success, where the probability of success on each independent trial is 0.75 using a continuous mixture.

55

Ex. 3.15 Poisson-gamma mixture

Question: Simulate 5000 observations of a variable that counts the number of failures until the 4th success, where the probability of success on each independent trial is 0.75 using a continuous mixture. Solution: From the description we recognize this as a NegBin random variable. By the continuous mixture requirement we, see that we need to use the Poisson-Gamma method.

3. and 4 need))1/(,(~),(~ ),(~X

==∴

+⇒

β

βββλλ

rrNegBinXrGammaPoisson

56

Ex. 3.15 Poisson-gamma mixture

Histogram of x

x

Frequency

0 2 4 6 8

0100

200

300

400

500

600

> hist(x) Check that the simulations make sense: if X counts the number of failures until the 4th success where each success probability is 0.75, expect lots of 0’s.

57

Ex. Poisson-gamma mixture

R code and output comparing the simulation probability mass function with a NegBin(4,.75) mass function:

58

The Empirical Cumulative Distribution Function (ecdf, Ch 2 of book) is a good method for comparing distributions.

.5.0/)](1)[()( oferror standard The

.1,1,...,1,/

,0)(

:by ... sample ordered observedan for defined and)()( of estimate unbiasedan is )( ecdf The

)(

)1()(

)1(

)()2((1)

nnxFxFxF

xxnixxxni

xxxF

xxxxX P xFxF

n

n

iin

n

n

≤−=

⎪⎩

⎪⎨

−=<≤

<

=

≤≤≤

≤=

+

Comparing distributions

59

Ex. 3.15 Poisson-gamma mixture

nnxFxFxFn

5.0/)](1)[()]([ s.e. ≤−=

R code and output comparing the ecdf of the simulated sample with the probability mass function of a NegBin(4,.75):

60

Multivariate Normal distribution

( ) .)(Σ)(21expΣ2)(

function density hasit if ),Σ,(~ denoted ,matrix variance

symmetric definite positive and r mean vectoon with distributi

Normal temultivaria ldimensiona- a has vector randomA

12/12/

1

111

1

1

⎭⎬⎫

⎩⎨⎧ −ʹ′−−=

⎟⎟⎟

⎜⎜⎜

⎟⎟⎟

⎜⎜⎜

=

∈⎟⎟⎟

⎜⎜⎜

−−− µµπ

µ

σσ

σσ

µ

µ

µ

XXXf

NX

dRx

xX

d

d

ddd

d

d

d

d

!"#"

!

"

"

Covered in Multivariate Statistics

determinant inverse

61

Multivariate Normal distribution

The R package already contains efficient functions to generate multivariate normal random variables: •  mvrnorm in the MASS package •  rmvnorm in the mvtnorm package. However, it is useful to see how these are traditionally generated from univariate N(0,1) random variables.

62

Multivariate Normal distribution

here.shown not isIt course. statisticsearly an in donebeen havemight and functionssticcharacteri requires This Normal. still is variablerandom ddistribute

Normal a ofn combinatiolinear a that is show todifficult morebit A

.)()()()( )()()()()(

yields variablesof nsnsformatiolinear trafor varianceandn expectatioof rules theapplyingThen . and (0,1)~Let

(0,1). a from simulated is thishow show and case ),( e Univariat thebegin with we

thisshow To ns.observatio (0,1) i.i.d. from generated bemay ),Σ( on,distributi Normal temultivaria a followingn observatioAn

22

2

σσ

µσµ

σµ

===+=

=+=+=+=

+=

ZVarσZVarσZµVarYVarZEσZEµEσZµEYE

σZµYNZ

NNNd

µ,Nd

63

Multivariate Normal distribution

Σ).,( on,distributi desired thehave to transform want to weNow

matrix.identity theis where),,0(~Then Z

.)( vector a into esecollect th and : variablesrandom(0,1) i.i.d. generatefirst WeΣ).,(~ generate want to weSuppose

similarly. proceeds variablerandom Normal temultivaria a generate To

.Set 2.)(0,1).~ Generate 1.)

:(0,1) a from variablerandom ),( a generate to how shows thisand ),(~ and (0,1)~

11

2

2

µ

µ

σµ

σµ

d

d

dd

d

NZ

ddIIN

,...,ZZZ,...,ZZNdNY

σZµYNZ

NNNYσZµYNZ

×

ʹ′=

+=

⇒+=∴

64

Multivariate Normal distribution

.dimensions on Normal also is and

)()()()()()()()()(

:are varianceandn expectatiofor ruleslinearity The ).( thandimension different a havecan that Note .in ector constant v a is and

in matrix constant a is where, and ),(~ ZSuppose

course. Statistics teMultivaria in the covered is and algebramatrix ofbit little arequiresIt shown.just case univariate thegeneralize results following The

qY

CCΣCZCVarCZVarbCZVarYVarbCbZCEbECZEbCZEYE

dqZYRb

RCbCZYNq

dqd

ʹ′=ʹ′==+=

+=+=+=+=

+=Σ ×

µ

µ

65

Multivariate Normal distribution Suppose that Σ can be factorized as Σ =C "C . Then if Z~Nd (0,I ) and Y =CZ +µ,

E(Y ) = E(CZ +µ) = E(CZ )+E(µ) =CE(Z )+µ = µVar(Y ) =Var(CZ +µ) =Var(CZ ) =CVar(Z ) "C =C "C = Σ.

and Y is also Normal on d dimensions. This gives the algorithm for generating the random variable, but how is the decomposition of Σ performed?

Three methods in R: •  eigen: spectral or eigenvalue decomposition •  chol: Cholesky factorization •  svd: Singular value decomposition

66

Defintions and examples of how to use these from the book

Three methods in R: •  eigen: spectral or eigenvalue decomposition •  chol: Cholesky factorization •  svd: Singular value decomposition

You will not need to memorize the specific decompositions for the exam, they would be provided and you would have to write R code.

67

Ex. 3.16 Spectral decomposition method

68

Ex. 3.16 Spectral decomposition method

69

Ex. 3.16 Spectral decomposition method

Compare simulated means, covariances to the truth (here covariances = correlations since the variances = 1).

70

Ex. 3.17 Singular value decomposition method

71

Ex. 3.18 Choleski factorization method

72

Wishart distribution

mean. sample theis 1 where

,~)(

then),(~,..., if that learning recallmight You

. variancessamplefor on distributi assumedcommonly a iswhich on,distributi square-chi the tocollapses Wishart thecase ldimensiona-1

In the matrices.for on distributi usedcommonly most theisWishart theand matrices random exists there vectorsrandom oaddition tIn

1

21

2

1

2

21

=

−=

=

n

ii

n

n

ii

iid

n

Xn

X

XX

NXX

χσ

σµ

73

Wishart distribution

mean. sample theis 1 where

,~)(

then),(~,..., if that learning recallmight You

. variancessamplefor on distributi assumedcommonly a iswhich on,distributi square-chi the tocollapses Wishart thecase ldimensiona-1

In the matrices.for on distributi usedcommonly most theisWishart theand matrices random exists there vectorsrandom oaddition tIn

1

21

2

1

2

21

=

−=

=

n

ii

n

n

ii

iid

n

Xn

X

XX

NXX

χσ

σµ

74

Wishart distribution

used.hardly isit but ondistributi square-chi central-non a exists thereasjust Wishart central-non a exists

there Wishart;central a echnicallyactually t is introduced Wishart The

).,(~ and ),0(~ and 1,For

matrices. definite-positive symmetric of space the tobelongs and

function gamma theis where,)2/]1([2

)}Σ(2/1exp{)(

:like looksdensity n theinformatioyour for but memorize, tonecessary not isIt

).,Σ( as written , (df) freedom of degrees and Σmatrix scaleith Wishart wis matrix theofon distributi Then the on.distributi )Σ,0(

t independenan roweach with matrix, data an is where that Suppose

21

222

1

2

1

2/4/)1(2/

12/)1(

===•

Γ−+Γ

−=

×

×ʹ′=

=

=

−−−

nWMNXXMd

M

jnΣ

MtraceMMf

nW~MnMdnN

dnXXXM

n

iid

i

n

ii

d

j

nddnd

dn

W

d

d

σχσσ

π

75

Wishart distribution

course. thisof scope thebeyond proof with next, provided is which ion,decomposit sBartlett' called algorithmefficient more a is There

ariable. Wishart voneevery for Normals temultivaria generate tohave wesincet inefficien be wouldalgorithm thisHowever,

rward.straightfo is variablerandom Wishart a generating ons,distributi Normaltemultivaria generate tohow learnedjust that weand above definition Given the

).,Σ( as written , (df) freedom of degrees and Σmatrix scaleith Wishart wis matrix theofon distributi Then the on.distributi )Σ,0(

t independenan roweach with matrix, data an is where that Suppose

n

nW~MnMdnN

dnXXXM

d

d ×

×ʹ′=

76

Bartlett’s Decomposition (not required to memorize for the exam)

).,Σ(~Then 2.). and 1.)by Generate

ngular.lower tria is where,Σion factorizat Choleski Obtain the

:),Σ( a generate To

).,(~Then

.,...,1for ~ .)2

for )1,0(~ 1.)

satisfying

entriest independenh matrix wit random ngular lower tria a be )(Let

21

nWLLAA

LLL

nW

nIWTTA

d iT

j iNT

ddTT

d

d

dd

inii

iid

ij

ij

ʹ′•

ʹ′=•

ʹ′=

=

>

×=

+−χ

77

End of Chapter 3

Chapter 4: Visualization of Multivariate Data Has some interesting and straightforward concepts but the R packages used in the chapter are now outdated. There are now many advanced R packages for plotting data, including ggplot. Chapter 4 is not covered in this course.

top related