Bayesian Hierarchical Model
Ying Nian WuUCLA Department of Statistics
IPAM Summer SchoolJuly 12, 2007
Plan
•Bayesian inference•Learning the prior•Examples•Josh’s example
independently
2
unknown parameter
given constant
one’s height
Inference of normal mean
Example:
),(~]|,...,,[ 221 NnYYY
nYYY ,...,, 21repeated measurements
2 known precision
Prior distribution
),(~ 2 N
),( 2 known hyper-parameters
The larger 2 , the more uncertain about 2 , prior becomes non-informative
Bayesian inference
independently),(~]|,...,,[ 221 NnYYY
Prior: ),(~ 2 N
Data:
Posterior: )1
1,
1
1
(~],...,|[
2222
22
1
nn
nY
YY n N
n
jjYn
Y1
1
Compromise between prior and data
Bayesian inference
Prior:
Data:
)(~ p
)|(~]|[ ypY
Posterior: )|()()|(~]|[ yppypyY
likelihoodpriorposterior
Prior:
Data:
)(~ p
)|(~]|[ ypY
y
)|()(),(~],[ yppypY
]|[ yY
]|[ Y
Illustration
independently),(~]|,...,,[ 221 NnYYY
Prior: ),(~ 2 N
Data:
Inference of normal mean
Sufficient statistic: ),(~1
]|[2
1 nY
nY
n
jj
N
Y
]|[ Y
]|[ Y
Combining prior and data
Y
]|[ Y
2 large n/2 small
Combining prior and data
]|[ YY
2 largen/2small
Y
]|[ Y
)1
1,
1
1
(~],...,|[
2222
22
1
nn
nY
YY n N
Prior knowledge is useful for inferring
independently),(~]|,...,,[ 221 NnYYY
Prior: ),(~ 2 N
Data:
Learning the prior
Prior distribution cannot be learned from single realization of
Prior:
Data:
Learning the prior
mii ,...,1),,(~],|[ 22 N
iiiij njY ,...,1),,(~],,|[ 22 N
Prior distribution can be learned from multiple experiences
Prior:
Data:
mii ,...,1),,(~],|[ 22 N
iiiij njY ,...,1),,(~],,|[ 22 N
Hierarchical model
),( 2
1 2 i m…… ……
1,12,11,1 ,...,, nYYYiniii YYY ,2,1, ,...,,
mnmmm YYY ,2,1, ,...,,
1 2 i m…… ……
1Y iY mY2Y
Hierarchical model
Collapsing
iiiii dppp )|()|()|( YY
yprojecting
Prior:
Data:
mii ,...,1),,(~],|[ 22 N
iiiij njY ,...,1),,(~],,|[ 22 N
Sufficient statistics
in
jij
ii Yn
Y1
1
),(~]|[2
iiii n
Y N
),(~],|[2
22
ii nY
N
),(~]|[2
iiii n
Y N
Integrating out i
Collapsing
),(~],|[2
22
nYi
N
Estimating hyper-parameter
m
iiYm 1
1
nY
m
m
ii
2
1
22 )ˆ(1
ˆ
Empirical Bayes
Borrowing strength from other observations
22
22
ˆ1
ˆ1
ˆˆ
n
nYi
i
iY
i
Hyper prior: )(~),( 2 p e.g., constant
),( 2
1 2 i m…… ……
1,12,11,1 ,...,, nYYYiniii YYY ,2,1, ,...,,
mnmmm YYY ,2,1, ,...,,
Full Bayesian
Full Bayesian
)]|()|([)(~],...,;,...,;[1
11 ii
m
iimm ppp YYY
)]|()|([)(),...,|,...,;(1
11 ii
m
iimm pppp YYY
m
iim ppp
11 )|()(),...,|( YYY
m
iiimmm dppp
1111 ),|(),...,|(),...,|,...,( YYYYY
Bayesian hierarchical model
Stein’s estimator
miY iii ,...,1),,(~]|[ 2 N
Example: measure each person’s height
ii Y 22
1
)ˆ( mm
iii
E
im
ii
i YY
m)
)2(1(
~
1
2
2
2
1
2)~
( mm
iii
E3m
Stein’s estimator
Stein’s estimator
222 ][ iiYE222 ][ mY
ii
ii E
miY iii ,...,1),,(~]|[ 2 N
Y
miY iii ,...,1),,(~]|[ 2 N
Stein’s estimator
),0(~ 2 Ni
Empirical Bayes interpretation
Beta-Binomial example
),(~]|[ nY Binomial
e.g., flip a coin, is probability of head
Y is number of heads out of n flips
yny
y
nyYp
)1()|(
Data:
Pre-election poll
),(~ 01 aaBeta
11
01
01 01 )1()()(
)()(
aa
aa
aap
01
1][aa
a
E
Conjugate prior
),(~]|[ nY Binomial
yny
y
nyYp
)1()|(
Data:
11
01
01 01 )1()()(
)()(
aa
aa
aap
),(~ 01 aaBetaPrior:
Posterior: ),(~]|[ 01 aynayyY Beta
01
1~]|[aan
ayyY
E
Hierarchical model
Examples: a number of coins probs of head a number of MLB players probs of hit pre-election poll in different states
),(~ 01 aai Beta
Dirichlet-Multinomial
Roll a die: ),...( 61
),(~]|,...,[ 61 nYY lmultinomia
6161
6161 ...
,...,)|,...,( yy
yy
nyyp
Conjugate prior
6161
6161 ...
,...,)|,...,( yy
yy
nyyp
16
11
61
61 61 ...)()...(
)...()(
aa
aa
aap
),...,(~ 61 aaDirichlet
),(~]|,...,[ 61 nYY lmultinomia
),...,(~ 61 aaDirichlet
),...,(~]|[ 6611 ayayy Dirichlet
Data
Prior
Posterior
61 ...~]|[
aan
ayy kk
k
E
Hierarchical model
),...,(~ 61 aai Dirichlet