Page 1
CAS Spring Meeting - Session C3Bayesian Analysis with
Monte-Carlo Markov Chain MethodsMonte-Carlo Markov Chain Methods
1
Glenn MeyersGlenn Meyers
FCAS, MAAA, Ph.D.FCAS, MAAA, Ph.D.
Vice President Vice President –– ResearchResearch
ISO Innovative AnalyticsISO Innovative Analytics
May 25, 2010May 25, 2010
Page 2
PerspectivesPerspectives
• Actuarial Perspective– Credibility – Approximation to Bayesian Analysis
• Statistical Perspective– Pure Bayesian Analysis
2
– Pure Bayesian Analysis
– Empirical Bayesian Analysis
• Statisticians and Actuaries came together in 80’s – Empirical Bayesian Credibility
– Buhlmann and Straub – “Credibility for Loss Ratios”
– Efron and Morris – “Stein’s Paradox in Statistics”
Page 3
Bayesian AnalysisBayesian Analysis
( ) ( ) ( )
( ) ( )∞
µ ⋅ µµ =
µ ⋅ µ µ∫0
f x\ gp \x
f x\ g d
3
• x – observation
• f – conditional distribution of x given µ
• g – prior distribution of µ
• p – posterior distribution
• Integral difficult to evaluate in multi-parameter
situations.
Page 4
Markov ChainMarkov Chain
• A discrete random process which can be in various
states, and which changes randomly in discrete
steps.
• Transition probabilities to next state depends only
4
• Transition probabilities to next state depends only
on current state.
• Under “conditions” (irreducible and aperiodic) the
time spent in a give state converges to an
equilibrium distribution.
Page 5
Markov Chain Monte Carlo Methodsin Bayesian Analysis
Markov Chain Monte Carlo Methodsin Bayesian Analysis
• Transition probabilities determined by f and g– Gibbs sampler
– Metropolis Hastings algorithm
• Equilibrium distribution is the posterior!!!
5
• Equilibrium distribution is the posterior!!!
• MCMC Methods became popular in the 90’s– Start with a guess at µ, and simulate a Markov-Chain
– Ignore first (thousand or so) states – “burning period”
• WINBUGS/COTOR Challenge
Page 6
Gibbs Sampler on a LognormalExample from February 2008 Actuarial Review
Gibbs Sampler on a LognormalExample from February 2008 Actuarial Review
• Simulate µ from the prior distribution.
• Calculate the likelihood of the data
with µ and previous σ.
• Select uniform(0,1) random U.
6
>Likelihood
UMaximum Likelihood
• Accept µ if
• If otherwise, start over.
• Next iteration, switch role of µ and σ.
Page 7
Posterior Distribution of µµµµ and σσσσ is Only of Temporary Interest!
Posterior Distribution of µµµµ and σσσσ is Only of Temporary Interest!
• Most often we are interested in functions of
µ and σ.
• For example:
Mean
7
Limited Expected Value
2 /2eµ+σ
( ) ( )22
/2log log
1L L
e Lµ+σ − µ − σ −µ
⋅Φ + ⋅ −Φ σ σ
Page 8
Layer Expected Value25,000 to 30,000
Layer Expected Value25,000 to 30,000
• Some posterior parameters generated by Gibbs sampler
µµµµ σσσσ LEV
9.194 0.723 392
8
9.206 0.708 383
8.817 0.707 119
8.944 0.644 120
9.461 0.785 836
9.150 0.651 252
9.043 0.739 280
9.240 0.773 514
9.392 0.863 845
9.018 0.781 311
Page 9
The Metropolis-Hastings AlgorithmThe Metropolis-Hastings Algorithm
1. Select a random candidate value, µ* from a proposal density function
2. Compute the ratio
– f comes from the modeled distribution
( ) ( ) ( )( ) ( ) ( )
1
1 1 1
| * * * |
| | *
t
t t t
f x g pR
f x g p
−
− − −
µ ⋅ µ ⋅ µ µ=
µ ⋅ µ ⋅ µ µ
( ) ( )1 1* | * | / ,
t t p pp − −µ µ = Γ µ µ α α
9
– f comes from the modeled distribution
– g is the prior distribution
– f·g is the posterior distribution
3. Select a random U from a uniform distribution on (0,1).
4. If U < R then set µt = µ*. Otherwise set µt = µt-1.
Introducing the proposal density function can keep big jumps from getting into the random walk
Page 10
Simple ExampleSimple Example
• Y ~ Tweedie(φ = 1, p = 1.5, µ unknown)
• 25 observed losses
y 0 1 2 3 5 8 10 12 16
10
• Prior distribution g(µ) = Γ(µ|α = 1, θ = 5)
– (prior mean=5)
• Proposal density function (mean = µt-1)
Freq 8 6 2 2 2 1 1 1 2
( ) ( )− −µ µ =Γ µ µ α α1 1
*| *| / ,t t
p
Page 11
Single Variable Example of Tuning the Metropolis-Hastings AlgorithmSingle Variable Example of Tuning the Metropolis-Hastings Algorithm
11
Page 12
Posterior Distribution of µµµµPosterior Distribution of µµµµ
12
Page 13
Often Posterior Distribution is Not the Desired Output
Often Posterior Distribution is Not the Desired Output
• For each Tweedie(φ = 1, p = 1.5, µt)
• Simulate outcome Yt
13
• Distribution of Y is called the predictive distribution.
Page 14
Predictive Distribution of YPredictive Distribution of Y
14
Page 15
Recall DataRecall Data
• Y ~ Tweedie(φ = 1, p = 1.5, µ unknown)
• 25 observed losses
y 0 1 2 3 5 8 10 12 16
15
• Prior distribution g(µ) = Γ(µ|α = 1, θ = 5)
Freq 8 6 2 2 2 1 1 1 2
Page 16
Bayesian Regression ExampleBayesian Regression Example
• Y ~ Tweedie(φ = 1, p = 1.5, µ unknown)
• “Observed” losses simulated from Tweedie
• µ = x1 + 2 ·x2 - “True” relationship (simulated)
• Model µ = a1·x1 + a2·x2 + a3·x3
16
• Model µ = a1·x1 + a2·x2 + a3·x3
• Prior distribution g(ai) = Γ(ai|α = 1, θ = 1)
– For i = 1, 2, 3 (prior mean = 1)
• Proposal density function (mean = ai,t-1)
( ) ( )* *
, 1 , 1| | 25, / 25i i t i i t
p a a a a− −=Γ α = θ =
Page 17
Posterior with 100 ObservationsPosterior with 100 Observations
17
Page 18
Posterior with 1,000 ObservationsPosterior with 1,000 Observations
18
Page 19
• Claim Count Ni ~ Poisson (λ = a0 + a1∙di)
• Claim Severity Zij ~ Γ(α,θ)
A Non-Linear Regression ExampleA Non-Linear Regression Example
iN
X Z=∑
19
• a0, a1, α and θ are unknown parameters
• Fit model with observed loss Xi and covariate di.
1
i ij
j
X Z=
=∑
Page 20
• X ~ Tweedie (µ,p,φ)
• With
• µ = λ·α·θ (with λ = a0 + a1∙di)
A Non-Linear Regression ExampleA Non-Linear Regression Example
20
• p
• φ
• So given any a0, a1, α and θ we can calculate the
likelihood of the data for the Tweedie distribution
2
1
α+=α+
1
2
p
p
−µ ⋅λ=
−
Page 21
• Simulated Data
– Ni ~ Poisson(λ = a0+a1∙di) = Poisson(λ = 1 + 3di)
– Zij ~ Γ(α = 3, θ = 5)
• Prior Distributions
– Prior (a ) = Prior(a ) = Γ(α = 2, θ = 1) (Prior Mean = 2)
Test Model with Simulated DataTest Model with Simulated Data
21
– Prior (a0) = Prior(a1) = Γ(α = 2, θ = 1) (Prior Mean = 2)
– Prior (α) = Prior(θ) = Γ(α = 4, θ = 1) (Prior Mean = 4)
• Proposal Density Function
( ) ( )* *
, 1 , 1| | 500, / 500i i t i i t
p parm parm parm parm− −= Γ α = θ =
Page 22
A Non-Linear Regression Model1,000 Observations
A Non-Linear Regression Model1,000 Observations
22
Page 23
A Non-Linear Regression Model1,000 Observations
A Non-Linear Regression Model1,000 Observations
23
Page 24
A Non-Linear Regression Model1,000 Observations
A Non-Linear Regression Model1,000 Observations
24
Page 25
A Non-Linear Regression Model1,000 Observations
A Non-Linear Regression Model1,000 Observations
25
Page 26
• Create a histogram of µ for a given value of d.
• Produce histogram for d = 0 and d = 1
Statistic of Interest – Variability of µµµµ|dStatistic of Interest – Variability of µµµµ|d
26
Page 27
Claim Count Ni ~ Poisson (λ = a0 + a1∙di)
Claim Severity Zij ~
Γ(α,θ)
A Non-Linear Regression ModelRange of Estimates Given 1,000 Observations
A Non-Linear Regression ModelRange of Estimates Given 1,000 Observations
27
Γ(α,θ)
Page 28
A Non-Linear Regression ModelRange of Estimates Given 100 Observations
A Non-Linear Regression ModelRange of Estimates Given 100 Observations
28
Page 29
A Real Application – Loss ReservingA Real Application – Loss Reserving
S&P Report, November 2003Insurance Actuaries – A Crisis in Credibility
29
“Actuaries are signing off on reserves
that turn out to be wildly inaccurate.”
Page 30
Method Illustrated on DataMethod Illustrated on Data
Incremental Paid Losses
30
54 observations – 45 unknown cells
Page 31
Loss ModelLoss Model
• Expected Loss
• Variance of Loss
+ −µ = ⋅ ⋅ ⋅ 1
,
AY Lag
AY Lag AY AY LagPremium ELR Dev t
( ) 2
, , ,1 1 /AY Lag AY Lag Lag AY Lag
Var X c = µ ⋅τ ⋅ + α + ⋅µ
31
• {ELRAY},{DevLag}, t, c, and Sev are unknown parameters,
( ) 2
, , ,1 1 /AY Lag AY Lag Lag AY Lag
Var X c = µ ⋅τ ⋅ + α + ⋅µ
3
1 1 for = 1,2 ...,10.10
Lag
LagSev Lag
τ = ⋅ − −
Page 32
Tweedie Model of Losses in Each (AY,Lag) CellTweedie Model of Losses in Each (AY,Lag) Cell
( )φ ⋅µ = µ ⋅τ ⋅ + α + ⋅µ,
2
, , ,1 1 /AY Lag
p
AY Lag AY Lag Lag AY Lagc
1
,
2,
1
AY Lag
AY Lag AY AY LagPremium ELR Dev t p
+ − α+µ = ⋅ ⋅ ⋅ =
α +
32
• Pick a parameter set {ELRAY}, {DevLag},t ,c, Sev
• Translate parameters into Tweedie parameters
• µAY,Lag , p and φAY,Lag• Calculation likelihood, prior and proposal density
for Metropolis Hastings Algorithm
Page 33
Perspective on Loss ModelsPerspective on Loss Models
• 55 data points with 23 parameters
• Efforts to formulate models with fewer parameters have been problematic.
• Don’t fight many parameters, figure out how to deal with it.
33
deal with it.
• Actuaries generally use models with several parameters, and temper their results with “judgment.”• Experience gained by looking at data from “similar” lines
of insurance and/or from other insurers.
• This calls (screams?) for a Bayesian approach.
Page 34
Sample from Metropolis-Hastings Algorithm Applied to {DevLag} and {ELRAY} parameters
Sample from Metropolis-Hastings Algorithm Applied to {DevLag} and {ELRAY} parameters
ELR1 ELR2 ELR3 ELR4 ELR5 ELR6 ELR7 ELR8 ELR9 ELR10
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
34
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776
0.86193 0.63186 0.67501 0.57013 0.60554 0.64775 0.61769 0.74869 0.68954 0.68855
0.85805 0.62464 0.68672 0.55612 0.58922 0.63364 0.65857 0.70962 0.67289 0.64800
Dev1 Dev 2 Dev3 Dev4 Dev5 Dev6 Dev7 Dev8 Dev9 Dev10
0.16546 0.25163 0.22465 0.16499 0.10414 0.05589 0.02427 0.00762 0.00131 0.00005
0.16546 0.25163 0.22465 0.16499 0.10414 0.05589 0.02427 0.00762 0.00131 0.00005
0.16321 0.24844 0.22338 0.16574 0.10598 0.05781 0.02564 0.00827 0.00148 0.00006
0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006
0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006
0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006
0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006
0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006
0.15732 0.24804 0.22578 0.16815 0.10736 0.05822 0.02555 0.00810 0.00141 0.00006
0.15732 0.24804 0.22578 0.16815 0.10736 0.05822 0.02555 0.00810 0.00141 0.00006
Page 35
Graphical Representation of Metropolis-Hastings SampleGraphical Representation of Metropolis-Hastings Sample
Note that the
posteriors are
tighter, showing
how the data
narrows the
35
narrows the
range of results.
“Information
Reduces
Uncertainty”
Claude Shannon
Page 36
Graphical Representation of Metropolis-Hastings SampleGraphical Representation of Metropolis-Hastings Sample
Note that the
posteriors are
tighter, showing
how the data
narrows the
36
narrows the
range of results.
“Information
Reduces
Uncertainty”
Claude Shannon
Page 37
Graphical Representation of Metropolis-Hastings SampleGraphical Representation of Metropolis-Hastings Sample
Note that the
posteriors are
tighter, showing
how the data
narrows the
37
narrows the
range of results.
“Information
Reduces
Uncertainty”
Claude Shannon
Page 38
Graphical Representation of Metropolis-Hastings SampleGraphical Representation of Metropolis-Hastings Sample
Note that the
posteriors are
tighter, showing
how the data
narrows the
38
narrows the
range of results.
“Information
Reduces
Uncertainty”
Claude Shannon
Page 39
Statistics of IntrestStatistics of Intrest
Incremental Paid Losses
AY Premium Lag1 Lag2 Lag3 Lag4 Lag5 Lag6 Lag7 Lag8 Lag9 Lag101 29,701 5,234 5,172 3,708 1,783 923 537 175 145 8 0
2 27,526 5,234 5,683 4,392 2,134 1,377 673 155 81 47 X2,10
3 30,750 5,702 5,865 7,966 2,472 NA 143 152 73 X3,9 X3,10
4 35,814 6,349 4,611 3,959 2,522 1,924 622 206 X4,8 X4,9 X4,10
39
= = −
µ∑ ∑10 10
,
2 12
~ AY Lag
AY Lag AY
Estimate= = −∑ ∑10 10
,
2 12
~ AY Lag
AY Lag AY
Outcome X
“Range of Reasonable Estimates”
Distribution of
Predictive Distribution
of Reserve Outcomes
5 42,277 8,377 6,890 4,055 3,795 1,292 1,422 X5,7 X5,8 X5,9 X5,10
6 50,088 9,291 13,836 12,441 4,086 2,293 X6,6 X6,7 X6,8 X6,9 X6,10
7 56,921 12,029 12,462 8,369 7,034 X7,5 X7,6 X7,7 X7,8 X7,9 X7,10
8 61,406 13,119 12,618 9,117 X8,4 X8,5 X8,6 X8,7 X8,8 X8,9 X8,10
9 67,983 15,860 14,893 X9,3 X9,4 X9,5 X9,6 X9,7 X9,8 X9,9 X9,10
10 73,359 16,498 X10,2 X10,3 X10,4 X10,5 X10,6 X10,7 X10,8 X10,9 X10,10
Page 40
• For a given parameter set, Pn, sampled from the
Markov chain:
• Calculate mean µn,AY,Lag and φn,AY,Lag.
• Simulate Xn,AY,Lag from a Tweedie(µn,AY,Lag,p,φn,AY,Lag) distribution.
Calculating RangesCalculating Ranges
40
distribution.
Range of “Reasonable” Estimates
Distribution of
= = −
µ∑ ∑10 10
, ,
2 12
~n n AY Lag
AY Lag AY
Estimate
Predictive Distribution
of Reserve Outcomes
= = −∑ ∑10 10
, ,
2 12
~n n AY Lag
AY Lag AY
Outcome X
Page 41
Range of Estimates and OutcomesRange of Estimates and Outcomes
41
Page 42
Bayesian Sound Bite #1
By George Box – Sung to a Familiar Show TuneBayesian Sound Bite #1
By George Box – Sung to a Familiar Show Tune
• There’s no theorem like Bayes’ theorem, it’s like no theorem we know.
• Everything about it is appealing
• Everything about it is a wow
• Let out all that a priori feeling, you’ve been
42
• Let out all that a priori feeling, you’ve been concealing till now.
• Almost everybody enters into a statistical analysis with prior expectations and/or incentives.
• Bayesian analysis forces one to specify the prior distribution. It is more transparent.
Page 43
Bayesian Sound Bite #2
Relayed indirectly to me through Stuart KlugmanBayesian Sound Bite #2
Relayed indirectly to me through Stuart Klugman
• To frequentist statisticians, models are real, and
data are random.
• To Bayesian statisticians, data are real, and
43
models are random.
Page 44
ReferencesReferences
• “Quantifying Tail Risk with the Gibbs Sampler”
– Brainstorms Column - Actuarial Review – February 2008
– http://www.casact.org/newsletter/index.cfm?fa=Index&newsletter_id=1
• “Bayesian Analysis with the Metropolis Hastings Algorithm”
– Brainstorms Column - Actuarial Review – November 2009
– http://www.casact.org/newsletter/index.cfm?fa=Index&newsletter_id=1
• “Stochastic Loss Reserving with the Collective Risk
44
• “Stochastic Loss Reserving with the Collective Risk Model”
– Variance – February 2010
– http://www.variancejournal.org/issues/?fa=article&abstrID=6606
• Albert, J., Bayesian Computation with R, NewYork: Springer, 2007.
• Lynch, S. M., Introduction to Applied Bayesian Statistics and Estimation for Social Scientists, New York: Springer,
2007