Bayesian Analysis with Monte Carlo Markov-Chain Methods CAS · CAS Spring Meeting -Session C3 Bayesian Analysis with Monte -Carlo Markov Chain Methods 1 Glenn Meyers FCAS, MAAA, Ph.D.

CAS Spring Meeting - Session C3Bayesian Analysis with

Monte-Carlo Markov Chain MethodsMonte-Carlo Markov Chain Methods

1

Glenn MeyersGlenn Meyers

FCAS, MAAA, Ph.D.FCAS, MAAA, Ph.D.

Vice President Vice President –– ResearchResearch

ISO Innovative AnalyticsISO Innovative Analytics

May 25, 2010May 25, 2010

PerspectivesPerspectives

• Actuarial Perspective– Credibility – Approximation to Bayesian Analysis

• Statistical Perspective– Pure Bayesian Analysis

2

– Pure Bayesian Analysis

– Empirical Bayesian Analysis

• Statisticians and Actuaries came together in 80’s – Empirical Bayesian Credibility

– Buhlmann and Straub – “Credibility for Loss Ratios”

– Efron and Morris – “Stein’s Paradox in Statistics”

Bayesian AnalysisBayesian Analysis

( ) ( ) ( )

( ) ( )∞

µ ⋅ µµ =

µ ⋅ µ µ∫0

f x\ gp \x

f x\ g d

3

• x – observation

• f – conditional distribution of x given µ

• g – prior distribution of µ

• p – posterior distribution

• Integral difficult to evaluate in multi-parameter

situations.

Markov ChainMarkov Chain

• A discrete random process which can be in various

states, and which changes randomly in discrete

steps.

• Transition probabilities to next state depends only

4

• Transition probabilities to next state depends only

on current state.

• Under “conditions” (irreducible and aperiodic) the

time spent in a give state converges to an

equilibrium distribution.

Markov Chain Monte Carlo Methodsin Bayesian Analysis

Markov Chain Monte Carlo Methodsin Bayesian Analysis

• Transition probabilities determined by f and g– Gibbs sampler

– Metropolis Hastings algorithm

• Equilibrium distribution is the posterior!!!

5

• Equilibrium distribution is the posterior!!!

• MCMC Methods became popular in the 90’s– Start with a guess at µ, and simulate a Markov-Chain

– Ignore first (thousand or so) states – “burning period”

• WINBUGS/COTOR Challenge

Gibbs Sampler on a LognormalExample from February 2008 Actuarial Review

Gibbs Sampler on a LognormalExample from February 2008 Actuarial Review

• Simulate µ from the prior distribution.

• Calculate the likelihood of the data

with µ and previous σ.

• Select uniform(0,1) random U.

6

>Likelihood

UMaximum Likelihood

• Accept µ if

• If otherwise, start over.

• Next iteration, switch role of µ and σ.

Posterior Distribution of µµµµ and σσσσ is Only of Temporary Interest!

Posterior Distribution of µµµµ and σσσσ is Only of Temporary Interest!

• Most often we are interested in functions of

µ and σ.

• For example:

Mean

7

Limited Expected Value

2 /2eµ+σ

( ) ( )22

/2log log

1L L

e Lµ+σ − µ − σ −µ

⋅Φ + ⋅ −Φ σ σ

Layer Expected Value25,000 to 30,000

Layer Expected Value25,000 to 30,000

• Some posterior parameters generated by Gibbs sampler

µµµµ σσσσ LEV

9.194 0.723 392

8

9.206 0.708 383

8.817 0.707 119

8.944 0.644 120

9.461 0.785 836

9.150 0.651 252

9.043 0.739 280

9.240 0.773 514

9.392 0.863 845

9.018 0.781 311

The Metropolis-Hastings AlgorithmThe Metropolis-Hastings Algorithm

1. Select a random candidate value, µ* from a proposal density function

2. Compute the ratio

– f comes from the modeled distribution

( ) ( ) ( )( ) ( ) ( )

1

1 1 1

| * * * |

| | *

t

t t t

f x g pR

f x g p

−

− − −

µ ⋅ µ ⋅ µ µ=

µ ⋅ µ ⋅ µ µ

( ) ( )1 1* | * | / ,

t t p pp − −µ µ = Γ µ µ α α

9

– f comes from the modeled distribution

– g is the prior distribution

– f·g is the posterior distribution

3. Select a random U from a uniform distribution on (0,1).

4. If U < R then set µt = µ*. Otherwise set µt = µt-1.

Introducing the proposal density function can keep big jumps from getting into the random walk

Simple ExampleSimple Example

• Y ~ Tweedie(φ = 1, p = 1.5, µ unknown)

• 25 observed losses

y 0 1 2 3 5 8 10 12 16

10

• Prior distribution g(µ) = Γ(µ|α = 1, θ = 5)

– (prior mean=5)

• Proposal density function (mean = µt-1)

Freq 8 6 2 2 2 1 1 1 2

( ) ( )− −µ µ =Γ µ µ α α1 1

*| *| / ,t t

p

Single Variable Example of Tuning the Metropolis-Hastings AlgorithmSingle Variable Example of Tuning the Metropolis-Hastings Algorithm

11

Posterior Distribution of µµµµPosterior Distribution of µµµµ

12

Often Posterior Distribution is Not the Desired Output

Often Posterior Distribution is Not the Desired Output

• For each Tweedie(φ = 1, p = 1.5, µt)

• Simulate outcome Yt

13

• Distribution of Y is called the predictive distribution.

Predictive Distribution of YPredictive Distribution of Y

14

Recall DataRecall Data


• 25 observed losses

y 0 1 2 3 5 8 10 12 16

15

• Prior distribution g(µ) = Γ(µ|α = 1, θ = 5)

Freq 8 6 2 2 2 1 1 1 2

Bayesian Regression ExampleBayesian Regression Example


• “Observed” losses simulated from Tweedie

• µ = x1 + 2 ·x2 - “True” relationship (simulated)

• Model µ = a1·x1 + a2·x2 + a3·x3

16

• Model µ = a1·x1 + a2·x2 + a3·x3

• Prior distribution g(ai) = Γ(ai|α = 1, θ = 1)

– For i = 1, 2, 3 (prior mean = 1)

• Proposal density function (mean = ai,t-1)

( ) ( )* *

, 1 , 1| | 25, / 25i i t i i t

p a a a a− −=Γ α = θ =

Posterior with 100 ObservationsPosterior with 100 Observations

17

Posterior with 1,000 ObservationsPosterior with 1,000 Observations

18

• Claim Count Ni ~ Poisson (λ = a0 + a1∙di)

• Claim Severity Zij ~ Γ(α,θ)

A Non-Linear Regression ExampleA Non-Linear Regression Example

iN

X Z=∑

19

• a0, a1, α and θ are unknown parameters

• Fit model with observed loss Xi and covariate di.

1

i ij

j

X Z=

=∑

• X ~ Tweedie (µ,p,φ)

• With

• µ = λ·α·θ (with λ = a0 + a1∙di)

A Non-Linear Regression ExampleA Non-Linear Regression Example

20

• p

• φ

• So given any a0, a1, α and θ we can calculate the

likelihood of the data for the Tweedie distribution

2

1

α+=α+

1

2

p

p

−µ ⋅λ=

−

• Simulated Data

– Ni ~ Poisson(λ = a0+a1∙di) = Poisson(λ = 1 + 3di)

– Zij ~ Γ(α = 3, θ = 5)

• Prior Distributions

– Prior (a ) = Prior(a ) = Γ(α = 2, θ = 1) (Prior Mean = 2)

Test Model with Simulated DataTest Model with Simulated Data

21

– Prior (a0) = Prior(a1) = Γ(α = 2, θ = 1) (Prior Mean = 2)

– Prior (α) = Prior(θ) = Γ(α = 4, θ = 1) (Prior Mean = 4)

• Proposal Density Function

( ) ( )* *

, 1 , 1| | 500, / 500i i t i i t

p parm parm parm parm− −= Γ α = θ =

A Non-Linear Regression Model1,000 Observations


22



23



24



25

• Create a histogram of µ for a given value of d.

• Produce histogram for d = 0 and d = 1

Statistic of Interest – Variability of µµµµ|dStatistic of Interest – Variability of µµµµ|d

26

Claim Count Ni ~ Poisson (λ = a0 + a1∙di)

Claim Severity Zij ~

Γ(α,θ)

A Non-Linear Regression ModelRange of Estimates Given 1,000 Observations

A Non-Linear Regression ModelRange of Estimates Given 1,000 Observations

27

Γ(α,θ)

A Non-Linear Regression ModelRange of Estimates Given 100 Observations

A Non-Linear Regression ModelRange of Estimates Given 100 Observations

28

A Real Application – Loss ReservingA Real Application – Loss Reserving

S&P Report, November 2003Insurance Actuaries – A Crisis in Credibility

29

“Actuaries are signing off on reserves

that turn out to be wildly inaccurate.”

Method Illustrated on DataMethod Illustrated on Data

Incremental Paid Losses

30

54 observations – 45 unknown cells

Loss ModelLoss Model

• Expected Loss

• Variance of Loss

+ −µ = ⋅ ⋅ ⋅ 1

,

AY Lag

AY Lag AY AY LagPremium ELR Dev t

( ) 2

, , ,1 1 /AY Lag AY Lag Lag AY Lag

Var X c = µ ⋅τ ⋅ + α + ⋅µ

31

• {ELRAY},{DevLag}, t, c, and Sev are unknown parameters,

( ) 2

, , ,1 1 /AY Lag AY Lag Lag AY Lag

Var X c = µ ⋅τ ⋅ + α + ⋅µ

3

1 1 for = 1,2 ...,10.10

Lag

LagSev Lag

τ = ⋅ − −

Tweedie Model of Losses in Each (AY,Lag) CellTweedie Model of Losses in Each (AY,Lag) Cell

( )φ ⋅µ = µ ⋅τ ⋅ + α + ⋅µ,

2

, , ,1 1 /AY Lag

p

AY Lag AY Lag Lag AY Lagc

1

,

2,

1

AY Lag

AY Lag AY AY LagPremium ELR Dev t p

+ − α+µ = ⋅ ⋅ ⋅ =

α +

32

• Pick a parameter set {ELRAY}, {DevLag},t ,c, Sev

• Translate parameters into Tweedie parameters

• µAY,Lag , p and φAY,Lag• Calculation likelihood, prior and proposal density

for Metropolis Hastings Algorithm

Perspective on Loss ModelsPerspective on Loss Models

• 55 data points with 23 parameters

• Efforts to formulate models with fewer parameters have been problematic.

• Don’t fight many parameters, figure out how to deal with it.

33

deal with it.

• Actuaries generally use models with several parameters, and temper their results with “judgment.”• Experience gained by looking at data from “similar” lines

of insurance and/or from other insurers.

• This calls (screams?) for a Bayesian approach.

Sample from Metropolis-Hastings Algorithm Applied to {DevLag} and {ELRAY} parameters

Sample from Metropolis-Hastings Algorithm Applied to {DevLag} and {ELRAY} parameters

ELR1 ELR2 ELR3 ELR4 ELR5 ELR6 ELR7 ELR8 ELR9 ELR10

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

34

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

0.91503 0.66796 0.62778 0.58480 0.56635 0.67332 0.56119 0.68528 0.69505 0.70776

0.86193 0.63186 0.67501 0.57013 0.60554 0.64775 0.61769 0.74869 0.68954 0.68855

0.85805 0.62464 0.68672 0.55612 0.58922 0.63364 0.65857 0.70962 0.67289 0.64800

Dev1 Dev 2 Dev3 Dev4 Dev5 Dev6 Dev7 Dev8 Dev9 Dev10

0.16546 0.25163 0.22465 0.16499 0.10414 0.05589 0.02427 0.00762 0.00131 0.00005

0.16546 0.25163 0.22465 0.16499 0.10414 0.05589 0.02427 0.00762 0.00131 0.00005

0.16321 0.24844 0.22338 0.16574 0.10598 0.05781 0.02564 0.00827 0.00148 0.00006

0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006

0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006

0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006

0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006

0.16613 0.24962 0.22293 0.16463 0.10487 0.05701 0.02520 0.00811 0.00144 0.00006

0.15732 0.24804 0.22578 0.16815 0.10736 0.05822 0.02555 0.00810 0.00141 0.00006

0.15732 0.24804 0.22578 0.16815 0.10736 0.05822 0.02555 0.00810 0.00141 0.00006

Graphical Representation of Metropolis-Hastings SampleGraphical Representation of Metropolis-Hastings Sample

Note that the

posteriors are

tighter, showing

how the data

narrows the

35

narrows the

range of results.

“Information

Reduces

Uncertainty”

Claude Shannon


Note that the

posteriors are

tighter, showing

how the data

narrows the

36

narrows the

range of results.

“Information

Reduces

Uncertainty”

Claude Shannon


Note that the

posteriors are

tighter, showing

how the data

narrows the

37

narrows the

range of results.

“Information

Reduces

Uncertainty”

Claude Shannon


Note that the

posteriors are

tighter, showing

how the data

narrows the

38

narrows the

range of results.

“Information

Reduces

Uncertainty”

Claude Shannon

Statistics of IntrestStatistics of Intrest

Incremental Paid Losses

AY Premium Lag1 Lag2 Lag3 Lag4 Lag5 Lag6 Lag7 Lag8 Lag9 Lag101 29,701 5,234 5,172 3,708 1,783 923 537 175 145 8 0

2 27,526 5,234 5,683 4,392 2,134 1,377 673 155 81 47 X2,10

3 30,750 5,702 5,865 7,966 2,472 NA 143 152 73 X3,9 X3,10

4 35,814 6,349 4,611 3,959 2,522 1,924 622 206 X4,8 X4,9 X4,10

39

= = −

µ∑ ∑10 10

,

2 12

~ AY Lag

AY Lag AY

Estimate= = −∑ ∑10 10

,

2 12

~ AY Lag

AY Lag AY

Outcome X

“Range of Reasonable Estimates”

Distribution of

Predictive Distribution

of Reserve Outcomes

5 42,277 8,377 6,890 4,055 3,795 1,292 1,422 X5,7 X5,8 X5,9 X5,10

6 50,088 9,291 13,836 12,441 4,086 2,293 X6,6 X6,7 X6,8 X6,9 X6,10

7 56,921 12,029 12,462 8,369 7,034 X7,5 X7,6 X7,7 X7,8 X7,9 X7,10

8 61,406 13,119 12,618 9,117 X8,4 X8,5 X8,6 X8,7 X8,8 X8,9 X8,10

9 67,983 15,860 14,893 X9,3 X9,4 X9,5 X9,6 X9,7 X9,8 X9,9 X9,10

10 73,359 16,498 X10,2 X10,3 X10,4 X10,5 X10,6 X10,7 X10,8 X10,9 X10,10

• For a given parameter set, Pn, sampled from the

Markov chain:

• Calculate mean µn,AY,Lag and φn,AY,Lag.

• Simulate Xn,AY,Lag from a Tweedie(µn,AY,Lag,p,φn,AY,Lag) distribution.

Calculating RangesCalculating Ranges

40

distribution.

Range of “Reasonable” Estimates

Distribution of

= = −

µ∑ ∑10 10

, ,

2 12

~n n AY Lag

AY Lag AY

Estimate

Predictive Distribution

of Reserve Outcomes

= = −∑ ∑10 10

, ,

2 12

~n n AY Lag

AY Lag AY

Outcome X

Range of Estimates and OutcomesRange of Estimates and Outcomes

41

Bayesian Sound Bite #1

By George Box – Sung to a Familiar Show TuneBayesian Sound Bite #1

By George Box – Sung to a Familiar Show Tune

• There’s no theorem like Bayes’ theorem, it’s like no theorem we know.

• Everything about it is appealing

• Everything about it is a wow

• Let out all that a priori feeling, you’ve been

42

• Let out all that a priori feeling, you’ve been concealing till now.

• Almost everybody enters into a statistical analysis with prior expectations and/or incentives.

• Bayesian analysis forces one to specify the prior distribution. It is more transparent.

Bayesian Sound Bite #2

Relayed indirectly to me through Stuart KlugmanBayesian Sound Bite #2

Relayed indirectly to me through Stuart Klugman

• To frequentist statisticians, models are real, and

data are random.

• To Bayesian statisticians, data are real, and

43

models are random.

ReferencesReferences

• “Quantifying Tail Risk with the Gibbs Sampler”

– Brainstorms Column - Actuarial Review – February 2008

– http://www.casact.org/newsletter/index.cfm?fa=Index&newsletter_id=1

• “Bayesian Analysis with the Metropolis Hastings Algorithm”

– Brainstorms Column - Actuarial Review – November 2009

– http://www.casact.org/newsletter/index.cfm?fa=Index&newsletter_id=1

• “Stochastic Loss Reserving with the Collective Risk

44

• “Stochastic Loss Reserving with the Collective Risk Model”

– Variance – February 2010

– http://www.variancejournal.org/issues/?fa=article&abstrID=6606

• Albert, J., Bayesian Computation with R, NewYork: Springer, 2007.

• Lynch, S. M., Introduction to Applied Bayesian Statistics and Estimation for Social Scientists, New York: Springer,

2007

Bayesian Analysis with Monte Carlo Markov-Chain Methods CAS · CAS Spring Meeting -Session C3 Bayesian Analysis with Monte -Carlo Markov Chain Methods 1 Glenn Meyers FCAS, MAAA, Ph.D.

Documents