Top Banner
Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey S. Morris University of Texas M.D. Anderson Cancer Center Department of Biostatistics [email protected] September 20, 2002 Abstract The purpose of this talk is to give a brief overview of Bayesian Inference and Markov Chain Monte Carlo methods, including the Gibbs Sampler and Metropolis Hastings algorithm.
108

Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

Jan 31, 2018

Download

Documents

haquynh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

Introduction to Bayesian Data Analysis andMarkov Chain Monte Carlo

Jeffrey S. MorrisUniversity of Texas M.D. Anderson Cancer Center

Department of [email protected]

September 20, 2002

AbstractThe purpose of this talk is to give a brief overview of Bayesian

Inference and Markov Chain Monte Carlo methods, including the GibbsSampler and Metropolis Hastings algorithm.

Page 2: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 1

Outline

• Bayesian vs. Frequentist paradigm

• Bayesian Inference and MCMC

? Gibbs Sampler? Metropolis-Hastings Algorithm

• Assessing Convergence of MCMC

• Hierarchical Model Example

• MCMC: Benefits and Cautions

Page 3: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 2

Frequentist vs. Bayesian paradigms

• Data: X Parameters: Θ

Page 4: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 2

Frequentist vs. Bayesian paradigms

• Data: X Parameters: Θ

• To a frequentist:

? The data X are random, and the parameters Θ are fixed.

Page 5: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 2

Frequentist vs. Bayesian paradigms

• Data: X Parameters: Θ

• To a frequentist:

? The data X are random, and the parameters Θ are fixed.? (ML) Inference is performed by finding Θ such that f(X|Θ) is

maximized.

Page 6: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 2

Frequentist vs. Bayesian paradigms

• Data: X Parameters: Θ

• To a frequentist:

? The data X are random, and the parameters Θ are fixed.? (ML) Inference is performed by finding Θ such that f(X|Θ) is

maximized.? We cannot make probability statements about parameters, but only can

make statements about performance of estimators over repeatedsampling (e.g.confidence intervals).

Page 7: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 2

Frequentist vs. Bayesian paradigms

• Data: X Parameters: Θ

• To a frequentist:

? The data X are random, and the parameters Θ are fixed.? (ML) Inference is performed by finding Θ such that f(X|Θ) is

maximized.? We cannot make probability statements about parameters, but only can

make statements about performance of estimators over repeatedsampling (e.g.confidence intervals).

• To a Bayesian:

? The current data X is fixed, and the unknown parameters Θ are random.

Page 8: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 2

Frequentist vs. Bayesian paradigms

• Data: X Parameters: Θ

• To a frequentist:

? The data X are random, and the parameters Θ are fixed.? (ML) Inference is performed by finding Θ such that f(X|Θ) is

maximized.? We cannot make probability statements about parameters, but only can

make statements about performance of estimators over repeatedsampling (e.g.confidence intervals).

• To a Bayesian:

? The current data X is fixed, and the unknown parameters Θ are random.? Inference is performed via the posterior distribution f(Θ|X).

Page 9: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 2

Frequentist vs. Bayesian paradigms

• Data: X Parameters: Θ

• To a frequentist:

? The data X are random, and the parameters Θ are fixed.? (ML) Inference is performed by finding Θ such that f(X|Θ) is

maximized.? We cannot make probability statements about parameters, but only can

make statements about performance of estimators over repeatedsampling (e.g.confidence intervals).

• To a Bayesian:

? The current data X is fixed, and the unknown parameters Θ are random.? Inference is performed via the posterior distribution f(Θ|X).? We can make probability statements about parameters, since they are

random quantities (e.g. credible intervals)

Page 10: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

Page 11: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

• f(X|Θ)= Likelihood

Page 12: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

• f(X|Θ)= Likelihood

• f(Θ)= Prior Distribution

Page 13: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

• f(X|Θ)= Likelihood

• f(Θ)= Prior Distribution

? Reflects prior knowledge about Θ

Page 14: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

• f(X|Θ)= Likelihood

• f(Θ)= Prior Distribution

? Reflects prior knowledge about Θ? Sometimes controversial

Page 15: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

• f(X|Θ)= Likelihood

• f(Θ)= Prior Distribution

? Reflects prior knowledge about Θ? Sometimes controversial? If little information available, just use diffuse priors (avoid improper priors)

Page 16: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

• f(X|Θ)= Likelihood

• f(Θ)= Prior Distribution

? Reflects prior knowledge about Θ? Sometimes controversial? If little information available, just use diffuse priors (avoid improper priors)

• f(X)= Marginal Distribution =∫f(X|Θ)f(Θ)dΘ

Page 17: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

• f(X|Θ)= Likelihood

• f(Θ)= Prior Distribution

? Reflects prior knowledge about Θ? Sometimes controversial? If little information available, just use diffuse priors (avoid improper priors)

• f(X)= Marginal Distribution =∫f(X|Θ)f(Θ)dΘ

? Difficult to compute (usually intractable integral)

Page 18: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 3

Bayes’ Rule

• The posterior distribution is computed by applying Bayes’ Rule:

f(Θ|X) =f(X|Θ)f(Θ)

f(X)

• f(X|Θ)= Likelihood

• f(Θ)= Prior Distribution

? Reflects prior knowledge about Θ? Sometimes controversial? If little information available, just use diffuse priors (avoid improper priors)

• f(X)= Marginal Distribution =∫f(X|Θ)f(Θ)dΘ

? Difficult to compute (usually intractable integral)? Often not necessary to compute.

Page 19: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

Page 20: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

Page 21: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

• Ex: (X|θ) ∼ Binomial(n, θ); θ ∼ Beta(α, β)

Page 22: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

• Ex: (X|θ) ∼ Binomial(n, θ); θ ∼ Beta(α, β)

f(θ|X) ∝ f(X|θ)f(θ)

Page 23: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

• Ex: (X|θ) ∼ Binomial(n, θ); θ ∼ Beta(α, β)

f(θ|X) ∝ f(X|θ)f(θ)

∝ θX(1− θ)n−Xθα(1− θ)β

Page 24: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

• Ex: (X|θ) ∼ Binomial(n, θ); θ ∼ Beta(α, β)

f(θ|X) ∝ f(X|θ)f(θ)

∝ θX(1− θ)n−Xθα(1− θ)β

= θα+X(1− θ)β+n−X

Page 25: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

• Ex: (X|θ) ∼ Binomial(n, θ); θ ∼ Beta(α, β)

f(θ|X) ∝ f(X|θ)f(θ)

∝ θX(1− θ)n−Xθα(1− θ)β

= θα+X(1− θ)β+n−X

= kernel of Beta(α+X,β + n−X)

Page 26: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

• Ex: (X|θ) ∼ Binomial(n, θ); θ ∼ Beta(α, β)

f(θ|X) ∝ f(X|θ)f(θ)

∝ θX(1− θ)n−Xθα(1− θ)β

= θα+X(1− θ)β+n−X

= kernel of Beta(α+X,β + n−X)

• For single parameter problem: conjugate priors allow closed form posteriordistributions.

Page 27: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

• Ex: (X|θ) ∼ Binomial(n, θ); θ ∼ Beta(α, β)

f(θ|X) ∝ f(X|θ)f(θ)

∝ θX(1− θ)n−Xθα(1− θ)β

= θα+X(1− θ)β+n−X

= kernel of Beta(α+X,β + n−X)

• For single parameter problem: conjugate priors allow closed form posteriordistributions.

• What if we don’t want to use conjugate priors?

Page 28: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 4

Conjugate priors

• Conjugate priors: f(Θ) and f(Θ|X) have same distributional form.

• Examples: Normal-Normal, Beta-Binomial, Gamma-Poisson

• Ex: (X|θ) ∼ Binomial(n, θ); θ ∼ Beta(α, β)

f(θ|X) ∝ f(X|θ)f(θ)

∝ θX(1− θ)n−Xθα(1− θ)β

= θα+X(1− θ)β+n−X

= kernel of Beta(α+X,β + n−X)

• For single parameter problem: conjugate priors allow closed form posteriordistributions.

• What if we don’t want to use conjugate priors?

What if we have multiple parameters?

Page 29: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 5

Non-conjugate Case

• Suppose we are interested in the posterior mean:

E(Θ|X) =∫

Θf(Θ|X)dΘ

Page 30: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 5

Non-conjugate Case

• Suppose we are interested in the posterior mean:

E(Θ|X) =∫

Θf(Θ|X)dΘ

=∫

Θf(X|Θ)f(Θ)dΘ∫f(X|Θ)f(Θ)dΘ

Page 31: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 5

Non-conjugate Case

• Suppose we are interested in the posterior mean:

E(Θ|X) =∫

Θf(Θ|X)dΘ

=∫

Θf(X|Θ)f(Θ)dΘ∫f(X|Θ)f(Θ)dΘ

• How do we compute this integral if it is intractable?

Page 32: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 5

Non-conjugate Case

• Suppose we are interested in the posterior mean:

E(Θ|X) =∫

Θf(Θ|X)dΘ

=∫

Θf(X|Θ)f(Θ)dΘ∫f(X|Θ)f(Θ)dΘ

• How do we compute this integral if it is intractable?

? Numerical Integration (Quadrature)

Page 33: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 5

Non-conjugate Case

• Suppose we are interested in the posterior mean:

E(Θ|X) =∫

Θf(Θ|X)dΘ

=∫

Θf(X|Θ)f(Θ)dΘ∫f(X|Θ)f(Θ)dΘ

• How do we compute this integral if it is intractable?

? Numerical Integration (Quadrature)May not work if there are many parameters.

Page 34: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 5

Non-conjugate Case

• Suppose we are interested in the posterior mean:

E(Θ|X) =∫

Θf(Θ|X)dΘ

=∫

Θf(X|Θ)f(Θ)dΘ∫f(X|Θ)f(Θ)dΘ

• How do we compute this integral if it is intractable?

? Numerical Integration (Quadrature)May not work if there are many parameters.

? Monte Carlo integration

Page 35: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 6

Markov Chain Monte Carlo: Monte Carlo Integration

• Monte Carlo integration:

Estimate integrals by randomly drawing samples from the requireddistribution.

Page 36: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 6

Markov Chain Monte Carlo: Monte Carlo Integration

• Monte Carlo integration:

Estimate integrals by randomly drawing samples from the requireddistribution.

E(Θ|X) =∫

Θf(Θ|X)dΘ

Page 37: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 6

Markov Chain Monte Carlo: Monte Carlo Integration

• Monte Carlo integration:

Estimate integrals by randomly drawing samples from the requireddistribution.

E(Θ|X) =∫

Θf(Θ|X)dΘ

≈ 1n

n∑t=1

Θt,

where Θt ∼ f(Θ|X)

Page 38: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 6

Markov Chain Monte Carlo: Monte Carlo Integration

• Monte Carlo integration:

Estimate integrals by randomly drawing samples from the requireddistribution.

E(Θ|X) =∫

Θf(Θ|X)dΘ

≈ 1n

n∑t=1

Θt,

where Θt ∼ f(Θ|X)

• We still need a method for drawing samples from the posterior distribution:

Page 39: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 6

Markov Chain Monte Carlo: Monte Carlo Integration

• Monte Carlo integration:

Estimate integrals by randomly drawing samples from the requireddistribution.

E(Θ|X) =∫

Θf(Θ|X)dΘ

≈ 1n

n∑t=1

Θt,

where Θt ∼ f(Θ|X)

• We still need a method for drawing samples from the posterior distribution:

? Rejection Sampling

Page 40: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 6

Markov Chain Monte Carlo: Monte Carlo Integration

• Monte Carlo integration:

Estimate integrals by randomly drawing samples from the requireddistribution.

E(Θ|X) =∫

Θf(Θ|X)dΘ

≈ 1n

n∑t=1

Θt,

where Θt ∼ f(Θ|X)

• We still need a method for drawing samples from the posterior distribution:

? Rejection Sampling? Importance Sampling

Page 41: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 6

Markov Chain Monte Carlo: Monte Carlo Integration

• Monte Carlo integration:

Estimate integrals by randomly drawing samples from the requireddistribution.

E(Θ|X) =∫

Θf(Θ|X)dΘ

≈ 1n

n∑t=1

Θt,

where Θt ∼ f(Θ|X)

• We still need a method for drawing samples from the posterior distribution:

? Rejection Sampling? Importance Sampling? Markov Chain

Page 42: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 7

Markov Chain Monte Carlo: Markov Chains

• Markov Chain : Method to draw samples from a desired stationarydistribution.

Page 43: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 7

Markov Chain Monte Carlo: Markov Chains

• Markov Chain : Method to draw samples from a desired stationarydistribution.

• Steps:

1. Obtain starting values Θ0

Page 44: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 7

Markov Chain Monte Carlo: Markov Chains

• Markov Chain : Method to draw samples from a desired stationarydistribution.

• Steps:

1. Obtain starting values Θ0

2. Sample Θ1 from suitably chosen transition kernel P (Θ1|Θ0)

Page 45: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 7

Markov Chain Monte Carlo: Markov Chains

• Markov Chain : Method to draw samples from a desired stationarydistribution.

• Steps:

1. Obtain starting values Θ0

2. Sample Θ1 from suitably chosen transition kernel P (Θ1|Θ0)3. Repeat second step n times to obtain chain {Θ0,Θ1, . . . ,Θn}.

Page 46: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 7

Markov Chain Monte Carlo: Markov Chains

• Markov Chain : Method to draw samples from a desired stationarydistribution.

• Steps:

1. Obtain starting values Θ0

2. Sample Θ1 from suitably chosen transition kernel P (Θ1|Θ0)3. Repeat second step n times to obtain chain {Θ0,Θ1, . . . ,Θn}.

• Theorems show that, under certain regularity conditions, the chain willconverge to a particular stationary distribution after suitable burn-in period.

Page 47: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 7

Markov Chain Monte Carlo: Markov Chains

• Markov Chain : Method to draw samples from a desired stationarydistribution.

• Steps:

1. Obtain starting values Θ0

2. Sample Θ1 from suitably chosen transition kernel P (Θ1|Θ0)3. Repeat second step n times to obtain chain {Θ0,Θ1, . . . ,Θn}.

• Theorems show that, under certain regularity conditions, the chain willconverge to a particular stationary distribution after suitable burn-in period.

• End result: A (correlated) sample from the stationary distribution.

Page 48: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 8

Markov Chain Monte Carlo

• Given Markov Chain {Θ0,Θ1, . . . ,Θn} with stationary distribution f(Θ|X)with burn-in m, we can estimate the posterior mean using Monte Carlointegration:

Page 49: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 8

Markov Chain Monte Carlo

• Given Markov Chain {Θ0,Θ1, . . . ,Θn} with stationary distribution f(Θ|X)with burn-in m, we can estimate the posterior mean using Monte Carlointegration:

E(Θ|X) ≈ 1n−m

n∑t=m+1

Θt.

Page 50: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 8

Markov Chain Monte Carlo

• Given Markov Chain {Θ0,Θ1, . . . ,Θn} with stationary distribution f(Θ|X)with burn-in m, we can estimate the posterior mean using Monte Carlointegration:

E(Θ|X) ≈ 1n−m

n∑t=m+1

Θt.

• Other quantities can also be computed from Markov Chain:

Page 51: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 8

Markov Chain Monte Carlo

• Given Markov Chain {Θ0,Θ1, . . . ,Θn} with stationary distribution f(Θ|X)with burn-in m, we can estimate the posterior mean using Monte Carlointegration:

E(Θ|X) ≈ 1n−m

n∑t=m+1

Θt.

• Other quantities can also be computed from Markov Chain:

? Standard errors? Quantiles? Density estimates

Page 52: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 8

Markov Chain Monte Carlo

• Given Markov Chain {Θ0,Θ1, . . . ,Θn} with stationary distribution f(Θ|X)with burn-in m, we can estimate the posterior mean using Monte Carlointegration:

E(Θ|X) ≈ 1n−m

n∑t=m+1

Θt.

• Other quantities can also be computed from Markov Chain:

? Standard errors? Quantiles? Density estimates

• Samples can be used to perform any Bayesian inference of interest.

• How do we generate the Markov Chain?

Page 53: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 9

Gibbs Sampler

• Gibbs Sampler(Geman and Geman, 1984):

Markov transition kernel consists of drawing from full conditionaldistributions.

Page 54: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 9

Gibbs Sampler

• Gibbs Sampler(Geman and Geman, 1984):

Markov transition kernel consists of drawing from full conditionaldistributions.

• Suppose Θ = {θ1, θ2, . . . , θp}T .

Page 55: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 9

Gibbs Sampler

• Gibbs Sampler(Geman and Geman, 1984):

Markov transition kernel consists of drawing from full conditionaldistributions.

• Suppose Θ = {θ1, θ2, . . . , θp}T .

Full conditional distribution for parameter i: f(θi|X,Θ−i)

Page 56: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 9

Gibbs Sampler

• Gibbs Sampler(Geman and Geman, 1984):

Markov transition kernel consists of drawing from full conditionaldistributions.

• Suppose Θ = {θ1, θ2, . . . , θp}T .

Full conditional distribution for parameter i: f(θi|X,Θ−i)

Conditions on:

? The data X? The values for all other parameters Θ−i.

Page 57: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

Page 58: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).

Page 59: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).2. Generate (Θ(1)|Θ(0)) by sampling:

θ(1)1 from f(θ(1)

1 |X,Θ(0)−1)

θ(1)2 from f(θ(1)

2 |X,Θ(0)−2)

...θ

(1)p from f(θ(1)

p |X,Θ(0)−p)

Page 60: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).2. Generate (Θ(1)|Θ(0)) by sampling:

θ(1)1 from f(θ(1)

1 |X,Θ(0)−1)

θ(1)2 from f(θ(1)

2 |X,Θ(0)−2)

...θ

(1)p from f(θ(1)

p |X,Θ(0)−p)

3. Repeat step two to get chain of length n: {Θ(0),Θ(1), . . .Θ(n)}.

Page 61: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).2. Generate (Θ(1)|Θ(0)) by sampling:

θ(1)1 from f(θ(1)

1 |X,Θ(0)−1)

θ(1)2 from f(θ(1)

2 |X,Θ(0)−2)

...θ

(1)p from f(θ(1)

p |X,Θ(0)−p)

3. Repeat step two to get chain of length n: {Θ(0),Θ(1), . . .Θ(n)}.4. Assuming convergence by iteration m, compute posterior mean,

quantiles, etc. using samples m through n.

Page 62: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).2. Generate (Θ(1)|Θ(0)) by sampling:

θ(1)1 from f(θ(1)

1 |X,Θ(0)−1)

θ(1)2 from f(θ(1)

2 |X,Θ(0)−2)

...θ

(1)p from f(θ(1)

p |X,Θ(0)−p)

3. Repeat step two to get chain of length n: {Θ(0),Θ(1), . . .Θ(n)}.4. Assuming convergence by iteration m, compute posterior mean,

quantiles, etc. using samples m through n.

• Many variations possible:

Page 63: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).2. Generate (Θ(1)|Θ(0)) by sampling:

θ(1)1 from f(θ(1)

1 |X,Θ(0)−1)

θ(1)2 from f(θ(1)

2 |X,Θ(0)−2)

...θ

(1)p from f(θ(1)

p |X,Θ(0)−p)

3. Repeat step two to get chain of length n: {Θ(0),Θ(1), . . .Θ(n)}.4. Assuming convergence by iteration m, compute posterior mean,

quantiles, etc. using samples m through n.

• Many variations possible:

? Parameters to update each iteration, order of updating

Page 64: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).2. Generate (Θ(1)|Θ(0)) by sampling:

θ(1)1 from f(θ(1)

1 |X,Θ(0)−1)

θ(1)2 from f(θ(1)

2 |X,Θ(0)−2)

...θ

(1)p from f(θ(1)

p |X,Θ(0)−p)

3. Repeat step two to get chain of length n: {Θ(0),Θ(1), . . .Θ(n)}.4. Assuming convergence by iteration m, compute posterior mean,

quantiles, etc. using samples m through n.

• Many variations possible:

? Parameters to update each iteration, order of updating? ’Blocking’ parameters together, working with marginalized distributions

Page 65: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).2. Generate (Θ(1)|Θ(0)) by sampling:

θ(1)1 from f(θ(1)

1 |X,Θ(0)−1)

θ(1)2 from f(θ(1)

2 |X,Θ(0)−2)

...θ

(1)p from f(θ(1)

p |X,Θ(0)−p)

3. Repeat step two to get chain of length n: {Θ(0),Θ(1), . . .Θ(n)}.4. Assuming convergence by iteration m, compute posterior mean,

quantiles, etc. using samples m through n.

• Many variations possible:

? Parameters to update each iteration, order of updating? ’Blocking’ parameters together, working with marginalized distributions

• If conjugate priors used for all parameters, full conditionals in closed form.

Page 66: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 10

Gibbs Sampler

• Steps of Gibbs sampler:

1. Choose a set of starting values Θ(0).2. Generate (Θ(1)|Θ(0)) by sampling:

θ(1)1 from f(θ(1)

1 |X,Θ(0)−1)

θ(1)2 from f(θ(1)

2 |X,Θ(0)−2)

...θ

(1)p from f(θ(1)

p |X,Θ(0)−p)

3. Repeat step two to get chain of length n: {Θ(0),Θ(1), . . .Θ(n)}.4. Assuming convergence by iteration m, compute posterior mean,

quantiles, etc. using samples m through n.

• Many variations possible:

? Parameters to update each iteration, order of updating? ’Blocking’ parameters together, working with marginalized distributions

• If conjugate priors used for all parameters, full conditionals in closed form.

• What if we don’t have closed form distributions for full conditionals?

Page 67: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

Page 68: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

π(θ): kernel of distribution of interest for θ, f(θ(t)i |X,Θ

(t−1)−i ).

Page 69: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

π(θ): kernel of distribution of interest for θ, f(θ(t)i |X,Θ

(t−1)−i ).

• Steps:

Page 70: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

π(θ): kernel of distribution of interest for θ, f(θ(t)i |X,Θ

(t−1)−i ).

• Steps:

1. Get θ(0)= starting value for θ.

Page 71: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

π(θ): kernel of distribution of interest for θ, f(θ(t)i |X,Θ

(t−1)−i ).

• Steps:

1. Get θ(0)= starting value for θ.2. Get θ∗=proposed value for θ(1), by sampling from proposal density

q(θ|X, θ(0)).

Page 72: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

π(θ): kernel of distribution of interest for θ, f(θ(t)i |X,Θ

(t−1)−i ).

• Steps:

1. Get θ(0)= starting value for θ.2. Get θ∗=proposed value for θ(1), by sampling from proposal density

q(θ|X, θ(0)).

3. Compute α(θ(0), θ∗)=min(

1,π(θ∗)q(θ(0)|θ∗)π(θ(0)q(θ∗|θ(0))

).

Page 73: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

π(θ): kernel of distribution of interest for θ, f(θ(t)i |X,Θ

(t−1)−i ).

• Steps:

1. Get θ(0)= starting value for θ.2. Get θ∗=proposed value for θ(1), by sampling from proposal density

q(θ|X, θ(0)).

3. Compute α(θ(0), θ∗)=min(

1,π(θ∗)q(θ(0)|θ∗)π(θ(0)q(θ∗|θ(0))

).

4. Generate u ∼Uniform(0,1).

Page 74: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

π(θ): kernel of distribution of interest for θ, f(θ(t)i |X,Θ

(t−1)−i ).

• Steps:

1. Get θ(0)= starting value for θ.2. Get θ∗=proposed value for θ(1), by sampling from proposal density

q(θ|X, θ(0)).

3. Compute α(θ(0), θ∗)=min(

1,π(θ∗)q(θ(0)|θ∗)π(θ(0)q(θ∗|θ(0))

).

4. Generate u ∼Uniform(0,1).If u < α⇒ let θ(1) = θ∗, else let θ(1) = θ(0).

Page 75: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 11

Metropolis-Hastings Algorithm

• Metropolis-Hastings algorithm (Metropolis et al. 1953, Hastings 1970):

Method to construct a Markov Chain for θ, even if closed form expressionfor distribution is not available.

π(θ): kernel of distribution of interest for θ, f(θ(t)i |X,Θ

(t−1)−i ).

• Steps:

1. Get θ(0)= starting value for θ.2. Get θ∗=proposed value for θ(1), by sampling from proposal density

q(θ|X, θ(0)).

3. Compute α(θ(0), θ∗)=min(

1,π(θ∗)q(θ(0)|θ∗)π(θ(0)q(θ∗|θ(0))

).

4. Generate u ∼Uniform(0,1).If u < α⇒ let θ(1) = θ∗, else let θ(1) = θ(0).

• Types of proposals: Random Walk, Independence, Symmetric

Page 76: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 12

Assessing Convergence of Markov Chains

• The Markov Chain is known to converge to the stationary distribution ofinterest, but how do I know when convergence has been achieved?

Page 77: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 12

Assessing Convergence of Markov Chains

• The Markov Chain is known to converge to the stationary distribution ofinterest, but how do I know when convergence has been achieved?

i.e. How do I decide how long the burn-in should be?

Page 78: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 12

Assessing Convergence of Markov Chains

• The Markov Chain is known to converge to the stationary distribution ofinterest, but how do I know when convergence has been achieved?

i.e. How do I decide how long the burn-in should be?

1. Look at time series plots for the parameters.

Page 79: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 12

Assessing Convergence of Markov Chains

• The Markov Chain is known to converge to the stationary distribution ofinterest, but how do I know when convergence has been achieved?

i.e. How do I decide how long the burn-in should be?

1. Look at time series plots for the parameters.2. Run multiple chains with divergent starting values.

Page 80: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 12

Assessing Convergence of Markov Chains

• The Markov Chain is known to converge to the stationary distribution ofinterest, but how do I know when convergence has been achieved?

i.e. How do I decide how long the burn-in should be?

1. Look at time series plots for the parameters.2. Run multiple chains with divergent starting values.3. Run formal diagnostics (Gelman and Rubin 1992, Geweke 1992)

Page 81: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 12

Assessing Convergence of Markov Chains

• The Markov Chain is known to converge to the stationary distribution ofinterest, but how do I know when convergence has been achieved?

i.e. How do I decide how long the burn-in should be?

1. Look at time series plots for the parameters.2. Run multiple chains with divergent starting values.3. Run formal diagnostics (Gelman and Rubin 1992, Geweke 1992)

• Other issues:

? Length of chain

Page 82: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 12

Assessing Convergence of Markov Chains

• The Markov Chain is known to converge to the stationary distribution ofinterest, but how do I know when convergence has been achieved?

i.e. How do I decide how long the burn-in should be?

1. Look at time series plots for the parameters.2. Run multiple chains with divergent starting values.3. Run formal diagnostics (Gelman and Rubin 1992, Geweke 1992)

• Other issues:

? Length of chain? Thinning to decrease autocorrelation

Page 83: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 12

Assessing Convergence of Markov Chains

• The Markov Chain is known to converge to the stationary distribution ofinterest, but how do I know when convergence has been achieved?

i.e. How do I decide how long the burn-in should be?

1. Look at time series plots for the parameters.2. Run multiple chains with divergent starting values.3. Run formal diagnostics (Gelman and Rubin 1992, Geweke 1992)

• Other issues:

? Length of chain? Thinning to decrease autocorrelation

Page 84: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 13

Example: Hierarchical Models

• Example: Growth curves for rats.

Page 85: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 13

Example: Hierarchical Models

• Example: Growth curves for rats.• Data Yij consists of weights for 30 rats over 5 weeks.

Page 86: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 13

Example: Hierarchical Models

• Example: Growth curves for rats.• Data Yij consists of weights for 30 rats over 5 weeks.

Rat Growth Model Data

Time (days)

Wei

ght (

g)

10 15 20 25 30 35

100

150

200

250

300

350

Page 87: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 13

Example: Hierarchical Models

• Example: Growth curves for rats.• Data Yij consists of weights for 30 rats over 5 weeks.

Rat Growth Model Data

Time (days)

Wei

ght (

g)

10 15 20 25 30 35

100

150

200

250

300

350

• Can estimate mean growth curve by linear regression, but growth curvemodels necessary to get standard errors right.

Page 88: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

Page 89: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

Page 90: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

αi ∼ Normal(αc, τα)

Page 91: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

αi ∼ Normal(αc, τα)

βi ∼ Normal(βc, τβ)

Page 92: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

αi ∼ Normal(αc, τα)

βi ∼ Normal(βc, τβ)

• Model could be fit using linear mixed model or Bayesian hierarchical model.

Page 93: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

αi ∼ Normal(αc, τα)

βi ∼ Normal(βc, τβ)

• Model could be fit using linear mixed model or Bayesian hierarchical model.

• Priors (conjugate and vague):αc, βc ∼ Normal(0, 10−6)

Page 94: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

αi ∼ Normal(αc, τα)

βi ∼ Normal(βc, τβ)

• Model could be fit using linear mixed model or Bayesian hierarchical model.

• Priors (conjugate and vague):αc, βc ∼ Normal(0, 10−6)

τc, τα, τβ ∼ Gamma(0.001, 0.001)

Page 95: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

αi ∼ Normal(αc, τα)

βi ∼ Normal(βc, τβ)

• Model could be fit using linear mixed model or Bayesian hierarchical model.

• Priors (conjugate and vague):αc, βc ∼ Normal(0, 10−6)

τc, τα, τβ ∼ Gamma(0.001, 0.001)

• Gibbs sampler:

Since conjugate priors were used, the full conditionals are all available inclosed form and can be derived using some algebra.

Page 96: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

αi ∼ Normal(αc, τα)

βi ∼ Normal(βc, τβ)

• Model could be fit using linear mixed model or Bayesian hierarchical model.

• Priors (conjugate and vague):αc, βc ∼ Normal(0, 10−6)

τc, τα, τβ ∼ Gamma(0.001, 0.001)

• Gibbs sampler:

Since conjugate priors were used, the full conditionals are all available inclosed form and can be derived using some algebra.

• WinBUGS: Statistical software to perform MCMC in general problems.

Page 97: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 14

Example: Hierarchical Models

• Model: Yij ∼ Normal(µij, τc)

µij = αi + βi(xj − x)

αi ∼ Normal(αc, τα)

βi ∼ Normal(βc, τβ)

• Model could be fit using linear mixed model or Bayesian hierarchical model.

• Priors (conjugate and vague):αc, βc ∼ Normal(0, 10−6)

τc, τα, τβ ∼ Gamma(0.001, 0.001)

• Gibbs sampler:

Since conjugate priors were used, the full conditionals are all available inclosed form and can be derived using some algebra.

• WinBUGS: Statistical software to perform MCMC in general problems.

Page 98: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.

Page 99: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.

Page 100: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.? Given posterior samples, can get all benefits of Bayesian inference.

Page 101: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.? Given posterior samples, can get all benefits of Bayesian inference.

• Words of Caution:

Page 102: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.? Given posterior samples, can get all benefits of Bayesian inference.

• Words of Caution:

? Monitor convergence!

Page 103: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.? Given posterior samples, can get all benefits of Bayesian inference.

• Words of Caution:

? Monitor convergence!∗ Unfortunately, the most complex models tend to converge very slowly.

Page 104: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.? Given posterior samples, can get all benefits of Bayesian inference.

• Words of Caution:

? Monitor convergence!∗ Unfortunately, the most complex models tend to converge very slowly.∗ Can try blocking and marginalization to decrease correlation of model

parameters in MCMC.

Page 105: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.? Given posterior samples, can get all benefits of Bayesian inference.

• Words of Caution:

? Monitor convergence!∗ Unfortunately, the most complex models tend to converge very slowly.∗ Can try blocking and marginalization to decrease correlation of model

parameters in MCMC.? Check if your answers make sense - compare with plots and simple

methods

Page 106: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.? Given posterior samples, can get all benefits of Bayesian inference.

• Words of Caution:

? Monitor convergence!∗ Unfortunately, the most complex models tend to converge very slowly.∗ Can try blocking and marginalization to decrease correlation of model

parameters in MCMC.? Check if your answers make sense - compare with plots and simple

methods? Perform sensitivity analysis on priors.

Page 107: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 15

Conclusions

• Why use MCMC?

? Flexible computing tool with ability to fit complex models.? No need to make simplified modeling assumptions out of convenience.? Given posterior samples, can get all benefits of Bayesian inference.

• Words of Caution:

? Monitor convergence!∗ Unfortunately, the most complex models tend to converge very slowly.∗ Can try blocking and marginalization to decrease correlation of model

parameters in MCMC.? Check if your answers make sense - compare with plots and simple

methods? Perform sensitivity analysis on priors.

• Other book: Gelman, Carlin, Stern, & Rubin (1995) Bayesian Data Analysis

Page 108: Introduction to Bayesian Data Analysis and Markov …odin.mdacc.tmc.edu/~jmorris/talks_files/mcmc2.pdf · Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey

MCMC OVERVIEW 16

ReferencesGelman A and Rubin DB (1992) . Inference from iterative simulation usingmultiple sequences. Statistical Science 7, 457�75472.

Geman S and Geman D (1984) . Stochastic relaxation, Gibbs distributions,and the Bayesian restoration of images. IEEE Trans. Pattn. Anal. Mach. Intel.6, 721�75741.

Geweke J (1992) . Evaluation of accuracy of sampling-based approaches tothe calculation of posterior moments. In Bayesian Statistics 4(ed. JMBernardo, J Berger, AP Dawid and AFM Smith), pp. 169�75193. OxfordUniversity Press.

Gilks WR, Richardson S, and Spiegelhalter DJ (1996) . Markov ChainMonte Carlo in Practice, Chapman and Hall.

Hastings WK (1970) . Monte Carlo sampling methods using Markov chainsand their applications. Biometrika 57, 97�75109.

Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH and Teller E(1953). Equations of state calculations by fast computing machine. J. Chem.Phys. 21, 1087�751091.