Top Banner
Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions Jeremy M. Brown Robert C. Thomson @jembrown www.phyleauxgenetics.org
32

Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Jan 23, 2017

Download

Science

jembrown
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Jeremy M. Brown Robert C. Thomson

@jembrown www.phyleauxgenetics.org

Page 2: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Bayesian Inference Requires Integration

Tree,Parameter Space

Pro

babi

lity

Den

sity

Ƭ2

Ƭ1

Page 3: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Markov Chain Monte Carlo (MCMC)

Tree,Parameter Space

Pro

babi

lity

Den

sity

1) Start somewhere 2) Propose a new position 3) Calculate posterior density

ratio (r) of new to old states - If r > 1, accept - If r < 1, accept with

probability r. 4) Record state. 5) Repeat many times.

Yes!Maybe

Page 4: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Markov Chain Monte Carlo (MCMC)

Tree,Parameter Space

Pro

babi

lity

Den

sity

Page 5: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

MCMC Has Trouble With Rugged Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

Ƭ2

Ƭ1

Page 6: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

Tree,Parameter Space

Pro

babi

lity

Den

sity

MCMC Has Trouble With Rugged Distributions

Page 7: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

MCMC Has Trouble With Rugged Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

Page 8: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Bipartition Bayes Factors

A

B

C

E

D

Marginal likelihood with AB | CDE

Bayes Factor

Marginal likelihood without AB | CDE + -

Page 9: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Negative Constraints = Rugged Distributions

Page 10: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Negative Constraints = Rugged Distributions

homo_sapiens

pantherophis_guttata

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

crocodylus_porosus

pelomedusa_subrufa

sphenodon_tuatara

chrysemys_picta

homo_sapiens

chrysemys_picta

sphenodon_tuatara

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

pantherophis_guttata

pelomedusa_subrufa

crocodylus_porosus

zebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa

Page 11: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Alternative Insertion Swaps are Difficult

homo_sapiens

pantherophis_guttata

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

crocodylus_porosus

pelomedusa_subrufa

sphenodon_tuatara

chrysemys_picta zebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa

Data

Data

Page 12: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

The Po-Boy Problem

How do you change the seafood on your po-boy while someone’s holding the sandwich?

Shrimp

Oysters

Halves of french roll = Naturally monophyletic taxa

Seafood = Inserted taxon

Page 13: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Metropolis Coupling (MC3) Improves Mixing

Tree,Parameter Space

Pro

babi

lity

Den

sity Additional heated chains

can act as “scouts”.

Swap?

Page 14: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Peaks All Found, But Different Probabilities?

homo_sapiens

chrysemys_picta

sphenodon_tuatara

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

pantherophis_guttata

pelomedusa_subrufa

crocodylus_porosus

homo_sapiens

pantherophis_guttata

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

crocodylus_porosus

pelomedusa_subrufa

sphenodon_tuatara

chrysemys_pictazebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa0.500.25

0.240.38

0.250.24

Run 1Run 2

GenerationLn

L

Page 15: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

A Closer Look at the Acceptance Ratio

r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)

pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)

Page 16: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

A Closer Look at the Acceptance Ratio

Does chain i like where chain j is?

Does chain j like where chain i is?

r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)

pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)

Page 17: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

A Closer Look at the Acceptance Ratio

r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)

pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)

r =

p(⌧j , ✓j |D)

p(⌧i, ✓i|D)

� 1Ti

� 1Tj

Page 18: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

A Closer Look at the Acceptance Ratio

r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)

pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)

r =

p(⌧j , ✓j |D)

p(⌧i, ✓i|D)

� 1Ti

� 1Tj

When temps equal, ALL swaps accepted regardless of posterior density.

Page 19: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

A Simple One-Parameter Example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

https://github.com/jembrown/toyMC3/

Page 20: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Max Temp > Number of Chains

2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Maximum Temperature

Peak O

ne P

robability

5 Chains

10 Chains

20 Chains

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

bability D

ensity

0.8

0.2

Page 21: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Peaks Have Different “Capture” Probabilities

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

P=0.8 P=0.2

Page 22: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Spurious Convergence by Chain Number

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

P=0.8 P=0.2

When two runs end up with the same distribution

of poorly mixing chains across peaks,

they will estimate nearly identical (but incorrect!)

probabilities.

Page 23: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Lots of Chains Looks Like Convergence

2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Maximum Temperature

Peak O

ne P

robability/S

tandard

Devia

tion

5 Chains

10 Chains

20 Chains

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

ba

bility D

en

sity

0.8

0.2

Page 24: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

Peak One0.8 * N

Peak Two0.2 * N

P=0.8 P=0.2

N (large #) Chains

Law of Large Numbers

Lots of Chains Looks Like Convergence

Page 25: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Negative Constraint on Bird Monophyly

zebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Maximum Temperature

Pro

babi

lity

2 Chains4 Chains8 Chains16 Chains32 Chains

Page 26: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Negative Constraint on Bird Monophyly

zebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Maximum Temperature

Pro

babi

lity/

Sta

ndar

d D

evia

tion 2 Chains

4 Chains8 Chains16 Chains32 Chains

Page 27: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Warnings

• Despite improving mixing, MC3 analyses still require careful thought.

• With small numbers of chains and small numbers of runs, estimated probabilities can be incorrect but identical across some runs.

• With large numbers of chains, estimated probabilities become increasingly similar across all runs.

Page 28: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Broad v Rugged Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

Page 29: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Recommendations

• For rugged distributions, increase maximum chain temperature not chain number

• For broad distributions, increase chain number

• Use more than 2 runs

Page 30: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Thank You

DEB-1355071DEB-1354506

@jembrown

Michael Landis

Karen Cranston

Page 31: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Negative Constraints = Rugged Distributions

TreeScaper

Guifang Zhou (SSB symposium lightning talk) - Monday, 1:45-1:50 - Ballroom A "A network framework to explore phylogenetic structure in genome data"

Guifang Zhou (iEvoBio talk) - Tuesday, 2:05-2:12 - Meeting Room 9C"TreeScaper: Software to visualize and extract phylogenetic signals from sets of trees”

https://github.com/whuang08/TreeScaper

Page 32: Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Spurious Convergence by Chain Number

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

2 Chains, 0 Chains0.64

1 Chain, 1 Chain0.32

0 Chains, 2 Chains0.04 P=0.8 P=0.2

2 Chains