Who is this man?
How sure are you?
SISMID University of Washington Bayesian Phylogenetics
The one ‘true’ tree?
Methods we’ve learned so fartry to find a single tree thatbest describes the dataHowever, they do not searcheverywhere, and it is difficultto find the “best” treeMany (gazillions of) trees maybe almost as good
SISMID University of Washington Bayesian Phylogenetics
Bayesian phylogenetics: general principle
Using Bayesian principles, we will search for and average over setsof plausible trees (weighted by their probability) instead a single“best” treeIn this method, the “space” that you search is limited by priorinformation and the data
The posterior distribution oftrees naturally translates intoprobability statements (anduncertainty) on aspects ofdirect scientific interest
I When did an evolutionaryevent happen?
I Are a subset of sequencesmore closely related?
The cost: we must formalizeour prior beliefs
SISMID University of Washington Bayesian Phylogenetics
Conditional probability: intuition
Philippe is a hipster.Philippe is a hipster
and rides a single-speed bike.
Which is more probable?
SISMID University of Washington Bayesian Phylogenetics
Conditional probability: intuition
Arbitrary events A (hipster) and B (bike) from sample space U
SISMID University of Washington Bayesian Phylogenetics
Bayes theorem
Definition of conditional probability in words:
probability(A and B) = probability(A given B)× probability(B)
In usual mathematical symbols:
p(A|B)p(B) = p(A,B) = p(B|A)p(A)
With a slight re-arrangement:
p(A|B) =p(B|A)p(A)
p(B)
“Just" a restatement of conditional probability
SISMID University of Washington Bayesian Phylogenetics
Bayes theorem
Integration (averaging) yields a marginal probability:
p(A) =
∫p(A,B)dB =
∫p(A|B)p(B)dB︸ ︷︷ ︸
over all possible values of B
probability(hipster) = probability(hipster and has bike) +probability(hipster and has no bike)
SISMID University of Washington Bayesian Phylogenetics
Conditional probability: pop quiz
What do you know about Thomas Bayes?Bayes theorem?
Some discussion points:Favorite game? Best buddies?
SISMID University of Washington Bayesian Phylogenetics
Bayes theorem for statistical inference
Unknown quantity θ (model parameters, scientific hypotheses)Prior p(θ) beliefs before observed data Y become availableConditional probability p(Y |θ) of the data given fixed θ – alsocalled the likelihood of YPosterior p(θ|Y ) beliefs:
p(θ|Y ) =p(Y |θ)p(θ)p(Y )
p(θ) and p(Y |θ) – easyp(Y ) =
∫p(Y |θ)p(θ)dθ – hard
SISMID University of Washington Bayesian Phylogenetics
Bayesian phylogenetic inference
SISMID University of Washington Bayesian Phylogenetics
Bayesian phylogenetic inference
SISMID University of Washington Bayesian Phylogenetics
Bayesian phylogenetic inference
Posterior:
p(θ|Y ) =p(Y |θ)p(θ)p(Y )
Trouble: p(Y ) is notcomputable – sum over allpossible treesFor N taxa: there are G(N) =(2N − 3)× (2N − 5)× · · · × 1
θ = (tree, substitution process)p(Y |θ) - continuous-timeMarkov chain process thatgives rise to sequences at tipsof tree
E.g., G(21) > 3× 1023
SISMID University of Washington Bayesian Phylogenetics
Priors
Strongest assumption: most parameters are separable, e.g. the treeis independent of the substitution processWeaker assumption: tree ∼ Coalescent processWeaker assumption: functional form on substitution parameters
Specialized priors as wellIf worried: check sensitivity
SISMID University of Washington Bayesian Phylogenetics
Posterior inference
Numerical (Monte Carlo) integration as a solution:
SISMID University of Washington Bayesian Phylogenetics
Markov chain Monte Carlo
Metropolis et al (1953) andHastings (1970) proposed astochastic integrationalgorithm that can explore vastparameter spacesAlgorithm generates a Markovchain that visits parametervalues (e.g., a specific tree)with frequency equal to theirposterior density / probability.Markov chain: random walkwhere the next step onlydepends on the currentparameter state
SISMID University of Washington Bayesian Phylogenetics
Metropolis-Hastings Algorithm
Each step in the Markov chain starts at its current state θ andproposes a new state θ? from an arbitrary proposal distributionq(·|θ) (transition kernel)θ? becomes the new state of the chain with probability:
R = min
p(θ?|Y )
p(θ|Y )× q(θ|θ?)q(θ?|θ)
= min
p(Y |θ?)p(θ?) / p(Y )
p(Y |θ)p(θ) / p(Y )× q(θ|θ?)q(θ?|θ)
= min
p(Y |θ?)p(θ?)
p(Y |θ)p(θ)× q(θ|θ?)q(θ?|θ)
Otherwise, θ remains the state of the chain
SISMID University of Washington Bayesian Phylogenetics
Posterior samplingWe repeat the process of proposing a new state,calculating the acceptance probability and eitheraccepting or rejecting the proposed move millions oftimes
Although correlated, the Markovchain samples are valid draws fromthe posterior; however . . .
Initial sampling (burn-in) is oftendiscarded due to correlation withchain’s starting point ( 6= posterior)
SISMID University of Washington Bayesian Phylogenetics
Transition Kernels
Often we propose changes to only a small # of dimensions in θ at atime (Metropolis-within-Gibbs)In phylogenetics, mixing (correlation) in continuous dimensions ismuch better (smaller) than for the treeSo, dominate approach has been keep-it-simple-stupid –alternatives exist and may become necessary:
I Gibbs sampler; slice sampler; Hamiltonian MC
SISMID University of Washington Bayesian Phylogenetics
Tree Transition Kernels
SISMID University of Washington Bayesian Phylogenetics
Posterior Summaries
For continuous θ, consider:posterior mean or median ≈MCMC sample average ormedianquantitative measures ofuncertainty, e.g. high posteriordensity interval
Credible Regions
Parameter (x)
The Bayesian equivalent of a confidence interval is called the highest posterior density (HPD) credible region. This is the
smallest region that contains 95% of the posterior probability.
lower 95% HPD limit
post
erio
r pr
obab
ility
di
stri
butio
n
upper 95% HPD limit
For trees, consider:scientifically interestingposterior probabilitystatement, e.g. the probabilityof monophyly ≈ MCMCsample proportion under whichhypothesis is true
SISMID University of Washington Bayesian Phylogenetics
Posterior Probabilities
SISMID University of Washington Bayesian Phylogenetics
Summarizing Trees
SISMID University of Washington Bayesian Phylogenetics
MCMC Diagnostics: within a single chain
VisuallyinspectMCMCoutput traces
Measure au-tocorrelationwithin achain: theeffectivesample size(ESS)
SISMID University of Washington Bayesian Phylogenetics
MCMC Diagnostics: across multiple chains
VisuallyinspectMCMCoutput traces
Comparing different chains → variance among and between chains
SISMID University of Washington Bayesian Phylogenetics
Improving Mixing
(Only if convergence diagnostics suggest a problem)
Run the chain longerUse a more parsimonious model(uninformative data)Change tuning parameters of transitionkernels to bring acceptance rates to 10% to70%
Use different transition kernels (consult anexpert)
SISMID University of Washington Bayesian Phylogenetics
Improving Mixing
SISMID University of Washington Bayesian Phylogenetics
Why Bother being Bayesian?
In practice, we have almost no prior knowledge for the modelparameters. So, why bother with Bayesian inference?
Analysis provides directly interpretable probability statementsgiven the observed dataMCMC is a stochastic algorithm that (in theory) avoids gettingtrapped in local sub-optimal solutionsSearch space under Coalescent prior is astronomically “smaller”By numerically integrating over all possible trees, we obtainmarginal probability statements on hypotheses of scientific interest,e.g. specific branching events or population dynamics, avoiding bias
SISMID University of Washington Bayesian Phylogenetics