Applied Bayesian Methods Phil Woodward 1Phil Woodward 2014.

Applied Bayesian Methods

Phil Woodward

1Phil Woodward 2014

Introduction to Bayesian Statistics

2Phil Woodward 2014

Inferences via Sampling Theory• Inferences made via sampling distribution of statistics

– A model with unknown parameters is assumed– Statistics (functions of the data) are defined– These statistics are in some way informative about the parameters– For example, they may be unbiased, minimum variance estimators

• Probability is the frequency with which recurring events occur– The recurring event is the statistic for fixed parameter values– The probabilities arise by considering data other than actually seen– Need to decide on most appropriate “reference set”– Confidence and p-values are p(data “or more extreme”| θ) calculations

• Difficulties when making inferences– Nuisance parameters an issue when no suitable sufficient statistics– Constraints in the parameter space cause difficulties– Confidence intervals and p-values are routinely misinterpreted

• They are not p(θ | data) calculations3Phil Woodward 2014

How does Bayes add value?• Informative Prior

– Natural approach for incorporating information already available– Smaller, cheaper, quicker and more ethical studies– More precise estimates and more reliable decisions– Sometimes weakly informative priors can overcome model fitting failure

• Probability as a “degree of belief”– Quantifies our uncertainty in any unknown quantity or event– Answers questions of direct scientific interest

• P(state of world | data) rather than P(data* | state of world)

• Model building and making inferences– Nuisance parameters no longer a “nuisance”– Random effects, non-linear terms, complex models all handled better– Functions of parameters estimated with ease– Predictions and decision analysis follow naturally– Transparency in assumptions

• Beauty in its simplicity!– p(θ | x) = p(x | θ) p(θ) / p(x)– Avoids issue of identifying “best” estimators and their sampling properties– More time spent addressing issues of direct scientific relevance

4Phil Woodward 2014

Probability• Most Bayesians treat probability as a measure of belief

– Some believe probabilities can be objective (not discussed here)– Probability not restricted to recurring events

• E.g. probability it will rain tomorrow is a Bayesian probability

– Probabilities lie between 0 (impossible event) and 1 (certain event)– Probabilities between 0 and 1 can be calibrated via the “fair bet”

• What is a “fair bet”?– Bookmaker sells a bet by stating the odds for or against an event– Odds are set to encourage a punter to buy the bet

• E.g. odds of 2-to-1 against means that for each unit staked two are won, plus the stake

– A fair bet is when one is indifferent to being bookmaker or punter• i.e. one doesn’t believe either side has an unfair advantage in the gamble

5Phil Woodward 2014

Probability• Relationship between odds and probability

– One-to-one mapping between odds (O) and probability (P)

Where O equals the ratio X/Y for odds of X-to-Y in favour and the ratio Y/X for odds of X-to-Y against an evente.g. odds of 2-to-1 against, if fair, imply probability equals ⅓

• Probabilities defined this way are inevitably subjective– People with different knowledge may have different probabilities– Controversy occurs when using this definition to interpret data– Science should be “objective”, so “subjectivity” to some is heresy– But where do the models that Frequentists use come from?– Are the decisions made when designing studies purely objective?– Is judgment needed when generalising from a sample to a population?

6Phil Woodward 2014

Probability• Subjectivity does not mean biased, prejudiced or unscientific

– Large body of research into elicitation of personal probabilities– Where frequency interpretation applies, these should support beliefs

• E.g. the probability of the next roll of a die coming up a six should be ⅙ for everyone unless you have good reason to doubt the die is fair

– An advantage of the Bayesian definition is that it allows all other information to be taken into account

• E.g. you may suspect the person offering a bet on the die roll is of dubious character• Bayesians are better equipped to win at poker than Frequentists!

• All unknown quantities, including parameters, are considered random variables– each parameter still has only one true value– our uncertainty in this value is represented by a probability distribution

7Phil Woodward 2014

Epistemic uncertainty

Exchangeability

• Exchangeability is an important Bayesian concept– exchangeable quantities cannot be partitioned into more

similar sub-groups– nor can they be ordered in a way that infers we can

distinguish between them– exchangeability often used to justify prior distribution for

parameters analogous to classical random effects

8Phil Woodward 2014

comes Bayes Theorem

Nothing controversial yet.

The Bayesian Paradigm

)|Pr()Pr()|Pr(

),Pr()|Pr(

9Phil Woodward 2014

How is Bayes Theorem (mis)used?Coin tossing study: Is the coin fair?

Modelri ~ bern(π) i = 1, 2, ..., nri = 1 if ith toss a head, = 0 if a tail

Let terms in Bayes Theorem beA = π (controversial)B = r

)|()()|(

10Phil Woodward 2014

What are these terms?

p(r|π) is the likelihood= bin(n, Σr| π) (not controversial)

p(π) is the prior= ??? (controversial)

The prior formally represents our knowledge of π before observing r

What are these terms (continued)?

p(r) is the normalising constant= ∫ p(r|π) p(π) dπ (the difficult bit!)

p(π|r) is the posterior

The posterior formally represents our knowledge of π after observing r

MCMC to the rescue!

In general,not in this particular case

A worked example.

Coin tossed 5 times giving 4 heads and 1 tailp(r|π) = bin(n=5, Σr=4| π)

p(π) = beta(a, b), when a=b=1 ≡ U(0, 1)Why choose a beta distribution?!

- conjugacy … posterior p(π|r) = beta(a+Σr, b+n-Σr)

- can represent vague belief?- can be an objective reference?- Beta family is flexible (could be informative)

...but is a stronger prior justifiable?

What if data were 5 dogs in tox study:

4 OK, 1 with an AE?

A worked example (continued).Applying Bayes theorem

p(π|r) = beta(5, 2)95% credible interval

π : (0.36 to 0.96)Pr[π ϵ (0.36 to 0.96) | Σr = 4] = 0.95

95% confidence intervalπ : (0.28 to 0.995)

Pr[Σr ≥ 4 | π = 0.28] = 0.025, Pr[Σr ≤ 4 | π = 0.995] = 0.025

Bayesian inference for simple Normal modelClinical study: What’s the mean response to placebo?

Modelyi ~ N(µ, σ2) i = 1, 2, ..., n (placebo subjects only)

assume σ known and for convenience will useprecision parameter τ = σ-2 (reciprocal of variance)

Terms in Bayes Theorem are

)|()()|(

Improper prior density

Posterior precision equals sum of prior and data precisions

Posterior mean equals weighted mean of

prior and data

A worked example (continued).Applying Bayes theorem

p(µ |y) = N(80, 0.5)95% credible interval

µ : (78.6 to 81.4)

95% confidence intervalµ : (78.6 to 81.4)

Bayesian inference for simple Normal modelThe case when both mean and variance are unknown

Modelyi ~ N(µ, σ2) i = 1, 2, ..., n

),|(),()|,(

Bayesian inference for Normal Linear ModelModely = Xθ + ε εi ~ N(0, σ2) i = 1, 2, ..., n

y and ε are n x 1 vectors of observations and errorsX is a n x k matrix of known constantsθ is a k x 1 vector of unknown regression coefficients

),|(),()|,(

θyθyθ

In summary, for Normal Linear Model (“fixed effects”)Classical confidence intervals can be interpretedas Bayesian credible intervals

But, need to be aware of implicit prior distributions

Not generally the case for other error distributions

But for “large samples” when likelihood based estimatorhas approximate Normal distribution, a Bayesian interpretation can again be made

“Random effects” models are not so easily comparedDon’t assume classical results have Bayesian interpretation

Conditional (on µ) distribution for future response

Posterior distribution for µ

N(µ, σ2) N(µ1, 1/τ1)

Sum of posterior variance of µ and conditional variance of yf

yf ~ N(µ1, 1/τ1 + 1/τ)

Predictive DistributionsWhen are predictive distributions useful?

When designing studieswe predict the data using priors to assess the design

we may use informative priors to reduce study size, these being predictions from historical studies

When undertaking interim analyseswe can predict the remaining data using current posterior

When checking adequacy of our assumed modelmodel checking involves comparing observations with predictions

When making decisions after study has completedwe can predict future trial data to assess probability of

success,helping to determine best strategy or decide to stop

Some argue predictive inferences should be our main focusbe interested in observable rather than unobservable

quantitiese.g. how many patients will do better on this drug?

“design priors” must be informative

δ is treatment effect

Making Decisions

A simple Bayesian approach defines criteria of the form

Pr(δ ≥ Δ) > πwhere Δ is an effect size of interest, and π is the probability required to make a positive decision

For example, Bayesian analogy to significance could be

Pr(δ > 0) > 0.95But is believing δ > 0 enough for further investment?

END OF PART 1intro to WinBUGSillustrating fixed effect models

Bayesian Model Checking

Brief outline of some methods easy to use with MCMC

Consider three model checking objectives1. Examination of individual observations2. Global tests of goodness-of-fit3. Comparison between competing models

In all cases we compare observed statistics with expectations, i.e. predictions conditional on a model

yi is the observationYi is the prediction

E(Yi) is the mean of the predictive distribution

Bayesian residuals can be examined as we do

classical residuals

p-value concept

Phil Woodward 2014

Ideally we would have a separate evaluation datasetPredictive distribution for Yi is then independent of yi

Typically not available for clinical studies

Cross-validation next best, but difficult within WinBUGS

Following methods use the data twice, so will be conservative, i.e. overstate how good model fits data

Will illustrate using WinBUGS code for simplest NLM

Bayesian Model Checking(Examination of Individual Observations)

{ ### Priors mu ~ dnorm(0, 1.0E-6) prec ~ dgamma(0.001, 0.001) ; sigma <- pow(prec, -0.5)

### Likelihood for (i in 1:N) { Y[i] ~ dnorm(mu, prec) }

### Model checking for (i in 1:N) { ### Residuals and Standardised Residuals resid[i] <- Y[i] – mu st.resid[i] <- resid[i] / sigma

### Replicate data set & Prob observation is extreme Y.rep[i] ~ dnorm(mu, prec) Pr.big[i] <- step( Y[i] – Y.rep[i] ) Pr.small[i] <- step( Y.rep[i] – Y[i] ) }}

each residual has a distributionuse the mean as the residual

Y.rep[i] is a prediction accounting for uncertainty in parameter values, but not in the type of model assumed

only need both when Y.rep[i] could

exactly equal Y[i]

More typically, each Y[i] has different mean, mu[i].

mean of Pr.big[i] estimates the probability a future observation is this big

Phil Woodward 2014

Identify a discrepancy measuretypically a function of the databut could be function of both data and parameters

Predict (replicate) values of this measureconditional on the type of model assumedbut accounting for uncertainty in parameter values

Compute “Bayesian p-value” for observed discrepancysimilar approach used for individual observationsconvention for global tests is to quote “p-value”

Bayesian Model Checking(Global tests of goodness-of-fit)

e.g. a measure of skewness for testing this aspect of

Normal assumption

Phil Woodward 2014

{ … code as before …

### Model checking for (i in 1:N) { ### Residuals and Standardised Residuals resid[i] <- Y[i] – mu st.resid[i] <- resid[i] / sigma m3[i] <- pow( st.resid[i], 3)

### Replicate data set Y.rep[i] ~ dnorm(mu, prec) resid.rep[i] <- Y.rep[i] – mu[i] st.resid.rep[i] <- resid.rep[i] / sigma m3.rep[i] <- pow( st.resid.rep[i], 3)

} skew <- mean( m3[] ) skew.rep <- mean( m3.rep[] ) p.skew.pos <- step( skew.rep – skew ) p.skew.neg <- step( skew – skew.rep )}

Bayesian Model Checking(Global tests of goodness-of-fit)

p.skew interpreted as for classical p-value, i.e. small

is evidence of a discrepancy

Phil Woodward 2014

Bayes factors ratio of marginal likelihoods under competing models

Bayesian analogy to classical likelihood ratio test

Bayesian Model Checking(Comparison between competing models)

not easy to implement using MCMC will not be discussed further

Phil Woodward 2014

Deviance Information Criterion (DIC)a Bayesian “information criterion” but not the BICwill not discuss theory, focus on practical interpretationWinBUGS & SAS can report this for most modelsDIC is the sum of two separately interpretable quantities

DIC = Dbar + pDDbar : the posterior mean of the deviancepD : the effective number of parameters in the model

pD = Dbar - DhatDhat : deviance point estimate using posterior mean of θ

Phil Woodward 2014

Deviance Information Criterion (DIC)

DIC = Dbar + pDpD will differ from the total number of parameters

when posterior distributions are correlatedtypically the case for “random effect parameters”non-orthogonal designs, correlated covariatescommon for non-linear modelspD will be smaller because some parameters’ effects “overlap”

Phil Woodward 2014

Deviance Information Criterion (DIC)

DIC = Dbar + pDMeasures model’s ability to make short-term predictions

Smaller values of DIC indicate a better modelRules of thumb for comparing models fitted to the same data

DIC difference > 10 is clear evidence of being betterDIC difference > 5 (< 10) is still strong evidence

There are still some unresolved issues with DICrelatively early days in it use, so use other methods as well

Phil Woodward 2014

Bayesian Model Checking(practical advice)

“All models are wrong, but some are useful”if we keep looking, or have lots of data, we will find lack-of-fitneed to assess whether model’s deficiencies matterdepends upon the inferences and decisions of interestjudge the model on whether it is fit for purpose

Sensitivity analyses are useful when uncertainshould assess sensitivity to both the likelihood and the prior

Model expansion may be necessaryBayesian approach particularly good hereInformative priors and MCMC allow greater flexibility

Phil Woodward 2014

e.g. replace Normal with t distribution

Introduction to BugsXLA

Parallel Group Clinical Study(Analysis of Covariance)

BugsXLA(case study 3.1)

Phil Woodward 2014

Switch to Excel and demonstrate how BugsXLA facilitates rapid Bayesian model specification and analysis via WinBUGS.

Settings used by WinBUGS

Suggested settings

Posterior distributions to be summarised

Posterior samples to be imported

Save WinBUGS files, create R scripts

Fixed factor effects parameterised as

contrasts from a zero constrained level

Default priors chosen to be “vague”

(no guarantees!)

Priors for other parameter types

Bayesian model checking options

BugsXLA uses generic names for parameters

in WinBUGS code (deciphered on input!)

Recommend adding MC Error to input list

The Excel sheet used to display the results

Generic names … deciphered

Posterior means, st.devs. & credible int.s

Could compute ratioMC Error / St.Dev.using cell formula

Reminder of model, prior and WinBUGS settings

BugsXLA interprets contents of cells to define predictions & contrasts to be estimated

In this case, predicted means for each level of factor TRT are defined

Recommend turn this off once understand how parameterised

Can set own alerts to be used with model checking functions

Other default settings can be personalised,

e.g. default priors

BugsXLA(Default Settings)

Fixed factor effects

parameterisation can be changed to

SAS (last level)

BugsXLA(Default Settings)

Obtaining Prior Distribution

Obtaining Prior Distributions

• Brief overview of main approaches*

• Further issues in the use of Priors*

* based on chapter 5 of Spiegelhalter et al (2004)

Obtaining Prior Distributions• Misconceptions: They are not necessarily

– Prespecified– Unique– Known– Influential

• Bayesian analysis– Transforms prior into posterior beliefs– Doesn’t produce the posterior distribution– Context and audience important– Sensitivity to alternative assumptions vital

• Prior could differ at design & analysis stage– May want less controversial vague priors in analysis– Design priors usually have to be informative

But prespecification strongly recommended, data must not influence

the prior distribution

• Five broad approaches– Elicitation of subjective opinion– Summarising past evidence– Default priors– Robust priors– Estimation using hierarchical models

• Elicitation of subjective opinion– Most useful when little ‘objective’ evidence– Less controversial at the design stage– Elicitation should be kept simple & interactive– O’Hagan is a strong advocate

• Spiegelhalter et al do not recommend– Prefer archetypal views; see Default Priors

Summarising Past Evidence.

Exchangeableθ,θh ~ N(μ, τ2)

(a) τ = ∞, μ = K(b) τ ~ dist.(f) τ = 0, μ = θ

Typically,yh ~ N(θh, σh

(c) θh = θ + δh

δh ~ N(0, σδh2)

θh ~ N(θ, σδh2)

Typically (b) adequate, maybe with more complexity. Meta-analytic-predictive

Obtaining Prior Distributions• Default Priors

– Vague a.k.a. non-informative or reference• WinBUGS (general advice for ‘simple’ models):

– Location parms ~ Normal with huge variance– Lowest level error variance ~ inv-gamma(small, small)– Hierarchical error variances … controverisal

sd ~ Uniform(0, big) or ~ Half-Normal(big); big < huge!– Sceptical & Enthusiastic Priors

• Sceptical used to determine when success achieved• Enthusiastic used to determine when to stop• Sceptical prior centred on 0 with small prob. effect > Δ• Enthusiastic prior centred on Δ with small prob. effect < 0

– ‘Lump-and-smear’ Priors• Point mass at the null hypothesis

Parameter “big” can be derived via eliciting inferred quantities, e.g. credible

differences between study means.

Might be appropriate for unprecedented

mechanisms in ED stage.

• Robust Priors– We always assess model assumptions– Bayesians assess prior assumptions also– Use a ‘community of priors’

• Discrete set• Parametric family• Non-parametric family

– Interpretation section recommended in report• Show how data affect a range of prior beliefs

Perhaps develop a range of priors appropriate in

typical case.

Example of a parametric family of priorsα is the discounting factor discussed previously(variant d: “equal but discounted”)

Not Recommended as no operationalInterpretation & no means of assessingsuitable values for alpha

• Hierarchical priors– In simplest case, the same as (b) Exchangeable– ‘Borrow strength’ between studies

• counter view: ‘share weakness’ – Three essential ingredients

• Exchangeable parameters• Form for random-effects dist.

– Typically Normal, although t is perhaps more realistic• Hyperprior for parms of random-effects dist.

– sd ~ Uniform(0, Max Credible) or Half-Normal(big) or Half-Cauchy(large)

• Case Study– Dental Pain Studies– Informative prior for placebo mean

• Used in the formal analysis

– Meta-analytic-predictive approach

Title Authors TreatmentTOTPAR[6]

Mean(Placebo Data) SE

Characterization of rofecoxib as a cyclooxygenase-2 isoform inhibitor and demonstration of analgesia in the dental pain model Elliot W. Ehrich et. al

Rofecoxib 50 and 500 mgIbuprofen 400 mgPlacebo

3.01 0.51

Valdecoxib Is More Efficacious Than Rofecoxib in Relieving Pain Associated With Oral Surgery Fricke J. et al.

Valdecoxib 40 mgRofecoxib 50 mgPlacebo

3.01 0.76

Rofecoxib versus codeine/acetaminophen in postoperative dental pain: a double-blind, randomized, placebo- and active comparator-controlled clinical trial Chang DJ; et al.

Rofecoxib 50 mgCodeine/Acetaminophen 60/600 mgPlacebo

3.4 1.22

Analgesic Efficacy of Celecoxib in Postoperative Oral Surgery Pain: A Single-Dose, Two-Center, Randomized, Double-Blind, Active- and Placebo-Controlled Study Raymond Cheung, et al

Celecoxib 400 mgIbuprofen 400 mgPlacebo

3.7 0.75

Combination Oxycodone 5 mg/Ibuprofen 400 mg for the Treatment of Postoperative Pain: A Double-Blind, Placebo and Active-Controlled Parallel-Group Study Thomas Van Dyke, et al

Oxycodone/Ibuprofen 5 mg/400 mgIbuprofen 400 mgOxycodone 5 mgPlacebo

4.2 0.83

(part of table of prior studies considered relevant)

• Meta-analysis of historical data– Published summary data

• Normal Linear Mixed ModelYi = θi + ei

θi ~ N(µθ, ω2)

ei ~ N(0, SEi2)

Yi are the observed placebo means from each study

SEi are their associated standard errors

{ for (i in 1:N) { Y[i] ~ dnorm(theta[i], prec[i]) theta[i] ~ dnorm(mu.theta, tau.theta) prec[i] <- pow(se[i], -2) } newtrial ~ dnorm(mu.theta, tau.theta) mu.theta ~ dnorm(0, 1.0E-6) tau.theta <- pow(omega, -2) omega ~ dunif(0, 100)}

• WinBUGS used to determine prior• assumes study means exchangeable• but not responses from different studies

If studies smaller, model should account for fact

that each se is estimated.

If few studies, might need slightly more informative prior.

‘newtrial’ provides

prior for a future study.

Gamma Distribution (or Inv-Gamma Dist.)Particularly useful in Bayesian statistics

Conjugate for Poisson meanMarginal distribution for σ-2 in NLMs

Chi-Sqr is a special case of the GammaChiSqr(v) ≡ Gamma(v/2, 0.5)

If s2 (v d.f.) is ML estimate of Normal variance(and conventional vague prior: p(σ2) α σ-2)

Posterior p(σ2 | s2) = v s2 Inv-ChiSqr(v) = Inv-Gamma(v/2, v s2/2)

NOTE: more than one parameterisation of

Gamma Dist.

Obtaining Prior Distributions• Empirical criticism of priors

– George Box suggested a Bayesian p-value• Prior predictive distribution for future observation• Compare actual observation with predictive dist.• Calculate prob. of observing more extreme• Measure of conflict between prior and data

– But what should you do if conflict occurs?• At least report this fact• Greater emphasis on analysis with a vaguer prior

– Robust prior approach• Formally model doubt using a mixture prior

e.g. observed placebo mean response

or heavy tailed e.g. t4 distribution

see model checking section

• Key Points– Subjectivity cannot be completely avoided– Range of priors should be considered– Elicited priors tend to be overly enthusiastic– Historical data is best basis for priors– Archetypal priors provide a range of beliefs– Default priors are not always ‘weak’– Exchangeability is a strong assumption

• but with hierarchical model plus covariates, best option?– Sensitivity analysis is very important

BugsXLA

Deriving and using informative prior distributions

Typically, model is much simpler than this, e.g. placebo

data only, no study level covariates, so only random

STUDY factor in model

BugsXLA can model study level summary statistics

(d.f. optional)

Predict placebo mean response in a future study

BugsXLA(case study 5.3, details not covered in this course)

BugsXLA(using informative prior distributions)

Back to Case Study 3.1

Will assume have derived informative priors for:Placebo mean response

Normal with mean 0 and standard deviation 0.04Residual variance

Scaled Chi-Square with s2 = 0.026 and df = 44

Switch back to Excel and show how to use this in BugsXLA

Import samples so prior and posterior can be compared.

Ignore this, unless you have R loaded and wish to explore in own time.

BugsXLA(case study 3.1, informative prior)

Informative priors for placebo

mean and residual variance

Click ‘sigma’ then ‘Post Plots’ icon

Update GraphCan edit histogram

(‘user specified’)

Repeat for ‘Beta0’ (placebo) & ‘X.Eff[1,3]’ (TRT C)

Can obtain other posterior summaries

CAUTIONAlthough prior

for TRT:C is flat, posterior is influenced

by other priors

Bayesian Study Design

Consider a generic decision criterion of the form

GO decision if Pr(δ ≥ Δ) > πδ is the treatment effectΔ is an effect size of interestπ is the probability required to make a positive decision

As previously discussed,a Bayesian analogy to significance could be

Pr(δ > 0) > 0.95

Number of subjects

Operating Characteristics (OC) Simple to calculate in any statistical software

e.g. for 2 group PG or AB/BA XO designR codenon-central t cdf1 - pt( qt(pi, df= df), df= df, ncp= (delta – DELTA)/(sigma*sqrt(2/N)) )

normal cdf1 - pnorm( qnorm(pi), mean= (delta – DELTA)/(sigma*sqrt(2/N)) )

Operating Characteristics (OC) If wanted to account for uncertainty in σDetermine Bayesian distribution for σ

e.g. σ ~ U(12, 18)Use simulation to calculate the OC

1. Simulate σ value from its distribution2. For each value of δ and this σ value compute Pr(GO)3. Repeat 1 & 2 10,000 times, say, and mean for each δ value

WarningThis unconditional Pr(GO) averages high and low probabilitiesIs under powered more concerning than over powered?

Prior: p(δ) = N(δ0, ω2)

vague if ω ≈ ∞Likelihood (sufficient statistic):

p(d | δ) = N(δ, Vd)

e.g. 2 arm PG or AB/BA XO, Vd = 2σ2/N

Posterior:p(δ | d) = N(Mδ, Vδ)

Mδ = Vδ(δ0/ω2 + d/Vd) weighted average

1/Vδ = 1/ω2 + 1/Vdprecisions are additive

Vague prior implies p(δ|d) = N(d, Vd) = “confidence dist.”

Bayesian Study Design(Bayesian NLM inference reminder)

Known σ

Phil Woodward 2014

Posterior Distribution:p(δ | d) = N(Mδ, Vδ)

Conditional Distribution for d* from future study:p(d* | δ) = N(δ, Vd*)

assuming studies are “exchangeable” Vd* determined by future study design

Predictive Distribution for d*:p(d* | d) = ∫ p(d* | δ) p(δ | d) dδ = N(M*, V*)

M* = Mδ

V* = Vδ + Vd* sum of posterior and conditional variances

Bayesian Study Design(Bayesian NLM predictions reminder)

As yet unobserved

Phil Woodward 2014

Can make predictions based on “prior beliefs”Prior Distribution:

p(δ) = N(δ0, ω2)

Conditional Distribution for d from planned study:p(d | δ) = N(δ, Vd)

Predictive Distribution for d:p(d) = ∫ p(d | δ) p(δ) dδ = N(M0, V0)

M0 = δ0

V0 = ω2 + Vdsum of prior and conditional variances

Bayesian Study Design(Prior Predictive Distribution)

Unobserved at design stage

Design Prior

Phil Woodward 2014

Classical power:Let C denote the event “reject null hypothesis”Power = Pr[C | δ] i.e. a conditional probabilityMore generally, C can be any decision criteriaRefer to the earlier OC calculations

Bayesian predictive probability:Pr[C] = ∫ Pr[C | δ] p(δ) dδThe “unconditional” probability of C occurring(although it is conditional on our prior beliefs)“The probability, given our prior knowledge, that we will meet the decision criteria at the end of the study.”

Bayesian Study Design(Assurance)

expected/marginal poweror predictive probability

Phil Woodward 2014

Consider original GO decisionPr[C | d] = Pr[d – tπ se(d) > Δ]

p(d) = N(δ0, ω2 + Vd) prior predictive distribution

Pr[C] = Pr[ N(δ0, ω2 + Vd) > tπ se(d) + Δ]

Pr[C] = Φ[δ0 – tπ se(d) – Δ) / (ω2 + Vd)½]

where Φ[.] is the standard Normal cdf

with informative Design Prior

with vague Analysis Prior

What if vague Design Prior,i.e. ω very large?

Phil Woodward 2014

Plot comparing classical (‘conditional power’) OC and assurance

For superiority, Δ = 0, and noting z = t for large d.f.Pr[C] = Φ[θ – zπ se(d)) / (ω2 + Vd)½]

same as Eq.3 in O’Hagan et al (2005)For non-inferiority, Δ is negative

same as Eq.6 in O’Hagan et al (2005)

Phil Woodward 2014

Can make predictions after an interim analysisLet estimate at interim (n subjects) be d’Let estimate from part 2 (m subjects) be d*

p(d* | d’) = N(Mδ, Vδ + Vd*)

If vague prior at study start (Design Prior)Mδ = d’

Vδ = Vd’

For a 2 arm PG or AB/BA XOVd* = 2σ2/m and Vd’ = 2σ2/n

Bayesian Study Design(Interim Analysis)

unobserved at interim stage

predictive distribution

Phil Woodward 2014

Consider original GO decision (with Vd = 2σ2/N)Pr[C | d] = Pr[d – tπ se(d) > Δ]

This criterion, C, can be expressed in terms of d’ and d*(md* + nd’)/N – tπ σ(2/N)½ > Δ

At the interim stage d* is the only unknownAnd so it is convenient to express C asd* > (N½ tπ σ2½ + NΔ - nd’) / m

Classical “conditional power”, Pr[C | δ, d’]Pr[ N(δ, 2σ2/m) > (N½ tπ σ2½ + NΔ - nd’) / m ]1 - Φ[ (N/m)½ tπ + (NΔ - nd’ - mδ) / (σ(2m)½) ]which, for Δ = 0 & tπ = zπ, is same as Eq.2 in Grieve (1991)

Bayesian Study Design(Interim Analysis with vague Design & Analysis Priors)

but Bayesians can do better than this!

Phil Woodward 2014

Bayesian predictive probability, Pr[C | d’]Pr[ N(d’, 2σ2(n-1+m-1)) > (N½ tπ σ2½ + NΔ - nd’) / m ]

1 - Φ[ (n/m)½ {tπ - N½(d’ – Δ) / (σ2½)} ]

which, for Δ = 0 & tπ = zπ, is same as Eq.3 in Grieve (1991)

“The probability, given our knowledge at the interim, that we will meet the decision criteria at the end of the study.”

Interim futility / success criteria could be based on this probabilitye.g. futile if Pr[C | d’] < 0.2

success if Pr[C | d’] > 0.8

Bayesian Study Design(Interim Analysis with vague Design & Analysis Priors)

Phil Woodward 2014

Bayesian Study Design(Interim analysis predictive probability)

Plot comparing ‘conditional power’ and predictive probabilityfollowing interim analysis (25/grp), vague prior distribution

d'Vd’

Only differences to analysis done prior to study start are:1)OC curve conditional on both delta and interim data2)‘Belief distribution’ for delta updated using interim data

Prior or Posteriordepends on one’s perspective

(‘Belief Distribution’)could use informative design

prior, updated using interim data …

If we allow an informative design prior at study startp(δ) = N(δ0, ω2)

p(d* | d’) = N(Mδ, Vδ + 2σ2/m)

Mδ = Vδ(δ0/ω2 + d’n/(2σ2))

1/Vδ = 1/ω2 + n/(2σ2)

Bayesian predictive probability, Pr[C | d’]Pr[ N(Mδ, Vδ + 2σ2/m) > (N½ tπ σ2½ + NΔ - nd’) / m ]

1 - Φ[(N½ tπ σ2½ - nd’ - mMδ + NΔ) / (m(Vδ + 2σ2/m)½) still with vague analysis prior

Bayesian Study Design(Interim Analysis with informative Design Prior)

refer back toBayesian NLM reminders

Phil Woodward 2014

Typically, only have informative prior for placebo response

Notation (in addition to that used previously)γ is the placebo true mean responsep(γ) = N(γ0, ψ2) , the informative prior for γ

nA, nP are the number of subjects receiving active and placebo

The effective number of subjects this prior contributes isnγ = σ2 / ψ2

which may be intuitive by re-expressing asψ2 = σ2 / nγ

Bayesian Study Design(using informative prior to reduce sample size)

… but is our intuition correct?

If prior for treatment effect, δ, is vague then posterior

It can be seen that the informative prior is equivalent to nγ additional placebo subjects with a sample mean of γ0

… left as an exercise to prove

Worked example

Suppose predictive distribution (placebo prior)p(γ) ~ N(18, 122)

Forecast residual standard deviation(obtained in usual way, not shown here)

σ = 70

Effective N of placebo priorEff.N = (70 / 12)2 = 34

Design study in usual way, ignoring informative prior. Then reduce placebo arm

by 34 and have same power / precision.

Unless no doubts at all, use Robust Priori.e. a mixture of informative and vague prior distributions

p(placebo mean) ~ 0.9 x N(18, 122) + 0.1 x N(18, 1202)

Represents 10% chance meta-data not exchangeablein which case, will effectively revert to vague prior(can also be thought of as heavy tailed distribution)

Also compute Bayesian p-value of data-prior compatibilityPr( “> observed mean” | prior ~ N(18, 122) )

Note: predictive dist. for obs. mean ~ N(18, 122 + σ2 /nP)

Bayesian Emax Model

dose/concentration response model

Emax model is often used for dose response dataeven more common for concentration response datain biological (non-clinical) context known as thelogistic or sigmoidal curve

More generally, could be used to model a monotonic relationship between response and covariate

initially the response changes very slowly with the covariatethen the response changes much more rapidlyfinally the response slows again as a plateau is reached

Bayesian Emax Model

Phil Woodward 2014 108

Bayesian Emax Modelλ sometimes

referred to as ‘Hill slope’

approximately linear on log-scale between

ED20 and ED80

when λ = 1 need ~80 fold range to cover ED10 to ED90

Convergence issues are common with MLE of Emax models

Bayesian Emax Model

Fitted Curve

1 10 100 1000

Concentration (ug/mL)

Fitted Curve

1 10 100 1000

Concentration (ug/mL)

Hill coefficient = 1Hill coefficient not restrained

Most clinical data more variable than this and smaller dose rangeClassical fitting algorithms can fail to provide any solution

no data on upper asymptote

Prior distributions required for all parametersE0 : placebo (negative control) response

utilise historical data as discussed earlier

Emax : maximum possible effect relative to E0

typically vague, similar approach to treatment effect prior

ED50 : dose that gives 50% of Emax effectcould be weakly informativebased on same information used to choose dose range

λ determines gradient of dose responsetypically needs to be very informativeclinical data rarely provides much information regards λ

Bayesian Emax Model

e.g. log-normal centred on mid/low dose

90% CI (0.1, 10)GM

e.g. log-normal centred on 1

90% CI (0.5, 2)

### Priors prec ~ dgamma( 0.001, 0.001 ) ; sigma <- pow(prec, -0.5) E0 ~ dnorm( 0, 1.0E-6 ) #... but could be informative for placebo mean response Emax ~ dnorm( 0, 1.0E-6 ) #... typically vague log.ED50 ~ dnorm( ???, 1.4 ) ED50 <- exp( log.ED50 ) #... Gives 90% CI of (0.1, 10) x exp(???) log.Hill ~ dnorm( 0, 0.42 ) Hill <- exp( log.Hill ) #... Gives 90% CI of (0.5, 2)

### Likelihood for (i in 1:N) { Y[i] ~ dnorm(mu[i], prec) mu[i] <- E0 + (Emax * pow( X[i], Hill) ) / ( pow( X[i], Hill ) + pow( ED50, Hill ) ) }

Bayesian Emax Model(WinBUGS code)

priors should be checked for appropriateness in

each particular case

{ … code as before …

### Quantities of potential interest for (i in 1:N.doses) { ### effect over placebo for pre-specified doses (values entered as data in node DOSE) effect[i] <- (Emax * pow( DOSE[i], Hill) ) / ( pow( DOSE[i], Hill ) + pow( ED50, Hill ) )

### predicted mean response for pre-specified doses DOSE.mean[i] <- effect[i] + E0

### probability of exceeding pre-specified effect of size ??? (mean of node DOSE.PrEffBig) DOSE.PrEffBig[i] <- step( effect[i] - ??? ) }

### estimate dose giving effect of size ??? DOSE.BigEff.0 <- ED50 * ( pow( ???, 1/Hill ) ) / ( pow( Emax - ???, 1/Hill ) ) # set to LARGE dose (LARGE pre-specified) if ??? > Emax DOSE.BigEff <- DOSE.BigEff.0 * step( Emax - ???) + LARGE * step( ??? – Emax )

Bayesian Emax Model(WinBUGS code)

BugsXLA

Emax modelsPharmacology Biomarker Experiment

Fixed & random effects Emax models can be fitted using

BugsXLA.

Details not covered in this course.

ReferencesBolstad, W.M. (2007). Introduction to Bayesian Statistics. 2nd Edition. John Wiley & Sons, New York.

Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2004). Bayesian Data Analysis. 2nd Edition. Chapman & Hall/CRC. (3rd Edition now available).

Grieve, A. (1991). Predictive probability in clinical trials. Biometrics, 47, 323-330

Lee, P.M. (2004). Bayesian Statistics: An Introduction. 3rd Edition. Hodder Arnold, London, U.K.

Neuenschwander, B., Capkun-Niggli, G., Branson, M. and Spiegelhalter, D.J. (2010). Summarizing historical information on controls in clinical trials. Clinical Trials; 7: 5-18

Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS. John Wiley & Sons, Hoboken, NJ.

O’Hagan,A., Stevens,J. and Campbell,M. (2005). Assurance in clinical trial design. Pharmaceut. Statist. 4, 187-201

Spiegelhalter, D., Abrams, K. and Myles,J. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. John Wiley & Sons, New York.

Woodward, P. (2012). Bayesian Analysis Made Simple. An Excel GUI for WinBUGS. Chapman & Hall/CRC.

Phil Woodward 2014

Applied Bayesian Methods Phil Woodward 1Phil Woodward 2014.

event odds

probability relationship

odds of x

fair bet

probability onetoone

certain event probabilities

inferences nuisance

bayesian statistics

Documents

Woodward Governor ComÞany PRIME TIMES MÂRCH...

Woodward governer

Governer Woodward 26260

Woodward hugh

triptico cine 2012[1] -...

Finals: Chattahoochee CC (Cambre/Cruz) d. … Woodward 1st.....

Tessa Woodward

Woodward Controlador

Catalogo Woodward

Alex Woodward

Woodward Governor

Woodward visit

Admiral Woodward

Woodward Actuator

Tutorial on Bayesian Data Analysis › ~gregory › papers.....

Woodward Pga12