Applied Bayesian Methods Phil Woodward 1Phil Woodward 2014.
Post on 19-Dec-2015
270 Views
Preview:
Transcript
Applied Bayesian Methods
Phil Woodward
1Phil Woodward 2014
Introduction to Bayesian Statistics
2Phil Woodward 2014
Inferences via Sampling Theory• Inferences made via sampling distribution of statistics
– A model with unknown parameters is assumed– Statistics (functions of the data) are defined– These statistics are in some way informative about the parameters– For example, they may be unbiased, minimum variance estimators
• Probability is the frequency with which recurring events occur– The recurring event is the statistic for fixed parameter values– The probabilities arise by considering data other than actually seen– Need to decide on most appropriate “reference set”– Confidence and p-values are p(data “or more extreme”| θ) calculations
• Difficulties when making inferences– Nuisance parameters an issue when no suitable sufficient statistics– Constraints in the parameter space cause difficulties– Confidence intervals and p-values are routinely misinterpreted
• They are not p(θ | data) calculations3Phil Woodward 2014
How does Bayes add value?• Informative Prior
– Natural approach for incorporating information already available– Smaller, cheaper, quicker and more ethical studies– More precise estimates and more reliable decisions– Sometimes weakly informative priors can overcome model fitting failure
• Probability as a “degree of belief”– Quantifies our uncertainty in any unknown quantity or event– Answers questions of direct scientific interest
• P(state of world | data) rather than P(data* | state of world)
• Model building and making inferences– Nuisance parameters no longer a “nuisance”– Random effects, non-linear terms, complex models all handled better– Functions of parameters estimated with ease– Predictions and decision analysis follow naturally– Transparency in assumptions
• Beauty in its simplicity!– p(θ | x) = p(x | θ) p(θ) / p(x)– Avoids issue of identifying “best” estimators and their sampling properties– More time spent addressing issues of direct scientific relevance
4Phil Woodward 2014
Probability• Most Bayesians treat probability as a measure of belief
– Some believe probabilities can be objective (not discussed here)– Probability not restricted to recurring events
• E.g. probability it will rain tomorrow is a Bayesian probability
– Probabilities lie between 0 (impossible event) and 1 (certain event)– Probabilities between 0 and 1 can be calibrated via the “fair bet”
• What is a “fair bet”?– Bookmaker sells a bet by stating the odds for or against an event– Odds are set to encourage a punter to buy the bet
• E.g. odds of 2-to-1 against means that for each unit staked two are won, plus the stake
– A fair bet is when one is indifferent to being bookmaker or punter• i.e. one doesn’t believe either side has an unfair advantage in the gamble
5Phil Woodward 2014
Probability• Relationship between odds and probability
– One-to-one mapping between odds (O) and probability (P)
Where O equals the ratio X/Y for odds of X-to-Y in favour and the ratio Y/X for odds of X-to-Y against an evente.g. odds of 2-to-1 against, if fair, imply probability equals ⅓
• Probabilities defined this way are inevitably subjective– People with different knowledge may have different probabilities– Controversy occurs when using this definition to interpret data– Science should be “objective”, so “subjectivity” to some is heresy– But where do the models that Frequentists use come from?– Are the decisions made when designing studies purely objective?– Is judgment needed when generalising from a sample to a population?
6Phil Woodward 2014
Probability• Subjectivity does not mean biased, prejudiced or unscientific
– Large body of research into elicitation of personal probabilities– Where frequency interpretation applies, these should support beliefs
• E.g. the probability of the next roll of a die coming up a six should be ⅙ for everyone unless you have good reason to doubt the die is fair
– An advantage of the Bayesian definition is that it allows all other information to be taken into account
• E.g. you may suspect the person offering a bet on the die roll is of dubious character• Bayesians are better equipped to win at poker than Frequentists!
• All unknown quantities, including parameters, are considered random variables– each parameter still has only one true value– our uncertainty in this value is represented by a probability distribution
7Phil Woodward 2014
Epistemic uncertainty
Exchangeability
• Exchangeability is an important Bayesian concept– exchangeable quantities cannot be partitioned into more
similar sub-groups– nor can they be ordered in a way that infers we can
distinguish between them– exchangeability often used to justify prior distribution for
parameters analogous to classical random effects
8Phil Woodward 2014
From
and
comes Bayes Theorem
Nothing controversial yet.
The Bayesian Paradigm
A
B
)Pr(
)|Pr()Pr()|Pr(
B
ABABA
)Pr(
),Pr()|Pr(
B
BABA
)Pr(
),Pr()|Pr(
A
BAAB
9Phil Woodward 2014
How is Bayes Theorem (mis)used?Coin tossing study: Is the coin fair?
Modelri ~ bern(π) i = 1, 2, ..., nri = 1 if ith toss a head, = 0 if a tail
Let terms in Bayes Theorem beA = π (controversial)B = r
then
The Bayesian Paradigm
)(
)|()()|(
rp
rpprp
Why?
10Phil Woodward 2014
What are these terms?
p(r|π) is the likelihood= bin(n, Σr| π) (not controversial)
p(π) is the prior= ??? (controversial)
The prior formally represents our knowledge of π before observing r
The Bayesian Paradigm
11Phil Woodward 2014
What are these terms (continued)?
p(r) is the normalising constant= ∫ p(r|π) p(π) dπ (the difficult bit!)
p(π|r) is the posterior
The posterior formally represents our knowledge of π after observing r
The Bayesian Paradigm
MCMC to the rescue!
In general,not in this particular case
12Phil Woodward 2014
A worked example.
Coin tossed 5 times giving 4 heads and 1 tailp(r|π) = bin(n=5, Σr=4| π)
p(π) = beta(a, b), when a=b=1 ≡ U(0, 1)Why choose a beta distribution?!
- conjugacy … posterior p(π|r) = beta(a+Σr, b+n-Σr)
- can represent vague belief?- can be an objective reference?- Beta family is flexible (could be informative)
The Bayesian Paradigm
...but is a stronger prior justifiable?
What if data were 5 dogs in tox study:
4 OK, 1 with an AE?
13Phil Woodward 2014
A worked example (continued).Applying Bayes theorem
p(π|r) = beta(5, 2)95% credible interval
π : (0.36 to 0.96)Pr[π ϵ (0.36 to 0.96) | Σr = 4] = 0.95
95% confidence intervalπ : (0.28 to 0.995)
Pr[Σr ≥ 4 | π = 0.28] = 0.025, Pr[Σr ≤ 4 | π = 0.995] = 0.025
The Bayesian Paradigm
14Phil Woodward 2014
Bayesian inference for simple Normal modelClinical study: What’s the mean response to placebo?
Modelyi ~ N(µ, σ2) i = 1, 2, ..., n (placebo subjects only)
assume σ known and for convenience will useprecision parameter τ = σ-2 (reciprocal of variance)
Terms in Bayes Theorem are
The Bayesian Paradigm
)(
)|()()|(
y
yy
p
ppp
15Phil Woodward 2014
The Bayesian Paradigm
Improper prior density
16Phil Woodward 2014
The Bayesian Paradigm
Posterior precision equals sum of prior and data precisions
Posterior mean equals weighted mean of
prior and data
17Phil Woodward 2014
The Bayesian Paradigm
18Phil Woodward 2014
A worked example (continued).Applying Bayes theorem
p(µ |y) = N(80, 0.5)95% credible interval
µ : (78.6 to 81.4)
95% confidence intervalµ : (78.6 to 81.4)
The Bayesian Paradigm
19Phil Woodward 2014
Bayesian inference for simple Normal modelThe case when both mean and variance are unknown
Modelyi ~ N(µ, σ2) i = 1, 2, ..., n
Terms in Bayes Theorem are
The Bayesian Paradigm
)(
),|(),()|,(
y
yy
p
ppp
20Phil Woodward 2014
The Bayesian Paradigm
21Phil Woodward 2014
The Bayesian Paradigm
22Phil Woodward 2014
Bayesian inference for Normal Linear ModelModely = Xθ + ε εi ~ N(0, σ2) i = 1, 2, ..., n
y and ε are n x 1 vectors of observations and errorsX is a n x k matrix of known constantsθ is a k x 1 vector of unknown regression coefficients
Terms in Bayes Theorem are
The Bayesian Paradigm
)(
),|(),()|,(
y
θyθyθ
p
ppp
23Phil Woodward 2014
The Bayesian Paradigm
24Phil Woodward 2014
In summary, for Normal Linear Model (“fixed effects”)Classical confidence intervals can be interpretedas Bayesian credible intervals
But, need to be aware of implicit prior distributions
Not generally the case for other error distributions
But for “large samples” when likelihood based estimatorhas approximate Normal distribution, a Bayesian interpretation can again be made
“Random effects” models are not so easily comparedDon’t assume classical results have Bayesian interpretation
The Bayesian Paradigm
25Phil Woodward 2014
The Bayesian Paradigm
Conditional (on µ) distribution for future response
Posterior distribution for µ
26Phil Woodward 2014
The Bayesian Paradigm
N(µ, σ2) N(µ1, 1/τ1)
Sum of posterior variance of µ and conditional variance of yf
yf ~ N(µ1, 1/τ1 + 1/τ)
27Phil Woodward 2014
Predictive DistributionsWhen are predictive distributions useful?
When designing studieswe predict the data using priors to assess the design
we may use informative priors to reduce study size, these being predictions from historical studies
When undertaking interim analyseswe can predict the remaining data using current posterior
When checking adequacy of our assumed modelmodel checking involves comparing observations with predictions
When making decisions after study has completedwe can predict future trial data to assess probability of
success,helping to determine best strategy or decide to stop
Some argue predictive inferences should be our main focusbe interested in observable rather than unobservable
quantitiese.g. how many patients will do better on this drug?
The Bayesian Paradigm
“design priors” must be informative
28Phil Woodward 2014
The Bayesian Paradigm
δ is treatment effect
29Phil Woodward 2014
The Bayesian Paradigm
30Phil Woodward 2014
The Bayesian Paradigm
31Phil Woodward 2014
Making Decisions
A simple Bayesian approach defines criteria of the form
Pr(δ ≥ Δ) > πwhere Δ is an effect size of interest, and π is the probability required to make a positive decision
For example, Bayesian analogy to significance could be
Pr(δ > 0) > 0.95But is believing δ > 0 enough for further investment?
The Bayesian Paradigm
32Phil Woodward 2014
END OF PART 1intro to WinBUGSillustrating fixed effect models
33Phil Woodward 2014
Bayesian Model Checking
34Phil Woodward 2014
Brief outline of some methods easy to use with MCMC
Consider three model checking objectives1. Examination of individual observations2. Global tests of goodness-of-fit3. Comparison between competing models
In all cases we compare observed statistics with expectations, i.e. predictions conditional on a model
Bayesian Model Checking
35Phil Woodward 2014
Bayesian Model Checking
36
yi is the observationYi is the prediction
E(Yi) is the mean of the predictive distribution
Bayesian residuals can be examined as we do
classical residuals
p-value concept
Phil Woodward 2014
Ideally we would have a separate evaluation datasetPredictive distribution for Yi is then independent of yi
Typically not available for clinical studies
Cross-validation next best, but difficult within WinBUGS
Following methods use the data twice, so will be conservative, i.e. overstate how good model fits data
Will illustrate using WinBUGS code for simplest NLM
Bayesian Model Checking
37Phil Woodward 2014
Bayesian Model Checking(Examination of Individual Observations)
38
{ ### Priors mu ~ dnorm(0, 1.0E-6) prec ~ dgamma(0.001, 0.001) ; sigma <- pow(prec, -0.5)
### Likelihood for (i in 1:N) { Y[i] ~ dnorm(mu, prec) }
### Model checking for (i in 1:N) { ### Residuals and Standardised Residuals resid[i] <- Y[i] – mu st.resid[i] <- resid[i] / sigma
### Replicate data set & Prob observation is extreme Y.rep[i] ~ dnorm(mu, prec) Pr.big[i] <- step( Y[i] – Y.rep[i] ) Pr.small[i] <- step( Y.rep[i] – Y[i] ) }}
each residual has a distributionuse the mean as the residual
Y.rep[i] is a prediction accounting for uncertainty in parameter values, but not in the type of model assumed
only need both when Y.rep[i] could
exactly equal Y[i]
More typically, each Y[i] has different mean, mu[i].
mean of Pr.big[i] estimates the probability a future observation is this big
Phil Woodward 2014
Identify a discrepancy measuretypically a function of the databut could be function of both data and parameters
Predict (replicate) values of this measureconditional on the type of model assumedbut accounting for uncertainty in parameter values
Compute “Bayesian p-value” for observed discrepancysimilar approach used for individual observationsconvention for global tests is to quote “p-value”
39
Bayesian Model Checking(Global tests of goodness-of-fit)
e.g. a measure of skewness for testing this aspect of
Normal assumption
Phil Woodward 2014
40
{ … code as before …
### Model checking for (i in 1:N) { ### Residuals and Standardised Residuals resid[i] <- Y[i] – mu st.resid[i] <- resid[i] / sigma m3[i] <- pow( st.resid[i], 3)
### Replicate data set Y.rep[i] ~ dnorm(mu, prec) resid.rep[i] <- Y.rep[i] – mu[i] st.resid.rep[i] <- resid.rep[i] / sigma m3.rep[i] <- pow( st.resid.rep[i], 3)
} skew <- mean( m3[] ) skew.rep <- mean( m3.rep[] ) p.skew.pos <- step( skew.rep – skew ) p.skew.neg <- step( skew – skew.rep )}
Bayesian Model Checking(Global tests of goodness-of-fit)
p.skew interpreted as for classical p-value, i.e. small
is evidence of a discrepancy
Phil Woodward 2014
Bayes factors ratio of marginal likelihoods under competing models
Bayesian analogy to classical likelihood ratio test
41
Bayesian Model Checking(Comparison between competing models)
not easy to implement using MCMC will not be discussed further
Phil Woodward 2014
42
Bayesian Model Checking(Comparison between competing models)
Deviance Information Criterion (DIC)a Bayesian “information criterion” but not the BICwill not discuss theory, focus on practical interpretationWinBUGS & SAS can report this for most modelsDIC is the sum of two separately interpretable quantities
DIC = Dbar + pDDbar : the posterior mean of the deviancepD : the effective number of parameters in the model
pD = Dbar - DhatDhat : deviance point estimate using posterior mean of θ
Phil Woodward 2014
43
Bayesian Model Checking(Comparison between competing models)
Deviance Information Criterion (DIC)
DIC = Dbar + pDpD will differ from the total number of parameters
when posterior distributions are correlatedtypically the case for “random effect parameters”non-orthogonal designs, correlated covariatescommon for non-linear modelspD will be smaller because some parameters’ effects “overlap”
Phil Woodward 2014
44
Bayesian Model Checking(Comparison between competing models)
Deviance Information Criterion (DIC)
DIC = Dbar + pDMeasures model’s ability to make short-term predictions
Smaller values of DIC indicate a better modelRules of thumb for comparing models fitted to the same data
DIC difference > 10 is clear evidence of being betterDIC difference > 5 (< 10) is still strong evidence
There are still some unresolved issues with DICrelatively early days in it use, so use other methods as well
Phil Woodward 2014
45
Bayesian Model Checking(practical advice)
“All models are wrong, but some are useful”if we keep looking, or have lots of data, we will find lack-of-fitneed to assess whether model’s deficiencies matterdepends upon the inferences and decisions of interestjudge the model on whether it is fit for purpose
Sensitivity analyses are useful when uncertainshould assess sensitivity to both the likelihood and the prior
Model expansion may be necessaryBayesian approach particularly good hereInformative priors and MCMC allow greater flexibility
Phil Woodward 2014
e.g. replace Normal with t distribution
Introduction to BugsXLA
Parallel Group Clinical Study(Analysis of Covariance)
46Phil Woodward 2014
47
BugsXLA(case study 3.1)
Phil Woodward 2014
Switch to Excel and demonstrate how BugsXLA facilitates rapid Bayesian model specification and analysis via WinBUGS.
48Phil Woodward 2014
BugsXLA(case study 3.1)
49Phil Woodward 2014
Settings used by WinBUGS
Suggested settings
Posterior distributions to be summarised
Posterior samples to be imported
Save WinBUGS files, create R scripts
BugsXLA(case study 3.1)
50Phil Woodward 2014
Fixed factor effects parameterised as
contrasts from a zero constrained level
Default priors chosen to be “vague”
(no guarantees!)
Priors for other parameter types
Bayesian model checking options
BugsXLA(case study 3.1)
51Phil Woodward 2014
BugsXLA uses generic names for parameters
in WinBUGS code (deciphered on input!)
Recommend adding MC Error to input list
The Excel sheet used to display the results
BugsXLA(case study 3.1)
52Phil Woodward 2014
Generic names … deciphered
Posterior means, st.devs. & credible int.s
Could compute ratioMC Error / St.Dev.using cell formula
Reminder of model, prior and WinBUGS settings
BugsXLA(case study 3.1)
53Phil Woodward 2014
BugsXLA(case study 3.1)
54Phil Woodward 2014
BugsXLA interprets contents of cells to define predictions & contrasts to be estimated
In this case, predicted means for each level of factor TRT are defined
BugsXLA(case study 3.1)
55Phil Woodward 2014
BugsXLA(case study 3.1)
56Phil Woodward 2014
Recommend turn this off once understand how parameterised
Can set own alerts to be used with model checking functions
Other default settings can be personalised,
e.g. default priors
BugsXLA(Default Settings)
57Phil Woodward 2014
Fixed factor effects
parameterisation can be changed to
SAS (last level)
BugsXLA(Default Settings)
Obtaining Prior Distribution
58Phil Woodward 2014
Obtaining Prior Distributions
• Brief overview of main approaches*
• Further issues in the use of Priors*
* based on chapter 5 of Spiegelhalter et al (2004)
59Phil Woodward 2014
Obtaining Prior Distributions• Misconceptions: They are not necessarily
– Prespecified– Unique– Known– Influential
• Bayesian analysis– Transforms prior into posterior beliefs– Doesn’t produce the posterior distribution– Context and audience important– Sensitivity to alternative assumptions vital
• Prior could differ at design & analysis stage– May want less controversial vague priors in analysis– Design priors usually have to be informative
But prespecification strongly recommended, data must not influence
the prior distribution
60Phil Woodward 2014
Obtaining Prior Distributions
• Five broad approaches– Elicitation of subjective opinion– Summarising past evidence– Default priors– Robust priors– Estimation using hierarchical models
61Phil Woodward 2014
Obtaining Prior Distributions
• Elicitation of subjective opinion– Most useful when little ‘objective’ evidence– Less controversial at the design stage– Elicitation should be kept simple & interactive– O’Hagan is a strong advocate
• Spiegelhalter et al do not recommend– Prefer archetypal views; see Default Priors
62Phil Woodward 2014
Summarising Past Evidence.
Exchangeableθ,θh ~ N(μ, τ2)
(a) τ = ∞, μ = K(b) τ ~ dist.(f) τ = 0, μ = θ
Typically,yh ~ N(θh, σh
2)
(c) θh = θ + δh
δh ~ N(0, σδh2)
θh ~ N(θ, σδh2)
Typically (b) adequate, maybe with more complexity. Meta-analytic-predictive
63Phil Woodward 2014
Obtaining Prior Distributions• Default Priors
– Vague a.k.a. non-informative or reference• WinBUGS (general advice for ‘simple’ models):
– Location parms ~ Normal with huge variance– Lowest level error variance ~ inv-gamma(small, small)– Hierarchical error variances … controverisal
sd ~ Uniform(0, big) or ~ Half-Normal(big); big < huge!– Sceptical & Enthusiastic Priors
• Sceptical used to determine when success achieved• Enthusiastic used to determine when to stop• Sceptical prior centred on 0 with small prob. effect > Δ• Enthusiastic prior centred on Δ with small prob. effect < 0
– ‘Lump-and-smear’ Priors• Point mass at the null hypothesis
Parameter “big” can be derived via eliciting inferred quantities, e.g. credible
differences between study means.
Might be appropriate for unprecedented
mechanisms in ED stage.
64Phil Woodward 2014
Obtaining Prior Distributions
• Robust Priors– We always assess model assumptions– Bayesians assess prior assumptions also– Use a ‘community of priors’
• Discrete set• Parametric family• Non-parametric family
– Interpretation section recommended in report• Show how data affect a range of prior beliefs
Perhaps develop a range of priors appropriate in
typical case.
65Phil Woodward 2014
Example of a parametric family of priorsα is the discounting factor discussed previously(variant d: “equal but discounted”)
Not Recommended as no operationalInterpretation & no means of assessingsuitable values for alpha
66Phil Woodward 2014
Obtaining Prior Distributions
• Hierarchical priors– In simplest case, the same as (b) Exchangeable– ‘Borrow strength’ between studies
• counter view: ‘share weakness’ – Three essential ingredients
• Exchangeable parameters• Form for random-effects dist.
– Typically Normal, although t is perhaps more realistic• Hyperprior for parms of random-effects dist.
– sd ~ Uniform(0, Max Credible) or Half-Normal(big) or Half-Cauchy(large)
67Phil Woodward 2014
Obtaining Prior Distributions
• Case Study– Dental Pain Studies– Informative prior for placebo mean
• Used in the formal analysis
– Meta-analytic-predictive approach
68Phil Woodward 2014
Title Authors TreatmentTOTPAR[6]
Mean(Placebo Data) SE
Characterization of rofecoxib as a cyclooxygenase-2 isoform inhibitor and demonstration of analgesia in the dental pain model Elliot W. Ehrich et. al
Rofecoxib 50 and 500 mgIbuprofen 400 mgPlacebo
3.01 0.51
Valdecoxib Is More Efficacious Than Rofecoxib in Relieving Pain Associated With Oral Surgery Fricke J. et al.
Valdecoxib 40 mgRofecoxib 50 mgPlacebo
3.01 0.76
Rofecoxib versus codeine/acetaminophen in postoperative dental pain: a double-blind, randomized, placebo- and active comparator-controlled clinical trial Chang DJ; et al.
Rofecoxib 50 mgCodeine/Acetaminophen 60/600 mgPlacebo
3.4 1.22
Analgesic Efficacy of Celecoxib in Postoperative Oral Surgery Pain: A Single-Dose, Two-Center, Randomized, Double-Blind, Active- and Placebo-Controlled Study Raymond Cheung, et al
Celecoxib 400 mgIbuprofen 400 mgPlacebo
3.7 0.75
Combination Oxycodone 5 mg/Ibuprofen 400 mg for the Treatment of Postoperative Pain: A Double-Blind, Placebo and Active-Controlled Parallel-Group Study Thomas Van Dyke, et al
Oxycodone/Ibuprofen 5 mg/400 mgIbuprofen 400 mgOxycodone 5 mgPlacebo
4.2 0.83
Obtaining Prior Distributions
(part of table of prior studies considered relevant)
69Phil Woodward 2014
• Meta-analysis of historical data– Published summary data
• Normal Linear Mixed ModelYi = θi + ei
θi ~ N(µθ, ω2)
ei ~ N(0, SEi2)
Yi are the observed placebo means from each study
SEi are their associated standard errors
Obtaining Prior Distributions
70Phil Woodward 2014
{ for (i in 1:N) { Y[i] ~ dnorm(theta[i], prec[i]) theta[i] ~ dnorm(mu.theta, tau.theta) prec[i] <- pow(se[i], -2) } newtrial ~ dnorm(mu.theta, tau.theta) mu.theta ~ dnorm(0, 1.0E-6) tau.theta <- pow(omega, -2) omega ~ dunif(0, 100)}
Obtaining Prior Distributions
• WinBUGS used to determine prior• assumes study means exchangeable• but not responses from different studies
If studies smaller, model should account for fact
that each se is estimated.
If few studies, might need slightly more informative prior.
‘newtrial’ provides
prior for a future study.
71Phil Woodward 2014
Obtaining Prior Distributions
Gamma Distribution (or Inv-Gamma Dist.)Particularly useful in Bayesian statistics
Conjugate for Poisson meanMarginal distribution for σ-2 in NLMs
Chi-Sqr is a special case of the GammaChiSqr(v) ≡ Gamma(v/2, 0.5)
If s2 (v d.f.) is ML estimate of Normal variance(and conventional vague prior: p(σ2) α σ-2)
Posterior p(σ2 | s2) = v s2 Inv-ChiSqr(v) = Inv-Gamma(v/2, v s2/2)
NOTE: more than one parameterisation of
Gamma Dist.
72Phil Woodward 2014
Obtaining Prior Distributions• Empirical criticism of priors
– George Box suggested a Bayesian p-value• Prior predictive distribution for future observation• Compare actual observation with predictive dist.• Calculate prob. of observing more extreme• Measure of conflict between prior and data
– But what should you do if conflict occurs?• At least report this fact• Greater emphasis on analysis with a vaguer prior
– Robust prior approach• Formally model doubt using a mixture prior
73Phil Woodward 2014
e.g. observed placebo mean response
or heavy tailed e.g. t4 distribution
see model checking section
Obtaining Prior Distributions
• Key Points– Subjectivity cannot be completely avoided– Range of priors should be considered– Elicited priors tend to be overly enthusiastic– Historical data is best basis for priors– Archetypal priors provide a range of beliefs– Default priors are not always ‘weak’– Exchangeability is a strong assumption
• but with hierarchical model plus covariates, best option?– Sensitivity analysis is very important
74Phil Woodward 2014
BugsXLA
Deriving and using informative prior distributions
75Phil Woodward 2014
76Phil Woodward 2014
Typically, model is much simpler than this, e.g. placebo
data only, no study level covariates, so only random
STUDY factor in model
BugsXLA can model study level summary statistics
(d.f. optional)
Predict placebo mean response in a future study
BugsXLA(case study 5.3, details not covered in this course)
77Phil Woodward 2014
BugsXLA(using informative prior distributions)
Back to Case Study 3.1
Will assume have derived informative priors for:Placebo mean response
Normal with mean 0 and standard deviation 0.04Residual variance
Scaled Chi-Square with s2 = 0.026 and df = 44
Switch back to Excel and show how to use this in BugsXLA
78Phil Woodward 2014
Import samples so prior and posterior can be compared.
Ignore this, unless you have R loaded and wish to explore in own time.
BugsXLA(case study 3.1, informative prior)
79Phil Woodward 2014
Informative priors for placebo
mean and residual variance
80Phil Woodward 2014
Click ‘sigma’ then ‘Post Plots’ icon
Update GraphCan edit histogram
(‘user specified’)
Repeat for ‘Beta0’ (placebo) & ‘X.Eff[1,3]’ (TRT C)
BugsXLA(case study 3.1, informative prior)
81Phil Woodward 2014
BugsXLA(case study 3.1, informative prior)
82Phil Woodward 2014
Can obtain other posterior summaries
BugsXLA(case study 3.1, informative prior)
CAUTIONAlthough prior
for TRT:C is flat, posterior is influenced
by other priors
Bayesian Study Design
83Phil Woodward 2014
Consider a generic decision criterion of the form
GO decision if Pr(δ ≥ Δ) > πδ is the treatment effectΔ is an effect size of interestπ is the probability required to make a positive decision
As previously discussed,a Bayesian analogy to significance could be
Pr(δ > 0) > 0.95
Bayesian Study Design
84Phil Woodward 2014
Bayesian Study Design
85Phil Woodward 2014
Bayesian Study Design
Number of subjects
86Phil Woodward 2014
Operating Characteristics (OC) Simple to calculate in any statistical software
e.g. for 2 group PG or AB/BA XO designR codenon-central t cdf1 - pt( qt(pi, df= df), df= df, ncp= (delta – DELTA)/(sigma*sqrt(2/N)) )
normal cdf1 - pnorm( qnorm(pi), mean= (delta – DELTA)/(sigma*sqrt(2/N)) )
Bayesian Study Design
87Phil Woodward 2014
Operating Characteristics (OC) If wanted to account for uncertainty in σDetermine Bayesian distribution for σ
e.g. σ ~ U(12, 18)Use simulation to calculate the OC
1. Simulate σ value from its distribution2. For each value of δ and this σ value compute Pr(GO)3. Repeat 1 & 2 10,000 times, say, and mean for each δ value
WarningThis unconditional Pr(GO) averages high and low probabilitiesIs under powered more concerning than over powered?
Bayesian Study Design
88Phil Woodward 2014
Bayesian Study Design
89Phil Woodward 2014
90
Prior: p(δ) = N(δ0, ω2)
vague if ω ≈ ∞Likelihood (sufficient statistic):
p(d | δ) = N(δ, Vd)
e.g. 2 arm PG or AB/BA XO, Vd = 2σ2/N
Posterior:p(δ | d) = N(Mδ, Vδ)
Mδ = Vδ(δ0/ω2 + d/Vd) weighted average
1/Vδ = 1/ω2 + 1/Vdprecisions are additive
Vague prior implies p(δ|d) = N(d, Vd) = “confidence dist.”
Bayesian Study Design(Bayesian NLM inference reminder)
Known σ
Phil Woodward 2014
91
Posterior Distribution:p(δ | d) = N(Mδ, Vδ)
Conditional Distribution for d* from future study:p(d* | δ) = N(δ, Vd*)
assuming studies are “exchangeable” Vd* determined by future study design
Predictive Distribution for d*:p(d* | d) = ∫ p(d* | δ) p(δ | d) dδ = N(M*, V*)
M* = Mδ
V* = Vδ + Vd* sum of posterior and conditional variances
Bayesian Study Design(Bayesian NLM predictions reminder)
As yet unobserved
Phil Woodward 2014
92
Can make predictions based on “prior beliefs”Prior Distribution:
p(δ) = N(δ0, ω2)
Conditional Distribution for d from planned study:p(d | δ) = N(δ, Vd)
Predictive Distribution for d:p(d) = ∫ p(d | δ) p(δ) dδ = N(M0, V0)
M0 = δ0
V0 = ω2 + Vdsum of prior and conditional variances
Bayesian Study Design(Prior Predictive Distribution)
Unobserved at design stage
Design Prior
Phil Woodward 2014
93
Classical power:Let C denote the event “reject null hypothesis”Power = Pr[C | δ] i.e. a conditional probabilityMore generally, C can be any decision criteriaRefer to the earlier OC calculations
Bayesian predictive probability:Pr[C] = ∫ Pr[C | δ] p(δ) dδThe “unconditional” probability of C occurring(although it is conditional on our prior beliefs)“The probability, given our prior knowledge, that we will meet the decision criteria at the end of the study.”
Bayesian Study Design(Assurance)
expected/marginal poweror predictive probability
Phil Woodward 2014
94
Consider original GO decisionPr[C | d] = Pr[d – tπ se(d) > Δ]
p(d) = N(δ0, ω2 + Vd) prior predictive distribution
Pr[C] = Pr[ N(δ0, ω2 + Vd) > tπ se(d) + Δ]
Pr[C] = Φ[δ0 – tπ se(d) – Δ) / (ω2 + Vd)½]
where Φ[.] is the standard Normal cdf
with informative Design Prior
with vague Analysis Prior
What if vague Design Prior,i.e. ω very large?
Bayesian Study Design(Assurance)
Phil Woodward 2014
Bayesian Study Design(Assurance)
Plot comparing classical (‘conditional power’) OC and assurance
δ0
ω
95Phil Woodward 2014
96
For superiority, Δ = 0, and noting z = t for large d.f.Pr[C] = Φ[θ – zπ se(d)) / (ω2 + Vd)½]
same as Eq.3 in O’Hagan et al (2005)For non-inferiority, Δ is negative
same as Eq.6 in O’Hagan et al (2005)
Bayesian Study Design(Assurance)
Phil Woodward 2014
97
Can make predictions after an interim analysisLet estimate at interim (n subjects) be d’Let estimate from part 2 (m subjects) be d*
p(d* | d’) = N(Mδ, Vδ + Vd*)
If vague prior at study start (Design Prior)Mδ = d’
Vδ = Vd’
For a 2 arm PG or AB/BA XOVd* = 2σ2/m and Vd’ = 2σ2/n
Bayesian Study Design(Interim Analysis)
unobserved at interim stage
predictive distribution
Phil Woodward 2014
98
Consider original GO decision (with Vd = 2σ2/N)Pr[C | d] = Pr[d – tπ se(d) > Δ]
This criterion, C, can be expressed in terms of d’ and d*(md* + nd’)/N – tπ σ(2/N)½ > Δ
At the interim stage d* is the only unknownAnd so it is convenient to express C asd* > (N½ tπ σ2½ + NΔ - nd’) / m
Classical “conditional power”, Pr[C | δ, d’]Pr[ N(δ, 2σ2/m) > (N½ tπ σ2½ + NΔ - nd’) / m ]1 - Φ[ (N/m)½ tπ + (NΔ - nd’ - mδ) / (σ(2m)½) ]which, for Δ = 0 & tπ = zπ, is same as Eq.2 in Grieve (1991)
Bayesian Study Design(Interim Analysis with vague Design & Analysis Priors)
but Bayesians can do better than this!
Phil Woodward 2014
99
Bayesian predictive probability, Pr[C | d’]Pr[ N(d’, 2σ2(n-1+m-1)) > (N½ tπ σ2½ + NΔ - nd’) / m ]
1 - Φ[ (n/m)½ {tπ - N½(d’ – Δ) / (σ2½)} ]
which, for Δ = 0 & tπ = zπ, is same as Eq.3 in Grieve (1991)
“The probability, given our knowledge at the interim, that we will meet the decision criteria at the end of the study.”
Interim futility / success criteria could be based on this probabilitye.g. futile if Pr[C | d’] < 0.2
success if Pr[C | d’] > 0.8
Bayesian Study Design(Interim Analysis with vague Design & Analysis Priors)
Phil Woodward 2014
Bayesian Study Design(Interim analysis predictive probability)
Plot comparing ‘conditional power’ and predictive probabilityfollowing interim analysis (25/grp), vague prior distribution
d'Vd’
Only differences to analysis done prior to study start are:1)OC curve conditional on both delta and interim data2)‘Belief distribution’ for delta updated using interim data
Prior or Posteriordepends on one’s perspective
(‘Belief Distribution’)could use informative design
prior, updated using interim data …
100Phil Woodward 2014
101
If we allow an informative design prior at study startp(δ) = N(δ0, ω2)
p(d* | d’) = N(Mδ, Vδ + 2σ2/m)
Mδ = Vδ(δ0/ω2 + d’n/(2σ2))
1/Vδ = 1/ω2 + n/(2σ2)
Bayesian predictive probability, Pr[C | d’]Pr[ N(Mδ, Vδ + 2σ2/m) > (N½ tπ σ2½ + NΔ - nd’) / m ]
1 - Φ[(N½ tπ σ2½ - nd’ - mMδ + NΔ) / (m(Vδ + 2σ2/m)½) still with vague analysis prior
Bayesian Study Design(Interim Analysis with informative Design Prior)
refer back toBayesian NLM reminders
Phil Woodward 2014
Typically, only have informative prior for placebo response
Notation (in addition to that used previously)γ is the placebo true mean responsep(γ) = N(γ0, ψ2) , the informative prior for γ
nA, nP are the number of subjects receiving active and placebo
The effective number of subjects this prior contributes isnγ = σ2 / ψ2
which may be intuitive by re-expressing asψ2 = σ2 / nγ
Bayesian Study Design(using informative prior to reduce sample size)
102Phil Woodward 2014
… but is our intuition correct?
If prior for treatment effect, δ, is vague then posterior
with
It can be seen that the informative prior is equivalent to nγ additional placebo subjects with a sample mean of γ0
Bayesian Study Design(using informative prior to reduce sample size)
103Phil Woodward 2014
… left as an exercise to prove
Worked example
Suppose predictive distribution (placebo prior)p(γ) ~ N(18, 122)
Forecast residual standard deviation(obtained in usual way, not shown here)
σ = 70
Effective N of placebo priorEff.N = (70 / 12)2 = 34
Design study in usual way, ignoring informative prior. Then reduce placebo arm
by 34 and have same power / precision.
Bayesian Study Design(using informative prior to reduce sample size)
104Phil Woodward 2014
Unless no doubts at all, use Robust Priori.e. a mixture of informative and vague prior distributions
p(placebo mean) ~ 0.9 x N(18, 122) + 0.1 x N(18, 1202)
Represents 10% chance meta-data not exchangeablein which case, will effectively revert to vague prior(can also be thought of as heavy tailed distribution)
Also compute Bayesian p-value of data-prior compatibilityPr( “> observed mean” | prior ~ N(18, 122) )
Note: predictive dist. for obs. mean ~ N(18, 122 + σ2 /nP)
Bayesian Study Design(using informative prior to reduce sample size)
105Phil Woodward 2014
Bayesian Emax Model
dose/concentration response model
106Phil Woodward 2014
Emax model is often used for dose response dataeven more common for concentration response datain biological (non-clinical) context known as thelogistic or sigmoidal curve
More generally, could be used to model a monotonic relationship between response and covariate
initially the response changes very slowly with the covariatethen the response changes much more rapidlyfinally the response slows again as a plateau is reached
Bayesian Emax Model
107Phil Woodward 2014
Phil Woodward 2014 108
Bayesian Emax Modelλ sometimes
referred to as ‘Hill slope’
approximately linear on log-scale between
ED20 and ED80
when λ = 1 need ~80 fold range to cover ED10 to ED90
Convergence issues are common with MLE of Emax models
Bayesian Emax Model
109Phil Woodward 2014
Fitted Curve
1
2
3
4
5
6
7
1 10 100 1000
Concentration (ug/mL)
Res
po
nse
Fitted Curve
1
2
3
4
5
6
7
1 10 100 1000
Concentration (ug/mL)
Res
pons
e
Hill coefficient = 1Hill coefficient not restrained
Most clinical data more variable than this and smaller dose rangeClassical fitting algorithms can fail to provide any solution
no data on upper asymptote
Prior distributions required for all parametersE0 : placebo (negative control) response
utilise historical data as discussed earlier
Emax : maximum possible effect relative to E0
typically vague, similar approach to treatment effect prior
ED50 : dose that gives 50% of Emax effectcould be weakly informativebased on same information used to choose dose range
λ determines gradient of dose responsetypically needs to be very informativeclinical data rarely provides much information regards λ
Bayesian Emax Model
110Phil Woodward 2014
e.g. log-normal centred on mid/low dose
90% CI (0.1, 10)GM
e.g. log-normal centred on 1
90% CI (0.5, 2)
{
### Priors prec ~ dgamma( 0.001, 0.001 ) ; sigma <- pow(prec, -0.5) E0 ~ dnorm( 0, 1.0E-6 ) #... but could be informative for placebo mean response Emax ~ dnorm( 0, 1.0E-6 ) #... typically vague log.ED50 ~ dnorm( ???, 1.4 ) ED50 <- exp( log.ED50 ) #... Gives 90% CI of (0.1, 10) x exp(???) log.Hill ~ dnorm( 0, 0.42 ) Hill <- exp( log.Hill ) #... Gives 90% CI of (0.5, 2)
### Likelihood for (i in 1:N) { Y[i] ~ dnorm(mu[i], prec) mu[i] <- E0 + (Emax * pow( X[i], Hill) ) / ( pow( X[i], Hill ) + pow( ED50, Hill ) ) }
}
Bayesian Emax Model(WinBUGS code)
priors should be checked for appropriateness in
each particular case
{ … code as before …
### Quantities of potential interest for (i in 1:N.doses) { ### effect over placebo for pre-specified doses (values entered as data in node DOSE) effect[i] <- (Emax * pow( DOSE[i], Hill) ) / ( pow( DOSE[i], Hill ) + pow( ED50, Hill ) )
### predicted mean response for pre-specified doses DOSE.mean[i] <- effect[i] + E0
### probability of exceeding pre-specified effect of size ??? (mean of node DOSE.PrEffBig) DOSE.PrEffBig[i] <- step( effect[i] - ??? ) }
### estimate dose giving effect of size ??? DOSE.BigEff.0 <- ED50 * ( pow( ???, 1/Hill ) ) / ( pow( Emax - ???, 1/Hill ) ) # set to LARGE dose (LARGE pre-specified) if ??? > Emax DOSE.BigEff <- DOSE.BigEff.0 * step( Emax - ???) + LARGE * step( ??? – Emax )
}
Bayesian Emax Model(WinBUGS code)
BugsXLA
Emax modelsPharmacology Biomarker Experiment
113Phil Woodward 2014
114Phil Woodward 2014
BugsXLA(case study 7.1)
115Phil Woodward 2014
BugsXLA(case study 7.1)
Fixed & random effects Emax models can be fitted using
BugsXLA.
Details not covered in this course.
116
ReferencesBolstad, W.M. (2007). Introduction to Bayesian Statistics. 2nd Edition. John Wiley & Sons, New York.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2004). Bayesian Data Analysis. 2nd Edition. Chapman & Hall/CRC. (3rd Edition now available).
Grieve, A. (1991). Predictive probability in clinical trials. Biometrics, 47, 323-330
Lee, P.M. (2004). Bayesian Statistics: An Introduction. 3rd Edition. Hodder Arnold, London, U.K.
Neuenschwander, B., Capkun-Niggli, G., Branson, M. and Spiegelhalter, D.J. (2010). Summarizing historical information on controls in clinical trials. Clinical Trials; 7: 5-18
Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS. John Wiley & Sons, Hoboken, NJ.
O’Hagan,A., Stevens,J. and Campbell,M. (2005). Assurance in clinical trial design. Pharmaceut. Statist. 4, 187-201
Spiegelhalter, D., Abrams, K. and Myles,J. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. John Wiley & Sons, New York.
Woodward, P. (2012). Bayesian Analysis Made Simple. An Excel GUI for WinBUGS. Chapman & Hall/CRC.
Phil Woodward 2014
top related