My Adventures with Bayes Peter Chapman Wokingham U3A Maths Group 6 April 2011
My Adventures with Bayes
Peter Chapman
Wokingham U3A Maths Group
6 April 2011
Contents • My background
• Motivation
• Some data
• The normal distribution
• Classical inference
• Bayes theorem
• Who was Thomas Bayes
• Bayesian inference
• Some examples of Bayesian inference
WHO AM I
CV
1962-1969: Ashford Grammar School (Middlesex/ Surrey). A-level in Pure Maths, Applied Maths, Chemistry, Physics. 1969-1972: Manchester University - Pure and Applied Maths. 1973: Department of Education, London – Assistant Statistician. 1973-1977: Exeter University – PhD in Applied Statistics 1977-1982: Grassland Research Institute, Hurley – Statistician. 1982-2007: ICI/Zeneca/AstraZeneca/Syngenta, Bracknell - Statistician. 2007-2009: Unilever, Sharnbrook, Bedfordshire. 2009: Retired – joined Wokingham U3A – some consultancy
MOTIVATION
In September 2010 I was offered a contract by my former employer, Syngenta, of Bracknell. The contract on offer required me to (a) carry out a Bayesian analysis, and (b) use the freeware software R. Both of these were new to me and required a significant amount of learning. About the same time I was asked to make a presentation to the Wokingham U3A Maths Group. Since I was putting in a significant amount of time to learn new techniques, it seemed only appropriate to share this learning with them.
This is a presentation about Bayesian methods. Although I am using UK temperature records to illustrate methods, this is not a presentation about climate change. A much more thorough analysis is necessary before we can say anything substantial about climate change. This presentation is not about the normal distribution. Because the normal distribution is well known and easy to work with I have used it to demonstrate Bayesian methodology. The ideas presented here will translate to other, more complex, distributions.
SOME DATA
Monthly mean, Central England temperature (degrees C) 1659-1973 Manley (Q.J.R.Meterol.Soc., 1974) 1974-onwards Parker et al. (Int.J.Clim., 1992) Parker and Horton (Int.J.Clim., 2005)
Year
oC
Average January Temperature - Central England :1659 - 2010
http://www.metoffice.gov.uk/hadobs/hadcet/cetml1659on.dat
Monthly mean, Central England temperature (degrees C) 1659-1973 Manley (Q.J.R.Meterol.Soc., 1974) 1974-onwards Parker et al. (Int.J.Clim., 1992) Parker and Horton (Int.J.Clim., 2005)
Year
oC
Average June Temperature - Central England :1659 - 2010
Monthly mean, Central England temperature (degrees C) 1659-1973 Manley (Q.J.R.Meterol.Soc., 1974) 1974-onwards Parker et al. (Int.J.Clim., 1992) Parker and Horton (Int.J.Clim., 2005)
Year
oC
Average Annual Temperature - Central England :1659 - 2010
Average Monthly Temperature - Central England :1659 - 2010
January
June
Annual
THE NORMAL DISTRIBUTION
2
22
2
1
2
x
e
2( | , ) f x
2
22 2
2
1| ,
2
x
y f x e
X
2
is called the mean
is called the variance
is called the standard deviation
2
22
2
1Prob( )
2
xb
X b e dx
X
b
2
2
2
21 2
21
1Prob( )
2
xb
b
b X b e dx
X
1b 2b
2
22
2
1Prob( ) 1
2
x
X e dx
X
2
2
22
2
1Prob( ) 0.66 ( )
2
x
X e dx approx
X
4
2
2
2
2
22
1Prob( 2 2 ) 0.95 ( )
2
x
X e dx approx
X
6
2
2
3
2
23
1Prob( 3 3 ) 0.99( )
2
x
X e dx approx
X
2
22
2
1Prob( )
2
xx
X x e dx
1.0
0.0
b
2
221
Prob( )2
xb
X b e dx
X
2
2
x-μ-
2 2σ
2(Probability)
1f x | μ,σ = e Density Fis called a .
2πσunction (PDF)
2
22 2
2
1( | , ) is called Cumulative Distribution a Funcion ( .
2CDF)
xx
F x e dx
2Vertical line indicates a distribution of x conditional on the values of μ and σ
CLASSICAL INFERENCE
We have some data..............and we believe that the data derives from
a normal distribution.
Fundamental principle : the parameters, μ and σ in our case,
are fixed or constant.
. 2 2ˆ ˆOur objective is therefore to estimate μ and σ ............μ and σ .
We also want to know how precise the parameter estimates are........
.......... so we need to compute confidence intervals.
2
2 2
At this stage we can compute ( | , )for a variety of values
of μ and σ , but we do not know the correct values for μ and σ .
f x
Year
oC
Average June Temperature - Central England :1659 - 2010
2I am going to guess that μ = 15 and σ = 1 (σ = 1)
15, 1
15, 1 13, 1
14, 2
2
2
i
--
2 22
2
We have 352 values of temperature, t , where i = 1659 to 2010.
1We can compute ( | , ) for any value of and we like.
2
it
if t e
20102
1659
2
iIn the classical approach we compute ( | , ) for all t , i = 1659 2010.
then multiply then to ( | ,gether ).) ( | ,i
i
i
L
f t
t f t
This is called the likelihood.
2We then find the values of and that maximise the likelihood.
2ˆ ˆWe call these maximum - likelihood estimates : and .
( 7, 1.5)
( 23, 1.5)
( 14, 1)
2
22010
2 2
21659
1( , )
2
it
i
L e
2010
* 2 2 2
21659
352 352 1( , ) log (2 ) log ( ) ( )
2 2 2e e e i
i
L Log L t
* 20102
21659
1( - ) 0 i
i
Lt
2010
1659
1ˆ
352i
i
t t
* 20102
2 2 41659
352 1 1( - ) 0
2 2i
i
Lt
20102 2
1659
1ˆ ( - )
352i
i
t t
2
ˆ 14.33
ˆ 1.09
ˆ 1.188
July
ˆ 15.96
ˆ 1.15
February
ˆ 3.86
ˆ 1.83
October
ˆ 9.69
ˆ 1.30
Confidence Intervals
Beyond the scope of this talk
BAYES THEOREM
Q = set of people tested for disease
D = subset of people who have disease
D = subset of people who do not have disease
T = subset of people who test positive
T = set of people who do not test positive
D + D = T + T = Q
P(D) = probability that an individual has the disease
P(D | T) = probability that an individual has a disease given that they have tested positive
P(T) = probability that an individual tests positive
P(T | D) = probability that an individual tests positive given that they have the disease
( | ) ( )( | )
( )
P T D P DP D T
P TBayes Theorem :
Sum
9,900
10,000
19,900
100
9,980,000
9,980,100
Sum
10,000
9,990,000
10,000,000
D D
T
T
9,900 10000 19,900
100 9,980,000 9,980,100
10,000 9,990,000 10,000,000
D D
T
T
10000( ) 0.001
10000000P D
19900( ) 0.00199
10000000P T
( ) and ( ) are marginal probabilitiesP T P D
9,900 10000 19,900
100 9,980,000 9,980,100
10,000 9,990,000 10,000,000
D D
T
T
9900( | ) 0.99
10000P T D
P(D | T) are P(T | D) are
conditional probabilities
( | ) ( )( | )
( )
P T D P DP D T
P T
0.99*0.001
0.00199
0.00099
0.00199
9900
19900 0.497487
9900( | ) 0.497487
19900P D T
BAYESIAN INFERENCE
A fundamental assumption of Bayesian inference is that the unknown parameters are variables.
2For the normal distribution this means that μ and σ are variables, not constants.
2 22 2 2
If we apply Bayes theorem to the normal density function we get :
( | , ) ( , )( , | ) ( | , ) ( , )
( )
L t ff t L t f
f t
Posterior
Distribution
Likelihood Prior
Distribution
Data
For many years Bayesian analysis was a theoretical academic pastime.
This was because the mathematics was very difficult.
Analytic solutions for the posterior often involved complex multiple integrals.
One of the few that can be solved is the normal distribution with uniform priors.
2010
,
1659
1
352i
i
t t
2010
22
1659
1, - .
352 -1)i
i
and s t t
In what follows :
If μ and logσ follow independent uniform prior distributions, then :
2
2
1( , ) , sof
2
2 2
2
352
1 1exp 352 1 352 ,
2
1s t
2010
2
23 21 59
256
1 1e
2,p
1x i
i
t
2010
2
216
259
1 1ex ,
2
1p i
i
t
2 22( | ,( )) , ), | (f fTT L
2
2352
2
2 2
1exp 35
1 12exp 352
1
2,
21 ts
2
2352
2
2
1 1exp 352 1 ,
3522
1,N ts
2 2 2
:
( , | ) | , | ,
We need to factorise the posterior as follows
f T f T f T
:and it can be shown that that
22| , , ,
352T N t
2 2 2| (352 1, ), T Inv s and
2
1| , .352
n
sT t t
Marginal posterior for μ
2Marginal posterior for σ
Conditional posterior for
2 2( | ) ( , | )f T f T d
2
2 2( | ) ( , | )f T f T d
MARKOV CHAIN MONTE-CARLO AND THE METROPOLIS METHOD
2 22 22
Set up the Bayesian posterior :
( | , ) ( , )( , | ) ( | , )
(( , )
)L
L T ff T
ffT
y
2
20102
21659
In our case it takes the following form :
1 1
exp2
1 ,i
i
t
2 2
0 0Select initial values, and , for and .
2 2
0 1 0 1Introduce jump functions : and
2
1 1
2
0 0
( , | )Compute = R.
( , | )
f T
f T
Sample a single random value from a Uniform distribution (0,1)Q U
2 2 2
1 1 1 1 0 0If min(1, ) keep ( , ) else ............( , ) ( , )Q R
2 2 2
0 0 1 1 2 2
2 2 2
1 1
Continue doing this ( , ) ( , ) ( , )
( , ) ( , ) ( , )n n n n big big
This results in a random joint sample from the posterior distribution.
Posterior Distribution
n1n
1 |nf T |nf T
11
( | ) = R > 1 Q so keep
( | )
nn
n
f T
f T
Posterior Distribution
1n n
|nf T 1 |nf T
11
1
( | ) = R < 1.........if keep
( | )
so keep with probability = R
nn
n
n
f TQ R
f T
2,n n
2
1 1,n n Jump Function
2
1 (0, )n n Zrnorm
2 2 2
1 (0, )n n Wrnorm
A COMPARISON OF THREE MODELS
2
2
2
2 2
2 2
2
22 2
2 2
1| ,
2
1| , , ,
2
t
t
f t e
f t e
2
22 2
2
1 | ,
2
t
f t e
Model 1 for all years.
Model 2
for earlier years
for later years
Model 3
2
22 2
2
1 | , ,
2
t t
f t e
i.e. t
2 2
2 2 2 2
| , ,
| , , , ,
f t N
f t N
2 2 | , ,f t N Model 1 for all years.
Model 2for earlier years
for later years
Model 3 2 2 | , , ,f t N t i.e. t
for earlier years
for later years
2 2
2 2 2 2
| , ,
| , , , ,
f t N
f t N
Model 2
for earlier years
for later years
2 2
2 2
| , ,
| , ,
early early early early
late late late late
f t N
f t N
Model 2
Is the same as
2 2 2where and late early late early
Model 1
2
22 2
2
1 | ,
2
t
f t e
2 2 | , ,f t N
2
MCMC Done Very Badl June : 1659y : - 2010
2
0 0,
Jumps Too Small
High correlation between consecutive pairs of sample values.
Posterior Distribution
Solution
2 2
0 0ˆ ˆBetter starting values , maximum likelihood estimates ,
Burn in sampling phase that is discarded
Main phase with infrequent sampling - e.g. every 10,000 pairth
2
th
Burn in stage = 1,000,000 pairs
Main sampling = 100,000,000 pairs, sampling every 10,000
Model 1: Posterior Distribution for June 1659: - 2010
th
Burn in stage = 1,000,000 pairs
Main sampling = 100,000,000 pairs, sampling every 10,000
Model 1: Posterior Distribution for June 1659: - 2010
: mean 14.33
14.213 4.440
2
2
: mean 1.20
1.034 1.389
2
th
Burn in stage = 1,000,000 pairs
Main sampling = 100,000,000 pairs, sampling every 10,000
Model 1: Posterior Distribution for Januar 1659 -y : 2010
Model 1: Posterior Distribution for Januar 1659 -y : 2010
1,000,000
100,000,000 , 10000th
Burn in stage iterations
Main sampling pairs sampling every
2
2
: mean 4.02
3.467 4.691
: mean 3.23
3.022 3.442
Model 1: Posterior Distribution for January 1981- : 2010
th
Burn in stage = 1,000,000 iterations
Main sampling = 100,000,000 pairs, sampling every 10,000
2
th
Burn in stage = 1,000,000 iterations
Main sampling = 100,000,000 pairs, sampling every 10,000
Model 1: Posterior Distribution for January 1981- : 2010
: mean 4.43
3.796 5.064
2
2
: mean 2.98
1.807 5.374
Model 1: Distribution for January : 1881-1910
th
Burn in stage = 1,000,000 iterations
Main sampling = 100,000,000 pairs, sampling every 10,000
2
th
Burn in stage = 1,000,000 iterations
Main sampling = 100,000,000 pairs, sampling every 10,000
Model 1: Distribution for January : 1881- 1910
: mean 3.50
2.857 4.144
2
2
: mean 3.17
1.996 5.893
oJanuary Average Temperature , C,Central England
1881 1910 1981 2010
oAverage January Temperature, C
1781 1810 1881 1910 1981 2010
ˆ 2.87 ˆ 3.49 ˆ 4.44
oAverage June Temperature, C
1781 1810 1881 1910 1981 2010
ˆ 14.54 ˆ 14.11 ˆ 14.48
Model 2
2 2
2 2 2 2
| , ,
| , , , ,
f t N
f t N
2
2
2
2 2
2 2
2
22 2
2 2
1| ,
2
1| , , ,
2
t
t
f t e
f t e
th
Burn in stage = 10,000,000 iterations
Main stage = 100,000,000 sets of four, sampling every 10,000
(1881,19Model 2 for January : 10) (1981,2and 010)
22
2
2
22
(1881,19Model 2 for January : 10) (1981,2and 010)
2
2
2.5th Percentile Median 97.5th Percentile
2.868 3.49 4.119
0.079 0.94 1.851
1.942 3.03 4.686
-1.588 0.07 2.276
22
(1881,19Model 2 for January : 10) (1981,2and 010)
th
Burn in stage = 10,000,000 iterations
Main stage = 100,000,000 sets of four, sampling every 10,000
(1881,1910)Model 2 for ( June : 1981,2 and 010)
(1881,1910)Model 2 for ( June : 1981,2 and 010)
2
2
2.5th Percentile Median 97.5th Percentile
13.745 14.10 14.464
-0.122 0.38 0.866
0.641 0.98 1.515
-0.587 -0.09 0.778
22
(1881,1910)Model 2 for ( June : 1981,2 and 010)
Model 3
2
22 2
2
1 | , ,
2
t t
f t e
2 2 | , , ,f t N t
Model 3 for Janu 1659 -ary : 2010
2
2
Model 3 for Janu 1659ary : - 2010
2
2.5th Percentile Median 97.5th Percentile
1.921 2.35 2.770
0.0028 0.0048 0.0068
1.271 3.790 4.411
Model 3 for Janu 1659 -ary : 2010
2
Model 3 for January : 1659 - 2010
oC
Year
2.35 0.0048( 1650)temp year
Model 3 for Jun 1659 -e : 2010
2
2
Model 3 for Jun 1659 -e : 2010
2
2.5th Percentile Median 97.5th Percentile
11.873 14.10 16.830
-0.00136 0.00013 0.00134
1.035 1.20 1.391
Model 3 for Jun 1659 -e : 2010
2
Year
Model 3 for June : 1659 - 2010
oC
14.096 0.000127( 1650)temp year
Model 3 for Jun 1801-e : 2010
Year
oC
14.18 0.0011( 1800)temp year
Model 3 for Annual A 1v 6er 59age : - 2010
2
2
2
Model 3 for Annual A 1v 6er 59age : - 2010
2.5th Percentile Median 97.5th Percentile
8.619 8.75 8.880
0.0019 0.0025 0.0032
0.319 0.369 0.430
2
Model 3 for Annual Average : 1659 - 2010
Model 3 for Annual A 16ver 59 -age : 2010
oC
Year
8.75 0.0025( 1650)temp year
THOMAS BAYES
Rev. Thomas Bayes (1702-1761) His friend Richard Price edited and presented his work in 1763, after Bayes' death, as An Essay towards solving a Problem in the Doctrine of Chances. The French mathematician Pierre-Simon Laplace reproduced and extended Bayes' results in 1774, apparently quite unaware of Bayes' work. It is speculated that Bayes was elected as a Fellow of the Royal Society in 1742 on the strength of the Introduction to the Doctrine of Fluxions, as he is not known to have published any other mathematical works during his lifetime. It has been suggested that Bayes' theorem, as we now call it, was discovered by Nicholas Saunderson some time before Bayes. This is disputed.
Comments in place of Conclusions: Bayesian Inference • This presentation was not about the Normal distribution. • The Normal distribution was used to illustrate the methods. • For the problems discussed here Bayesian inference offers few
advantages over classical. • For more complex problems, Bayesian inference offers big
advantages. • Prior to 1990 or so, Bayesian inference was a partially academic
subject. • The advent of MCMC and fast computers has made Bayesian
inference a significant player in the world of data analysis. • The numbers of PhDs in statistics is small and getting smaller but
most of them are absorbed in Bayesian issues. • Bayesian approaches are now commonplace.
Comments: Climate Change
• This presentation was not about climate change. • A more thorough analysis is required before we can say anything
substantial about climate change. • The results of a limited analysis so far indicate, for Central England,
that summers are not getting warmer. The range of average summer temperatures that we see today is similar to that seen in the past.
• The range of average winter temperatures seems to be narrower than in the past with an absence of very cold months in recent years.
• This effect could, of course, be a result of thermometers being placed in urban areas.
• The statistically significant increase in annual average temperature may be caused may be caused by increasing average winter temperatures.