10/3/14, 13:37 Bayesian Models in R Page 1 of 53 http://docs.supstat.com/BayesianModelEN/#1 Bayesian Models in R Vivian Zhang | SupStat Inc. Copyright SupStat Inc., All rights reserved
10/3/14, 13:37Bayesian Models in R
Page 1 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian Models in RVivian Zhang | SupStat Inc.Copyright SupStat Inc., All rights reserved
10/3/14, 13:37Bayesian Models in R
Page 2 of 53http://docs.supstat.com/BayesianModelEN/#1
Outline1. Introduction to Bayes and Bayes' Theorem
2. Distribution estimation
3. Conditional probability
4. Bayesian models
2/53
10/3/14, 13:37Bayesian Models in R
Page 3 of 53http://docs.supstat.com/BayesianModelEN/#1
Introduction to Bayes andBayes' Theorem
3/53
10/3/14, 13:37Bayesian Models in R
Page 4 of 53http://docs.supstat.com/BayesianModelEN/#1
The*story*behind*the*Bayesian*modelThomas Bayes
Source: http://www.bioquest.org/products/auth_images/422_bayes.gif
18th century English statistician
Most known for the Bayes Theorem
Essential contributor to early development of probability theory
·
·
·
4/53
10/3/14, 13:37Bayesian Models in R
Page 5 of 53http://docs.supstat.com/BayesianModelEN/#1
The*Model1. Models using Bayes' theorem (based on conditional probablity�
2. Bayes Decision Theory
3. Models implementing Bayesian thinking
Naive Bayes, Association Rules·
Classical Bayesian model for Decision Theory·
Treat all the parameter as random variables, especially in hierarchical models·
5/53
10/3/14, 13:37Bayesian Models in R
Page 6 of 53http://docs.supstat.com/BayesianModelEN/#1
Distribution Estimation
6/53
10/3/14, 13:37Bayesian Models in R
Page 7 of 53http://docs.supstat.com/BayesianModelEN/#1
Distribu6on*Es6ma6onProbablity Density Function
In statistics, the Probablity Density Function (PDF) of a continous random variable is an outputdiscribing this variable, which means the probability around a certain point.
Example: plot of PDF of the Normal distribution
·
·
7/53
10/3/14, 13:37Bayesian Models in R
Page 8 of 53http://docs.supstat.com/BayesianModelEN/#1
Distribu6on*Es6ma6onProbablity Density Function
The PDF has an important place in statistics�
Knowing the PDF, we can calculate the
·
It contains all the information in the random variable-
·
Mean
Variance
Median
etc.
-
-
-
-
8/53
10/3/14, 13:37Bayesian Models in R
Page 9 of 53http://docs.supstat.com/BayesianModelEN/#1
Distribu6on*Es6ma6onProbablity Density FunctionObtain the PDF, get everything from a random variable. This allows you to perform:
Bayesian Hypothesis Tests
Bayesian Interval Estimation
Bayesian Regression Models
Bayesian Logistic Models
etc.
·
·
·
·
·
9/53
10/3/14, 13:37Bayesian Models in R
Page 10 of 53http://docs.supstat.com/BayesianModelEN/#1
Distribu6on*Es6ma6onProbablity Density Function
Example�Bayesian Regression:
Estimation methods for the regression model
·
Y = Xβ + ϵ, ϵ ∼ N(0, )σ2
·
OLS (Ordinary Least Squres)
is the estimator of
-
- β ∼ N(( X Y, ( X )X ′ )−1 X ′ X ′ )−1
- = ( X Yβ̂ X ′ )−1 X ′ β
10/53
10/3/14, 13:37Bayesian Models in R
Page 11 of 53http://docs.supstat.com/BayesianModelEN/#1
Distribu6on*Es6ma6onThe Bayesian Model
Before obtaining data, one has beliefs about the value of the proportion and models his or herbeliefs in terms of a prior distribution.
After data have been observed, one updates one’s beliefs about the proportion by computing theposterior distribution.
·
·
11/53
10/3/14, 13:37Bayesian Models in R
Page 12 of 53http://docs.supstat.com/BayesianModelEN/#1
Distribu6on*Es6ma6onThe Bayesian Model
Building a Bayesian model begins with Bayesian Thinking (every value has its own distribution).
Steps to build a Bayesian model:
·
·
Make inferences about prior distribution
Calculate the parameter of the posterior distribution
Finish the statistical task (interval estimation�statistical decision, etc.)
-
-
-
12/53
10/3/14, 13:37Bayesian Models in R
Page 13 of 53http://docs.supstat.com/BayesianModelEN/#1
Inferring*from*the*posterior*distribu6on
Essentials:
Posterior inference is the core of Bayes' Theorem, because we do not actually know thepopulation distribution which generated our data. We use the conditional distribution to addressthis gap indirectly. In this section, a certain degree of mathematical sophistication is requiredwithout which we cannot easily implement the model computationally.
·
Bayes' theorem
Conditional distribution
Certain prior distribution
·
·
For example: in regression is from a normal distribution- ϵ
·
No information given-
13/53
10/3/14, 13:37Bayesian Models in R
Page 14 of 53http://docs.supstat.com/BayesianModelEN/#1
Calcula6ng*the*posterior*distribu6onThe most difficult part is calculating the posterior distribution, which requires integration.
Markov chain Monte Carlo (MCMC)·
Gibbs
MH method
-
-
14/53
10/3/14, 13:37Bayesian Models in R
Page 15 of 53http://docs.supstat.com/BayesianModelEN/#1
Conditional probability
15/53
10/3/14, 13:37Bayesian Models in R
Page 16 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*probabilityWhat is conditional probability?
The probablity that event will occur when event has occurred. This probability is written as .
· A BP(A|B)
P(A|B) = P(AB)P(B)
A and B are two events
is the probability that both A and B occur.
is the probability that B occurs.
·
· P(AB)
· P(B)
16/53
10/3/14, 13:37Bayesian Models in R
Page 17 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*probabilityWhy conditional probability�Example
Suppose�·
A: The event of getting a cold
B: The event of a rainy day (p = 0.2)
AB: The event that when it rains you get a cold (p = 0.1)
-
-
-
P(A|B) = = = 0.5P(AB)P(B)
0.10.2
Interpretation:·
When it rains, the probablity of getting a cold is 50%-
17/53
10/3/14, 13:37Bayesian Models in R
Page 18 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*probabilityExercise
There are two kids in a family.·
If one of the kids is a boy, the probability that the other one is also a boy is...
If the first one is a boy, the probability that the other one is a boy is...
,
-
-
- 23
12
18/53
10/3/14, 13:37Bayesian Models in R
Page 19 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*ProbabilityThe model relates to conditional probability
A priori·
Mining associated rules
The association from A to B is defined as:
-
-
A => B : = P(B|A)P(AB)P(A)
In R, use the arules package·
19/53
10/3/14, 13:37Bayesian Models in R
Page 20 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*ProbabilityA priori
Goal: find the items with strong relationships
First, load the data:
·
·
library(arules)
data = read.csv("data/BASKETS1n")
names(data)
[1] "cardid" "value" "pmethod" "sex" "homeown" "income"
[7] "age" "fruitveg" "freshmeat" "dairy" "cannedveg" "cannedmeat"
[13] "frozenmeal" "beer" "wine" "softdrink" "fish" "confectionery"
20/53
10/3/14, 13:37Bayesian Models in R
Page 21 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*ProbabilityA priori
basket = data[, 8:18]
names(basket)[which(basket[1, ] == T)]
[1] "freshmeat" "dairy" "confectionery"
tbs2 = apply(basket, 1, function(x) names(basket)[which(x==T)])
len = sapply(tbs2, length)
require(arules)
trans.code = rep(1:1000, len)
trans.items = unname(unlist(tbs2))
trans.code.ind = match(trans.code, unique(trans.code))
trans.items.ind = match(trans.items, unique(trans.items))
21/53
10/3/14, 13:37Bayesian Models in R
Page 22 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*ProbabilityA priori
mat = sparseMatrix(i = trans.items.ind,
j = trans.code.ind,
x = 1,
dims = c(length(unique(trans.items)),
length(unique(trans.code))))
mat = as(mat, 'ngCMatrix')
#after setting the argument we get the model:
trans.res = apriori(mat,parameter = list(confidence=0.05,
support=0.05,
minlen=2,maxlen=3))
22/53
10/3/14, 13:37Bayesian Models in R
Page 23 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*ProbabilityA priori
parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
0.05 0.1 1 none FALSE TRUE 0.05 2 3 rules FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[11 item(s), 940 transaction(s)] done [0.00s].
sorting and recoding items ... [11 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [108 rule(s)] done [0.00s]. 23/53
10/3/14, 13:37Bayesian Models in R
Page 24 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*ProbabilityAt last, we have the items with the strongest relationship in one basket·
#let's see these rules:
lhs.generic = unique(trans.items)[trans.res@lhs@data@i+1]
rhs.generic = unique(trans.items)[trans.res@rhs@data@i+1]
cbind(lhs.generic, rhs.generic)[1:10, ]
lhs.generic rhs.generic
[1,] "dairy" "confectionery"
[2,] "confectionery" "dairy"
[3,] "dairy" "fish"
[4,] "fish" "dairy"
[5,] "dairy" "fruitveg"
[6,] "fruitveg" "dairy"
[7,] "dairy" "frozenmeal"
[8,] "frozenmeal" "dairy"
[9,] "freshmeat" "confectionery"
[10,] "confectionery" "freshmeat"
24/53
10/3/14, 13:37Bayesian Models in R
Page 25 of 53http://docs.supstat.com/BayesianModelEN/#1
Condi6onal*ProbabilityThe model relates to conditional probablity
Naive Bayes·
Used in recommendation systems�classification problems
Compute the posterior probability for all values of C using the Bayestheorem:
Choose the value of C that maximizes
Equivalent to choosing the value of C that maximizes
-
- P(C|A1, A2, … , An)
P(C| ⋯ ) =A1A2 AnP( ⋯ |C) × P(C)A1A2 An
P( ⋯ )A1A2 An
- P(C|A1, A2, . . . , An)
- P(A1, A2, . . . , An|C)P(C)
25/53
10/3/14, 13:37Bayesian Models in R
Page 26 of 53http://docs.supstat.com/BayesianModelEN/#1
Naive*Bayesdata(iris)
m = naiveBayes(Species ~ ., data=iris)
## alternatively:
m = naiveBayes(iris[, -5], iris[, 5])
26/53
10/3/14, 13:37Bayesian Models in R
Page 27 of 53http://docs.supstat.com/BayesianModelEN/#1
Naive*BayesModel:
m
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = iris[, -5], y = iris[, 5])
A-priori probabilities:
iris[, 5]
setosa versicolor virginica
0.33333 0.33333 0.33333
Conditional probabilities:
Sepal.Length
iris[, 5] [,1] [,2]
setosa 5.006 0.35249 27/53
10/3/14, 13:37Bayesian Models in R
Page 28 of 53http://docs.supstat.com/BayesianModelEN/#1
Naive*BayesPredict:
table(predict(m, iris), iris[,5])
setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 3 47
28/53
10/3/14, 13:37Bayesian Models in R
Page 29 of 53http://docs.supstat.com/BayesianModelEN/#1
From*condi6onal*probablity*to*Bayes'*TheoremWe have:
So:
Change the Conditional Prob.
·
P(B|A) = P(AB)P(A)
·
P(AB) = P(B|A)P(A)
·
P(A|B) = =P(AB)P(B)
P(B|A)P(A)P(B)
29/53
10/3/14, 13:37Bayesian Models in R
Page 30 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayes'*TheoremP(A|B) = P(B|A)P(A)
P(B)
Bayes' theorem relates the conditional probablity to the marginal distribution of a random varable.Bayes' theorm can tell us how to update our thinking after obtaining new data.
Harold Jeffreys has claimed that Bayes' theorem is to Statistics as the Pythagorean theorem is togeometry.
·
·
30/53
10/3/14, 13:37Bayesian Models in R
Page 31 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayes'*theoremContinuous situation
The Bayes' theorem mentioned above is in discrete form
In the real world often we are using and analyzing continuous random variables
The Bayes' theorem can be written in continuous form as:
·
·
·
π(θ|x) = f (x|θ)π(θ)m(x)
31/53
10/3/14, 13:37Bayesian Models in R
Page 32 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayes'*TheoremContinous form
π(θ|x) = f (x|θ)π(θ)m(x)
Here�·
is an unknown parameter
is the data observed
Processing is from to
From the original knowledge of updated to the situation after we observe
- θ
- X
- π(θ) π(θ|x)
- θ X
32/53
10/3/14, 13:37Bayesian Models in R
Page 33 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayes'*TheoremContinuous form
π(θ|x) = f (x|θ)π(θ)m(x)
Based on the properties of continous random variables, it can be written as:·
π(θ|x) = f (x|θ)π(θ)∫ f (x|θ)π(θ)dθ
33/53
10/3/14, 13:37Bayesian Models in R
Page 34 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayes'*TheoremContinuous form
Important distributions:�
π(θ|x) = =f (x|θ)π(θ)m(x)
f (x|θ)π(θ)∫ f (x|θ)π(θ)dθ
· π(θ)
Prior distribution-
· π(θ|x)
Posterior distribution-
34/53
10/3/14, 13:37Bayesian Models in R
Page 35 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayes'*TheoremContinuous form
Other distributions:
π(θ|x) = =f (x|θ)π(θ)m(x)
f (x|θ)π(θ)∫ f (x|θ)π(θ)dθ
· m(x) = ∫ f (x|θ)π(θ)dθ
Marginal Distribution-
· f (x|θ)π(θ) = f (x, θ)
Joint distribution-
35/53
10/3/14, 13:37Bayesian Models in R
Page 36 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian Models
36/53
10/3/14, 13:37Bayesian Models in R
Page 37 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*ModelsBayesian thinking
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Data are random variables with a mean of · μ
37/53
10/3/14, 13:37Bayesian Models in R
Page 38 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*ModelsBayesian thinking
The frequency perspective: The mean is a constant· μ
colMeans(iris[, 1:3])
Sepal.Length Sepal.Width Petal.Length
5.8433 3.0573 3.7580
38/53
10/3/14, 13:37Bayesian Models in R
Page 39 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*ModelsBayesian thinking
PROB SEPAL LENGTH SEPAL WIDTH PETAL LENGTH
90% 5.843333 3.057333 3.758000
10% Others Others Others
The Bayesian perspective: The mean is a random variable· μ
39/53
10/3/14, 13:37Bayesian Models in R
Page 40 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*ModelsIn fact, nearly all of modern Bayesian modeling uses Bayesian thinking
Nearly all statistical models can be implemented as Bayesian-form models
Even some non-parametric models can be transformed to Bayeseian versions
Bayes Cluster
Bayes Regression
Bayes Neural Net
Non-parametric Bayes
Hierarchical model
etc.
·
·
·
·
·
Logit, Probit, Tobit, Quantile, LASSO...-
·
·
·
·
40/53
10/3/14, 13:37Bayesian Models in R
Page 41 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExampleQuestion
For a Sample from a normal distribution. We want to know the mean of this sample.
Frequentists think
Bayesians think is a random variable with a distribution
Suppose that�
·
· , , . . . , ∼ N(θ, σ)X1 X2 Xn
· = mean(x)θ̂
· θ
· θ ∼ N(μ, )τ2
Infer the posterior distribution
Calculate the posterior distribution
Estimate the mean of the sample
-
-
-
41/53
10/3/14, 13:37Bayesian Models in R
Page 42 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExampleInferenceInferring the posterior distribution using Bayes' Theorem in continous form:
π(θ|x) = =f (x|θ)π(θ)m(x)
f (x|θ)π(θ)∫ f (x|θ)π(θ)dθ
Put the distribution into the theorem to calculate the posterior distribution·
Prior distribution
Conditional distribution
- θ ∼ N(μ, )τ2
- x|θ ∼ N(θ, )σ2
42/53
10/3/14, 13:37Bayesian Models in R
Page 43 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExampleInference
43/53
10/3/14, 13:37Bayesian Models in R
Page 44 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExampleCalculating the posterior distributionAccording to the theorem, we know the mean and the variance of for a normal distribution.θ
postDis = function(miu=2, tau=4, n=100) {
x = rnorm(n,3,5)
a = list(0)
a[[1]] = (var(x)*miu+tau^2*mean(x))/(var(x)+tau^2)
a[[2]] = var(x)*tau^2/(var(x)+tau^2)
a
}
postDis(3, 5, 1000)
[[1]]
[1] 2.9284
[[2]]
[1] 12.254
44/53
10/3/14, 13:37Bayesian Models in R
Page 45 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExampleEstimating the mean
In ordinary statistics, the MLE and moment estimators of in a normal distribution are the samplemean.
For the Bayes posterior distribution
· μ
·
MLE ---> posterior maximum likelihood estimator
Can be considered as MLE of posterior distribution
Posterior distribution is normal, too. So, the parameter of the mean is:
-
-
-
( μ + x)/( + )σ2 τ2 σ2 τ2
45/53
10/3/14, 13:37Bayesian Models in R
Page 46 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExampleEstimating the mean
When using a different prior distribution
Observe the error in a different situation
· x ∼ N(μ, σ) = N(3, 5)
The mean is 3-
·
·
46/53
10/3/14, 13:37Bayesian Models in R
Page 47 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExamplePrior distribution: · N(3, 1)
library(ggplot2)
plot_dif = function(miu=3, tau=1) {
i = seq(100, 10000, by=10)
set.seed(123)
meanCompare = function(n=100, miu=3, tau=1) {
x = rnorm(n, 3, 5)
(var(x)*miu+tau^2*mean(x))/(var(x)+tau^2)-3
}
aa = sapply(i, meanCompare, miu=miu, tau=tau)
bb = sapply(i,function(i) mean(rnorm(i,3,5))-3)
g = ggplot(data.frame(i=i, a=aa, b=bb)) +
geom_line(aes(x=i ,y=b), col="blue") +
geom_line(aes(x=i, y=a), col="red")
print(g)
}
47/53
10/3/14, 13:37Bayesian Models in R
Page 48 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExamplePrior distribution: (Bayes estimator in red, MLE in blue)· N(3, 1)
plot_dif(3, 1)
48/53
10/3/14, 13:37Bayesian Models in R
Page 49 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExamplePrior distribution: (Bayes estimator in red, MLE in blue)· N(2, 1)
plot_dif(2,1)
49/53
10/3/14, 13:37Bayesian Models in R
Page 50 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExamplePrior distribution: (Bayes estimator in red, MLE in blue)· N(2, 4)
plot_dif(2,4)
50/53
10/3/14, 13:37Bayesian Models in R
Page 51 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExamplePrior distribution: (Bayes estimator in red, MLE in blue)· N(2, 100)
plot_dif(2,100)
51/53
10/3/14, 13:37Bayesian Models in R
Page 52 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*Example1. As we can see, if the prior distribution is very accurate, the Bayes estimator is better than the
ordinary estimator.
2. If the prior distribution is not accurate enough:
Larger variance is better
For a suitable variance� more data is better
·
·
52/53
10/3/14, 13:37Bayesian Models in R
Page 53 of 53http://docs.supstat.com/BayesianModelEN/#1
Bayesian*Modeling*ExampleChoosing the prior distribution
Choosing a prior distribution...·
If sure for the model, can improve the accuracy of the estimator
If not sure, should be done by selecting for greater variance to improve the estimator
-
-
53/53