Logistic regression Weakly informative priors Conclusions Bayesian generalized linear models and an appropriate default prior Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su Columbia University 14 August 2008 Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p
130
Embed
Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Logistic regressionWeakly informative priors
Conclusions
Bayesian generalized linear models and anappropriate default prior
Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, andYu-Sung Su
Columbia University
14 August 2008
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
Logistic regression
−6 −4 −2 0 2 4 60.0
0.2
0.4
0.6
0.8
1.0 y = logit−1(x)
x
logi
t−1(x
)
slope = 1/4
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
A clean example
−10 0 10 20
0.0
0.2
0.4
0.6
0.8
1.0
estimated Pr(y=1) = logit−1(−1.40 + 0.33 x)
x
y slope = 0.33/4
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
The problem of separation
−6 −4 −2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
slope = infinity?
x
y
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
Separation is no joke!
glm (vote ~ female + black + income, family=binomial(link="logit"))
1960 1968
coef.est coef.se coef.est coef.se
(Intercept) -0.14 0.23 (Intercept) 0.47 0.24
female 0.24 0.14 female -0.01 0.15
black -1.03 0.36 black -3.64 0.59
income 0.03 0.06 income -0.03 0.07
1964 1972
coef.est coef.se coef.est coef.se
(Intercept) -1.15 0.22 (Intercept) 0.67 0.18
female -0.09 0.14 female -0.25 0.12
black -16.83 420.40 black -2.63 0.27
income 0.19 0.06 income 0.09 0.05
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
Regularization in action!
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models