Top Banner
Matt VanLandeghem and Grant Sorensen
9

Matt VanLandeghem and Grant Sorensen. Too many parameters: Lots of variance in predicted values Too few parameters: Missing important parameters

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

Matt VanLandeghem and Grant Sorensen

Page 2: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

Too many parameters:• Lots of variance in predicted values

Too few parameters:• Missing important parameters

Variance/bias tradeoff

Page 3: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

See SAS website• PROC GLMSELECT• Version 9.3 documentation (not 9.2)

• http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_glmselect_sect037.htm

Page 4: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

Variable importance represented as a selection frequency• Instead of p-value from F test

Estimates based on several “good” models

Distributions of parameter estimates

All of these help us pick the most useful model

Page 5: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

Any field where variable selection techniques are used• Biology (Burnham and Anderson

2002)• Atmospheric sciences (Sloughter et

al. 2007)• Econometrics (LeSage and Parent

2007)• Finance (Pesaran et al. 2009)• Psychology (Wasserman 2000)• …and others

Page 6: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

SAS implementation• GLMSELECT• Only GLMs• Experimental

Sensitive to correlated predictors• e.g. Homework #4

Extension of regression• Typical assumptions

still apply• Not a “magic” solution

Page 7: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

Other SAS options• AIC or BIC from SAS

procedure of choice• Model weights based

on AIC or BIC• Averaged “by hand”

Page 8: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

Burnham, K.P. and D.R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York.

LeSage, J.P and O. Parent. 2007. Bayesian model averaging for spatial economic models. Geographical Analysis 39:241-267.

Peseran, M.H., C. Schleicher, and P. Zaffaroni. 2009. Model averaging in risk management with an application to futures markets. Journal of Empirical Finance 16:280-305.

Sloughter, J.M., A.E. Raftery, T. Gneiting, and C. Fraley. 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 3209–3220

Wasserman, L. 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology 44:92-107.

Whintey, M. and L. Ngo. 2004. Bayesian model averaging using SAS software. SUGI 29 Proceedings, Paper 203-29.

Pitfall picture:http://www.retrogameoftheday.com/2009/10/retro-game-of-day-pitfall.html

SAS model averaging webpage: http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_glmselect_sect026.htm

Page 9: Matt VanLandeghem and Grant Sorensen.  Too many parameters: Lots of variance in predicted values  Too few parameters: Missing important parameters

ods graphics on;

proc glmselect data = colstd seed=3 plots= all;model y = x1-x9 / selection=stepwise

(choose=cv);modelAverage tables=(EffectSelectPct(all)

ParmEst(all)) refit(minpct=50 nsamples=100) ;

run;

ods graphics off;