Top Banner
Topic 13: Multiple Linear Regression Example
44

Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Jan 04, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Topic 13: Multiple Linear Regression Example

Page 2: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Outline

• Description of example

• Descriptive summaries

• Investigation of various models

• Conclusions

Page 3: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Study of CS students

• Too many computer science majors at

Purdue were dropping out of program

• Wanted to find predictors of success to

be used in admissions process

• Predictors must be available at time of

entry into program.

Page 4: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Data available

• GPA after three semesters

• Overall high school math grade

• Overall high school science grade

• Overall high school English grade

• SAT Math

• SAT Verbal

• Gender (of interest for other reasons)

Page 5: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Data for CS Example

• Y is the student’s grade point average (GPA) after 3 semesters

• 3 HS grades and 2 SAT scores are the explanatory variables (p=6)

• Have n=224 students

Page 6: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Descriptive StatisticsData a1; infile 'C:\...\csdata.dat'; input id gpa hsm hss hse satm satv genderm1;

proc means data=a1 maxdec=2; var gpa hsm hss hse satm satv;run;

Page 7: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Output from Proc Means

Variable N Mean Std Dev Minimum Maximumgpahsmhsshsesatmsatv

224224224224224224

2.648.328.098.09

595.29504.55

0.781.641.701.5186.4092.61

0.122.003.003.00

300.00285.00

4.0010.0010.0010.00800.00760.00

Page 8: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Descriptive Statistics

proc univariate data=a1; var gpa hsm hss hse satm satv; histogram gpa hsm hss hse satm satv /normal;run;

Page 9: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Page 10: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Page 11: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Page 12: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Page 13: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Page 14: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Page 15: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Correlations

proc corr data=a1; var hsm hss hse satm satv;proc corr data=a1; var hsm hss hse satm satv; with gpa;run;

Page 16: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Output from Proc CorrPearson Correlation Coefficients, N = 224

Prob > |r| under H0: Rho=0gpa hsm hss hse satm satv

gpa 1.00000 0.43650<.0001

0.32943<.0001

0.28900<.0001

0.251710.0001

0.114490.0873

hsm 0.43650<.0001

1.00000 0.57569<.0001

0.44689<.0001

0.45351<.0001

0.221120.0009

hss 0.32943<.0001

0.57569<.0001

1.00000 0.57937<.0001

0.240480.0003

0.26170<.0001

hse 0.28900<.0001

0.44689<.0001

0.57937<.0001

1.00000 0.108280.1060

0.243710.0002

satm 0.251710.0001

0.45351<.0001

0.240480.0003

0.108280.1060

1.00000 0.46394<.0001

satv 0.114490.0873

0.221120.0009

0.26170<.0001

0.243710.0002

0.46394<.0001

1.00000

Page 17: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Output from Proc Corr

Pearson Correlation Coefficients, N = 224Prob > |r| under H0: Rho=0

hsm hss hse satm satvgpa 0.43650

<.00010.32943<.0001

0.28900<.0001

0.251710.0001

0.114490.0873

All but SATV significantly correlated with GPA

Page 18: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Scatter Plot Matrix

proc corr data=a1 plots=matrix;

var gpa hsm hss hse satm satv;

run;

Allows visual check of pairwise relationships

Page 19: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

No “strong” linear

Relationships

Can see discreteness

of high school scores

Page 20: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Use high school grades to predict GPA (Model #1)

proc reg data=a1; model gpa=hsm hss hse;run;

Page 21: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|Intercept 1 0.58988 0.29424 2.00 0.0462hsm 1 0.16857 0.03549 4.75 <.0001hss 1 0.03432 0.03756 0.91 0.3619hse 1 0.04510 0.03870 1.17 0.2451

Root MSE 0.69984 R-Square 0.2046

Dependent Mean 2.63522 Adj R-Sq 0.1937

Coeff Var 26.55711

Results Model #1

Meaningful??

Page 22: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

ANOVA Table #1

Analysis of Variance

Source DFSum of

SquaresMean

Square F Value Pr > FModel 3 27.71233 9.23744 18.86 <.0001

Error 220 107.75046 0.48977

Corrected Total 223 135.46279

Significant F test but not all variable t tests significant

Page 23: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Page 24: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Remove HSS (Model #2)

proc reg data=a1; model gpa=hsm hse;run;

Page 25: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|Intercept 1 0.62423 0.29172 2.14 0.0335hsm 1 0.18265 0.03196 5.72 <.0001hse 1 0.06067 0.03473 1.75 0.0820

Root MSE 0.69958 R-Square 0.2016

Dependent Mean 2.63522 Adj R-Sq 0.1943

Coeff Var 26.54718

Results Model #2

Slightly better MSE and adjusted R-Sq

Page 26: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

ANOVA Table #2

Analysis of Variance

Source DFSum of

SquaresMean

Square F Value Pr > FModel 2 27.30349 13.65175 27.89 <.0001

Error 221 108.15930 0.48941

Corrected Total 223 135.46279

Significant F test but not all variable t tests significant

Page 27: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Rerun with HSM only (Model #3)

proc reg data=a1; model gpa=hsm;run;

Page 28: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|Intercept 1 0.90768 0.24355 3.73 0.0002hsm 1 0.20760 0.02872 7.23 <.0001

Root MSE 0.70280 R-Square 0.1905

Dependent Mean 2.63522 Adj R-Sq 0.1869

Coeff Var 26.66958

Results Model #3

Slightly worse MSE and adjusted R-Sq

Page 29: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

ANOVA Table #3

Analysis of Variance

Source DFSum of

SquaresMean

Square F Value Pr > FModel 1 25.80989 25.80989 52.25 <.0001

Error 222 109.65290 0.49393

Corrected Total 223 135.46279

Significant F test and all variable t tests significant

Page 30: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Page 31: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

SATs (Model #4)

proc reg data=a1; model gpa=satm satv;run;

Page 32: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Root MSE 0.75770 R-Square 0.0634

Dependent Mean 2.63522 Adj R-Sq 0.0549

Coeff Var 28.75287

Results Model #4

Much worse MSE and adjusted R-Sq

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|Intercept 1 1.28868 0.37604 3.43 0.0007

satm 1 0.00228 0.00066291 3.44 0.0007

satv 1 -0.00002456 0.00061847 -0.04 0.9684

Page 33: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

ANOVA Table #4

Analysis of Variance

Source DFSum of

SquaresMean

Square F Value Pr > FModel 2 8.58384 4.29192 7.48 0.0007

Error 221 126.87895 0.57411

Corrected Total 223 135.46279

Significant F test but not all variable t tests significant

Page 34: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

HS and SATs (Model #5)

proc reg data=a1; model gpa=satm satv hsm hss hse;*Does general linear test; sat: test satm, satv; hs: test hsm, hss, hse;

Page 35: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Root MSE 0.70000 R-Square 0.2115

Dependent Mean 2.63522 Adj R-Sq 0.1934

Coeff Var 26.56311

Results Model #5

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|Intercept 1 0.32672 0.40000 0.82 0.4149hsm 1 0.14596 0.03926 3.72 0.0003hss 1 0.03591 0.03780 0.95 0.3432hse 1 0.05529 0.03957 1.40 0.1637satm 1 0.00094359 0.00068566 1.38 0.1702satv 1 -0.00040785 0.00059189 -0.69 0.4915

Page 36: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Test sat

Test sat Results for Dependent Variable gpa

Source DFMean

Square F Value Pr > FNumerator 2 0.46566 0.95 0.3882

Denominator 218 0.49000

Cannot reject the reduced model…No significant information lost…We don’t need SAT variables

Page 37: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Test hs

Test hs Results for Dependent Variable gpa

Source DFMean

Square F Value Pr > FNumerator 3 6.68660 13.65 <.0001

Denominator 218 0.49000

Reject the reduced model…There is significant information lost…We can’t remove HS variables from model

Page 38: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Best Model?

• Likely the one with just HSM or the one with HSE and HSM.

• We’ll discuss comparison methods in Chapters 7 and 8

Page 39: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Key ideas from case study

• First, look at graphical and numerical summaries one variable at a time

• Then, look at relationships between pairs of variables with graphical and numerical summaries.

• Use plots and correlations to understand relationships

Page 40: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Key ideas from case study

• The relationship between a response variable and an explanatory variable depends on what other explanatory variables are in the model

• A variable can be a significant (P<.05) predictor alone and not significant (P>0.5) when other X’s are in the model

Page 41: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Key ideas from case study

• Regression coefficients, standard errors and the results of significance tests depend on what other explanatory variables are in the model

Page 42: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Key ideas from case study

• Significance tests (P values) do not tell the whole story

• Squared multiple correlations give the proportion of variation in the response variable explained by the explanatory variables) can give a different view

• We often express R2 as a percent

Page 43: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Key ideas from case study

• You can fully understand the theory in terms of Y = Xβ + e

• However to effectively use this methodology in practice you need to understand how the data were collected, the nature of the variables, and how they relate to each other

Page 44: Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Background Reading

• Cs2.sas contains the SAS commands used in this topic