Top Banner
Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable Plots Multiple Regression HH Chapter 9 October 31, 2005
20

Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Feb 02, 2019

Download

Documents

dangdieu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Multiple Regression

HH Chapter 9

October 31, 2005

Page 2: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Topics

I Regression with Two or More Predictors

I Matrix Version of Regression

I Hat Matrix & Leverage

I Added Variable Plots

I Interpretation

Page 3: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Data

EDA original

Correlations

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Air Pollution Data

I hh/datasets/usair.dat

I Response SO2 measurements in 41 metropolitan areas

I PredictorsI tempI mgfirmsI popnI windI precipI raindays

Model?

Page 4: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Data

EDA original

Correlations

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Scatterplot Matrix

Original Variables

SO2

45 60 75 0 1500 3500 10 30 50

2060

100

4560

75 temp

mgfirms

015

00

015

0035

00

popn

wind

68

10

1030

50 precip

20 60 100 0 1500 6 8 10 40 100 160

4010

016

0

raindays

Page 5: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Data

EDA original

Correlations

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Correlations between Variables

SO2 temp firms popn wind precip rain

SO2 1.00 -0.43 0.64 0.49 0.09 0.05 0.37

temp -0.43 1.00 -0.19 -0.06 -0.35 0.39 -0.43

firms 0.64 -0.19 1.00 0.96 0.24 -0.03 0.13

popn 0.49 -0.06 0.96 1.00 0.21 -0.03 0.04

wind 0.09 -0.35 0.24 0.21 1.00 -0.01 0.16

precip 0.05 0.39 -0.03 -0.03 -0.01 1.00 0.50

rain 0.37 -0.43 0.13 0.04 0.16 0.50 1.00

Which explanatory variable leads to the “best” simple linearregression?What is its R2?Can we do “better” by including other variables?(transformations?)

Page 6: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Model

R Code

Diagnostics

Matrix

Notation

Added

Variable Plots

Multiple Regression with p Predictors

Model:

I Observe data {Yi , xi1, . . . , xip} i = 1, . . . n

I E[Yi |xi1, . . . xip] = f (xi1, xi ,p)

I First Approximation (First order Taylor’s series)

E[Yi |xi1, . . . xip] ≡ µi = β0 + xi1β1 + . . . + xi ,pβp

I Normal Model

Yiind∼ N(µi , σ

2) ⇔

Yi = β0 + xi1β1 + . . . + xi ,pβp + εi , εiiid∼ N(0, σ2)

I OLS (MLE) find β0, . . . , βp that minimize

i

(Yi − β0 + xi1β1 + . . . + xi ,pβp)2 ≡

∑(e2

i )

Page 7: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Model

R Code

Diagnostics

Matrix

Notation

Added

Variable Plots

Fitting Models in R

Choice of transformation of response and predictors?BoxCox procedure can be used to find “best” transformation ofY (for a given set of transformed predictors

poll.lm = lm(SO2 ~ temp + firms +

popn + wind +

precip+ rain,

data=pollution)

# plot diagnostics (R 2.2)

par(mfrow=c(2,2))

plot(poll.lm, ask=F)

library(MASS)

boxcox(poll.lm)

Page 8: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Model

R Code

Diagnostics

Matrix

Notation

Added

Variable Plots

Scatterplot - log response

log(SO2)

45 60 75 0 1500 3500 10 30 50

2.0

3.0

4.0

4560

75 temp

firms

015

00

015

0035

00

popn

wind

68

10

1030

50 precip

2.0 3.0 4.0 0 1500 6 8 10 40 100 160

4010

016

0

rain

Page 9: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Model

R Code

Diagnostics

Matrix

Notation

Added

Variable Plots

Residuals

ei = Yi − Yi = Yi − {β0 + xi1β1 + . . . + xi ,pβp}

0 20 40 60 80 100

−20

020

40

Fitted values

Res

idua

ls

Residuals vs Fitted

31

30

26

−2 −1 0 1 2

−1

01

23

4

Theoretical QuantilesS

tand

ardi

zed

resi

dual

s

Normal Q−Q

31

30

26

0 20 40 60 80 100

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location31

30

26

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

−2

−1

01

23

4

Leverage

Sta

ndar

dize

d re

sidu

als

Cook’s distance1

0.5

0.5

1

Residuals vs Leverage

31

1

25

Page 10: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Matrix Notation

Y1 = 1β0 + x11β1 + . . . + x1pβp + ε1

Y2 = 1β0 + x21β1 + . . . + x2pβp + ε2... =

...

Yn = 1β0 + xn1β1 + . . . + xn,pβp + εn

Y = 1nβ0 + X1β1 + . . . + Xpβp + ε

Y = Xβ + ε

where X = [1nX1 . . .Xp] is a n × (p + 1)) matrix and Y and Xj

are vectors of length n, β = (β0, . . . βp)T

Page 11: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

MLE’s in Matrix Notation

The MLE of β maximizes

Q(β) = (Y − Xβ)T (Y − Xβ)

(or equivalently OLS solution minimizes −Q(β))

Solution: β = (XTX)−1XTY

Page 12: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Hat Matrix

H ≡ X(XTX)−1XT is a n × n projection matrix

I HT = H (Symmetric)

I HH = H2 = H (idempotent)

I HY = X(XTX)−1XTY = Xβ = Y Hat Matrix

I (In − H) is also a projection matrix (In is the identitymatrix)

I (In − H)Y = Y − Y = e

hi is the leverage of case i (the ith diagonal element of H)Measure of how far the ith set of predictors is away from therest of the data(more in Chapter 11)

Page 13: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Leverage

hatvalues(poll.lm)

0 10 20 30 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Case Index

Leve

rage

high leverage point if hi > 2(p + 1) n

Case 11?

Page 14: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

New Model

poll.lm3 = lm(log(SO2) ~ temp + log(firms) +

log(popn) + wind + precip +

rain, data=pollution)

2.5 3.0 3.5 4.0

−1.

0−

0.5

0.0

0.5

1.0

Fitted values

Res

idua

ls

Residuals vs Fitted

37

25

31

−2 −1 0 1 2

−2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

25

3731

2.5 3.0 3.5 4.0

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location25

37 31

0.0 0.1 0.2 0.3 0.4 0.5

−2

−1

01

2

Leverage

Sta

ndar

dize

d re

sidu

als

Cook’s distance 1

0.5

0.5

1Residuals vs Leverage

25

3111

Page 15: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Interpretation

Added Variable Plots

What is effect of adding Xj to model after all other X′ havebeen included?

I Regress Xj on X1,Xj−1,Xj+1,Xp

I Find the residuals Xj − Xj ≡ Xj |.

I Regress Y on X1,Xj−1,Xj+1,Xp

I Find the residuals Y − Y1,j−1,j+1,p ≡ ej

I Plot ej versus Xj |.

I Slope of line is βj in regression on all X’s (adjusted)

I Look for need to transform, non-constant variance,outliers, etc

Page 16: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Interpretation

Added Variable Plots in R

# use poll-lm3

library(car)

# library for ‘‘Companion to Applied Regression’’

help(av.plots)

av.plots(poll.lm3)

Page 17: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Interpretation

av.plots

0 5 10

−1.

0−

0.5

0.0

0.5

1.0

Added−Variable Plot

temp | others

log(

SO

2) |

oth

ers

−0.5 0.0 0.5 1.0

−0.

50.

00.

51.

0

Added−Variable Plot

log(firms) | otherslo

g(S

O2)

| o

ther

s

−0.5 0.0 0.5

−0.

50.

00.

51.

0

Added−Variable Plot

log(popn) | others

log(

SO

2) |

oth

ers

−2 −1 0 1 2 3

−1.

5−

0.5

0.0

0.5

1.0

Added−Variable Plot

wind | others

log(

SO

2) |

oth

ers

−15 −10 −5 0 5 10

−1.

0−

0.5

0.0

0.5

1.0

Added−Variable Plot

precip | others

log(

SO

2) |

oth

ers

−20 −10 0 10 20 30

−0.

50.

00.

51.

0

Added−Variable Plot

rain | others

log(

SO

2) |

oth

ers

Page 18: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Interpretation

Model Fitting

EDA used throughout:

I scatterplots

I BoxCox or ladder of powers

I leverage plots

I residual plots

I added variable plots

iterate model building until “assumptions” linearity & constantvariance seem plausible

Page 19: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Interpretation

summary(poll.lm3) (abbreviated)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.7142760 1.6475086 4.075 0.000261 ***

temp -0.0649495 0.0227711 -2.852 0.007333 **

log(firms) 0.3698588 0.1934076 1.912 0.064289 .

log(popn) -0.1771293 0.2335520 -0.758 0.453428

wind -0.1738606 0.0656713 -2.647 0.012204 *

precip 0.0156032 0.0132718 1.176 0.247893

rain 0.0009153 0.0057335 0.160 0.874104

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1

Residual standard error: 0.5108 on 34 degrees of freedom

Multiple R-Squared: 0.5503, Adjusted R-squared: 0.471

F-statistic: 6.936 on 6 and 34 DF, p-value: 7.12e-05

Page 20: Multiple Regression HH Chapter 9 Air Pollution Multiple ... Multiple Regression HH Chapter 9 Air Pollution Example Regression with Multiple Predictors Matrix Notation Added Variable

Multiple

Regression

HH Chapter 9

Air Pollution

Example

Regression

with Multiple

Predictors

Matrix

Notation

Added

Variable Plots

Interpretation

Interpretation

I coefficients and their standard errors (in original units)

I t-statistics & p-values

I R2 and adjusted R-squared

I residual standard error

I F statistic and p-value