Top Banner
Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression
51

Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Jan 29, 2016

Download

Documents

Hilary Stephens
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-1

Copyright © 2012 Pearson Education. All rights reserved.

Chapter 18

Multiple Regression

Page 2: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-2

18.1 The Multiple Regression Model

For simple regression, the predicted value depends on only one predictor variable:

0 1y b b x

For multiple regression, we write the regression model with more predictor variables:

0 1 1 2 2ˆ k ky b b x b x b x

Page 3: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-3

18.1 The Multiple Regression ModelSimple Regression Example: Zillow.com, Home Price vs. Bedrooms, Saratoga Springs, NY

Random sample of 1057 homes. Can Bedrooms be used to predict Price?

Approximately linear relationship

Equal Spread Condition is violated.

Be cautious about using inferential methods on these data.

Page 4: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-4

18.1 The Multiple Regression ModelSimple Regression Example: Zillow.com, Home Price vs. Bedrooms, Saratoga Springs, NYComputer regression output:

• The variation in Bedrooms accounts for only 21% of the variation in Price.

• Perhaps the inclusion of another factor can account for a portion of the remaining variation.

Page 5: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-5

18.1 The Multiple Regression ModelMultiple Regression: Include Living Area as a predictor in the regression model.Computer regression output:

• Now the model accounts for 58% of the variation in Price.

Page 6: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-6

18.1 The Multiple Regression ModelMultiple Regression:

Residuals: (as with simple regression)

Degrees of freedom:n = number of observationsk = number of predictor variables

Standard deviation of residuals:

ˆe y y

1df n k

1e

y ys

n k

Page 7: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-7

18.2 Interpreting Multiple Regression Coefficients

NOTE: The meaning of the coefficients in multiple regression can be subtly different than in simple regression.

Price drops with increasing bedrooms? How can this be correct?

28986.10 7483. 0 3.841 9Price Bedrooms Living Area

Page 8: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-8

18.2 Interpreting Multiple Regression Coefficients

In a multiple regression, each coefficient takes into account all the other predictor(s) in the model. For houses with similar sized Living Areas, more bedrooms means smaller bedroomsand/or smaller commonliving space. Cramped rooms may decrease the value of a house.

Homes in sample with

2500 ≤ Living Area ≤ 3000

Page 9: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-9

18.2 Interpreting Multiple Regression Coefficients

So, what’s the correct answer to the question:

“Do more bedrooms tend to increase or decrease the price of a home?”

Correct answer:

“increase” if Bedrooms is the only predictor (“more bedrooms” may mean “bigger house”, after all!)

“decrease” if Bedrooms increases for fixed Living Area (“more bedrooms” may mean “smaller, more-cramped rooms”)

Page 10: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-10

18.2 Interpreting Multiple Regression Coefficients

Summarizing:

Multiple regression coefficients must be interpreted in terms of the other predictors in the model.

Page 11: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-11

18.2 Interpreting Multiple Regression Coefficients

Example : Ticket PricesOn a typical night in New York City, about 25,000 people attend a Broadway show, paying an average price of more than $75 per ticket. Data for most weeks of 2006-2008 consider the variables Paid Attendance (thousands), # shows, Average Ticket Price ($) to predict Receipts($million). Consider the regression model for these variables.

Dependent variable is: Receipts($M)R squared = 99.9% R squared (adjusted) = 99.9%s = 0.0931 with 74 degrees of freedomSource Sum of Squares df Mean Square F-ratioRegression 484.789 3 161.596 18634Residual 0.641736 74 0.008672

Page 12: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-12

18.2 Interpreting Multiple Regression Coefficients

Example : Ticket Prices Write the regression model for these variables.

Interpret the coefficient of Paid Attendance.Estimate receipts when paid attendance was 200,000 customer attending 30 shows at an average ticket price of $70.Is this likely to be a good prediction? Why or why not?

Variable Coeff SE(Coeff) t-ratio P-valueIntercept –18.320 0.3127 –58.6 0.0001Paid Attend 0.076 0.0006 126.7 0.0001# Shows 0.0070 0.0044 1.6 0.116Average 0.24 0.0039 61.5 0.0001 Ticket Price

Page 13: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-13

18.2 Interpreting Multiple Regression Coefficients

Example : Ticket Prices Write the regression model for these variables.

Interpret the coefficient of Paid Attendance. If the number of shows and ticket price are fixed, an increase of 1000 customers generates an average increase of $76,000 in receipts. Estimate receipts when paid attendance was 200,000 customer attending 30 shows at an average ticket price of $70. $13.89 millionIs this likely to be a good prediction? Yes, R2 (adjusted) is 99.9% so this model explains most of the variability in Receipts.

receipts 18.32 0.076 Paid Attendance

0.007 # Shows 0.24Average Ticket Price

Page 14: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-14

18.3 Assumptions and Conditions for the Multiple Regression Model

Linearity Assumption Linearity Condition: Check each of the predictors.

Home Prices Example: Linearity Condition is well-satisfied for both Bedrooms and Living Area.

Page 15: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-15

18.2 Interpreting Multiple Regression Coefficients

Example : Ticket PricesOne a typical night in New York City, about 25,000 people attend a Broadway show, paying an average price of more than $75 per ticket. Data for most weeks of 2006-2008 consider the variables Paid Attendance (thousands), # shows, Average Ticket Price ($) to predict Receipts ($million). Consider the regression model for these variables.

Variable Coeff SE(Coeff) t-ratio P-valueIntercept –18.320 0.3127 –58.6 0.0001Paid Attend 0.076 0.0006 126.7 0.0001# Shows 0.0070 0.0044 1.6 0.116Average 0.24 0.0039 61.5 0.0001 Ticket Price

Page 16: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-16

18.3 Assumptions and Conditions for the Multiple Regression Model

Linearity Assumption Linearity Condition: Also check the residual plot.

Home Prices ExampleLinearity Condition is well-satisfied.

Page 17: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-17

18.3 Assumptions and Conditions for the Multiple Regression Model

Independence Assumption

As usual, there is no way to be sure the assumption is satisfied. But, think about how the data were collected to decide if the assumption is reasonable.

Randomization Condition: Does the data collection method introduce any bias?

Page 18: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-18

18.3 Assumptions and Conditions for the Multiple Regression Model

Equal Variance Assumption

Equal Spread Condition: The variability of the errors should be about the same for each predictor.

Use scatterplots to assess the Equal Spread Condition.

Residuals vs. Predicted Values: Home Prices

Page 19: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-19

18.3 Assumptions and Conditions for the Multiple Regression Model

Normality Assumption

Nearly Normal Condition: Check to see if the distribution of residuals is unimodal and symmetric.

Home Price Example: The ‘tails” of the distribution appear to be non-normal.

Page 20: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-20

18.3 Assumptions and Conditions for the Multiple Regression Model

Summary of Multiple Regression Model and Condition Checks:

1. Check Linearity Condition with a scatterplot for each predictor. If necessary, consider data re-expression.

2. If the Linearity Condition is satisfied, fit a multiple regression model to the data.

3. Find the residuals and predicted values.

4. Inspect a scatterplot of the residuals against the predicted values. Check for nonlinearity and non-uniform variation.

Page 21: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-21

18.3 Assumptions and Conditions for the Multiple Regression Model

Summary of Multiple Regression Model and Condition Checks:

5. Think about how the data were collected.

Do you expect the data to be independent?

Was suitable randomization utilized?

Are the data representative of a clearly identifiable population?

Is autocorrelation an issue?

Page 22: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-22

18.3 Assumptions and Conditions for the Multiple Regression Model

Summary of Multiple Regression Model and Condition Checks:

6. If the conditions check, feel free to interpret the regression model and use it for prediction.

7. Check the Nearly Normal Condition by inspecting a residual distribution histogram and a Normal plot. If the sample size is large, the Normality is less important for inference. Watch for skewness and outliers.

Page 23: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-23

18.4 Testing the Multiple Regression ModelThere are several hypothesis tests in multiple regression

Each is concerned with whether the underlying parameters (slopes and intercept) are actually zero.

The hypothesis for slope coefficients:

0 1 2: . . . 0

: at least one 0k

A

H

H

Test the hypothesis with an F-test (a generalization of the t-test to more than one predictor).

Page 24: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-24

18.4 Testing the Multiple Regression ModelThe F-distribution has two degrees of freedom:

k, where k is the number of predictors

n – k – 1 , where n is the number of observations

The F-test is one-sided – bigger F-values mean smaller P-values.

If the null hypothesis is true, then F will be near 1.

Page 25: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-25

18.4 Testing the Multiple Regression ModelIf a multiple regression F-test leads to a rejection of the null hypothesis, then check the t-test statistic for each coefficient:

1

0jn k

j

bt

SE b

Note that the degrees of freedom for the t-test is n – k – 1.

Confidence interval: b

jt

n k 1* SE b

j

Page 26: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-26

18.4 Testing the Multiple Regression Model“Tricky” Parts of the t-tests:

SE’s are harder to compute (let technology do it!)

The meaning of a coefficient depends on the other predictors in the model (as we saw in the Home Price example).

If we fail to reject based on it’s t-test, it does not mean that xj has no linear relationship to y. Rather, it means that xj contributes nothing to modeling y after allowing for the other predictors.

0 : 0jH

Page 27: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-27

18.4 Testing the Multiple Regression Model

In Multiple Regression, it looks like each tells us the effect of its associated predictor, xj.

BUT

The coefficient can be different from zero even when there is no correlation between y and xj.

It is even possible that the multiple regression slope changes sign when a new variable enters the regression.

j

j

Page 28: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-28

18.4 Testing the Multiple Regression ModelExample : More Ticket PricesOn a typical night in New York City, about 25,000 people attend a Broadway show, paying an average price of more than $75 per ticket. The variables are Paid Attendance (thousands), # shows, Average Ticket Price ($) to predict Receipts ($million).

State hypothesis, the test statistic and p-value, and draw a conclusion for an F-test for the overall model.

Dependent variable is: Receipts($M)R squared = 99.9% R squared (adjusted) = 99.9%s = 0.0931 with 74 degrees of freedomSource Sum of Squares df Mean Square F-ratio P-valueRegression 484.789 3 161.596 18634 < 0.0001Residual 0.641736 74 0.008672

Page 29: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-29

18.4 Testing the Multiple Regression ModelExample : More Ticket PricesState hypothesis for an F-test for the overall model.

State the test statistic and p-value. The F-statistic is the F-ratio = 18634. The p-value is < 0.0001.Draw a conclusion. The p-value is small, so reject the null hypothesis. At least one of the predictors accounts for enough variation in y to be useful.

Dependent variable is: Receipts($M)R squared = 99.9% R squared (adjusted) = 99.9%s = 0.0931 with 74 degrees of freedomSource Sum of Squares df Mean Square F-ratio P-valueRegression 484.789 3 161.596 18634 < 0.0001Residual 0.641736 74 0.008672

H0

:1

2

30

HA

:10,

20, or

30

Page 30: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-30

18.4 Testing the Multiple Regression Model

Example : More Ticket PricesSince the F-ratio suggests that at least one variable is a useful predictor, determine which of the following variables contribute in the presence of the others. Recall the variables Paid Attendance (thousands), # shows, Average Ticket Price ($) to predict Receipts($million).

Variable Coeff SE(Coeff) t-ratio P-valueIntercept 18.320 0.3127 58.6 0.0001Paid Attend 0.076 0.0006 126.7 0.0001# Shows 0.0070 0.0044 1.6 0.116Average 0.24 0.0039 61.5 0.0001 Ticket Price

Page 31: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-31

18.4 Testing the Multiple Regression ModelExample : More Ticket PricesSince the F-ratio suggests that at least one variable is a useful predictor, determine which of the following variables contribute in the presence of the others.

Paid Attendance (p = 0.0001) and Average Ticket Price(p = 0.0001) both contribute, even when all other variables are in the model. # Shows however, is not significant(p = 0.116) and should be removed from the model.

Variable Coeff SE(Coeff) t-ratio P-valueIntercept 18.320 0.3127 58.6 0.0001Paid Attend 0.076 0.0006 126.7 0.0001# Shows 0.0070 0.0044 1.6 0.116Average 0.24 0.0039 61.5 0.0001 Ticket Price

Page 32: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-32

18.5 Adjusted R2, and the F-statisticSummary of Multiple Regression Variation Measures:

Parameter Significance

Sum of Squared Residuals

Larger SSE = “noisier” data and less precise prediction

Regression Sum of Squares

Larger SSR = stronger model correlation

Total Sum of Squares

Larger SST = larger variability in y, due to “noisier” data (SSE) and/or stronger model correlation (SSR)

2SSE e

2SST y y

or

SST SSR SSE

2ˆSSR y y

Page 33: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-33

18.5 Adjusted R2, and the F-statisticR2 in Multiple Regression:

R2 =fraction of the total variation in y accounted for by the model (all the predictor variables included)

2 1SSR SSE

RSST SST

F and R2:

By using the expressions for SSE, SSR, SST, and R2, it can be shown that:

2

2

/

1 / 1

R kF

R n k

So, testing whether F = 0 is equivalent to testing whether R2 = 0.

Page 34: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-34

18.5 Adjusted R2, and the F-statisticR2 and Adjusted R2

Adding new predictor variables to a model never decreases R2 and may increase it.

But each added variable increases the model complexity, which may not be desirable.

Adjusted R2 imposes a “penalty” on the correlation strength of larger models, depreciating their R2 values to account for an undesired increase in complexity:

2 2 11 1

1adj

nR R

n k

Adjusted R2 permits a more equitable comparison between models of different sizes.

Page 35: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-35

*18.6 The Logistic Regression Model

Dichotomous Response: a response y with only two choices: Yes/No, A/B, Up/Down, True/False, etc.

Dichotomous responses are categorical.

Dichotomous responses can be re-expressed in terms of the responses “0” and “1” to make them quantitative instead of categorical: 0 = “No” and 1 = “Yes”, for example

Using this re-expression, samples of dichotomous response data can be analyzed using regression concepts and tools.

Page 36: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-36

*18.6 The Logistic Regression ModelExample: Will a customer respond to a special offer based on how much the customer spent with the vendor last year?

Predictor variable: Spending (previous year)

Response variable: Respond to Offer (0 = “No”, 1 = “Yes)

A plot of Spending vs. Respond to Offer is a quantitative scatterplot that one might analyze with linear regression (blue line).

Page 37: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-37

*18.6 The Logistic Regression ModelSome problems with the regression:

Problem 1: The model (blue line) takes on values such as 0.3 and 0.75. Are these values equal to “Yes” or “No”? What do they mean?

Resolution: The response y is really the proportion of customers that will respond to the offer, and thus y can assume fractional values.

Page 38: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-38

*18.6 The Logistic Regression ModelSome problems with the regression:

Problem 2: The model should not give values greater than 1 or less than 0. It ought to smoothly climb from values near 0 to values near 1.

Resolution: Try re-expressing the response as

where p is the proportion of “Yes” respondents. This is logistic regression.

ln

öp

1 öp

b

0 b

1x

1K b

kx

k

Page 39: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-39

*18.6 The Logistic Regression ModelLogistic Regression Strategy:

Transform the responses using .

Apply multiple regression (using technology) to the transformed data to get the values of the coefficients:

Transform back to the probability p:

ln

öp

1 öp

b

0 b

1x

1K b

kx

k

ln

öp

1 öp

0 1 1

1 k xb b x b xp

e

Page 40: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-40

*18.6 The Logistic Regression ModelLogistic Regression Strategy:

The resulting response curve

is a smooth “S”-shaped curve that ranges from near 0 to near 1.

0 1 1

1 k xb b x b xp

e

Page 41: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-41

*18.6 The Logistic Regression ModelDeciding to Apply the Logistic Regression:

Are the data dichotomous?

Do you have reason to doubt that the data are independent?

Do you have reason to question the Randomization Condition?

Are there any outliers that may unduly influence the model (examine the predictor values)?

Page 42: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-42

Don’t claim to “hold everything else constant” for a single individual. (For the predictors Age and Years of Education, it is impossible for an individual to get a year of education at constant age.)

Don’t interpret regression causally. Statistics assesses correlation, not causality.

Be cautious about interpreting a regression as predictive. That is, be alert for combinations of predictor values that take you outside the ranges of these predictors.

Page 43: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-43

Be careful when interpreting the signs of coefficients in a multiple regression. The sign of a variable canchange depending on which other predictors are in or out of the model. The truth is more subtle and requires that we understand the multiple regression model.

If a coefficient’s t-statistic is not significant, don’t interpret it at all.

Don’t fit a linear regression to data that aren’t straight. Usually, we are satisfied when plots of y against the x’s are straight enough.

Page 44: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-44

Watch out for changing variance in the residuals. The most common check is a plot of the residuals against the predicted values.

Make sure the errors are nearly normal.

Watch out for high-influence points and outliers.

Page 45: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-45

What Have We Learned?

Know how to perform a multiple regression, using the technology of your choice.• Technologies differ, but most produce similar-looking tables to hold the regression results. Know how to find the values you need in the output generated by the technology you are using.

Understand how to interpret a multiple regression model.• The meaning of a multiple regression coefficient depends on the other variables in the model. In particular, it is the relationship of y to the associated x after removing the linear effects of the other x’s.

Page 46: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-46

What Have We Learned?

Be sure to check the Assumptions and Conditions before interpreting a multiple regression model.• The Linearity Assumption asserts that the form of the multiple regression model is appropriate. We check it by examining scatterplots. If the plots appear to be linear, we can fit a multiple regression model.• The Independence Assumption requires that the errors made by the model in fitting the data be mutually independent. Data that arise from random samples or randomized experiments usually satisfy this assumption.

Page 47: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-47

What Have We Learned?

Be sure to check the Assumptions and Conditions before interpreting a multiple regression model (continued).• The Equal Variance Assumption states that the variability around the multiple regression model should be the same everywhere. We usually check the Equal Spread Condition by plotting the residuals against the predicted values. This assumption is needed so that we can pool the residuals to estimate their standard deviation, which we will need for inferences about the regression coefficients.

Page 48: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-48

What Have We Learned?

Be sure to check the Assumptions and Conditions before interpreting a multiple regression model (continued).• The Normality Assumption says that the model’s errors should follow a Normal model. We check the Nearly Normal Condition with a histogram or normal probability plot of the residuals. We need this assumption to use Student’s t models for inference, but for larger sample sizes, it is less important.

Page 49: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-49

What Have We Learned?

Know how to state and test hypotheses about the multiple regression coefficients.• The standard hypothesis test for each coefficient is

• We test these hypotheses by referring the test statistic

to the Student’s t distribution on n – k – 1 degrees of freedom, where k is the number of coefficients estimated in the multiple regression.

bj 0

SE(bj)

j

j

: = 0

: 0

o

A

H

H

Page 50: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-50

What Have We Learned?

Interpret other associated statistics generated by a multiple regression• R2 is the fraction of the variation in y accounted for by the multiple regression model.• Adjusted R2 attempts to adjust for the number of coefficients estimated.

Page 51: Copyright © 2012 Pearson Education. All rights reserved. 18-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.

Copyright © 2012 Pearson Education. All rights reserved. 18-51

What Have We Learned?

The F-statistic tests the overall hypothesis that the regression model is of no more value than simply modeling y with its mean.

The standard deviation of the residuals,

provides an idea of how precisely the regression model fits the data.

2

1e

es

n k