Top Banner
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
28

Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dec 22, 2015

Download

Documents

Jody Harrell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Multiple Regression 2

Sociology 5811 Lecture 23

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Page 2: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Announcements

• Today: More multivariate regression

Page 3: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

The Multiple Regression Model

• A two-independent variable regression model:

iiii eXbXbaY 2211

• Note: There are now two X variables

• And a slope (b) is estimated for each one

• The full multiple regression model is:

ikikiii eXbXbXbaY 2211

• For k independent variables

Page 4: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Multiple Regression: Slopes

• Regression slope for the two variable case:

21

21

2121

11 XX

XXYXYX

X

Y

r

rrr

s

sb

• b1 = slope for X1 – controlling for the other independent variable X2

• b2 is computed symmetrically. Swap X1s, X2s

• Compare to bivariate slope: YXX

YYX r

s

sb

Page 5: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Regression Slopes

• So, if two variables (X1, X2) are correlated and both predict Y:

• The X variable that is more correlated with Y will have a higher slope in multivariate regression– The slope of the less-correlated variable will shrink

• Thus, slopes for each variable are adjusted to how well the other variable predicts Y– It is the slope “controlling” for other variables

Page 6: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Standardized Regression Coefficients

• Standardized Coefficients– Also called “Betas” or Beta Weights”– Symbol: Greek b with asterisk: * – Equivalent to Z-scoring (standardizing) all

independent variables before doing the regression

• Formula of coeficient for Xj:j

Y

X

j bs

sj

*

• Result: The unit is standard deviations

• Betas: Indicates the effect a 1 standard deviation change in Xj on Y

Page 7: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Model Summary

.522a .272 .271 12.41Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), INCOM16, EDUCa.

R-Square in Multiple Regression• Example:

• R-square of .272 indicates that education, parents wealth explain 27% of variance in job prestige

• “Adjusted R-square” is a more conservative, more accurate measure in multiple regression– Generally, you should report Adjusted R-square.

Page 8: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables

• Question: How can we incorporate nominal variables (e.g., race, gender) into regression?

• Option 1: Analyze each sub-group separately– Generates different slope, constant for each group

• Option 2: Dummy variables– “Dummy” = a dichotomous variables coded to

indicate the presence or absence of something– Absence coded as zero, presence coded as 1.

Page 9: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables

• Strategy: Create a separate dummy variable for all nominal categories

• Ex: Gender – make female & male variables– DFEMALE: coded as 1 for all women, zero for men– DMALE: coded as 1 for all men

• Next: Include all but one dummy variables into a multiple regression model

• If two dummies, include 1; If 5 dummies, include 4.

Page 10: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables

• Question: Why can’t you include DFEMALE and DMALE in the same regression model?

• Answer: They are perfectly correlated (negatively): r = -1– Result: Regression model “blows up”

• For any set of nominal categories, a full set of dummies contains redundant information– DMALE and DFEMALE contain same information– Dropping one removes redundant information.

Page 11: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables: Interpretation

• Consider the following regression equation:

iiii eDFEMALEbINCOMEbaY 21

• Question: What if the case is a male?

• Answer: DFEMALE is 0, so the entire term becomes zero.– Result: Males are modeled using the familiar

regression model: a + b1X + e.

Page 12: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables: Interpretation

• Consider the following regression equation:

iiii eDFEMALEbINCOMEbaY 21

• Question: What if the case is a female?

• Answer: DFEMALE is 1, so b2(1) stays in the equation (and is added to the constant)– Result: Females are modeled using a different

regression line: (a+b2) + b1X + e

– Thus, the coefficient of b2 reflects difference in the constant for women.

Page 13: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables: Interpretation

• Remember, a different constant generates a different line, either higher or lower– Variable: DFEMALE (women = 1, men = 0)– A positive coefficient (b) indicates that women are

consistently higher compared to men (on dep. var.)– A negative coefficient indicated women are lower

• Example: If DFEMALE coeff = 1.2:– “Women are on average 1.2 points higher than men”.

Page 14: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables: Interpretation• Visually: Women = blue, Men = red

INCOME

100000800006000040000200000

HA

PP

Y

10

9

8

7

6

5

4

3

2

1

0

Overall slope for all data points

Note: Line for men, women have same slope… but one is

high other is lower. The constant differs!

If women=1, men=0: The constant (a) reflects

men only. Dummy coefficient (b) reflects

increase for women (relative to men)

Page 15: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables

• What if you want to compare more than 2 groups?

• Example: Race– Coded 1=white, 2=black, 3=other (like GSS)

• Make 3 dummy variables:– “DWHITE” is 1 for whites, 0 for everyone else– “DBLACK” is 1 for Af. Am., 0 for everyone else– “DOTHER” is 1 for “others”, 0 for everyone else

• Then, include two of the three variables in the multiple regression model.

Page 16: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Coefficientsa

9.666 1.672 5.780 .000

2.476 .111 .517 22.271 .000

6.282E-02 .397 .004 .158 .874

-2.666 1.117 -.055 -2.388 .017

1.114 1.777 .014 .627 .531

(Constant)

EDUC

INCOM16

DBLACK

DOTHER

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: PRESTIGEa.

Dummy Variables: Interpretation

• Ex: Job Prestige

• Negative coefficient for DBLACK indicates a lower level of job prestige compared to whites– T- and P-values indicate if difference is significant.

Page 17: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dummy Variables: Interpretation

• Comments:

• 1. Dummy coefficients shouldn’t be called slopes– Referring to the “slope” of gender doesn’t make sense– Rather, it is the difference in the constant (or “level”)

• 2. The contrast is always with the nominal category that was left out of the equation– If DFEMALE is included, the contrast is with males– If DBLACK, DOTHER are included, coefficients

reflect difference in constant compared to whites.

Page 18: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Question: What if you suspect that a variable has a totally different slope for two different sub-groups in your data?

• Example: Income and Happiness– Perhaps men are more materialistic -- an extra dollar

increases their happiness a lot– If women are less materialistic, each dollar has a

smaller effect on income (compared to men)

• Issue isn’t men = “more” or “less” than women– Rather, the slope of a variable (income) differs across

groups

Page 19: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Issue isn’t men = “more” or “less” than women– Rather, the slope of a variable coefficient (for income)

differs across groups

• Again, we want to specify a different regression line for each group– We want lines with different slopes, not parallel lines

that are higher or lower.

Page 20: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms• Visually: Women = blue, Men = red

INCOME

100000800006000040000200000

HA

PP

Y

10

9

8

7

6

5

4

3

2

1

0

Overall slope for all data points

Note: Here, the slope for men and women

differs.

The effect of income on happiness (X1 on Y)

varies with gender (X2). This is called an

“interaction effect”

Page 21: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Interaction effects: Differences in the relationship (slope) between two variables for each category of a third variable

• Option #1: Analyze each group separately

• Option #2: Multiply the two variables of interest: (DFEMALE, INCOME) to create a new variable– Called: DFEMALE*INCOME– Add that variable to the multiple regression model.

Page 22: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Consider the following regression equation:

iiii eINCDFEMbINCOMEbaY *21

• Question: What if the case is male?

• Answer: DFEMALE is 0, so b2(DFEM*INC) drops out of the equation– Result: Males are modeled using the ordinary

regression equation: a + b1X + e.

Page 23: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Consider the following regression equation:

iiii eINCDFEMbINCOMEbaY *21

• Question: What if the case is male?

• Answer: DFEMALE is 1, so b2(DFEM*INC) becomes b2*INCOME, which is added to b1

– Result: Females are modeled using a different regression line: a + (b1+b2) X + e

– Thus, the coefficient of b2 reflects difference in the slope of INCOME for women.

Page 24: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Interpreting interaction terms:

• A positive b for DFEMALE*INCOME indicates the slope for income is higher for women vs. men– A negative effect indicates the slope is lower– Size of coefficient indicates actual difference in slope

• Example: DFEMALE*INCOME. Observed b’s:– Income: b = .5– DFEMALE * INCOME: b = -.2

• Interpretation: Slope is .5 for men, .3 for women.

Page 25: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Continuous variable can also interact

• Example: Effect of education and income on happiness– Perhaps highly educated people are less materialistic– As education increases, the slope between between

income and happiness would decrease

• Simply multiply Education and Income to create the interaction term “EDUCATION*INCOME”– And add it to the model

Page 26: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• How do you interpret continuous variable interactions?

• Example: EDUCATION*INCOME: Coefficient = 2.0

• Answer: For each unit change in education, the slope of income vs. happiness increases by 2– Note: coefficient is symmetrical: For each unit

change in income, education slope increases by 2– Dummy interactions result in slopes for each group– Continuous interactions result in many slopes

• Each category of education*income has a different slope.

Page 27: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Comments:

• 1. If you make an interaction you should also include the component variables in the model:– A model with “DFEMALE * INCOME” should also

include DFEMALE and INCOME– There is some debate on this issue… but that is the

safest course of action

• 2. Sometimes interaction terms are highly correlated with its components

• Watch out for that.

Page 28: Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interaction Terms

• Question: Can you think of examples of two variables that might interact?

• Either from your final project? Or anything else?