interaction models Categorical independent variables Dummy variables Multiple categories Interaction models With dummy variables With multiple category variables With continuous variables Multiple regression: Categorical independent variables and interaction effects Johan A. Elkink School of Politics & International Relations University College Dublin 19 November 2018
37
Embed
Multiple regression: Categorical independent variables and interaction effects · 2018-11-16 · Multiple categories Interaction models With dummy variables With multiple category
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Johan A. ElkinkSchool of Politics & International Relations
University College Dublin
19 November 2018
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
1 Categorical independent variables
Dummy variables
Multiple categories
2 Interaction models
With dummy variables
With multiple category variables
With continuous variables
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Outline
1 Categorical independent variables
Dummy variables
Multiple categories
2 Interaction models
With dummy variables
With multiple category variables
With continuous variables
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Outline
1 Categorical independent variables
Dummy variables
Multiple categories
2 Interaction models
With dummy variables
With multiple category variables
With continuous variables
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Introduction
So far, we have discussed regressions where both thedependent and the independent variables were continuous, orof interval/ratio measurement level.
In particular in the social sciences, variables are oftenqualitative or categorical in nature.
When an independent variable is categorical in nature, theestimation remains the same, but the interpretation changes.
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Dummy variables
A dummy variable is a binary variable that can only havevalues 0 or 1.
In regression analysis, a dummy variable can be added as anindependent variable without any problems. If a categoricalvariable is coded differently, you cannot add it to the model.
respnr gender female1 Male 02 Female 13 Male 04 Male 05 Female 16 Female 17 Female 1
In SPSS: RECODE gender
("Male" = 0) ("Female" =
1) INTO female.
In Stata: recode gender (1 =
0) (2 = 1), gen(female)
In R: female <-
car::recode(gender,
"’male’=0; ’female’=1;
else=NA")
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Dummy variables
A dummy variable is a binary variable that can only havevalues 0 or 1.
In regression analysis, a dummy variable can be added as anindependent variable without any problems. If a categoricalvariable is coded differently, you cannot add it to the model.
respnr gender female1 Male 02 Female 13 Male 04 Male 05 Female 16 Female 17 Female 1
In SPSS: RECODE gender
("Male" = 0) ("Female" =
1) INTO female.
In Stata: recode gender (1 =
0) (2 = 1), gen(female)
In R: female <-
car::recode(gender,
"’male’=0; ’female’=1;
else=NA")
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Regression with dummy variables
Model 1: yi = β1, i.e. a model without any independentvariables.
Here you would simply obtain: β1 = y .
(This also shows that regression is close to estimating meansand the t-test is also the same as for comparing means.)
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Regression with dummy variables
Model 2: yi = β1 + β2di , where D is a dummy variable. Herethere are two scenarios:
di = 0:yi = β1 + β2 · 0 = β1
and we just estimate the mean of Y for the group where D = 0.
di = 1:yi = β1 + β2 · 1 = β1 + β2
and that sum is the estimated mean of Y for the group whereD = 1.
The estimate β2 is therefore the difference in means for thetwo groups.
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Regression with dummy variables
Model 3: yi = β1 + β2di + β3xi , where D is a dummy variableand X is continuous. Here there are two scenarios:
di = 0:yi = β1 + β2 · 0 + β3xi = β1 + β3xi
and we have an intercept β1 and a slope coefficient β3 forthe group where D = 0.
di = 1:
yi = β1 + β2 · 1 + β3xi = (β1 + β2) + β3xi
and we have an intercept β1 + β2 and a slope coefficient β3for the group where D = 1.
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Dummy variables and interpretation
So, dummy variables test whether the intercept (means)differ—do not interpret the respective coefficient as “if Xincreases by 1 unit, Y increases by ...”
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Dummy variables and t-tests
yi = β1 + β2di + β3xi
In a regression, the t-test for a coefficient tests whether, giventhe other variables in the model, the slope of a line is differentfrom zero, with zero being no effect of X on Y .
H0 : β3 = 0, so under the null, the slope of the line is zero.
In a regression with a dummy variable, the t-test for thatcoefficient tests whether, given the other variables in themodel, the mean of the two groups differ.
H0 : β2 = 0, so under the null, the two groups have the sameintercept.
Instead of just two categories, a categorical variables can havemultiple categories, such as party preference or religiousdenomination. To add these to the regression, we split them upin multiple dummy variables.
respnr party ff fg lab sf1 Fianna Fail 1 0 0 02 Sinn Fein 0 0 0 13 Labour 0 0 1 04 Sinn Fein 0 0 0 15 Fianna Fail 1 0 0 06 Fianna Fail 1 0 0 07 Fine Gael 0 1 0 08 Fine Gael 0 1 0 09 Labour 0 0 1 0
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Multiple categories
respnr party ff fg lab sf1 Fianna Fail 1 0 0 02 Sinn Fein 0 0 0 13 Labour 0 0 1 04 Sinn Fein 0 0 0 15 Fianna Fail 1 0 0 06 Fianna Fail 1 0 0 07 Fine Gael 0 1 0 08 Fine Gael 0 1 0 09 Labour 0 0 1 0
Note that in a regression always one category has to be leftout, and all the other results are relative to this referencecategory, e.g.:
Yi = β1 + β2fgi + β3labi + β4sfi ,
such that all coefficients show the difference relative to FiannaFail voters.
So far, we have only been adding variables in an additivemodel.
Imagine, however, that the relation between X and Y woulddepend on the group—e.g. the effect of ability on income isgreater for those with a degree than those without a degree.
We call this an interaction effect, we have to interact thevariable X with D, for example:
yi = β1 + β2xi + β3di + β4xidi .
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Interaction with dummy variables
Model 4: yi = β1 + β2xi + β3di + β4xidi , where D is a dummyvariable and X is continuous. Here there are two scenarios:
di = 0:
yi = β1 + β2xi + β3 · 0 + β4xi · 0 = β1 + β2xi
and we have an intercept β1 and a slope coefficient β2 for thegroup where D = 0.
and we have an intercept β1 + β3 and a slope coefficientβ2 + β4 for the group where D = 1.
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Including component variables
Note that this also shows the importance of including thecomponent variables that make up the interaction. E.g.:
yi = β1 + β2di + β3xidi ,
where we exclude the variable X by itself, we would have:
di = 0:yi = β1 + β2 · 0 + β3xi · 0 = β1
and we have an intercept β1 and a slope coefficient 0 (!) forthe group where D = 0.
di = 1:
yi = β1 + β2 · 1 + β3xi · 1 = (β1 + β2) + β3xi
and we have an intercept β1 + β2 and a slope coefficient β3 forthe group where D = 1.
So we arbitrarily fix one slope to zero.
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Including component variables
Or similarly:yi = β1 + β2xi + β3xidi ,
where we exclude the dummy variable D by itself:
di = 0:yi = β1 + β2xi + β3xi · 0 = β1 + β2xi
and we have an intercept β1 and a slope coefficient β2 for thegroup where D = 0.
di = 1:
yi = β1 + β2xi + β3xi · 1 = β1 + (β2 + β3)xi
and we have an intercept β1 and a slope coefficient β2 + β3 forthe group where D = 1.
So we fix the value of Y to be identical for the two groupsat the arbitrary point of X = 0.
interactionmodels
Categoricalindependentvariables
Dummyvariables
Multiplecategories
Interactionmodels
With dummyvariables
With multiplecategoryvariables
With continuousvariables
Interaction models and t-tests
yi = β1 + β2xi + β3di + β4xidi
So we can think of the following t-tests:
H0 : β2 = 0, so under the null, the slope of the line is zero, forthe group where D = 0.
H0 : β3 = 0, so under the null, the two groups have the sameintercept.
In a regression with an interaction with a dummy variable, thet-test for that coefficient tests whether, given the othervariables in the model, the slope for the two groups differ.
H0 : β4 = 0, so under the null, the two groups have the sameslope between X and Y .
So β2 and β3 are differences in intercepts, relative to whites; β5and β6 are differences in slopes, relative to whites and t-teststest whether intercepts or slopes, respectively, differ.
It is possible to interact two continuous variables. Here youexpect the effects of X on Y to gradually change as some thirdvariable Z changes.
yi = β1 + β2xi + β3zi + β4xizi ,
so when we take X as the key independent variable, we have:
Intercept: β1 + β3ziSlope: β2 + β4zi
Both intercept and slope change with Z . These types of modelsare typically somewhat difficult to interpret and there is nostatistical difference between whether the slope between X andY varies for different values of Z , or the slope between Z andY varies for different values of X . It requires a strong theoryon causal relations to be able to make sense of the results.