Chapter 7 Using Indicator and Interaction Variables. Terry Dielman Applied Regression Analysis: A Second Course in Business and Economic Statistics, fourth edition. 7.1 Using and Interpreting Indicator Variables. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
7.1 Using and Interpreting Indicator Variables7.1 Using and Interpreting Indicator Variables
Suppose some observations Suppose some observations have a particular characteristic have a particular characteristic or attribute, while others do not.or attribute, while others do not.
We can include this information We can include this information in the regression model by using in the regression model by using dummy or indicator variables.dummy or indicator variables.
Add the info thru a coding schemeAdd the info thru a coding scheme
Use a binary (dummy) variable to “indicate” Use a binary (dummy) variable to “indicate” when the characteristic is presentwhen the characteristic is present
DDi i = 1= 1 if observation i has the attributeif observation i has the attribute
DDii = 0 = 0 if observation i does not have itif observation i does not have it
DDii = 1 if individual i is employed = 1 if individual i is employed
DDii = 0 if individual i is not employed = 0 if individual i is not employed
We could do it the other way and use We could do it the other way and use the "1" to indicate an unemployed the "1" to indicate an unemployed individual. individual.
Multiple CategoriesMultiple Categories For multiple categories, use multiple For multiple categories, use multiple
indicators.indicators.
For example, to indicate where a firm's For example, to indicate where a firm's stock is listed, we could define 3 indicator stock is listed, we could define 3 indicator variables; one each for the NYSE, AMEX and variables; one each for the NYSE, AMEX and NASDAQ.NASDAQ.
For computational reasons, we would For computational reasons, we would include only two of these in the regression.include only two of these in the regression.
Example 7.1 Employment DiscriminationExample 7.1 Employment Discrimination
If two groups have apparently different If two groups have apparently different salary structures, you first need to account salary structures, you first need to account for differences in education, training and for differences in education, training and experience before any claim of experience before any claim of discrimination can be made.discrimination can be made.
Regression analysis with an indicator Regression analysis with an indicator variable for the group is a way to variable for the group is a way to investigate this.investigate this.
The data set HARRIS7 contains The data set HARRIS7 contains information on the salaries of 93 information on the salaries of 93 employees of the Harris Trust and employees of the Harris Trust and Savings Bank. They were sued by Savings Bank. They were sued by the US Department of Treasury in the US Department of Treasury in 1981.1981.
Here we examine how salary depends Here we examine how salary depends on education, also accounting for on education, also accounting for gender.gender.
An Intercept AdjusterAn Intercept AdjusterFor an indicator variable, the For an indicator variable, the bbjj is not really a slope. is not really a slope.
To see this, evaluate the equation for the two To see this, evaluate the equation for the two groups.groups.
What if the Coding Was Different?What if the Coding Was Different?
If we had an indicator for females and If we had an indicator for females and used it, the equation would be:used it, the equation would be:
SALARY = 4865 + 80.7 EDUCAT - 692 FEMALES
The difference between the groups is The difference between the groups is the same. For females, the intercept the same. For females, the intercept in the equation is 4865 – 692 = 4173in the equation is 4865 – 692 = 4173
This implies the difference between Region 3 This implies the difference between Region 3 (MW) and Region 2 (W) = b(MW) and Region 2 (W) = b33 = 119 = 119
And the difference between Region 2 (W) And the difference between Region 2 (W) and Region 1 (S) is also 119and Region 1 (S) is also 119
The sales differences may not be equal but The sales differences may not be equal but this this forcesforces them to be estimated that way them to be estimated that way
Because location is measured by two Because location is measured by two variables in a group, we need to do a variables in a group, we need to do a partial F test.partial F test.
The full Model has ADV, BONUS, The full Model has ADV, BONUS, SOUTH and WEST and has RSOUTH and WEST and has R2 2 = 94.7 = 94.7
The reduced model has only ADV and The reduced model has only ADV and BONUS, with RBONUS, with R22 = 85.5 = 85.5
Another type of variable used in Another type of variable used in regression models is an interaction regression models is an interaction variable.variable.
This is usually formulated as the This is usually formulated as the product of two variables; for product of two variables; for example, example, xx33 = = xx11xx22
With this variable in the model, it With this variable in the model, it means the level of means the level of xx22 changes how changes how xx11 affects Yaffects Y
With two With two xx variables the model is: variables the model is:
If we factor out If we factor out xx11 we get:we get:
so each value of so each value of xx22 yields a different slope in yields a different slope in the relationship between the relationship between y y and and xx11
Interaction Involving an IndicatorInteraction Involving an IndicatorIf one of the two variables is binary, the If one of the two variables is binary, the
interaction produces a model with two interaction produces a model with two different slopes.different slopes.
Example 7.4 Discrimination (again)Example 7.4 Discrimination (again)
In the Harris Bank case, suppose we In the Harris Bank case, suppose we suspected that the salary difference suspected that the salary difference by gender changed with different by gender changed with different levels of education.levels of education.
To investigate this, we created a new To investigate this, we created a new variable MSLOPE = EDUCAT*MALES variable MSLOPE = EDUCAT*MALES and added it to the model.and added it to the model.
Tests in This ModelTests in This Model Although the slope adjuster implies Although the slope adjuster implies
the salary gap increases with the salary gap increases with education, this effect is not really education, this effect is not really significant (tsignificant (tMSLOPEMSLOPE = 1.16). = 1.16).
The overall affect of gender is now The overall affect of gender is now contained in two variables, so a contained in two variables, so a partial F test would be needed to test partial F test would be needed to test for differences between male and for differences between male and female salaries.female salaries.
7.3 Seasonal Effects in Time Series Regression7.3 Seasonal Effects in Time Series Regression
Data collected over time (say quarterly)Data collected over time (say quarterly)
If we think the Y variable depends on the If we think the Y variable depends on the calendar can do a kind of “seasonal calendar can do a kind of “seasonal adjustment” by adding quarter dummiesadjustment” by adding quarter dummies
Q1 = 1 if this was first quarter, Q2 = 1 if a Q1 = 1 if this was first quarter, Q2 = 1 if a second quarter, Q3 = 1 if thirdsecond quarter, Q3 = 1 if third
Don’t use Q4 since that is the “base”Don’t use Q4 since that is the “base”
Example 7.5 ABX Company SalesExample 7.5 ABX Company Sales
We fit a trend to these sales in We fit a trend to these sales in Example 3.11 by regressing sales on Example 3.11 by regressing sales on a time index variable.a time index variable.
Because this company sells winter Because this company sells winter sports merchandise, including sports merchandise, including seasonal effects should markedly seasonal effects should markedly improve the fit.improve the fit.
Are the Seasonal Effects Significant?Are the Seasonal Effects Significant?
The strong t-ratios for Q2 and Q3 say "yes" The strong t-ratios for Q2 and Q3 say "yes" and the model Rand the model R22 increased by 17.6% increased by 17.6% when we added the seasonal indicators.when we added the seasonal indicators.
With evidence this strong we probably With evidence this strong we probably don't need to test further.don't need to test further.
In general, however, we would need In general, however, we would need another partial F test to see if the overall another partial F test to see if the overall seasonal effect is significant.seasonal effect is significant.