2. Linear regression with multiple regressors Aim of this section: • Introduction of the multiple regression model • OLS estimation in multiple regression • Measures-of-fit in multiple regression • Assumptions in the multiple regression model • Violations of the assumptions (omitted-variable bias, multicollinearity, heteroskedasticity, autocorrelation) 5
34
Embed
2. Linear regression with multiple regressors · Linear regression with multiple regressors ... 15 -60-40-20 0 20 40 60 ... Consider the multiple regression model in Definition 2.1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
2. Linear regression with multiple regressors
Aim of this section:
• Introduction of the multiple regression model
• OLS estimation in multiple regression
• Measures-of-fit in multiple regression
• Assumptions in the multiple regression model
• Violations of the assumptions(omitted-variable bias, multicollinearity, heteroskedasticity,autocorrelation)
5
2.1. The multiple regression model
Intuition:
• A regression model specifies a functional (parametric) rela-tionship between a dependent (endogenous) variable Y anda set of k independent (exogenous) regressors X1, X2, . . . , Xk
• In a first step, we consider the linear multiple regressionmodel
6
Definition 2.1: (Multiple linear regression model)
The multiple (linear) regression model is given by
• The intercept β0 is the expected value of Yi (for all i =1, . . . , n) when all X-regressors equal 0
• β1, . . . , βk are the slope coefficients on the respective regres-sors X1, . . . , Xk
• β1, for example, is the expected change in Yi resulting fromchanging X1i by one unit, holding constant X2i, . . . , Xki(and analogously β2, . . . , βk)
The error term ui is called homoskedastic if the conditional vari-ance of ui given X1i, . . . , Xki, Var(ui|X1i, . . . , Xki), is constant fori = 1, . . . , n and does not depend on the values of X1i, . . . , Xki.Otherwise, the error term is called heteroskedastic.
8
Example 1: (Student performance)
• Regression of student performance (Y ) in n = 420 US-districts on distinct school characteristics (factors)
• Yi: average test score in the ith district (TEST SCORE)
• X1i: average class size in the ith district(measured by the student-teacher ratio, STR)
• X2i: percentage of English learners in the ith district (PCTEL)
• Expected signs of the coefficients:
β1 < 0
β2 < 0
9
Example 2: (House prices)
• Regression of house prices (Y ) recorded for n = 546 housessold in Windsor (Canada) on distinct housing characteristics
• Yi: sale price (in Canadian dollars) of the ith house (SALEPRICE)
• X1i: lot size (in square feet) of the ith property (LOTSIZE)
• X2i: number of bedrooms in the ith house (BEDROOMS)
• X3i: number of bathrooms in the ith house (BATHROOMS)
• X4i: number of storeys (excluding the basement) in the ith
house (STOREYS)
• Expected signs of the coefficients:β1, β2, β3, β4 > 0
10
2.2. The OLS estimator in multiple regression
Now:
• Estimation of the coefficients β0, β1, . . . , βk in the multipleregression model on the basis of n observations by applyingthe Ordinary Least Squares (OLS) technique
Idea:
• Let b0, b1, . . . , bk be estimators of β0, β1, . . . , βk
• We can predict Yi by b0 + b1X1i + . . . + bkXki
• The prediction error is Yi − b0 − b1X1i − . . .− bkXki
11
Idea: [continued]
• The sum of the squared prediction errors over all n observa-tions is
The OLS estimators β0, β1, . . . , βk are the values of b0, b1, . . . , bkthat minimize the sum of squared prediction errors (2.2). TheOLS predicted values Yi and residuals ui (for i = 1, . . . , n) are
Yi = β0 + β1X1i + . . . + βkXki (2.3)
and
ui = Yi − Yi. (2.4)
12
Remarks:
• The OLS estimators β0, β1, . . . , βk and the residuals ui arecomputed from a sample of n observations of (X1i, . . . , Xki, Yi)for i = 1, . . . , n
• They are estimators of the unknown true population coeffi-cients β0, β1, . . . , βk and ui
• There are closed-form formulas for calculating the OLS es-timates from the data(see the lectures Econometrics I+II)
• In this lecture, we use the software-package EViews
13
Regression estimation results (EViews) for the student-performance dataset
14
Dependent Variable: TEST_SCORE Method: Least Squares Date: 07/02/12 Time: 16:29 Sample: 1 420 Included observations: 420
Variable Coefficient Std. Error t-Statistic Prob.
C 686.0322 7.411312 92.56555 0.0000STR -1.101296 0.380278 -2.896026 0.0040
PCTEL -0.649777 0.039343 -16.51588 0.0000
R-squared 0.426431 Mean dependent var 654.1565Adjusted R-squared 0.423680 S.D. dependent var 19.05335S.E. of regression 14.46448 Akaike info criterion 8.188387Sum squared resid 87245.29 Schwarz criterion 8.217246Log likelihood -1716.561 Hannan-Quinn criter. 8.199793F-statistic 155.0136 Durbin-Watson stat 0.685575Prob(F-statistic) 0.000000
Predicted values Yi and residuals ui for the student-performance dataset
15
-60
-40
-20
0
20
40
60
600
620
640
660
680
700
720
50 100 150 200 250 300 350 400
Residual Actual Fitted
Regression estimation results (EViews) for the house-prices dataset
16
Dependent Variable: SALEPRICE Method: Least Squares Date: 07/02/12 Time: 16:50 Sample: 1 546 Included observations: 546
Variable Coefficient Std. Error t-Statistic Prob.
C -4009.550 3603.109 -1.112803 0.2663LOTSIZE 5.429174 0.369250 14.70325 0.0000
Given the OLS assumptions the following properties of the OLSestimators β0, β1, . . . , βk hold:
1. β0, β1, . . . , βk are unbiased estimators of β0, . . . , βk.
2. β0, β1, . . . , βk are consistent estimators of β0, . . . , βk.(Convergence in probability)
3. In large samples β0, β1, . . . , βk are jointly normally distributedand each single OLS estimator βj, j = 0, . . . , k, is normallydistributed with mean βj and variance σ2
βj, that is
βj ∼ N(βj, σ2βj
).
19
Remarks:
• In general, the OLS estimators are correlated
• This correlation among β0, β1, . . . , βk arises from the correla-tion among the regressors X1, . . . , Xk
• The sampling distribution of the OLS estimators will becomerelevant in Section 3(hypothesis-testing, confidence intervals)
20
2.3. Measures-of-fit in multiple regression
Now:
• Three well-known summary statistics that measure how wellthe OLS estimates fit the data
Standard error of regression (SER):
• The SER estimates the standard deviation of the error termui (under the assumption of homoskedasticity):
SER =
√
√
√
√
1n− k − 1
n∑
i=1u2
i
21
Standard error of regression: [continued]
• We denote the sum of squared residuals by SSR ≡∑n
i=1 u2i
so that
SER =
√
SSRn− k − 1
• Given the OLS assumptions and homoskedasticity the squaredSER, (SER)2, is an unbiased estimator of the unknown con-stant variance of the ui
• SER is a measure of the spread of the distribution of Yiaround the population regression line
• Both measures, SER and SSR, are reported in the EViewsregression output
22
R2:
• The R2 is the fraction of the sample variance of the Yi ex-plained by the regressors
• Equivalently, the R2 is 1 minus the fraction of the varianceof the Yi not explained by the regressors(i.e. explained by the residuals)
• Denoting the explained sum of squares (ESS) and the totalsum of squares (TSS) by
ESS =n
∑
i=1(Yi − Y )2 and TSS =
n∑
i=1(Yi − Y )2,
respectively, we define the R2 as
R2 =ESSTSS
= 1−SSRTSS
23
R2: [continued]
• In multiple regression, the R2 increases whenever an addi-tional regressor Xk+1 is added to the regression model, unlessthe estimated coefficient βk+1 is exactly equal to zero
• Since in practice it is extremely unusual to have exactlyβk+1 = 0, the R2 generally increases (and never decreases)when an new regressor is added to the regression model
−→ An increase in the R2 due to the inclusion of a new regressordoes not necessarily indicate an actually improved fit of themodel
24
Adjusted R2:
• The adjusted R2 (in symbols: R2), deflates the conventionalR2:
R2 = 1−n− 1
n− k − 1SSRTSS
• It is always true that R2 < R2
(why?)
• When adding a new regressor Xk+1 to the model, the R2
can increase or decrease(why?)
• The R2 can be negative(why?)
25
2.4. Omitted-variable bias
Now:
• Discussion of a phenomenon that implies violation of the firstOLS assumption on Slide 18
• This issue is known under the phrasing omitted-variable biasand is extremely relevant in practice
• Although theoretically easy to grasp, avoiding this specifi-cation problem turns out to be a nontrivial task in manyempirical applications
26
Definition 2.5: (Omitted-variable bias)
Consider the multiple regression model in Definition 2.1 on Slide7. Omitted-variable bias is the bias in the OLS estimator βj ofthe coefficient βj (for j = 1, . . . , k) that arises when the associ-ated regressor Xj is correlated with an omitted variable. Moreprecisely, for omitted-variable bias to occur, the following twoconditions must hold:
1. Xj is correlated with the omitted variable.
2. The omitted variable is a determinant of the dependent vari-able Y .
27
Example:
• Consider the house-prices dataset (Slides 16, 17)
• Using the entire set of regressors, we obtain the OLS esti-mate β2 = 2824.61 for the BEDROOMS-coefficient
• The correlation coefficients between the regressors are asfollows:
Example: [continued]• There is positive (significant) correlation between the vari-
able BEDROOMS and all other regressors• Excluding the other variables from the regression yields the
following OLS-estimates:
−→ The alternative OLS-estimates of the BEDROOMS-coefficientdiffer substantially
29
Dependent Variable: SALEPRICE Method: Least Squares Date: 14/02/12 Time: 16:10 Sample: 1 546 Included observations: 546
Variable Coefficient Std. Error t-Statistic Prob.
C 28773.43 4413.753 6.519040 0.0000BEDROOMS 13269.98 1444.598 9.185932 0.0000
R-squared 0.134284 Mean dependent var 68121.60Adjusted R-squared 0.132692 S.D. dependent var 26702.67S.E. of regression 24868.03 Akaike info criterion 23.08421Sum squared resid 3.36E+11 Schwarz criterion 23.09997Log likelihood -6299.989 Hannan-Quinn criter. 23.09037F-statistic 84.38135 Durbin-Watson stat 0.811875Prob(F-statistic) 0.000000
Intuitive explanation of the omitted-variable bias:
• Consider the variable LOTSIZE as omitted
• LOTSIZE is an important variable for explaining SALEPRICE
• If we omit LOTSIZE in the regression, it will try to enter in theonly way it can, namely through its positive correlation withthe included variable BEDROOMS
−→ The coefficient on BEDROOMS will confound the effect of BED-
ROOMS and LOTSIZE on SALEPRICE
30
More formal explanation:
• Omitted-variable bias means that the first OLS assumptionon Slide 18 is violated
• Reasoning:In the multiple regression model the error term ui repre-sents all factors other than the included regressors X1, . . . , Xkthat are determinants of YiIf an omitted variable is correlated with at least one ofthe included regressors X1, . . . , Xk, then ui (which containsthis factor) is correlated with the set of regressors
−→ This implies that
E(ui|X1i, . . . , Xki) 6= 0
31
Important result:
• In the case of omitted-variable bias
the OLS estimators on the corresponding included regres-sors are biased in finite samples
this bias does not vanish in large samples
−→ the OLS estimators are inconsistent
Solutions to omitted-variable bias:
• To be discussed in Section 5
32
2.5. Multicollinearity
Definition 2.6: (Perfect multicollinearity)
Consider the multiple regression model in Definition 2.1 on Slide7. The regressors X1, . . . , Xk are said to be perfectly multi-collinear if one of the regressors is a perfect linear function ofthe other regressors.
Remarks:
• Under perfect multicollinearity the OLS estimates cannot becalculated due to division by zero in the OLS formulas
• Perfect multicollinearity often reflects a logical mistake inchoosing the regressors or some unrecognized feature in thedata set
33
Example: (Dummy variable trap)
• Consider the student-performance dataset
• Suppose we partition the school districts into the 3 categories(1) rural, (2) suburban, (3) urban
• We represent the categories by the dummy regressors
RURALi =
{
1 if district i is rural0 otherwise
and by SUBURBANi and URBANi analogously defined
• Since each district belongs to one and only one category, wehave for each district i:
RURALi + SUBURBANi + URBANi = 1
34
Example: [continued]
• Now, let us define the constant regressor X0 associated withthe intercept coefficient β0 in the multiple regression modelon Slide 7 by
X0i ≡ 1 for i = 1, . . . n
• Then, for i = 1, . . . , n, the following relationship holds amongthe regressors:
X0i = RURALi + SUBURBANi + URBANi
−→ Perfect multicollinearity
• To estimate the regression we must exclude either one of thedummy regressors or the constant regressor X0 (the interceptβ0) from the regression
35
Theorem 2.7: (Dummy variable trap)
Let there be G different categories in the data set representedby G dummy regressors. If
1. each observation i falls into one and only one category,
2. there is an intercept (constant regressor) in the regression,
3. all G dummy regressors are included as regressors,
then regression estimation fails because of perfect multicollinear-ity.
Usual remedy:
• Exclude one of the dummy regressors(G− 1 dummy regressors are sufficient)
36
Definition 2.8: (Imperfect multicollinearity)
Consider the multiple regression model in Definition 2.1 on Slide7. The regressors X1, . . . , Xk are said to be imperfectly multi-collinear if two or more of the regressors are highly correlated inthe sense that there is a linear function of the regressors that ishighly correlated with another regressor.
Remarks:
• Imperfect multicollinearity does not pose any (numeric) prob-lems in calculating OLS estimates
• However, if regressors are imperfectly multicollinear, then thecoefficients on at least one individual regressor will be impre-cisely estimated
37
Remarks: [continued]
• Techniques for identifying and mitigating imperfect multi-collinearity are presented in econometric textbooks(e.g. Hill et al., 2010, pp. 155-156)