2. Linear regression with multiple regressors · Linear regression with multiple regressors ... 15 -60-40-20 0 20 40 60 ... Consider the multiple regression model in Deﬁnition 2.1

2. Linear regression with multiple regressors

Aim of this section:

• Introduction of the multiple regression model

• OLS estimation in multiple regression

• Measures-of-fit in multiple regression

• Assumptions in the multiple regression model

• Violations of the assumptions(omitted-variable bias, multicollinearity, heteroskedasticity,autocorrelation)

5

2.1. The multiple regression model

Intuition:

• A regression model specifies a functional (parametric) rela-tionship between a dependent (endogenous) variable Y anda set of k independent (exogenous) regressors X1, X2, . . . , Xk

• In a first step, we consider the linear multiple regressionmodel

6

Definition 2.1: (Multiple linear regression model)

The multiple (linear) regression model is given by

Yi = β0 + β1 ·X1i + β2 ·X2i + . . . + βk ·Xki + ui, (2.1)

i = 1, . . . , n, where

• Yi is the ith observation on the dependent variable,

• X1i, X2i, . . . , Xki are the ith observations on each of the kregressors,

• ui is the stochastic error term.

• The population regression line is the relationship that holdsbetween Y and the X’s on average:

E(Yi|X1i = x1, X2i = x2, . . . , Xki = xk) = β0+β1x1+. . .+βkxk.

7

Meaning of the coefficients:

• The intercept β0 is the expected value of Yi (for all i =1, . . . , n) when all X-regressors equal 0

• β1, . . . , βk are the slope coefficients on the respective regres-sors X1, . . . , Xk

• β1, for example, is the expected change in Yi resulting fromchanging X1i by one unit, holding constant X2i, . . . , Xki(and analogously β2, . . . , βk)

Definition 2.2: (Homoskedasticity, Heteroskedasticity)

The error term ui is called homoskedastic if the conditional vari-ance of ui given X1i, . . . , Xki, Var(ui|X1i, . . . , Xki), is constant fori = 1, . . . , n and does not depend on the values of X1i, . . . , Xki.Otherwise, the error term is called heteroskedastic.

8

Example 1: (Student performance)

• Regression of student performance (Y ) in n = 420 US-districts on distinct school characteristics (factors)

• Yi: average test score in the ith district (TEST SCORE)

• X1i: average class size in the ith district(measured by the student-teacher ratio, STR)

• X2i: percentage of English learners in the ith district (PCTEL)

• Expected signs of the coefficients:

β1 < 0

β2 < 0

9

Example 2: (House prices)

• Regression of house prices (Y ) recorded for n = 546 housessold in Windsor (Canada) on distinct housing characteristics

• Yi: sale price (in Canadian dollars) of the ith house (SALEPRICE)

• X1i: lot size (in square feet) of the ith property (LOTSIZE)

• X2i: number of bedrooms in the ith house (BEDROOMS)

• X3i: number of bathrooms in the ith house (BATHROOMS)

• X4i: number of storeys (excluding the basement) in the ith

house (STOREYS)

• Expected signs of the coefficients:β1, β2, β3, β4 > 0

10

2.2. The OLS estimator in multiple regression

Now:

• Estimation of the coefficients β0, β1, . . . , βk in the multipleregression model on the basis of n observations by applyingthe Ordinary Least Squares (OLS) technique

Idea:

• Let b0, b1, . . . , bk be estimators of β0, β1, . . . , βk

• We can predict Yi by b0 + b1X1i + . . . + bkXki

• The prediction error is Yi − b0 − b1X1i − . . .− bkXki

11

Idea: [continued]

• The sum of the squared prediction errors over all n observa-tions is

n∑

i=1(Yi − b0 − b1X1i − . . .− bkXki)

2 (2.2)

Definition 2.3: (OLS estimators, predicted values, residuals)

The OLS estimators β0, β1, . . . , βk are the values of b0, b1, . . . , bkthat minimize the sum of squared prediction errors (2.2). TheOLS predicted values Yi and residuals ui (for i = 1, . . . , n) are

Yi = β0 + β1X1i + . . . + βkXki (2.3)

and

ui = Yi − Yi. (2.4)

12

Remarks:

• The OLS estimators β0, β1, . . . , βk and the residuals ui arecomputed from a sample of n observations of (X1i, . . . , Xki, Yi)for i = 1, . . . , n

• They are estimators of the unknown true population coeffi-cients β0, β1, . . . , βk and ui

• There are closed-form formulas for calculating the OLS es-timates from the data(see the lectures Econometrics I+II)

• In this lecture, we use the software-package EViews

13

Regression estimation results (EViews) for the student-performance dataset

14

Dependent Variable: TEST_SCORE Method: Least Squares Date: 07/02/12 Time: 16:29 Sample: 1 420 Included observations: 420

Variable Coefficient Std. Error t-Statistic Prob.

C 686.0322 7.411312 92.56555 0.0000STR -1.101296 0.380278 -2.896026 0.0040

PCTEL -0.649777 0.039343 -16.51588 0.0000

R-squared 0.426431 Mean dependent var 654.1565Adjusted R-squared 0.423680 S.D. dependent var 19.05335S.E. of regression 14.46448 Akaike info criterion 8.188387Sum squared resid 87245.29 Schwarz criterion 8.217246Log likelihood -1716.561 Hannan-Quinn criter. 8.199793F-statistic 155.0136 Durbin-Watson stat 0.685575Prob(F-statistic) 0.000000

Predicted values Yi and residuals ui for the student-performance dataset

15

-60

-40

-20

0

20

40

60

600

620

640

660

680

700

720

50 100 150 200 250 300 350 400

Residual Actual Fitted

Regression estimation results (EViews) for the house-prices dataset

16

Dependent Variable: SALEPRICE Method: Least Squares Date: 07/02/12 Time: 16:50 Sample: 1 546 Included observations: 546


C -4009.550 3603.109 -1.112803 0.2663LOTSIZE 5.429174 0.369250 14.70325 0.0000

BEDROOMS 2824.614 1214.808 2.325153 0.0204BATHROOMS 17105.17 1734.434 9.862107 0.0000

STOREYS 7634.897 1007.974 7.574494 0.0000

R-squared 0.535547 Mean dependent var 68121.60Adjusted R-squared 0.532113 S.D. dependent var 26702.67S.E. of regression 18265.23 Akaike info criterion 22.47250Sum squared resid 1.80E+11 Schwarz criterion 22.51190Log likelihood -6129.993 Hannan-Quinn criter. 22.48790F-statistic 155.9529 Durbin-Watson stat 1.482942Prob(F-statistic) 0.000000

Predicted values Yi and residuals ui for the house-prices dataset

17

-80,000

-40,000

0

40,000

80,000

120,000

0

40,000

80,000

120,000

160,000

200,000

50 100 150 200 250 300 350 400 450 500

Residual Actual Fitted

OLS assumptions in the multiple regression model (2.1):

1. ui has conditional mean zero given X1i, X2i, . . . , Xki:

E(ui|X1i, X2i, . . . , Xki) = 0

2. (X1i, X2i, . . . , Xki, Yi), i = 1, . . . , n, are independently and iden-tically distributed (i.i.d.) draws from their joint distribution

3. Large outliers are unlikely: X1i, X2i, . . . , Xki and Yi have non-zero finite fourth moments

4. There is no perfect multicollinearity

Remarks:

• Note that we do not assume any specific parametric distri-bution for the ui

• The OLS assumptions imply specific distribution results

18

Theorem 2.4: (Unbiasedness, consistency, normality)

Given the OLS assumptions the following properties of the OLSestimators β0, β1, . . . , βk hold:

1. β0, β1, . . . , βk are unbiased estimators of β0, . . . , βk.

2. β0, β1, . . . , βk are consistent estimators of β0, . . . , βk.(Convergence in probability)

3. In large samples β0, β1, . . . , βk are jointly normally distributedand each single OLS estimator βj, j = 0, . . . , k, is normallydistributed with mean βj and variance σ2

βj, that is

βj ∼ N(βj, σ2βj

).

19

Remarks:

• In general, the OLS estimators are correlated

• This correlation among β0, β1, . . . , βk arises from the correla-tion among the regressors X1, . . . , Xk

• The sampling distribution of the OLS estimators will becomerelevant in Section 3(hypothesis-testing, confidence intervals)

20

2.3. Measures-of-fit in multiple regression

Now:

• Three well-known summary statistics that measure how wellthe OLS estimates fit the data

Standard error of regression (SER):

• The SER estimates the standard deviation of the error termui (under the assumption of homoskedasticity):

SER =

√

√

√

√

1n− k − 1

n∑

i=1u2

i

21

Standard error of regression: [continued]

• We denote the sum of squared residuals by SSR ≡∑n

i=1 u2i

so that

SER =

√

SSRn− k − 1

• Given the OLS assumptions and homoskedasticity the squaredSER, (SER)2, is an unbiased estimator of the unknown con-stant variance of the ui

• SER is a measure of the spread of the distribution of Yiaround the population regression line

• Both measures, SER and SSR, are reported in the EViewsregression output

22

R2:

• The R2 is the fraction of the sample variance of the Yi ex-plained by the regressors

• Equivalently, the R2 is 1 minus the fraction of the varianceof the Yi not explained by the regressors(i.e. explained by the residuals)

• Denoting the explained sum of squares (ESS) and the totalsum of squares (TSS) by

ESS =n

∑

i=1(Yi − Y )2 and TSS =

n∑

i=1(Yi − Y )2,

respectively, we define the R2 as

R2 =ESSTSS

= 1−SSRTSS

23

R2: [continued]

• In multiple regression, the R2 increases whenever an addi-tional regressor Xk+1 is added to the regression model, unlessthe estimated coefficient βk+1 is exactly equal to zero

• Since in practice it is extremely unusual to have exactlyβk+1 = 0, the R2 generally increases (and never decreases)when an new regressor is added to the regression model

−→ An increase in the R2 due to the inclusion of a new regressordoes not necessarily indicate an actually improved fit of themodel

24

Adjusted R2:

• The adjusted R2 (in symbols: R2), deflates the conventionalR2:

R2 = 1−n− 1

n− k − 1SSRTSS

• It is always true that R2 < R2

(why?)

• When adding a new regressor Xk+1 to the model, the R2

can increase or decrease(why?)

• The R2 can be negative(why?)

25

2.4. Omitted-variable bias

Now:

• Discussion of a phenomenon that implies violation of the firstOLS assumption on Slide 18

• This issue is known under the phrasing omitted-variable biasand is extremely relevant in practice

• Although theoretically easy to grasp, avoiding this specifi-cation problem turns out to be a nontrivial task in manyempirical applications

26

Definition 2.5: (Omitted-variable bias)

Consider the multiple regression model in Definition 2.1 on Slide7. Omitted-variable bias is the bias in the OLS estimator βj ofthe coefficient βj (for j = 1, . . . , k) that arises when the associ-ated regressor Xj is correlated with an omitted variable. Moreprecisely, for omitted-variable bias to occur, the following twoconditions must hold:

1. Xj is correlated with the omitted variable.

2. The omitted variable is a determinant of the dependent vari-able Y .

27

Example:

• Consider the house-prices dataset (Slides 16, 17)

• Using the entire set of regressors, we obtain the OLS esti-mate β2 = 2824.61 for the BEDROOMS-coefficient

• The correlation coefficients between the regressors are asfollows:

28

BEDROOMS BATHROOMS LOTSIZE STOREYS

BEDROOMS 1.000000 0.373769 0.151851 0.407974 BATHROOMS 0.373769 1.000000 0.193833 0.324066

LOTSIZE 0.151851 0.193833 1.000000 0.083675 STOREYS 0.407974 0.324066 0.083675 1.000000

Example: [continued]• There is positive (significant) correlation between the vari-

able BEDROOMS and all other regressors• Excluding the other variables from the regression yields the

following OLS-estimates:

−→ The alternative OLS-estimates of the BEDROOMS-coefficientdiffer substantially

29

Dependent Variable: SALEPRICE Method: Least Squares Date: 14/02/12 Time: 16:10 Sample: 1 546 Included observations: 546


C 28773.43 4413.753 6.519040 0.0000BEDROOMS 13269.98 1444.598 9.185932 0.0000

R-squared 0.134284 Mean dependent var 68121.60Adjusted R-squared 0.132692 S.D. dependent var 26702.67S.E. of regression 24868.03 Akaike info criterion 23.08421Sum squared resid 3.36E+11 Schwarz criterion 23.09997Log likelihood -6299.989 Hannan-Quinn criter. 23.09037F-statistic 84.38135 Durbin-Watson stat 0.811875Prob(F-statistic) 0.000000

Intuitive explanation of the omitted-variable bias:

• Consider the variable LOTSIZE as omitted

• LOTSIZE is an important variable for explaining SALEPRICE

• If we omit LOTSIZE in the regression, it will try to enter in theonly way it can, namely through its positive correlation withthe included variable BEDROOMS

−→ The coefficient on BEDROOMS will confound the effect of BED-

ROOMS and LOTSIZE on SALEPRICE

30

More formal explanation:

• Omitted-variable bias means that the first OLS assumptionon Slide 18 is violated

• Reasoning:In the multiple regression model the error term ui repre-sents all factors other than the included regressors X1, . . . , Xkthat are determinants of YiIf an omitted variable is correlated with at least one ofthe included regressors X1, . . . , Xk, then ui (which containsthis factor) is correlated with the set of regressors

−→ This implies that

E(ui|X1i, . . . , Xki) 6= 0

31

Important result:

• In the case of omitted-variable bias

the OLS estimators on the corresponding included regres-sors are biased in finite samples

this bias does not vanish in large samples

−→ the OLS estimators are inconsistent

Solutions to omitted-variable bias:

• To be discussed in Section 5

32

2.5. Multicollinearity

Definition 2.6: (Perfect multicollinearity)

Consider the multiple regression model in Definition 2.1 on Slide7. The regressors X1, . . . , Xk are said to be perfectly multi-collinear if one of the regressors is a perfect linear function ofthe other regressors.

Remarks:

• Under perfect multicollinearity the OLS estimates cannot becalculated due to division by zero in the OLS formulas

• Perfect multicollinearity often reflects a logical mistake inchoosing the regressors or some unrecognized feature in thedata set

33

Example: (Dummy variable trap)

• Consider the student-performance dataset

• Suppose we partition the school districts into the 3 categories(1) rural, (2) suburban, (3) urban

• We represent the categories by the dummy regressors

RURALi =

{

1 if district i is rural0 otherwise

and by SUBURBANi and URBANi analogously defined

• Since each district belongs to one and only one category, wehave for each district i:

RURALi + SUBURBANi + URBANi = 1

34

Example: [continued]

• Now, let us define the constant regressor X0 associated withthe intercept coefficient β0 in the multiple regression modelon Slide 7 by

X0i ≡ 1 for i = 1, . . . n

• Then, for i = 1, . . . , n, the following relationship holds amongthe regressors:

X0i = RURALi + SUBURBANi + URBANi

−→ Perfect multicollinearity

• To estimate the regression we must exclude either one of thedummy regressors or the constant regressor X0 (the interceptβ0) from the regression

35

Theorem 2.7: (Dummy variable trap)

Let there be G different categories in the data set representedby G dummy regressors. If

1. each observation i falls into one and only one category,

2. there is an intercept (constant regressor) in the regression,

3. all G dummy regressors are included as regressors,

then regression estimation fails because of perfect multicollinear-ity.

Usual remedy:

• Exclude one of the dummy regressors(G− 1 dummy regressors are sufficient)

36

Definition 2.8: (Imperfect multicollinearity)

Consider the multiple regression model in Definition 2.1 on Slide7. The regressors X1, . . . , Xk are said to be imperfectly multi-collinear if two or more of the regressors are highly correlated inthe sense that there is a linear function of the regressors that ishighly correlated with another regressor.

Remarks:

• Imperfect multicollinearity does not pose any (numeric) prob-lems in calculating OLS estimates

• However, if regressors are imperfectly multicollinear, then thecoefficients on at least one individual regressor will be impre-cisely estimated

37

Remarks: [continued]

• Techniques for identifying and mitigating imperfect multi-collinearity are presented in econometric textbooks(e.g. Hill et al., 2010, pp. 155-156)

38

2. Linear regression with multiple regressors · Linear regression with multiple regressors ... 15 -60-40-20 0 20 40 60 ... Consider the multiple regression model in Deﬁnition 2.1

Documents