Basic Metrics BM

Basic EconometricsEconometric Analysis (56268)

Dr. Keshab Bhattarai

Hull Univ. Business School

February 7, 2011

Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 1 / 243

Algebra of Matrices

A =a11 a12a21 a22

; B =

b11 b12b21 b22

; C =

c11 c12c21 c22

;

Addition:

A+ B =a11 a12a21 a22

+

b11 b12b21 b22

=

a11 + b11 a12 + b12a21 + b21 a22 + b22

(1)

Subtraction:

A B =a11 a12a21 a22

b11 b12b21 b22

=

a11 b11 a12 b12a21 b21 a22 b22

(2)

Multiplication:

AB =a11 a12a21 a22

b11 b12b21 b22

=

a11b11 + a12b21 a11b12 + a12b22a21b11 + a22b21 a21b12 + a22b22

(3)


Determinant and Transpose of Matrices

Determinant of A

jAj = a11 a12a21 a22

= (a11a22 a21a12) ; (4)

Determinant of B jB j = b11 b12b21 b22

= (b11b22 b21b12)Determinant of C jC j =

c11 c12c21 c22

= (c11c22 c21c12)Transposes of A, B and C

A0 =a11 a21a12 a22

; B 0 =

b11 b21b12 b22

; C 0 =

c11 c21c12 c22

(5)

Singular matrix jD j = 0. non-singular matrix jD j 6= 0.


Inverse of A

A1 =a11 a12a21 a22

1=

1jAjadj (A) (6)

adj (A) = C 0 (7)

For C cofactor matrix. For this cross the row and column correspondingto an element and multiply by (1)i+j

C =ja22j ja21j ja12j ja11j

=

a22 a21a12 a11

(8)

C 0 =a22 a21a12 a11

0=

a22 a12a21 a11

(9)


Inverse of A

A1 =1

(a11a22 a21a12)

a22 a12a21 a11

=

"a22

(a11a22a21a12) a12(a11a22a21a12)

a21(a11a22a21a12)

a11(a11a22a21a12)

#(10)

Exercise: Find B1.


Matrix Algebra

Market 1:

X d1 = 10 2p1 + p2 (11)

X S1 = 2+ 3p1 (12)

Market 2:X d2 = 15+ p1 p2 (13)

X S2 = 1+ 2p2 (14)

X d1 = XS1 implies 10 2p1 + p2 = 2+ 3p1

X d1 = XS1 implies 15+ p1 p2 = 1+ 2p2

5 11 3

p1p2

=

1216

(15)


Application of Matrix in Solving Equations

p1p2

=

5 11 3

1 1216

(16)

jAj = a11 a12a21 a22

= 5 11 3

= (5 3 (1) (1)) = 15 1 =14;

C 0 =a22 a21a12 a11

0=

a22 a12a21 a11

=

3 11 5

p1p2

=

114

3 11 5

1216

=

114

(3 12) + (1 16)(1 12) + (5 16)

=

52149214

=

267467

(17)


Cramers Rule

p1 =

12 116 3

5 11 3

=36+ 1615 1 =

267; p2 =

5 121 16

5 11 3

=80+ 1215 1 =

467

(18)Market 1:

LHS = 10 2p1 + p2 = 10 2267

+

467

=647= 2+ 3p1 = RHS

(19)Market 2:

LHS = 15+ p1 p2 = 15+267 467=857= 1+ 2p2 =

857= RHS

(20)QED.Extension to N-markets is obvious; a condence for solving large models.


Spectral Decomposition of a Matrix

jA λI j = 5 11 3

λ 00 λ

= 5 λ 11 3 λ

= 0 (21)

λ is Eigen value

(5 λ) (3 λ) 1 = 0 (22)

15 5λ 3λ+ λ2 1 = 0 or

λ2 8λ+ 14 = 0 (23)

Eigen values

λ1,λ2 =8

p82 4 142

=8

p8

2=8 2.83

2= 5.4, 2.6 (24)

5 λ 11 3 λ

x1x2

=

00

(25)


Spectral Decomposition of a Matrix

for λ1 = 5.4 5 5.4 11 3 5.4

x1x2

=

00

(26)

0.4 11 2.4

x1x2

=

00

(27)

x2 = 0.4x1Normalisation

x21 + x22 = 1 ; x21 + (0.4x1)

2 = 1 (28)

1.16x21 = 1; x21 =

11.16

; x1 =p0.862 = 0.928 (29)

x2 = 0.4x1 = 0.4 (0.928) = 0.371 (30)


Eigenvector 1

V1 =x1x2

=

0.9280.371

(31)

λ2 = 2.6 5 2.6 11 3 2.6

x1x2

=

00

(32)

2.4 11 0.4

x1x2

=

00

(33)

x2 = 2.4x1

x21 + x22 = 1; x21 + (2.4x1)

2 = 1 (34)

6.76x21 = 1; x21 =

16.76

; x1 =p0.129 = 0.373 (35)

x2 = 2.4 x1 = 2.4 (0.373) = 0.895 (36)Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 11 / 243

Eigenvector 2

V2 =x1x2

=

0.3730.895

(37)

Orthogonality (Required for GLS)

(V1)0(V2) = 0 (38)

V1 =x1x2

=

0.9280.371

(39)

V2 =x1x2

=

0.3730.895

(40)

0.928 0.371

0.3730.895

= 0.346 0.332 0 (41)


Orthogonality (Required for GLS)

(V1)0(V2) = 0 (42)

0.928 0.371

0.3730.895

= 0.346 0.332 0 (43)

(V1V2)0(V1V2) = (V1V2) (V1V2)

0= I (44)

0.9280.371

0.3730.895

0.9280.371

0.3730.895

T=

1 00 1

(45)


Diagonalisation, Trace of Matrix

Inverse of an orthogonal matrix equals its transpose Q =V01V2

Q1 = Q 0 (46)

Q 0AQ = Λ (47)

Λ =

2664λ1 0 .. 00 λ2 .. 0: : : :0 0 .. λn

3775 (48)

n

∑i=1

λi =n

∑i=1aii (49)

jAj = λ1λ2....λn (50)


Quadratic forms, Positive and Negative Denite Matrices

quadratic form

q (x) = (x1x2)a11 a12a21 a22

x1x2

(51)

Positive denite matrix (matrix with all positive eigen values)

q (x) = x0Ax > 0 (52)

Positive semi-denite matrix

q (x) = x0Ax > 0 (53)

Negative denite matrix (matrix with all negative eigen values)

q (x) = x0Ax < 0 (54)

Negative semi-denite matrix

q (x) = x0Ax 0 (55)


Generalised Least Square

Take a regression

Y = X β+ e (56)

Assumption of homoskedasticity and no autocorrelation are violated

var (εi ) 6= σ2 for 8 i (57)

covar (εi εj ) 6= 0 (58)

The variance covariance of error is given by

Ω = Eee 0=

2664σ21 σ12 .. σ1nσ21 σ22 .. σ2n

: : : :σn1 σn2 .. σ2n

3775 (59)

Q 0ΩQ = Λ (60)



Ω = QΛQ 0 = QΛ12 Λ

12Q 0 (61)

P = QΛ12 (62)

P 0ΩP = I ; P 0P = Ω1 (63)

Transform the model

PY = βPX + Pe (64)

Y = βX + e (65)

Y = PY X = PX and e = PeβGLS = (X

0P 0PX )1 (X 0P 0PY )

βGLS =X 0Ω1X

1 X 0Ω1Y

(66)


Regression Model

Consider a linear regression model:

Yi = β1 + β2Xi + εi i = 1...N (67)

Errors represent all missing elements from this relationship; plus andminuses cancel out. Mean of error is zero; E (εi ) = 0.

εi N0, σ2

(68)

Normal equations of above regression:

∑Yi = bβ1N + bβ2 ∑Xi (69)

∑YiXi = bβ1 ∑Xi + bβ2 ∑X 2i (70)




Ordinary Least Square (OLS): Assumptions

List the OLS assumptions on error terms ei .Normality of Errors:

E (εi ) = 0 (71)

Homoskedasticity:

var (εi ) = σ2 for 8 i (72)

No autocorrelation:

covar (εi εj ) = 0 (73)

Indendence of errors from depenent variables:

covar (εiXi ) = 0 (74)


Derivation of normal equations for the OLS estimators

Choose bβ1 and bβ2 to minimise sum of square errors:

Min Sbβ1bβ2 = ∑ ε2i = ∑Yi bβ1 bβ2Xi2 (75)

First order conditions:∂S

∂bβ1 = 0; ∂S

∂bβ2 = 0; (76)

∑Yi bβ1 bβ2Xi (1) = 0 (77)

∑Yi bβ1 bβ2Xi (Xi ) = 0 (78)

∑Yi = bβ1N + bβ2 ∑Xi (79)

∑YiXi = bβ1 ∑Xi + bβ2 ∑X 2i (80)

There are two unknows bβ1, bβ2 and two equations. One way to nd bβ1 , bβ2is to use subtitution and reduced form method.


Slope estimator by the reduced form equation method

Multiply the second equation by N and rst by ∑Xi

∑Xi ∑Yi = bβ1N∑Xi + bβ2 ∑Xi2

(81)

N∑YiXi = bβ1N∑Xi + bβ2N∑X 2i (82)

By subtraction this reduces to

∑Xi ∑Yi N∑YiXi = bβ2 ∑Xi2 bβ2 ∑X 2i (83)

bβ2 = ∑Xi ∑Yi N ∑YiXi(∑Xi )

2 N ∑X 2i=

∑ xiyi∑ x2i

(84)

This is the OLS Estimator of bβ2, the slope parameter.Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 23 / 243

Intercept estimator by the reduced form equation method

When bβ2 is known it is easy to nd bβ1 by averaging out the regressionYi = β1 + β2Xi + εi as:

bβ1 = Y bβ2X (85)

Proof:∑Xi ∑YiN ∑YiXi(∑Xi )

2N ∑X 2i= ∑ xi yi

∑ x 2i;

LHS =∑Xi ∑Yi N ∑YiXi(∑Xi )

2 N ∑X 2i=NXNY N ∑YiXiNX

2 N ∑X 2i

=NXNY N ∑YiXiNX

2 N ∑X 2i=NXY ∑YiXiNX

2 ∑X 2i=

∑YiXi NXY

∑X 2i NX2

=

∑Yi - Y

∑Xi - X

∑Xi - X

2 =∑ xiyi∑ x2i

= RHS (86)


Normal equations in matrix form and OLS Estimators

Y = XB + e (87)

∑Yi = bβ1N + bβ2 ∑Xi (88)

∑YiXi = bβ1 ∑Xi + bβ2 ∑X 2i (89)

∑Yi

∑YiXi

=

N ∑Xi

∑Xi ∑X 2i

" bβ1bβ2#

(90)

OLS Estimators:

" bβ1bβ2#=

N ∑Xi

∑Xi ∑X 2i

1 ∑Yi

∑YiXi

; bβ = X 0

X1

X 0Y (91)


Analysis of Variance

var ( Yi ) = ∑Yi Y i

2= ∑

h bYi Y i + beii2= ∑

bYi Y i2 +∑be2i + 2∑bYi Y ibei

∑Yi Y i

2= ∑

bYi Y i2 +∑be2i (92)

TSS = RSS + ESS (93)

For N observations and K explanatory variables[Total variation] = [Explained variation] + [Residual variation]df = N-1 K-1 N-K


Relation between Rsquare and Rbarsquare

Prove that two forms R2= 1 (1 R2) N1NK or R

2= R2 N1NK

K1NK are

equivalentProof

LHS = R2= 1 (1 R2) N 1

N K = R2 +1 R2

1 R2

N 1N K

= R2 1 R2

N 1N K 1

= R2 +

1 R2

N 1N +KN K

= R2

1 R2

K 1N K

= R2 + R2

K 1N K

K 1N K

= R21+

K 1N K

K 1N K

= R2N K +K 1

N K

K 1N K = R2

N 1N K

K 1N K

RHS ; QED (94)


Linearity of slope and intercept parameters (pages 42-46)

Consider a linear regression

Yi = β1 + β2Xi + εi i = 1 ...N (95)

εi~N0, σ2

(96)

Intercept and slopes are linear on dependent varibales

bβ2 = ∑ xiyi∑ x2i

= ∑wiyi (97)

Where wi =xi

∑ x 2iis a constant.

bβ1 = Y bβ2X =∑ yiN

X ∑wiyi (98)

Thus .bβ2 and bβ1 are linear on yiDr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 28 / 243

Unbiasedness of Intercept Parameter

bβ1 = Y bβ2X =∑ yiN

X ∑wiyi ; wi =xi

∑ x2i(99)

Ebβ1 = E

∑ (β1 + β2Xi + εi )

N

E

bβ2X (100)

Ebβ1 = E

Nβ1N

+β2 ∑XiN

+∑ εiN

E

bβ2X (101)

Ebβ1 = β1 + β2X E

bβ2X (102)Ebβ1 β1

= X

β2 E

bβ2 (103)

Ebβ1 = β1 (104)


Unbiasedness of Slope Parameter


= ∑wiyi ; wi =xi

∑ x2i(105)

Ebβ2 = E ∑wiyi

= E ∑wi (β1 + β2Xi + εi ) (106)

Ebβ2 = β1E

∑wi

+ β2E

∑wixi

+ E

∑wi εi

(107)

Ebβ2 = β2 (108)


Minimum Variance of Slope Parameter

Ebβ2 = ∑wiyi (109)

Varbβ2 = var

"∑Xi X

∑Xi X

2#var (yi ) =

1

∑ x2ibσ2 (110)

Take b2 any other linear and unbiased estimator. Then need to prove thatvar(b2) > var(bβ2)

E (b2) = ∑ kiyi ki = wi + ci (111)

E (b2) = E∑ ki (β1 + β2Xi + εi )

= (112)

E

∑wi (β1 + β2Xi + εi ) +∑ ci (β1 + β2Xi + εi )(113)

E (b2) = Eβ1 ∑wi + β2 ∑wixi +∑wi εi + β1 ∑ ci + β2 ∑ cixi +∑ ci εi

(114)

E (b2) = β2 (115)


Minimum Variance of Slope Parameter (cont.)

E (b2) = β2 (116)

var (b2) = E [b2 β2]2 = E

∑ ki εi

2= E

∑ (wi + ci ) εi

2(117)

var (b2) =1

∑ x2ibσ2 + bσ2 ∑ c2i = var(bβ2) + bσ2 ∑ c2i (118)

var(b2) > var(bβ2) (119)

QED. Thus the OLS slope parameter is the best, linear andunbiased (BLUE).Similar proof can be applied for Var

bβ1 .Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 32 / 243

Consistency of OLS Estimator: Large Sample orAssymptotic Property

Varbβ2 = 1

∑ x2ibσ2 (120)

Varbβ2

lim N ! ∞

=bσ2N

∑ x 2iN

= 0

lim N ! ∞

(121)


Covariance between Slope and Intercept Parameters

covbβ1, bβ2 = E

bβ1 E bβ1 bβ2 E bβ2= E

bβ1 β1

bβ2 β2

= XE

bβ2 β2

2*

Ebβ1 β1

= X

β2 E

bβ2 (122)

= X 1

∑ x2ibσ2 (123)

.


Regression in matrixLet Y is N 1 vector of dependent variables, X is N K matrix ofexplanatory variables, e is N 1 vector of independently and identicallydistributed normal random variable with mean equal to zero and a constantvariance e~N(0, σ2I ); β is a K 1 vector of unknown coe¢ cients

Y = βX + e (124)

Objective is to minimise sum square errors

MinβS (β) = e 0e = (Y βX )0 (Y βX )

= Y 0Y Y 0 (βX ) (βX )0 Y + (βX )0 (βX ) (125)

= Y 0Y 2βX 0X + (βX )0 (βX ) (126)


First order condition in Matrix Method

∂S (β)∂β

= 2X 0Y + 2bβX 0X = 0 (127)

=) bβ = X 0X 1 X 0Y (128)


Blue Property in Matrix: Linearity and Unbiasedness

bβ = X 0X 1 X 0Y (129)

bβ = aY ; a = X 0X 1 X 0 (130)

Linearity proved.

Ebβ = E hX 0X 1 X 0 (X β+ e)

i(131)

Ebβ = E hX 0X 1 X 0X β

i+ E

hX 0X

1 X 0ei (132)

Ebβ = β+ E

hX 0X

1 X 0ei (133)

Ebβ = β (134)

Unbiasedness is proved.Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 37 / 243

Blue Property in Matrix: Minimum Variance

Ebβ β = E

hX 0X

1 X 0ei (135)

EhEbβ β

i2= E

hX 0X

1 X 0ei0 hX 0X 1 X 0ei (136)

=X 0X

1 X 0XE e 0e X 0X 1 = σ2X 0X

1(137)Take an alternative estimator b

b =hX 0X

1 X 0 + ciY (138)

b =hX 0X

1 X 0 + ci (X β+ e) (139)

b β = EhX 0X

1 X 0e + cei (140)



Now it need to be shown that

cov (b) > covbβ (141)

Take an alternative estimator b

b β = EhX 0X

1 X 0e + cei (142)

cov (b) = E(b β) (b β)0

= E

hX 0X

1 X 0e + cei hX 0X 1 X 0e + cei= σ2

X 0X

1+ σ2c2 (143)


Proved.Thus the OLS is BLUE =Best, Linear, Unbiased Estimator.



What is the statistical inference?

Inference is statement about population based on sample information.Economic theory provides these relations. Statistical inferenceempirically tests their validity based on cross section or time series orpanel data.Hypotheses are set up according to the economic theory, estimates ofparameters are estimated using OLS (similar other)estimators.Consider a linear regression

Yi = β1 + β2Xi + εi i = 1 ...N (145)

True values of β1 and β2 are unknown; their values can be estimatedusing the OLS technique. bβ1 and bβ2 are such estimators of β1 andβ2. Validity of these estimators/estimates are tested using statisticaldistributions. Two most important tests are

1 Siginicance of an individual coe¢ cient: t-test2 Overall signicance of the model: F -testOverall t of the data to the model is indicated by R2. (χ2,Durbin-Watson, Unit root tests to follow).


Standard hypothesis about individual coe¢ ceints (t-test)

Null hypothesis: value of intercept and slope coe¢ cients are zero.

H0 : β1 = 0H0 : β2 = 0

Alternative hypotheses: Intercept and slope coe¢ cients are non -zero.

HA : β1 6= 0HA : β2 6= 0

Parameter β2 is slope,∂Y∂X ; it measures how much Y will change when X

changes by one unit. Parameter β1 is intercept. It shows amount of Ywhen X is zero.Economic theory: a noraml demand function should have β1 > 0 andβ2 < 0; a normal supply function should have β1 6= 0 β2 > 0. This is thehypothesis to be tested empirically.


Standard hypothesis about the validity of the model(F-test)

Null hypothesis: both intercept and slope coe¢ cients are zero; model ismeaningless and irrelevant:

H0 : β1 = β2 = 0

Alternative hypotheses: at least one of the parameters is non -zero, modelis relevant:

HA : either β1 6= 0 or β2 6= 0 or both β1 6= 0, β2 6= 0

As is often seen, some of the coe¢ cients in a regression may beinsignicant but F-statistics is signicant and model is valid.








An Example of regression on deviations from the mean

Easy to work with a simple example

Table: Data Table:Price and Quantity

X 1 2 3 4 5 6Y 6 3 4 3 2 1

What are the estimates of bβ1 and bβ2?Here ∑Xi = 21 ; ∑Yi = 19 ; ∑YiXi =52 ∑X 2i =91 ∑Y 2i =75Y = 3.17 X = 3.5Ols stimators bβ2 = ∑ yixi

∑ x2i; bβ1 = Y bβ2X (146)


Normal Equations and Deviation Form

Normal equations of above regression

∑Yi = bβ1N + bβ2 ∑Xi (147)

∑YiXi = bβ1 ∑Xi + bβ2 ∑X 2i (148)

Dene deviations asxi =

Xi X

(149)

yi = (Yi y) (150)

∑Xi X

= 0;∑ (Yi y) = 0 (151)


Normal Equations and Deviation Form

Putting these in the Normal equations

∑ (Yi y) = bβ1N + bβ2 ∑Xi X

(152)

∑Xi X

(Yi y) = bβ1 ∑

Xi X

+ bβ2 ∑

Xi X

2(153)

Terms ∑Xi X

= 0;∑ (Yi y) = 0 drop out

∑Xi X

(Yi y) = ∑ xiyi and ∑

Xi X

2= ∑ x2i

This is a regression through origin. Therefore estimator of slopeceo¢ ceint with deviation


(154)

bβ1 = Y bβ2X (155)

The reliability of .bβ2 and bβ1 depends on their variances; t-test is usedto determine their signicance.


Deviations from the mean

Useful short-cuts ( though matrix method is more accurate,sometimes quick short cuts like this can be handy)

∑ x2i = ∑Xi X

2= ∑X 2i NX

2= 91 6(3.5)2 = 17.5 (156)

∑ y2i = ∑Yi Y

2= ∑Y 2i NY

2= 91 6(3.17)2 = 14.7 (157)

∑ yixi = ∑Yi Y

∑Xi X

= ∑YiXi Y ∑Xi X ∑Yi +NYX =

∑YiXi YNX XNY +NYX= ∑YiXi YNX = 52 (3.5) (6) (3.17) = 14.57(158)


OLS estimates by the deviation method

Estimate of the slope coe¢ cient:

bβ2 = ∑ yixi∑ x2i

=14.5717.5

= 0.833 (159)

This is negative as expected.Estimate of the intercept coe¢ cient.

bβ1 = Y bβ2X = 3.17 (0.833) (3.5) = 6.09 (160)

It is positive as expected.Thus the regression line tted from the data:

bYi = bβ1 + bβ2Xi = 6.09 0.833Xi (161)

How reliable is this line? Answer to this should be based on the analysis ofvariance and statistical tests.


Variation of Y, Predicted Y and error

Total variation to be explained:

∑ y2i = ∑Yi Y

2= ∑Y 2i NY

2= 75 6(3.17)2 = 14.707 (162)

Variation explained by regression:

∑ by2i = ∑(bβ2xi )2 = bβ22 ∑ xi 2 =

∑ yixi∑ x2i

2∑ xi 2

=(∑ yixi )

∑ x2i

2

=(14.57)2

17.5=212.2817.5

= 12.143 (163)

Note that in deviation form: ∑ byi = ∑ bβ2xi .Unexplained variation (accounted by various errors):

∑be2i = ∑ y2i ∑ by2i = 14.707 12.143 = 2.564 (164)


Measure of Fit: Rsquare and Rbar-square

The measure of t R2 is ratio of total variation explained by regression∑ by2i to total variation that need to be explained ∑ y2i

R2 =∑ by2i∑ y2i

=12.14314.707

= 0.826 (165)

This regression model explains about 83 percent of variation in y .

R2= 1 (1 R2) N 1

N K = 1 (1 0.826)54= 0.78 (166)

Variance of error indicates the unexplained variation

var (bei ) = bσ2 = ∑be2iN K =

2.5644

= 0.641 (167)

var (yi ) =∑ y2iN 1 =

14.75= 2.94 (168)


Variance of Parameters

Reliability of estimated parameters depends on their variances, standarderrors and t-values

varbβ2 = 1

∑ xi 2bσ2 = 0.641

17.5= 0.037 (169)

varbβ1 =

"1N+

X2

∑ xi 2

# bσ2 = 16+3.52

17.5

0.641 = (0.867) 0.641 = 0.556

(170)Prove these formula (see later on).Standard errors

SEbβ2 = rvar bβ2 = p0.037 = 0.192 (171)

SEbβ1 = rvar bβ1 = p0.556 = 0.746 (172)


T-test

Theoretical value of T distribution is derived by dividing mean by standarderror. Mean is a normally distributed variable and the standard error χ2

distribution. Originally t-distribution was established by W.S. Gossett ofGuiness Brewery in 1919.One- and Two-Tailed TestsIf the area in only one tail of a curve is used in testing a statisticalhypothesis, the test is called a one-tailed test; if the area of both tails areused, the test is called two-tailed.The decision as to whether a one-tailed or a two-tailed test is to be useddepends on the alternative hypothesis.

One-tailed testX z

Two-tailed testz X z


Test of signicance of parameters (t-test)

tbβ2 = bβ2 β2

SEbβ2 =

bβ2 0SEbβ2 =

0.8330.192

= 4.339 (173)

tbβ1 = bβ1

SEbβ1 =

6.090.746

= 8.16 (174)

These calculated t-values need to be compared to t-values from thetheoretical table t-table.Decision rule: (one tail test following economic theory)

Accept H0 : β1 > 0 if tbβ1 < tα,df reject H0 : β1 > 0 or accept

HA : β1 0 if tbβ1 > tα,df

Accept H0 : β2 < 0 if tbβ2 < tα,df reject H0 : β2 < 0 or accept

HA : β2 0 if tbβ2 > tα,df

P-value: Probability of test statistics exceeding table value.Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 59 / 243

Test of signicance of parameters (t-test)

Theoretical values of t are given in a t Table. Column of t-table have levelof signicance (α) and rows have degrees of freedom.Here tα,df is t-table value for degrees of freedom (df = n k) and α levelof signicance. df = 6 2 = 4.

Table: Relevant t-values (one tail) fron t-Table

(n, α) 0.05 0.025 0.0051 6.314 12.706 63.6572 2.920 4.303 9.9254 2.132 2.776 4.604

tbβ1 = 8.16 > tα,df = t0.05,4 = 2.132. Thus the intercept is statisticallysignicant; t

bβ2 = j4.339j > tα,df = t0.05,4 = 2.132. thus the slope isalso statistically signicant at 5% and 2.5% level of signicance.


Condence interval on the slope parameter

A researcher may be interested more in knowing the interval in which thetrue parameter may lie than in the point estimte where α is the level ofsignicance or the probability of error such as 1% or 5%. That meansaccuracy of the estimate is (1 α) %.A 95% level condence interval for β1 and β2 is:

Pbβ2 SE bβ2 tα,n < β2 <

bβ2 + SE bβ2 tα,n = (1 α) (175)

P [0.833 0.192 (2.132.) < β2 < 0.833+ 0.192 (2.132.)]= (1 0.05) = 0.95 (176)

P [1.242 < β2 < 0.424] = 0.95 (177)

There is 95 condence that the true value of slope β2 lies between 0.424and 1.242.


Condence interval on the intercept parameter

95 % condence interval on the slope parameter:

Pbβ1 SE bβ2 tα,n < β1 <

bβ1 + SE bβ2 tα,n = (1 α) (178)

P [6.09 0.746 (2.132.) < β1 < 6.09+ 0.746 (2.132.)]

= (1 0.05) = 0.95 (179)

P [4.500 < β2 < 7.680] = 0.95 (180)

There is 95 condence that the true value of intercept β1 lies between4.500 and 7.680.


F-Test

F-value is the ratio of sum of squared normally distributed variables (χ2 )adjusted for relevant degrees of freedom.

F =V1/n1V2/n2

= F (n1, n2) (181)

Where V1 and V2 are variances of numberator and denomenator andn1and n2 are degrees of freedom of numberator and denomenator.H0: Variance are the same; HA: Variance are di¤erent. Fcrit values areobtained from F-distribution table. Accept H0 if FCalc < Fcrit and reject ifFCalc > Fcrit .


F - is ratio of two χ2 distributed variables with degrees of freedom n2 andn1.

Fcalc =∑ by 2iK1∑ be2iNK

=12.1431

2.5644

=12.1430.641

= 18.94; n1 = K 1 and n2 = N K

(182)

Table: Relevant F-values from the F-Table1% level of signicance 5% level of signicance

(n2, n1) 1 2 3 1 2 31 4042 4999.5 5403 161.4 199.5 215.72 98.50 99.00 99.17 18.51 19.00 19.164 21.20 18.00 16.69 7.71 6.94 6.59

n1 = degrees of freedom of numerator; n2 =degrees of freedom ofdenominator; for 5% level of signicance Fn1,n2 = F1,4 = 7.71;Fcalc > F1,4;for 1% level of signicance Fn1,n2 = F1,4 = 21.20; Fcalc > F1,4=)imply that this model is statistically signicant at 5% but not at 1%level of signicance. Model is meaningful only at 5% level of signicance.


Prediction and error of prediction

What is the prediction of Y when X is 0.5?

bYi = bβ1 + bβ2Xi = 6.09 0.833 (0.5) = 5.673 (183)

Prediction error

f = Y0 bY0 = β1 + β2Xi + ε0 bβ1 bβ2Xi (184)

Mean of prediction error

E (f ) = E

β1 + β2Xi + ε0 bβ1 bβ2Xi = 0 (185)

Predictor is ubiased.


t-test for variance of forecast

tf =Y0 bY0SE (f )

~tN2 (186)

Standard error of forecast. Find var (f ) .

SE (f ) =qvar (f ) (187)

Condence interval of forecast

Pr

"tc

Y0 bY0SE (f )

tc

#= (1 α) (188)

Prh bY0 tcSE (f ) Y0 bY0 + tcSE (f )i = (1 α) (189)


Variance of Y and error

E (bεi )2 = ∑be2iN k = bσ2 (190)

where N is is number of observations and k is the number of parametersincluding intercept.

var (Yi ) = EYi Y

2= E

hβ1 + β2Xi + εi bβ1 bβ2X i2

=h

β1 + β2E (Xi ) + E (εi ) Ebβ1 E bβ2X i2

=β1 + β2X + E (εi ) β1 β2X

2=

β1 + β2X + E (εi ) β1 β2X

2= [E (εi ) ]

2 = σ2 (191)


Variance of Slope Parameter


(192)

Ebβ2 = ∑wiyi (193)

where

wi =xi

∑ x2i=

Xi X

∑Xi X

2 (194)

Varbβ2 = var

"∑Xi X

∑Xi X

2#var (yi ) =

1

∑ x2ibσ2 (195)


Variance of Intercept Parameter

bβ1 = Y bβ2X (196)

varbβ1 = var

Y bβ2X

= E

∑ yiN

X ∑ xiyi∑ x2i

2= E ∑

1N X xi

∑ x2i

2E∑ yi

2=

NN2+ X

2 ∑w2i 21NX ∑wi

bσ2 (197)

=

"1N+X2

∑ x2i

# bσ2 (198)

2 1NX ∑wi = 0 because ∑wi = 00


Covariance of Parameters (with Matrix)

b =

bβ1bβ2!=X 0X

1 X 0Y = X 0X 1 X 0 (X β+ e)

= β+X 0X

1 X 0e (199)

b β =X 0X

1 X 0e (200)

cov (b β) = EhX 0X

1 X 0ee 0X X 0X 1i = X 0X 1 σ2 (201)

X 0X

1=

N ∑Xi

∑Xi ∑X 2i

1(202)

covbβ = X 0X 1 bσ2 = 1

N ∑X 2i (∑Xi )2

∑X 2i ∑Xi∑Xi N

(203)



X 0X

1=

N ∑Xi

∑Xi ∑X 2i

1(204)

cov (b β) =

var (b1) var (b1, b2)var (b1, b2) var (b2)

=

264 (∑Xi )2

N ∑(XiX )2

X∑(XiX )

2

X∑(XiX )

21

∑(XiX )2

375(205)


Variance of Prediction error

varbβ1 =

"1+

1N+

(x0 x)2

∑ (x0 x)2i

# bσ2 (206)

Proof

Y0 = bY0 +bε0 (207)

var (Y0) = varbY0+ var (bε0) (208)

varbY0 = var bβ1 + bβ2X0 = var bβ1+X 20 var bβ2+ 2X0covar bβ1bβ2

(209)


Variance of Prediction

varbY0 =

∑Xi X

2N ∑

Xi X

2 bσ2 + X 20 ∑Xi X

∑Xi X

2 bσ2+2X0

X 1

∑Xi X

2!bσ2 (210)

add and subtractN ∑(XiX )

2

N ∑(XiX )2 bσ2

varbY0 =

∑Xi X

2N ∑

Xi X

2 bσ2 N ∑Xi X

2N ∑

Xi X

2 bσ2 + X 20 ∑Xi X

∑Xi X

2 bσ2+2X0

X 1

∑Xi X

2!bσ2 + N ∑

Xi X

2N ∑

Xi X

2 bσ2 (211)


Variance of forecast

Taking common elements out

varbY0 = bσ2

2664∑(XiX )

2N ∑(XiX )2

N ∑(XiX )2

+X 202X0X+∑(XiX )

2

∑(XiX )2

3775 (212)

varbY0 = bσ2 " ∑

Xi X

2N ∑

Xi X

2 +X0 X

2∑Xi X

2#

(213)


Variance of forecast

varbY0 = bσ2 " 1N +

X0 X

2∑Xi X

2#

(214)

var (f ) = varbY0+ var (bε0) (215)

var (f ) = bσ2 " 1N+

X0 X

2∑Xi X

2#+ bσ2 (216)

var (f ) = bσ2 "1+ 1N+

X0 X

2∑Xi X

2#

(217)


Estimation and Inference

type I and type II errorsElaborate on the following with relevant diagrams

True FalseAccept Correct Type II errorReject Type I error Correct


Distributions: Normal, t, F and chi_square

Normal Distribution

f (x) =1

σp2π

exp

12(x µ)2

σ2

!(218)

Lognormal

f (x) =1

σp2π

exp

12(ln x µ)2

σ2

!(219)

Standard normal:

e ~ N (0, 1) (220)

Any distribution can be converted to the standard normal distribution bynormalization.


Distributions: Normal, t, F and chi_square

Chi-square: Sum of the Square of a normal distribution

Z =k

∑i=1Z 2i (221)

with k degrees of freedom.t Distribution: ratio of normal to chi-square

t =Z1pZ1/k

(222)

F - distribution: ratios of two chi-square distribution with df k1 and k2

F =

pZ1/k1pZ1/k2

(223)


Large Sample Theory

Probability limitp lim (β) = β (224)

Central limit theoremt Distribution: ratio of normal to chi-square

Y β

σ/pT= N (0, 1) (225)

Convergence in limit

limt!∞

pbθ θ

6 ε = 1 =) p limbθ = θ (226)

t-distribution more accurate for nite samples but the normal distributionasymptotically approximates any other distribution according to the centrallimit theorem.


Large Sample Theory

Probability limit of sum of two numbers is sum of probability limits

Probability limit of product of two numbers is product of probabilitylimits

Probability limit of a function is the function of the probability limit(Slutskey theorem)


Multiple Regression Model in Matrix


Yi = β0 + β1X1,i + β2X2,i + β3X3,i + ....+ βkXk ,i + εi i = 1 ...N(227)

and assumptions

E (εi ) = 0 (228)

E (εixj ,i ) = 0; var (εi ) = σ2 for 8 i ; εi~N0, σ2

(229)

covar (εi εj ) = 0 (230)

Explanatory variables are uncorrelated.

E (X1,iX1,j ) = 0 (231)

Object is to choose parameters that minimise the sum of squarederrors

Min Sbβ0bβ1bβ2...bβk = ∑ ε2i =Yi bβ0 bβ1X1,i bβ2X2,i ... bβkXk ,i2

(232)Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 81 / 243

Derivation of Normal Equations

;∂S

∂bβ0 = 0; ∂S

∂bβ1 = 0; ∂S

∂bβ2 = 0; ∂S

∂bβ3 = 0; ...... ∂S

∂bβk = 0 (233)

Normal equations for two explanatory variable case

∑Yi = bβ0N + bβ1 ∑X1,i + bβ2 ∑X2,i (234)

∑X1,iYi = bβ0 ∑X1,i + bβ1 ∑X 21,i + bβ2 ∑X1,iX2,i (235)

∑X2,iYi = bβ0 ∑X2,i + bβ1 ∑X1,iX2,i + bβ2 ∑X 22,i (236)24 ∑Yi∑X1,iYi∑X2,iYi

35 =24 N ∑X1,i ∑X2,i

∑X1,i ∑X 21,i ∑X1,iX2,i∑X2,i ∑X1,iX2,i ∑X 22,i

35264 bβ0bβ1bβ2

375 (237)

Matrix must be non-singular X 0X

1 6= 0 (238)


Normal equations in matrix form

264 bβ0bβ1bβ2375 =

24 N ∑X1,i ∑X2,i∑X1,i ∑X 21,i ∑X1,iX2,i∑X2,i ∑X1,iX2,i ∑X 22,i

351 24 ∑Yi∑YiX1,i∑YiX2,i

35 (239)

β =X 0X

1 X 0Y (240)

bβ0 =

∑Yi ∑X1,i ∑X2,i∑YiX1,i ∑X 21,i ∑X1,iX2,i∑YiX2,i ∑X1,iX2,i ∑X 22,i

N ∑X1,i ∑X2,i


(241)


Use Cramer Rule to solve for paramers

bβ1 =

N ∑Yi ∑X2,i∑X1,i ∑YiX1,i ∑X1,iX2,i∑X2,i ∑YiX2,i ∑X 22,i

N ∑X1,i ∑X2,i


(242)

bβ2 =

N ∑X1,i ∑Yi∑X1,i ∑X 21,i ∑YiX1,i∑X2,i ∑X1,iX2,i ∑YiX2,i

N ∑X1,i ∑X2,i


(243)


Covariance of Parameters

covbβ = X 0X 1 bσ2 (244)

covbβ =

0B@ var(bβ1) cov(bβ1bβ2) cov(bβ1bβ3)cov(bβ1bβ2) var(bβ2) cov(bβ2bβ3)cov(bβ1bβ3) cov(bβ2bβ3) var(bβ3)

1CA (245)

covbβ =


351 bσ2 (246)


Determinant and cofactor matrix required for inverse

jX 0X j =N ∑X 21,i ∑X 22,i +∑X1,i ∑X1,iX2,i ∑X2,i +∑X2,i ∑X1,iX2,i ∑X1,i∑X2,i ∑X2,i ∑X 21,i N ∑X1,iX2,i ∑X1,iX2,i ∑X 22,i ∑X1,i ∑X1,iAdj(X 0X ) = C 0

C =

266666664

∑X 21,i ∑X1,iX2,i∑X1,iX2,i ∑X 22,i

∑X1,i ∑X1,iX2,i

∑X2,i ∑X 22,i

∑X1,i ∑X 21,i∑X2,i ∑X1,iX2,i

∑X1,i ∑X2,i

∑X1,iX2,i ∑X 22,i

N ∑X2,i∑X2,i ∑X 22,i

N ∑X1,i

∑X2,i ∑X1,iX2,i

∑X1,i ∑X2,i∑X 21,i ∑X1,iX2,i

N ∑X2,i

∑X1,i ∑X1,iX2,i

N ∑X1,i∑X1,i ∑X 21,i

377777775(247)


Variance of parameters

varbβ0 =

∑X 21,i ∑X1,iX2,i∑X1,iX2,i ∑X 22,i

jX 0X j bσ2 (248)

varbβ1 =

N ∑X2,i∑X2,i ∑X 22,i

jX 0X j bσ2 (249)

varbβ2 =

N ∑X1,i∑X1,i ∑X 21,i

jX 0X j bσ2 (250)


Standard Error and t-values

SEbβ0 = rvar bβ0; t bβ0 = bβ0 β0

SEbβ0 (251)


SEbβ1 (252)


SEbβ2 (253)



∑ y2i = Y0Y NY 2 (254)

∑ by2 = bβ1 ∑ x1y + bβ2 ∑ x2y = bβ0x 0y (255)

∑be2i = ∑ y2i ∑ by2i (256)


Rsquare and F Statistics

R2 =bβ0x 0yY 0Y

(257)

Fcalc =∑ by 2iK1e 0eNK

=∑ by2iK 1

N Ke 0e

(258)

Fcalc =∑ by2i

∑ y2i (K 1)(N K )∑ y2i

e 0e=

R2

K 1N K(1 R2) (259)


Numerical Example: Does level of unempolyment dependon claimant count, strikes and work hours?

How does the level of unemployment (Yi )relate to the level of claimantcounts (X1,i ), numbers of stopages(X2,i ) because of industrial strikes andnumber of work hours (X3,i )in UK? Data from the Labour Force Surveyfor 19 years;N = 19.

Yi = β0 + β1X1,i + β2X2,i + β3X3,i + εi i = 1 ...N (260)2664N ∑X1,i ∑X2,i ∑X3,i

∑X1,i ∑X 21,i ∑X1,iX2,i ∑X1,iX3,i∑X2,i ∑X1,iX2,i ∑X 22,i ∑X2,iX3,i∑X3,i ∑X1,iX3,i ∑X2,iX3,i ∑X 23,i

3775 =2664

19 29057 4109 16904.629057 53709128.8 6872065.8 25461639.464109 6872065.8 1132419 363814516904.6 25461639.46 3638145 15059252.96

3775Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 91 / 243

Numerical Example: OLS Setup

2664∑Yi

∑X1,iYi∑X2,iYi∑X3,iYi

3775 =2664

3732663415261.4847614632958009

377526664bβ0bβ1bβ2bβ3

37775 =2664

19 29057 4109 16904.629057 53709128.8 6872065.8 25461639.464109 6872065.8 1132419 363814516904.6 25461639.46 3638145 15059252.96

37751 2664

3732663415261.4847614632958009

3775


Numerical Example: Estimates of parameters, theirstandard errors and t-values

2664bβ0bβ1bβ2bβ3

3775 =2664

402.4319485 0.018888662 0.013959234 0.4231817170.018888662 1.00E 06 9.85E 07 1.97E 050.013959234 9.85E 07 5.37E 06 1.53E 050.423181717 1.97E 05 1.53E 05 0.000445415

37752664

3732663415261.4847614632958009

37752664bβ0bβ1bβ2bβ3

3775 =26645560.8809670.984582930.2232883286.820108368

3775 ;2666664

SEbβ0

SEbβ1

SEbβ2

SEbβ3

3777775 =2664

871.23841010.043488930.1006270260.916586018

3775 ;2666664

tbβ0tbβ1tbβ2tbβ3

3777775 =26646.38273164122.639851642.2189697537.440772858

3775t-testHypotheses: H0 : βi = 0 against HA : βi 6= 0Critical values of t for 15 degrees fo freedom at 5% level of signicance = 2.13;Each of above computed t-values are greater than table values. Therefore statistical enough evidence to reject the nullhypothess. All of four parameters are statistically signicant.


Numerical Example:Sum Square Error and Covariance ofBeta

var (e) = E (bεi )2 = ∑ be2iN k =

28292.5984219 4 = 1886.173228 = bσ2 (261)

covbβ =

2664402.4319485 0.018888662 0.013959234 0.4231817170.018888662 1.00E 06 9.85E 07 1.97E 050.013959234 9.85E 07 5.37E 06 1.53E 050.423181717 1.97E 05 1.53E 05 0.000445415

3775 (1886.173228) (262)

=

2664759056.3673 35.62728928 26.32953287 798.194024835.62728928 0.001891287 0.001858782 0.03724436626.32953287 0.001858782 0.010125798 0.028859446798.1940248 0.037244366 0.028859446 0.840129929

3775


Rsquare and Adjusted Rsquare

R2 =4428800.1384457092.737

= 0.99365223 (263)

R2= 1

1 R2

N 1N K = 0.992382676 (264)



Source of Variance Sum Degrees of freedom Mean F-valueTotal sum square (TSS) 4457092.737 18 247616.2632Regression Sum Square (RSS) 4428800.138 3 1476266.713 782.6782243Sum of square error 28292.59842 15 1886.173228

Hypothesis: H0 : β0 = β1 = β2 = β3 = 0 or model is meaningless against HA : β0 6= β1 6= β2 6= β3 6= 0 or at least one βi 6= 0model explains something.Critical values of F for degrees of freedom of 3 and 15 at 5 percent level of signicance = 3.29.Calculated F-statistics is much higher than critical value. Thereofre there is statistical evidence to reject the null hypothesis.That means in general this model is statistically siginicant.


Normal equations for K variables

∑Yi = bβ0N + bβ1 ∑X1,i + bβ2 ∑X2,i + bβ3 ∑X3,i + ..+ bβk ∑Xk ,i (265)

∑YiX1,i = bβ0 ∑X1,i + bβ1 ∑X 21,i + bβ2 ∑X1,iX2,i + .+ bβk ∑X1,iXk ,i(266)

...............................................................................

∑YiXk ,i = bβ0 ∑Xk ,i + bβ1 ∑X1,iXk ,i + bβ2 ∑Xk ,iX2,i + .+ bβk ∑X 2k ,i(267)

Process is similar to the three variable model - except that thisgeneral model will have more coe¢ cients to evaluate and test andrequires data on more variables.


Regression in Matrix (pages 42-46)Let Y is N 1 vector of dependent variables X is N K matrix ofexplanatory variablesErrors e is N 1 vector of independently and identically distributednormal random variable with mean equal to zero and a constant variancee~N(0, σ2I ); β is a K 1 vector of unknown coe¢ cients

Y = βX + e (268)

Objective is to minimise sum square errors

MinβSbβ = e 0e =

Y bβX0 Y bβX

= Y 0Y Y 0bβX bβX0 Y + bβX0 bβX (269)

= Y 0Y 2bβX 0Y + bβX0 bβX (270)


First order condition in Matrix Method

∂S (β)∂β

= 2X 0Y + 2bβX 0X = 0 (271)

=) bβ = X 0X 1 X 0Y (272)

be = Y bβX (273)

Estimate of variance of errors

bσ2 = ∑be2iN k =

e 0eN k (274)


Derivation of Parameters (with Matrix Inverse)

For two variable, Yi = β1 + β2Xi + εi i = 1 ...N, case

X 0X

1=

N ∑Xi

∑Xi ∑X 2i

1=

1

N ∑X 2i (∑Xi )2


(275)

X 0X

1=

24 ∑X 2iN ∑X 2i (∑Xi )

2 ∑XiN ∑X 2i (∑Xi )

2

∑XiN ∑X 2i (∑Xi )

2N

N ∑X 2i (∑Xi )2

35 (276)

" bβ1bβ2#=

24 ∑X 2iN ∑X 2i (∑Xi )

2 ∑XiN ∑X 2i (∑Xi )

2

∑XiN ∑X 2i (∑Xi )

2N

N ∑X 2i (∑Xi )2

35 ∑Yi∑X 0i Yi

(277)


Derivation of Parameters (with Matrix Inverse)

" bβ1bβ2#=

24 ∑X 2i ∑Yi∑Xi ∑X 0i YiN ∑X 2i (∑Xi )

2

N ∑X 0i Yi∑Xi ∑YiN ∑X 2i (∑Xi )

2

35 =24 ∑Xi ∑X 0i Yi∑X 2i ∑Yi

N ∑X 2i (∑Xi )2

∑Xi ∑YiN ∑X 0i YiN ∑X 2i (∑Xi )

2

35 (278)

Compares to what we had earlier:

bβ2 = ∑Xi ∑Yi N ∑YiXi(∑Xi )

2 N ∑X 2i=

∑ xiyi∑ x2i

(279)



covbβ = N ∑Xi

∑Xi ∑X 2i

1 bσ2 (280)

covbβ = X 0X 1 bσ2 = 1

N ∑X 2i (∑Xi )2


bσ2(281)


Variance of Parameters (with Matrix)

Take the corresponding diagonal element for variance:

varbβ2 = N

N ∑X 2i (∑Xi )2 bσ2 = 1

∑ x2ibσ2 (282)

varbβ1 = ∑X 2i

N ∑X 2i (∑Xi )2 bσ2 (283)

Standard errors

SEbβ2 = rvar bβ2; SE bβ1 = rvar bβ1 (284)

t-values

tbβ2 = bβ2 β2

SEbβ2 ; t

bβ1 = bβ1 β1

SEbβ1 (285)


Variances

∑be2i = ∑ y2i ∑ by2i (286)

∑ by2 = ∑ (xβ)0 (βx) ; x = X X (287)

∑ by2i = ∑(bβ2xi )2 = bβ22 ∑ xi 2 (288)

R2 =∑ by2i∑ y2i

and Fcalc =∑ by 2iK1∑ be2iNK

; Fcalc =R2

K 1N K(1 R2) (289)


Variances in multiple regression

For Yi = β0 + β1X1,i + β2X2,i + εi

Y = bY + e = bβX + e (290)

∑ y2i = Y0Y NY 2 (291)

Regression of two explantory variables in the deviation from the mean:

by = bβ1x1 + bβ1x2 (292)


Explained variation in multiple regression

∑ by2 =bβ1 ∑ x1 + bβ2 ∑ x2

2= bβ21 ∑ x21 + bβ1bβ2 ∑ x1x2 + bβ1bβ2 ∑ x1x2 + bβ22 ∑ x22

= bβ1 bβ1 ∑ x21 + bβ2 ∑ x1x2+ bβ2 bβ1 ∑ x1x2 + bβ2 ∑ x21

= bβ1 ∑ x1y + bβ2 ∑ x2y = bβ0x 0y (293)

∑be2i = ∑ y2i ∑ by2i (294)


Explained variation in multiple regression

∑ by2 = hbβ1 bβ2i x11 x12 . x1Nx21 x22 . x2N

2664y1y2.yN

3775 = bβ0x 0y (295)

be 0be = Y 0Y bβ0x 0y (296)


R-square and F-statistics in multiple regression

R2 =bβ0x 0yY 0Y

(297)

Fcalc =∑ by 2iK1e 0eNK

(298)

Fcalc =R2

K 1N K(1 R2) (299)


Blue Property in Matrix: Linearity and Unbiasedness

bβ = X 0X 1 X 0Y (300)

bβ = aY ; a = X 0X 1 X 0 (301)

Linearity proved.

Ebβ = E hX 0X 1 X 0 (X β+ e)

i(302)

Ebβ = E hX 0X 1 X 0X β

i+ E

hX 0X

1 X 0ei (303)

Ebβ = β+ E

hX 0X

1 X 0ei (304)

Ebβ = β (305)

Unbiasedness is proved.Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 109 / 243


Ebβ β = E

hX 0X

1 X 0ei (306)

EhEbβ β

i2= E

hX 0X

1 X 0ei0 hX 0X 1 X 0ei (307)

=X 0X

1 X 0XE e 0e X 0X 1 = σ2X 0X

1(308)Take an alternative estimator b

b =hX 0X

1 X 0 + ciY (309)

b =hX 0X

1 X 0 + ci (X β+ e) (310)

b β = EhX 0X

1 X 0e + cei (311)



Now it need to be shown that


Take an alternative estimator b

b β = EhX 0X

1 X 0e + cei (313)

cov (b) = E(b β) (b β)0

= E

hX 0X

1 X 0e + cei hX 0X 1 X 0e + cei= σ2

X 0X

1+ σ2c2 (314)


Proved.Thus the OLS is BLUE =Best, Linear, Unbiased Estimator.



Consider a linear regression without intercept term

Yi = β1X1,i + β2X2,i + β3X3,i + εi i = 1 ...N (316)

and assumptions

E (εi ) = 0 (317)

E (εixj ,i ) = 0 (318)

var (εi ) = σ2 for 8 i (319)

covar (εi εj ) = 0 (320)

εi~N0, σ2

(321)

Objective is to choose parameters that minimise the sum of squarederrors

Min Sbβ1,bβ2 b,β3 = ∑ ε2i =Yi bβ1X1,i bβ2X2,i bβ3X3,i2 (322)



∂S

∂bβ1 = 0; ∂S

∂bβ2 = 0; ∂S

∂bβ3 = 0; (323)

Normal equations for three explanatory variable case

∑X1,iYi = bβ1 ∑X 21,i + bβ2 ∑X1,iX2,i + bβ3 ∑X1,iX3,i (324)

∑X2,iYi = bβ1 ∑X1,iX2,i + bβ2 ∑X 22,i + bβ3 ∑X2,iX3,i (325)

∑X3,iYi = bβ1 ∑X1,iX3,i + bβ2 ∑X2,iX3,i + bβ3 ∑X 23,i (326)24 ∑X1,iYi∑X2,iYi∑X3,iYi

35 =24 ∑X 21,i ∑X1,iX2,i ∑X1,iX3,i

∑X1,iX2,i ∑X 22,i ∑X2,iX3,i∑X1,iX3,i ∑X2,iX3,i ∑X 23,i

35264 bβ1bβ2bβ3

375(327)



264 bβ1bβ2bβ3375 =

24 ∑X 21,i ∑X1,iX2,i ∑X1,iX3,i∑X1,iX2,i ∑X 22,i ∑X2,iX3,i∑X1,iX3,i ∑X2,iX3,i ∑X 23,i

351 24 ∑X1,iYi∑X2,iYi∑X3,iYi

35(328)

β =X 0X

1 X 0Y (329)

bβ1 =

∑X1,iYi∑X2,iYi∑X3,iYi

∑X1,iX2,i ∑X1,iX3,i∑X 22,i ∑X2,iX3,i

∑X2,iX3,i ∑X 23,i

∑X 21,i ∑X1,iX2,i ∑X1,iX3,i


(330)



bβ2 =

∑X 21,i ∑X1,iX2,i ∑X1,iYi∑X1,iX2,i ∑X 22,i ∑X2,iYi∑X1,iX3,i ∑X2,iX3,i ∑X3,iYi

∑X 21,i ∑X1,iX2,i ∑X1,iX3,i


(331)

bβ2 =

∑X 21,i ∑X1,iYi ∑X1,iX3,i∑X1,iX2,i ∑X2,iYi ∑X2,iX3,i∑X1,iX3,i ∑X3,iYi ∑X 23,i

∑X 21,i ∑X1,iX2,i ∑X1,iX3,i


(332)


Covariance of Parameters

Matrix must be non-singular (X 0X )1 6= 0

covbβ =

0B@ var(bβ1) var(bβ1bβ2) var(bβ1bβ3)var(bβ1bβ2) var(bβ2) var(bβ2bβ3)var(bβ1bβ3) var(bβ2bβ3) var(bβ3)

1CA (333)

covbβ = X 0X 1 σ2 (334)

covbβ =


351 bσ2 (335)


Data (text book example)

Table: Data for a multiple regression

y 1 -1 2 0 4 2 2 0 2x1 1 -1 1 0 1 0 0 1 0x2 0 1 0 1 2 3 0 -1 0x3 -1 0 0 0 0 0 1 1 1


Squares and cross products

X 0X =

24 1 1 1 0 1 0 0 1 00 1 0 1 2 3 0 1 01 0 0 0 0 0 1 1 1

35

26666666666664

1 0 11 1 01 0 00 1 01 2 00 3 00 0 11 1 10 0 1

37777777777775=

24 5 0 00 16 10 1 4

35


Sum and cross products

X 0Y =

24 1 1 1 0 1 0 0 1 00 1 0 1 2 3 0 1 01 0 0 0 0 0 1 1 1

35

26666666666664

112042202

37777777777775=

24 8133

35



35 =24 5 0 00 16 10 1 4

35 and24 ∑X1,iYi∑X2,iYi∑X3,iYi

35 =24 8133

35


Estimation of Parameters

264 bβ1bβ2bβ3375 =

24 5 0 00 16 10 1 4

351 24 8133

35 (336)

264 bβ1bβ2bβ3375 =

24 0.2 0 00 0.063 0.0160 0.016 0.254

3524 8133

35 =24 1.60.8730.968

35 (337)

Prediction equation

bYi = 1.6X1,i + 0.873X2,i + 0.968X3,i (338)


Sum Squares

∑ y2i = ∑Y 2 NY 2 = 34 9 (1.3333)2 = 18.00 (339)

by = bβ1x1 + bβ2x2 + bβ3x3 (340)

∑ by2 = bβ1 ∑ x1y + bβ2 ∑ x2y + bβ3 ∑ x3y (341)

= 1.6 4+ 0.873 5+ 0.968 0.333 = 11.087 (342)

∑be2i = ∑ y2i ∑ by2i = 18 11.087 = 6.913 (343)

R2 is not reliable for regression from the origin.


Estimation of Errors

bei = Yi 1.6X1,i + 0.873X2,i + 0.968X3,i (344)

be1 = 1 1.6 (1) + 0.873 (0) + 0.968 (1) = 0.368 (345)

be2 = 1 1.6 (1) + 0.873 (1) + 0.968 (0) = 0.273 (346)

be3 = 2 1.6 (1) + 0.873 (0) + 0.968 (0) = 0.4 (347)be4 = 0 1.6 (0) + 0.873 (1) + 0.968 (0) = 0.873 (348)be5 = 4 1.6 (1) + 0.873 (2) + 0.968 (0) = 0.654 (349)be6 = 2 1.6 (0) + 0.873 (3) + 0.968 (0) = 0.619 (350)be7 = 2 1.6 (0) + 0.873 (0) + 0.968 (1) = 1.032 (351)be8 = 0 1.6 (1) + 0.873 (1) + 0.968 (1) = 1.695 (352)be9 = 2 1.6 (0) + 0.873 (0) + 0.968 (1) = 1.032 (353)


Sum of Error square, variance and covariance of Beta

∑be2i = 0.3682 + (0.273)2 + 0.42

+ (0.873)2 + (0.654)2 + (0.619)2 + 1.0322

+ (1.695)2 + 1.0322 = 6.9460 (354)

Variance of errors

var(e) = E (bεi )2 = ∑be2iN k =

6.94609 3 = 1.1577 = bσ2 (355)

covbβ =


351 bσ2 (356)

=

24 0.2 0 00 0.063 0.0160 0.016 0.254

35 (1.1577) =24 0.232 0 0

0 0.074 0.0180 0.018 0.294


varbβ1 = 0.232; var bβ2 = 0.074; var bβ1 = 0.294; (357)

covbβ1bβ2 = cov bβ1bβ3 = 0; cov bβ2bβ3 = cov bβ3bβ2 = 0; (358)

SEbβ1 =

p0.232 = 0.482;SE

bβ2 = p0.074 = 0.272;varbβ3 =

p0.294 = 0.542; (359)

tbβ1 = 1.6

0.482= 3.32; t

bβ2 = 0.8730.272

= 3.20; tbβ3 = 0.968

0.542= 1.79;

(360)


Test of Restrictions

Hypothesis H0: β1 = β2 = β3 = 0 against HA: β1 6= 0; β2 6= 0;or β3 6= 0Here J = 3 is the number of restrictionsF-test

F =(Rb r)0 [Rcov (b)R 0]1 (Rb r)

J(361)

R =

24 1 0 00 1 00 0 1

35 ; b =264 bβ1bβ2bβ3

375 ; r =24 000

35 (362)



F =

0B@24 1 0 00 1 00 0 1

35264 bβ1bβ2bβ3

37524 000

351CA0

26424 1 0 00 1 00 0 1

3524 0.232 0 00 0.074 0.0180 0.018 0.294

3524 1 0 00 1 00 0 1

3503751

0B@24 1 0 00 1 00 0 1

35264 bβ1bβ2bβ3

37524 000

351CA

J = 3(363)

See matrix_restrictions.xls for calculations.



F =

1.6 0.873 0.968

24 4.3190 0 00 13.821 0.86380 0.8638 3.455

3524 1.60.8730.968

353

(364)

F =

1.6 0.873 0.968

24 6.9104211.229432.59141

353

=23.373

= 7.79 (365)

.F(m1,m2),α = F(3,6),5% = 4.76; critical value for F at degrees of freedom of(3,6) at 5% condence interval is 4.76.F calculated is bigger than F critical => Reject null hypothesis, whichsays .H0: β1 = β2 = β3 = 0At least one of these parameters is signicant and explains variation in y,in other words accept HA: β1 6= 0; β2 6= 0; or β3 6= 0Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 128 / 243



Yi = β0 + β1X1,i + β2X2,i + β3X3,i + ....+ βkXk ,i + εi i = 1 ...N(366)

and assumptions

E (εi ) = 0 (367)

E (εixj ,i ) = 0; var (εi ) = σ2 for 8 i ; εi~N0, σ2

(368)

covar (εi εj ) = 0 (369)

Explanatory variables are uncorrelated.

E (X1,iX1,j ) = 0 (370)

Objective is to choose parameters that minimise the sum of squarederrors

Min Sbβ0bβ1bβ2...bβk = ∑ εi =Yi bβ0 bβ1X1,i bβ2X2,i .... bβkXk ,i (371)



;∂S

∂bβ0 = 0; ∂S

∂bβ1 = 0; ∂S

∂bβ2 = 0; ∂S

∂bβ3 = 0; ...... ∂S

∂bβk = 0 (372)


∑Yi = bβ0N + bβ1 ∑X1,i + bβ2 ∑X2,i (373)

∑X1,iYi = bβ0 ∑X1,i + bβ1 ∑X 21,i + bβ2 ∑X1,iX2,i (374)

∑X2,iYi = bβ0 ∑X2,i + bβ1 ∑X1,iX2,i + bβ2 ∑X 22,i (375)24 ∑Yi∑X1,iYi∑X2,iYi

35 =24 N ∑X1,i ∑X2,i


35264 bβ0bβ1bβ2

375 (376)



264 bβ0bβ1bβ2375 =


351 24 ∑Yi∑YiX1,i∑YiX2,i

35 (377)

β =X 0X

1 X 0Y (378)

bβ0 =


N ∑X1,i ∑X2,i


(379)



bβ1 =


N ∑X1,i ∑X2,i


(380)

bβ2 =


N ∑X1,i ∑X2,i


(381)


Evaluate the determinant

X 0X =

N ∑X1,i ∑X2,i∑X1,i ∑X 21,i ∑X1,iX2,i∑X2,i ∑X1,iX2,i ∑X 22,i

(382)

jX 0X j =N ∑X 21,i ∑X 22,i +∑X1,i ∑X1,iX2,i ∑X2,i +∑X2,i ∑X1,iX2,i ∑X1,i∑X2,i ∑X2,i ∑X 21,i N ∑X1,iX2,i ∑X1,iX2,i ∑X 22,i ∑X1,i ∑X1,iDederminant = (cross product from to left to right - cross product frombottom left to right)

X 0X =


N ∑X1,i∑X1,i ∑X 21,i∑X2,i ∑X1,iX2,i

(383)


Multicollinearity problem: Singularity

Signicant R2 but insignicant t-ratios. why?In existence of exact multicollinearity X 0X is singular, i.e. jX 0X j = 0X1,i = λX2,ijX 0X j =N ∑X 21,i ∑X 22,i +∑X1,i ∑X1,iX2,i ∑X2,i +∑X2,i ∑X1,iX2,i ∑X1,i∑X2,i ∑X2,i ∑X 21,i N ∑X1,iX2,i ∑X1,iX2,i ∑X 22,i ∑X1,i ∑X1,iSubstituting out X1,ijX 0X j =Nλ2 ∑X 22,i ∑X 22,i + λ2 ∑X2,i ∑X 22,i ∑X2,i + λ2 ∑X2,i ∑X 22,i ∑X2,iλ2 ∑X2,i ∑X2,i ∑X 22,i Nλ2 ∑X 22,i ∑X 22,i λ2 ∑X 22,i ∑X2,i ∑X2,i= 0

X 0X =

N λ ∑X2,i ∑X2,iλ ∑X2,i λ2 ∑X 22,i λ ∑X2,iX2,i∑X2,i λ ∑X2,iX2,i ∑X 22,i

= 0 (384)


Parameters are indeterminate in model with exactmulticollinearity

bβ0 =


0

= ∞ (385)

bβ1 =


0

= ∞ (386)

bβ2 =


0

= ∞ (387)


Covariance of parameters cannot be estimated in modelwith exact multicollinearity

X 0X

1= ∞ (388)

covbβ =

0B@ var(bβ1) cov(bβ1bβ2) cov(bβ1bβ3)cov(bβ1bβ2) var(bβ2) cov(bβ2bβ3)cov(bβ1bβ3) cov(bβ2bβ3) var(bβ3)

1CA = ∞ (389)

covbβ = X 0X 1 σ2 = ∞ (390)

covbβ =


351 bσ2 = ∞ (391)


Numerical example of exact multicollinearity

Table: Data for a multiple regression

y 3 5 7 6 9 6 7x1 1 2 3 4 5 6 7x2 5 10 15 20 25 30 35

Evaluate the determinant

X 0X =


=7 28 14028 140 700140 700 3500

;(392)24 ∑Yi

∑YiX1,i∑YiX2,i

35 =24 43188980


Numerical example of exact multicollinearity

jX 0X j =N ∑X 21,i ∑X 22,i +∑X1,i ∑X1,iX2,i ∑X2,i +∑X2,i ∑X1,iX2,i ∑X1,i∑X2,i ∑X2,i ∑X 21,i N ∑X1,iX2,i ∑X1,iX2,i ∑X 22,i ∑X1,i ∑X1,i= (7 140 3500+ 28 700 140+ 140 700 28140 140 140 7 700 700 28 28 3500) = 0You can evaluate determinants easily in excel using following steps:1. select the cell where to put the result.and press shift and controlcontinously by two ngers of left hand2. use mouse by right hand to choose math and trig function3. choose MDETERM4. Select matrix for which to evaluate the determinant5. press OK and you will see the reslut.


Normal equations in deviation form

" bβ1bβ2#=

∑ x21,i ∑ x1,ix2,i

∑ x1,ix2,i ∑ x22,i

1 ∑ yix1,i∑ yix2,i

(393)

β =X 0X

1 X 0Y (394)

bβ1 = ∑ yix1,i ∑ x1,ix2,i

∑ yix2,i ∑ x22,i

∑ x21,i ∑ x1,ix2,i∑ x1,ix2,i ∑ x22,i

(395)

bβ2 = ∑ x21,i ∑ yix1,i

∑ x1,ix2,i ∑ yix2,i

∑ x21,i ∑ x1,ix2,i∑ x1,ix2,i ∑ x22,i

(396)


Variances of parameters

∑ x21,i ∑ x1,ix2,i

∑ x1,ix2,i ∑ x22,i

1=

1

∑ x21,i ∑ x22,i (∑ x1,ix2,i )2

∑ x22,i ∑ x1,ix2,i

∑ x1,ix2,i ∑ x21,i

(397)

varbβ1 = ∑ x22,i

∑ x21,i ∑ x22,i (∑ x1,ix2,i )2 bσ2 (398)

varbβ2 = ∑ x21,i

∑ x21,i ∑ x22,i (∑ x1,ix2,i )2 bσ2 (399)


Variance Ination Factor in Inexact Multicollinearity

Let correlations between X1,i and X2,i be given by r12. Then Varianceination factor is 1

(1r 212)

varbβ2 =

∑ x21,ih∑ x21,i ∑ x22,i (∑ x1,ix2,i )

2ibσ2

=1h

∑ x 21,i ∑ x 22,i∑ x 21,i

(∑ x1,i x2,i )2

∑ x 21,i

ibσ2=

1

∑ x22,ih

∑ x 21,i∑ x 21,i

(∑ x1,i x2,i )2

∑ x 22,i ∑ x 21,i

ibσ2=

1

∑ x22,i [1 r212]σ2

=1

(1 r212)1

∑ x22,iσ2 (400)


Solutions for Multicollinearity Problem

When Variance is high the standard errors are hish and that makest-statistics very small and insignicant

SEbβ2 =

rvarbβ2 ; SE bβ1 = rvar bβ1 ;

tbβ1 =bβ1 β1

SEbβ1 ; tbβ2 =

bβ2 β2

SEbβ2 (401)

.since 0 < r12 < 1 it raises the variance and hence stancard errors andlowers t-values.First detect the pairwise correlations between explalantory variables suchX1,i and X3,i be given by r12.Drop highly correlated variables.Adopts Kliens rule of thumb:compare R2y from overall regression to R2x from auxiliary regression.Determine multicollinearity if R2x > R

2y . Drop highly correlated variables.


Heteroreskedastity


Yi = β1 + β2Xi + εi i = 1 ...N (402)

and OLS assumptions

E (εi ) = 0 (403)

E (εixi ) = 0 (404)

var (εi ) = σ2 for 8 i (405)

covar (εi εj ) = 0 (406)

Then the OLS Regression coe¢ cients are:


; bβ1 = Y bβ2X (407)

Heteroskedasticity occurs when variances of errors are not constant,var (εi ) = σ2i variance of errors vary for each i . This is mainly a crosssection problem.


Main reasons for heteroskedasticity

Learning reduces errors;

driving practice, driving errors and accidentstyping practice and typing errors,defects in productions and improved machinesexperience in jobs reduces number of errors or wrong decisions

Improved data collection: better formulas and good software

More heteroscedasticity exists in cross section than in time series data.




Nature of Heteroskedasticity

E (εi )2 = σ2i (408)


(409)

Ebβ2 = ∑wiyi (410)

where

wi =xi

∑ x2i=

Xi X

∑Xi X

2 (411)

Varbβ2 = var

"∑Xi X

∑Xi X

2#var (yi ) =

∑ x2i σ2i

[∑ x2i ]2 (412)







OLS Estimator is still unbiased


= ∑wiyi (413)

Ebβ2 = E ∑wiyi

= E ∑wi (β1 + β2Xi + εi ) (414)

Ebβ2 = β1E

∑wi

+ β2E

∑wixi

+ E

∑wi εi

(415)

Ebβ2 = β2 (416)


OLS Parameter is ine¢ cient with Heteroskedasticity

Ebβ2 = ∑wiyi (417)

Ebβ2 = E ∑wiyi

= E ∑wi (β1 + β2Xi + εi ) (418)

Ebβ2 = β1E

∑wi

+ β2E

∑wixi

+ E

∑wi εi

(419)

Ebβ2 = β2 + E

∑wi εi

(420)

Varbβ2 = E hE bβ2 β2

i2= E

∑wi εi

2(421)

Varbβ2 = E ∑ ∑w2i ε2i

+∑ ∑ cov (εi εj )

2 (422)

Varbβ2 = ∑ x2i σ2i

[∑ x2i ]2 (423)


OLS Estimator is inconsistent assymptotically

Varbβ2 = ∑ x2i σ2i

[∑ x2i ]2 (424)

Varbβ2

lim N ! ∞

=∑ x2i σ2i

[∑ x2i ]2 ) ∞

lim N ! ∞

(425)


Various tests of heteroskedasticity

Spearman Rank Test

Park Test

Goldfeld-Quandt Test

Glesjer Test

Breusch-Pagan,Godfrey test

White Test

ARCH test(See food_hetro.xls excel spreadsheet for some exmaples on how tocompute these. Gujarati (2003) Basic Econometrics,McGraw Hill is agood text for Heteroskedasticity; x-hetro test in PcGive).


GLS Solution of the Heteroskedasticity Problem WhenVariance is Known

Yiσi=

β1σi+ β2

Xiσi+

εiσi

i = 1 ...N (426)

Variance of error in this this tranformed equation equals 1 :

var

εiσi

=

σ2iσ2i= 1

ifσ2i = σ2Xi (427)

YiXi=

β1Xi+ β2 +

εiXi

; var

εixi

=

σ2x2ix2i

= σ2 (428)


In matrix notationβOLS =

X 0X

1 X 0Y (429)

βGLS =X 0Ω1X

1 X 0Ω1Y

(430)

Ω1 is inverse of variance covariance matrix.

Ω = Eee 0=

2664σ21 σ12 .. σ1nσ21 σ22 .. σ2n

: : : :σn1 σn2 .. σ2n

3775 (431)

P 0ΩP = I ; P 0P = Ω1 (432)


Spearman rank test of heteroskedactity

rs = 1 6∑id2i

n (n2 1) (433)

steps:run OLS of y on x.obtain errors erank e and y or xnd the di¤erence of the rankuse t-statistics if ranks are signicantly di¤erent assuming n > 8 andrank correlation coe¢ cient ρ = 0.

t = 1 6 rspn 2p1 r2s

with df (n 2) (434)

If tcal > tcrit there is heteroskedasticity.


Glesjer Test of heteroskedasticity

ModelYi = β1 + β2Xi + ei i = 1 ...N (435)

There are a number of versions of it:

jei j = β1 + β2Xi + vi (436)

jei j = β1 + β2pXi + vi (437)

jei j = β1 + β21Xi+ vi (438)

jei j = β1 + β21pXi+ vi (439)

jei j =q

β1 + β2Xi + vi (440)

jei j =q

β1 + β2X2i + vi (441)

In each case dot-test H0 : βi = 0 against HA : βi 6= 0. If is signicantthen that is the evidence of heteroskedasticity.


White test

White test of heteroskedasticity is more general testYi = β0 + β1X1,i + β2X2,i + εi i = 1 ...N

run OLS and obtain error squares be2iregress be2i = α0 + α1X1,i + α2X2,i + α3X 21,i + α4X 22,i + α5X1,iX2,i + viCompute test statistics n.R2 = χ2dfIf the calculated χ2df value is greater that the χ2df table value then,there is evidence of heteroskedasticity.


White test of heteroskedasticity

1 This is a more general test.2 Model Yi = β0 + β1X1,i + β2X2,i + β3X3,i + εi3 Run OLS to this and get be2i as:4 be2i = α0 + α1X1,i + α2X2,i + α3X3,i + α4X 21,i + α5X 22,i + α6X 23,i +

α7X1,iX2,i + α8X2,iX3,i .+ εi5 Compute the test statistics6 n.R2 χ2df7 Again if the calculated χ2df is greater than table value there is anevidence of heteroskedasticity.


Park test of heteroskedasticity


Error square:σ2i = σ2X β

i evii (443)

Or taking loglnσ2i = ln σ2 + β2Xi + vi (444)

steps : run the OLS regression for (Yi ) and get the estimates of errorterms (ei ) .

Square ei and then run a regression of lne2i with x variable. Do t-testH0 : β2 = 0 against HA : β2 6= 0. If is signicant then that is theevidence of heteroskedasticity.


Goldfeld-Quandt test of heteroskedasticity


Steps:

Rank observations in ascending order of one of the x variableOmit c numbers of central observations leaving two groups NC2 withnumber of osbervationsFit OLS to the rst NC2 and the last NC2 observations and nd sumof the squared errors from both of them.Set hypothesis σ21 = σ22 against σ21 6= σ22 .compute λ = ERSS2/df 2

ERSS1/df 1 .It follows F distribution.


Breusch-Pagan,Godfrey test of heteroskedasticity

Yi = β0 + β1X1,i + β2X2,i + β3X3,i + ....+ βkXk ,i + εi i = 1 ...N

run OLS and obtain error squares

Obtain average error square bσ2 = ∑ie2i

n and pi =e2ibσ2

regress pi on a set of explanatory variables

pi = α0 + α1X1,i + α2X2,i + α3X3,i + ....+ αkXk ,i + εi

obtain squares of explained sum (EXSS)

θ = 12 (EXSS)

θ = 1m1 (EXSS) χ2m1

H0 : α0 = α1 = α2 = α3 = .. = αk = 0

No heteroskedasticity and σ2i = α1 a constant. If calculated χ2m1 isgreater than table value there is an evidence of heteroskedasticity.


ARCH test of heteroskedasticity

Engle (1987) autoregressive conditional heteroskedasticy (ARCH): moreuseful for time series dataModel Yt = β0 + β1X1,t + β2X2,t + β3X3,t + ....+ βkXk ,t + etεt N

0,α0 + α2e2t1

σ2t = α0 + α2e2t1 (446)

1 Here σ2t not observed. Simple way is to run OLS of Yt and get be2t2 ARCH (1)3 be2t = α0 + α2be2t1 + vt4 ARCH (p)5 be2t = α0 + α2be2t1 + α3be211 + α4be211 + ..+ αpbe21p + vt6 Compute the test statistics7 n.R2 χ2df8 Again if the calculated χ2df is greater than table value there is anevidence of ARCH e¤ect and heteroskedasticity.

9 Both ARCH and GARCH models are estimated using iterativeMaximum Likelihood procedure.


GARCH tests of heteroskedasticity

Bollerslevs generalised autoregressive conditional heteroskedasticy(GARCH) process is more general

1 GARCH (1)σ2t = α0 + α2be2t1 + βσ2t1 + vt (447)

2 GARCH (p,q)3 σ2t = α0 + α2be2t1 + α3be2t2 + α4be2t3 + ..+ αpbe2tp + β1σ

2t1 +

β2σ2t2 + ..βqσ2tq + ..+ vt

4 Compute the test statistics n.R2 χ2df . Sometimes written as5 ht = α0 + α2be2t1 + α3be2t2 + α4be2t3 + ..+ αpbe2tp + β1ht1 +

β2ht2 + ..βqhtq + ..+ vt6 where ht = σ2t7 Various functional forms of ht : ht = α0 + α2be2t1 + β1

pht1 + vi or

ht = α0 + α2be2t1 +pβ1ht1 + β2ht2 + vi8 Both ARCH and GARCH modesl are estimated using iterativeMaximum Likelihood procedure. Volatility package in PcGiveestimates ARCH-GARCH models.


Autocorrelation


Yt = β1 + β2Xt + εt t = 1 ...T (448)

Classical assumptions

E (εt ) = 0 (449)

E (εtxt ) = 0 (450)

var (εt ) = σ2 for 8 t covar (εt εt1) = 0 (451)

In presence of autocorrelation (rst order)

εt = ρεt1 + vt (452)

Then the OLS Regression coe¢ cients are:

bβ2 = ∑ xtyt∑ x2t

; bβ1 = Y bβ2X ;bρ = ∑ etet1∑ e2t

(453)


Causes and cosequences of autocorrelation

Autocorrelation occurs when covariances of errors are not zero,covar (εt εt1) 6= 0 covariance of errors are nonnegative This is mainly aproblem observed in time series data.Causes of autocorrelation

inertia , specication bias, cobweb phenomena

manipulation of data

Consequences of autocorrelation

Estimators are still linear and unbiased, but

they there not the best, they are ine¢ cient.

Remedial measures

When ρ is known - transform the model

When ρ is unknown estimate it and transform the model


Negative autocorrelation


Cyclical autocorrelation


Nature of Autocorrelation


(454)

Ebβ2 = ∑wtyt (455)

whereE (εt )

2 = σ2 (456)

Ebβ2 = β2 + E

∑wt εt

(457)

Ebβ2 = β1E

∑wt

+ β2E

∑wtxt

+ E

∑wt εt

(458)


i2= E

∑wt εt

2(459)

Varbβ2 = 1

∑ x2tσ2 + 2∑ ∑

(xtxt1)

[∑ x2t ]2 cov (εt εt1) (460)


OLS Estimator is still unbiased

εt = ρεt1 + vt (461)


= ∑wtyt (462)

Ebβ2 = E ∑wtyt

= E ∑wt (β1 + β2Xt + εt ) (463)

Ebβ2 = β1E

∑wt

+ β2E

∑wtxt

+ E

∑wt εt

(464)

Ebβ2 = β2 (465)


OLS Parameter is ine¢ cient with Autocorrelation

Ebβ2 = ∑wtyt (466)

Ebβ2 = E ∑wtyt

= E ∑wt (β1 + β2Xt + εt ) (467)

Ebβ2 = β1E

∑wt

+ β2E

∑wtxt

+ E

∑wt εt

(468)

Ebβ2 = β2 + E

∑wt εt

(469)


i2= E

∑wt εt

2(470)

Varbβ2 = E ∑ ∑w2t ε2t

+ 2∑ ∑wtwt1cov (εt εt1)

2 (471)


Variance of OLS parameter in presence of autocorrelation

Varbβ2 =

1

∑ x2tσ2

"1+ 2

∑ xtxt1[∑ x2t ]

cov (εt εt1)pvar (εt )

pvar (εt1)

#* var (εt ) = var (εt1) (472)

Varbβ2 = 1

∑ x2tσ2

24 1+ 2∑(xtx )(xt1x )∑ x 2t

ρ1+

+2∑(xtx )(xt1x )∑ x 2t

ρ2 + ..+ 2∑(xtx )(xt1x )∑ x 2t

ρs

35(473)


OLS Estimator is inconsistent assymptotically

Varbβ2 = 1

∑ x2tσ2

24 1+ 2∑(xtx )(xt1x )∑ x 2t

ρ1+

+2∑(xtx )(xt1x )∑ x 2t

ρ2 + ..+ 2∑(xtx )(xt1x )∑ x 2t

ρs

35(474)

Varbβ2

lim N ! ∞

=1

∑ x2tσ2

24 1+ 2∑(xtx )(xt1x )∑ x 2t

ρ1 + 2∑(xtx )(xt1x )∑ x 2t

ρ2

+..+ 2∑(xtx )(xt1x )∑ x 2t

ρs

35) ∞

(475)


Durbin-Watson Distribution


Durbin-Watson test of autocorrelation

d =

T∑t=1(et et1)2

T∑t=1e2t

(476)

d =

T∑t=1

e2t 2etet1 + e2t1

T∑t=1e2t

= 2 (1 ρ) (477)


Autocorrelation and Durbin-Watson Statistics

d = 2 (1 ρ) (478)

ρ = 0 =) d = 2 (479)

ρ = 1 =) d = 4 (480)


Autocorrelation

Estimates of β1and β2 are given in this table. Both are statisticallysignicant as the overall model is.

Table: Consumption on income and prices (double log model): Estimates ofelasticities

Coe¢ cient Standard Error t-value t-probIntercept 3.164 0.705 4.49 0.000Log income 1.143 0.156 7.33 0.000Log prices -0.829 0.036 -23.0 0.000R2 = 0.97 , F = 266 (0.00) , DW = 1.93 , N = 17.

χ22 = 0.355 [0.837] ; Arch F[1, 12] = 1.01 [0.33]

Caclulated value of Durbin-Watson statistics is 1.93. TheoreticalDurbin-Watson table values for N =12 are dL = 0.812 and du = 1.579Clearly the computed Durbin-Watson statistics is 1.93 is above thesevalues. There is no evidence of statistically signicant autocorrelation inthis problem.


Steps for testing Autocorrelation in PcGive

1. Run the regression using single equation dynamic model ineconometrics package.2. look at the Durbin-Watson statistics (d=2 means no autocorrelation).2.click test/test3. select error autocorrelation test and choose the order of autocorrelation3. Error autocorrelation coe¢ cients in auxiliary regression:Lag Coe¢ cient Std.Error1 -0.12025 0.33162 -0.20083 0.4231RSS = 0.0132044 sigma = 0.00110037Testing for error autocorrelation from lags 1 to 2Chi^2(2) = 0.51033 [0.7748] and F-form F(2,12) = 0.18569 [0.8329]4. read above estimates. Here the autocorrelation is not signicant.5. Use normality of errors.


UK supply function

Estimation of UK supply function. Here Yt is output index or nationalincome Xt is ination or price index.

Yt = β0 + β1Xt + εt t = 1 ...T (481)

Estimates below are from quarterly data, 1960:1 to 2008:3

Table: National income on GDP Deator of UK (Spurious regression)

Coe¢ cient Standard Error t-value t-probIntercept -26199.8 2729 -9.60 0.000Deator 2766.23 46.91 59.0 0.000R2 = 0.95 , F(1,183) = 3477 [0.000]** , DW = 0.0269 , N =185


Residual function

Supply function should be positively sloped; result here show the deatorhas positive expected sign and t-value is very signicant. However, thisrelation is spurious because R2 > DW . This is called a spurious regressionbecause variables are non-stationary. Spurious regression is meansingless.It is evident for nonnormality of errors. When DW = 0.0269autocorrelation is close to 1. bρ. It can be estimated from the residual.

Table: Estimation of autocorrelation UK supply function

Coe¢ cient Standard Error t-value t-probResidual lag1 1.005 0.01237 81.3 0.000Intercept 237.045 263.0 0.901 0.369R2 = 0.97 , (1,182) = 6604 [0.000]** , DW = 2.2 , N =184

The estimated value of bρ is close 1.005 and this is signicant.Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 183 / 243

UK supply function

One remedy solve the autocorrelation problem is to take the rst di¤erenceof the dependent variable.

Table: Change in income on GDP Deator of UK (Spurious regression)

Coe¢ cient Standard Error t-value t-probIntercept 129.5 422.0 0.307 0.759Deator 34.005 7.217 4.71 0.000R2 = 0.11 , F(1,181) = 22.2 [0.00]* , DW = 2.37 , N =183

Taking the rst di¤erence has solved the problem of autocorrelation. For N =200upper and lower bounds of Durbin-Watson table values are dL = 1.758 anddu = 1.778 . It indicates a negative autocorrelation but this seems to lie in theDurbin Watson tables inconclusive region. It is di¢ cult to be denite on theevidence of autocorrelation as the calculated statistics falls in the inconclusiveregion. Let us check this by estimating value of bρ . This is now -0.19 and issignicant though far away from 1.005 seen above.


Residual

Table: Estimation of autocorrelation UK supply function

Coe¢ cient Standard Error t-value t-probResidual lag1 -0.19 0.074 -2.64 0.009Intercept 6.74 242.1 0.03 0.978R2 = 0.04, F(1,180) = 6.984 [0.000]** , DW = 2.2 , N =182

UK supply function

Table: Growth rate of income on ination in UK

Coe¢ cient Standard Error t-value t-probIntercept 0.010 0.003 3.10 0.002Ination 0.736 0.1578 4.66 0.000R2 = 0.11 , F(1,183) = 21.8 [0.00]* , DW = 2.7 , N =182


Growth and GDP deator

Table: Growth rate of income on GDP Deator of UK (Spurious regression)

Coe¢ cient Standard Error t-value t-probIntercept 0.028 0.004 6.95 0.000Deator -0.00014 6.923e-005 -2.04 0.043R2 = 0.022 , F(1,183) = 4.2 [0.04]* , DW = 2.7 , N =183

Looking at these latest two tables there is evidence for aggregate supplyfunction for the UK, though there is slight evidence of negativeautocorrelation.


Growth and decit in UK

Table of results summarising all above calculations are presented as:

Table: Growth on net borrowing

Coe¢ cient Stadard Error t-valueIntercept 3.283 0.783 4.191Net borrowing 0.349 0.133 2.613R2 = 0.406 , F = 6.147 , N = 12.



Theoretical values of t are given in a t Table. Column of t-table have levelof signicance (α) and rows have degrees of freedom.Here tα,df is t-table value for degrees of freedom (df = n k) and α levelof signicance. df = 12-2=10.

Table: Relevant t-values (one tail) fron t-Table

(n, α) 0.05 0.025 0.0051 6.314 12.706 63.6572 2.920 4.303 9.92510 1.182 2.228 3.169

tbβ1 = 4.191 > tα,df = t0.05,10 = 1.182. Thus the intercept isstatistically signicant; t

bβ2 = j2.613j > tα,df = t0.05,10 = 1.182. Thusthe slope is also statistically signicant at 5% and 2.5% level ofsignicance.



F- is ratio of two χ2 distributed variables with degrees of freedom n2 andn1.

Fcalc =∑ by 2iK1∑ be2iNK

=22.961

33.62910

= 6.15 (482)

Table: Relevant F-values from the F-Table1% level of signicance 5% level of signicance

(n2, n1) 1 2 3 1 2 31 4042 4999.5 5403 161.4 199.5 215.72 98.50 99.00 99.17 18.51 19.00 19.1610 10.04 7.56 6.55 4.96 4.10 3.71

n1 = degrees of freedom of numerator; n2 =degrees of freedom ofdenominator; for 5% level of signicance Fn1,n2 = F1,10 = 4.96;Fcalc > F1,10;for 1% level of signicance Fn1,n2 = F1,10 = 10.04;Fcalc < F1,10 =)imply that this model is not statistically signicant at 1%but signigicant at 5% level of signicance. Model is meaningful.



Testing autocorrelation

d =

T∑t=1(et et1)2

T∑t=1e2t

=58.47333.629

= 1.74 (483)

bρ =T∑t=1etet1

T∑t=1e2t

=1.283233.629

= 0.0381 (484)


Durbin-Watson Table (part of it)

Table: Durbin-Watson Tables relevant part

5% level of signicanceK = 2 K = 3 K = 4du dL du dL du dL

9 0.824 1.320 0.629 1.699 0.455 2.12810 0.879 1.320 0.697 1.641 0.525 2.01612 0.971 1.331 0.812 1.579 0.658 2.864


,

Here the calcualated Durwin-Watston statistics d = 1.74 > dL12,2 = 1.331dun1,n2 = du12,2 = 0.971 dLn1,n2 = dL12,2 = 1.331Autocorrelation is positive becasue d = 1.74 < 2 but that autocorrelationis not statistically signicant.The calcualted DW valued = 1.74 is clearlyout of the inconclusive region as it does not fall in the range


Testing for heteroskedactity

One way is to regress predicted square errors be2i in predicated square of y,bY 2i The test statistics nR2 χ2df df =1 here.

be2i = α0 + α1 bY 2i + vi (485)

n.R2 = 6.089

Table: Table Values of Chi-Square

(n, α) 0.10 0.05 0.011 2.7055 3.8415 6.63492 4.605 5.991 9.21010 15.987 18.307 23.209

Null hypothesis is no heteroskedasticity. nR2 = 6.089 > χ2df = 2.7055=) there is heteroskedasticity. White test or ARCH and AR test suggestthere is slight problem of heteroskedasticity in the errors in this model.However, heteroskedasticity is more serious for cross section than for timeseries. Therefore conclusion of above model are still valid.


Transformation of the model in the presence ofautocorrelation

when autocorrelation coe¢ cient is known

Yt = β1 + β2Xt + εt t = 1 ...T (486)

εt = ρεt1 + vt (487)

Yt ρYt1 = (β1 ρβ1) + β2 (Xt ρXt1) + εt ρεt1 (488)

Y t = β1 + β2Xt + ε

t(489)

Apply OLS in this transformed model β1 and β2 will have BLUE properties.


Transformation of the model in the presence ofautocorrelation

when autocorrelation coe¢ cient is unknown

This method is similar to the above ones, except that it involves multipleiteration for estimating ρ. Steps are as following:1. Get estimates bβ1 and bβ2 from the original model; get error terms beiand estimate bρ2. Transform the original model multiplying it by bρ and by taking therst di¤erence,3. Estimate bbβ1 and bbβ2 from the transformed model and get errors bbe iof this transformed model4. Then again estimate bbρ and use those values to transform theoriginal model as

Yt bρYt1 = (β1 bρβ1) + β2 (Xt bρXt1) + εt bρεt1 (490)

5. Continue this iteration process until bbρ converges.PcGive suggests using di¤erences in variables. Diagnos /ACF options inOLS in Shazam will generate these iterations.


GLS to solve autocorrelation

In matrix notationβOLS =

X 0X

1 X 0Y (491)

βGLS =X 0Ω1X

1 X 0Ω1Y

(492)

Ω1 is inverse of variance covariance matrix.



Take a regression

Y = X β+ e (493)

Assumption of homoskedasticity and no autocorrelation are violated

var (εi ) 6= σ2 for 8 i (494)

covar (εi εj ) 6= 0 (495)

The variance covariance of error is given by

Ω = Eee 0=

2664σ21 σ12 .. σ1nσ21 σ22 .. σ2n

: : : :σn1 σn2 .. σ2n

3775 (496)

Q 0ΩQ = Λ (497)



Ω = QΛQ 0 = QΛ12 Λ

12Q 0 (498)

P = QΛ12 (499)

P 0ΩP = I ; P 0P = Ω1 (500)

Transform the model

PY = βPX + Pe (501)

Y = βX + e (502)

Y = PY X = PX and e = PeβGLS = (X

0P 0PX )1 (X 0P 0PY )

βGLS =X 0Ω1X

1 X 0Ω1Y

(503)


Dummy Variables in a Regression Model

It represents qualitative aspect or characteristic in the dataQuality : good, bad; Location: south/north/east/west;characterisitcs: fat/thin or tall/shortTime: Annual 1970s/ 1990s.; seasonal: Summer,Autumn, Winter,Spring;Gender: male/female; Education: GCSE/UG/PD/PhDSubjects: Math/English/Science/EconomicsEthnic backgrounds: Black, White, Asian, Cacasian, European,American, Latinos, Mangols, Ausis.

Yi = β1 + β2Xi + β2Di + εi i = 1 ...N (504)

εi~N0, σ2

(505)

Here Di is special type of variable

Di =Z

1 = if the certain quality exists0 = otherwise

(506)


Dummy Variables in a Regression Model

Three types of dummy1 Slope dummy2 Intercept dummy3 Interaction between slope and interceptExamples

Earnding di¤erences by gender, region, ethnicity or religion, occupation,education level.Unemployment duration by gender, region, ethnicity or religion,occupation, education level.Demand for a product by by weather, season, gender, region, ethnicityor religion, occupation, education level.Test scores by gender, previous background, ethnic originGrowth rates by decades, countries, exchange rate regimes


Dummy Variables Trap

Consider seasonal dummies as

Yi = β1 + β2Xi + β2D1 + β2D2 + β2D3 + β2D4 + εi (507)

where

D1 =Z

1 = if summer0 otherwise

(508)

D2 =Z

1 = if autumn0 otherwise

(509)

D3 =Z

1 = if winter0 otherwise

(510)

D4 =Z

1 = if spring0 otherwise

(511)

Since ∑Di = 1, it will cause multicollinearity as:

D1 +D2 +D3 +D4 = 1 (512)

drop on of Di to avoid the dummy variable trap.Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 201 / 243

Dummy Variables in a piecewise linear regression models

Threshold e¤ects in sales

tari¤ charges by volume of transaction -mobile phones

Panel regression: time and individual dummies

Pay according to hierarchy in an organisation

prot from whole sale and retail sales

age dependent earnings -Scholarship for students, pensions andallowances for elderly

tax allowances by level of income or business

Investment credit by size of investment

prices, employemnts, prots or sales for small, medium and large scalecorporations

requirements according to weight or hight of body


Analysis of Structural change in decit regime

Suppose scal policy regimes have changed since 1970.

Regresses growth rate of output (Yi ) on netborrowing (Xi ) as:Expansionary scal policy regime from 1970 to 1990

Yi = β1 + β2Xi + ei i = 1 ...T

Balanced scal policy regime from 1990 to 2009

Yi = γ1 + γ2Xi + ei i = 1 ...T

H0 : Link between growth and decit has remained the same β1 = γ1;β2 = γ2HA : There has been a shift in regime


Chow Test for stability of parameters or structural change

Use n1 and n2 observations to estimate overall and separateregressions with (n1+n2-k, n1-k, and n2-k) degrees of freedoms;obtain sum square of residual (SSR1) with n1+n2-k dfs assuming thatβ1 = γ1; β2 = γ2; : for the whole sample (restricted estimation)SSR2 (with n1-k dfs): rst sampleSSR3 (with n2-k dfs): second sampleSSR4 = SSR2+ SSR3 (with n1+n2-2k dfs): unrestricted sum squareerrorsobtain S5 = S1-S4;do F-test

F =S5kS4

(n1+n22k )(513)

The advantage of this approach to the Chow test is that it does notrequire the construction of the dummy and interaction variables.


Distributed Lag Model

Ct = β0 + β1Xt + β2Xt1 + β3Xt2 + β4Xt3 + ...++βkXtk + εt

t = 1 ...T (514)

Reasons for lags Psychological reasons: it takes time to believe something. Technological reasons: takes time to change new machines or toupdate. Institutional reasons: rules, regulations, notices, contracts.Lagged marginal e¤ect in consumption of an increase in income at period0.


Koycks Model

short run multiplier : β1 intermediate run multiplier: β1 + β2 + β3 long run multiplier: ∑ β1 + β2 + β3 + ...+ βk proportion of the long run impact at a certain period: β =

β1β

Koycks procedure: β2 = λβ1; β3 = λ2β1; βk = λk β1 and so on.

Ct = β0+ β1Xt +λβ1Xt1+λ2β1Xt2+λ3β1Xt3+ ...+λk β1Xtk + εt(515)


Koycks procedure

Koyck procedure converts distributed lag model into an autoregressivemodel. It involves (a) multiplying (2) by λ, which is between 0 and 1, 0 <λ < 1; (b) takking one period lag of that and (c) subtracting from (2)

λCt = λβ0 + λβ1Xt + λ2β1Xt1 + λ3β1Xt2

+λ4β1Xt3 + ...+ λk+1β1Xtk + εt (516)

Ct = β0 + β1Xt + λβ1Xt1 + λ2β1Xt2 + λ3β1Xt3

+...+ λk β1Xtk + εt (517)

λCt1 = λβ0 + λβ1Xt1 + λ2β1Xt2 + λ3β1Xt3 + λ4β1Xt4

+...+ λk+1β1Xtk1 + εt1 (518)


Koycks procedure

Take the di¤erence between these two

Ct λCt1 = (1 λ) β0 + β1Xt + λk β1Xtk + εt εt1 (519)

Term λk β1Xtk ! 0 as 0 < λ < 1

Ct = (1 λ) β0 + β1Xt + λCt1 + ut (520)

ut = εt εt1By cancelling terms it transforms to an autoregressive equation asfollowing:

In steady state Ct = Ct1 = C ;

C = β0 +β1

(1 λ)Xt +

ut(1 λ)

(521)

term β1(1λ)

gives the long run impact of the change in Xt on CtDr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 208 / 243

Choice of Length of Lag in Koyck Model

Median lag: - log 2log(λ) : 50% of the long run impact is felt over this lag

Mean lag:

∞∑k=0kβk

∞∑k=0

βk

: mean of the total impact

Koyck mean lag: λ(1λ)

: average lag lengthHow to choose lag lengthMinimise Akaiki information criteria

AIC = lnSSENT N +

2 (n+ 2)T N (522)

Minimise Swartz criteria (minimise these values)

SC (N) = lnSSENT N +

2 (n+ 2) ln (T N)T N (523)


Problems with Koyck Model

It is very restrictive. The successive coe¢ cient may not declinegeometrically when 0 < λ < 1. There is no a-priori guide to the maximum length of lag ; Tinbergensuggests to use trial and error, rst regress Ct on Xt and Xt1, if thecoe¢ cients are signicant, keep introducing lagged terms of higher order. But more lags implies fewer degrees of freedom Multicollinearity may appear Data mining Autoregressive term is correlated with the error term,Durbin-Watson statistics cannot be applied in this case. Need to useDurbin-h statistics which is dened as

h =1 1

d

sT 1

1 (T 1) SE (β2)2 (524)


Almons polynomial lag model

Koyck procedure is very restrictive where the values of coe¢ cients declinein geometric proportions. However impact of economic variables may bebetter explained by a quadratic cubic or higher order polynomial of theform:

Ct = β0+ β1Xt +λβ1Xt1+λ2β1Xt2+λ3β1Xt3+ ...+λk β1Xtk + εt(525)

quadratic impact structure: βi = α0 + α1 i + α2 i2 + α3cubic impact structure: βi = α0 + α1 i + α2 i2 + α3 i3k-order polynomial lag structure:βi = α0 + α1 i + α2 i2 + α3 i3 + ...+ αk ik

Ct = β0 +β1

(1 λ)Xt +

ut(1 λ)

(526)

Ct = β0 +∞

∑k=0

α0 + α1 i + α2 i2 + α3 i3 + ...+ αk ik

Xt1 + ut

(527)Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 211 / 243

Advantages of Almon model over Koyck

a. Flexible; can incorporate variety of lagb. do not have to decline geometrically, Koyck had rigid lag structurec. No lagged dependent variable in the regressiond. Number of coe¢ cient estimated signicantly smaller than in theKoyck modele. is likely to be multicollinear.Estimates of a polynomial distributed lag model


Autoreggressive Distributed Lag Model: ARDL (1,1)

Yt = µ+ β0Xt + β1Xt1 + γYt1 + εt (528)

This can be represented by an innitely distributed lag as following

Yt = µ+ β0Xt + β1Xt1 +l

∑i=0

γil (β1 + γβ0)Xt1 + εt (529)

lag weights:α0 = β0; α1 = (β1 + γβ0) ; α2 = γ (β1 + γβ0) = γ2α1; ....,αS = γSα1ARDL (2,2)

Yt = µ+ β0Xt + β1Xt1 + β2Xt2 + γ1Yt1 + γ2Yt2 + εt (530)


Main Features of Simultaneous Equation System

Single equation models have Y dependent variable to be determinedby a X or a set of X variables and the error term.

one way causation from independent variables to the dependentvariable.

However, many variables in economics are interdependent and there istwo way causation.

Consider a market model with demand and supply.

Price determines quantity and quantity determines price.

Same is true in national income determination model withconsumption and income.


Main Feature of Simultaneous Equation System

Both quantities and prices ( or income and consumption) aredetermined simultaneously.

A system of equations, not a single equation, need to be estimated inorder to be able to capture this interdependency among variables.

The main features of a simultaneous equation model are:(i) two or more dependent (endogenous) variables and a set ofexplanatory (exogenous) variables(ii) a set of equations

Computationally cumbersome and errors in one equation transmittedthrough the whole system. High non-linearity in parameters.


Keynesian Model

Ct = β0 + β1Yt + ut (531)

Yt = Ct + It (532)

Here β0 and β1are structural parameters ; Income (Yt ) and consumption(Ct ) are endogenous variables and investment (It ) is exogenous variable.

Table: Coe¢ cient matrix for rank test

constant Ct Yt It-β0 1 -β1 00 -1 1 1


Derivation of the reduced Form Equation

Yt = β0 + β1Yt + ut + It (533)

Yt =β0

1 β1+

11 β1

It +1

1 β1ut (534)

Ct = β0 + β1

β0

1 β1+

11 β1

It +1

1 β1ut

+ ut (535)

Ct =β0

1 β1+

β11 β1

It +1

1 β1ut (536)


Keynesian model: Estimation of the reduced form of themodel

In the income determination model the reduced form is obtained byexpressing C and Y endogenous variables in terms of I which is the onlyexogenous variable in the model.

Ct = Π1,1 +Π1,2It + V1,t (537)

Yt = Π2,1 +Π2,2It + V2,t (538)

Cons = + 3.647*Invest + 1.272e+010(SE) (0.017) (4.34e-013)

GDP = + 5.014*Invest + 5.441e+010(SE) (0.0228) (5.8e-013)

Π1,1 =β0

1 β1= 12.72; Π1,2 =

β11 β1

= 3.65 (539)

Π2,1 =β0

1 β1= 54.41 Π2,2 =

11 β1

= 5.01 (540)Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 218 / 243

Empirical Part: Exercise in PcGive

construct data set in macroeocnomic variables ( Y, C, I , G, T , X, M,MS, i, ination, wage rate, exchange rate etc)

save data in *.csv format

Start GiveWin and PcGive and open data le

choose multiple equation dynamic modelling

determine endogenous and exogenous variables and run simultaneousequation using 3SLS or FIML

Study coe¢ cients

Change policy variables and construct few scenarios


Estimation of reduced form

Table: Simultaneous equation model of UK, 1971:2-2010:2

Consumption equation GDP equationexogenous variables Coe¢ cient tvalue prob Coe¢ cient tvalue probInvestment 3.64682 214.0 0.000 5.01427 220.0 0.000Constant 12.7228 2.932e+022 0.000 54.408 9.379e+022 0.000Vector Portmanteau(12): 836.726;Vector Normality test: Chi^2(4) = 5.6050 [0.2307]

MOD( 2) Estimating the model by FIML (using macro_2010.csv)The estimation sample is: 1971 (2) to 2010 (2)


Retrieval of the structural parameters:

β1 =Π1,2

Π2,2=

β11 β1

%1

1 β1=3.646825.01427

= 0.727 (541)

β0 = Π1,1 (1 β1) = 12.7228 (1 0.727) = 3.47 (542)

Estimated system:

bCt = 3.47+ 0.727 bYt (543)

bYt = bCt + It (544)

This seems very plausible result. Validity of this is tested by VectorPortmanteau, Vector Normality test, Vector hetro tests.


Techniques of estimation of simultaneous equation models

Single Equations Methods: Recursive OLS

Ordinary Least Squares

Indirect Least Squares

Two Stage Least Squares Method

System Method


Seemingly Unrelated Regression Equations


Rank and Order Conditions for Identication

Order condition:K k > m 1 (545)

Rank condition: =>

ρ (A) > (M 1) (M 1) (546)

order of the matrix.M = number of endogenous variables in the modelK = number of exogenous variables in the model including the interceptm = number of endogenous variables in an equationk = number of exogenous variables in a given equationRank condition is dened by the rank of the matrix, which should have adimension (M-1), where M is the number of endogenous variables in themodel.


Determining the Rank of the Matrix

Table: Coecient matrix for rank test

constant Yt It0 1 1

Rank of matrix is the order of non-singular matrix

Rank matrix is formed from the coe¢ cients of the variables (bothendogenous and exogenous) excluded from that particular equationbut included in the other equations in the model.

The rank condition tells us whether the equation under considerationis identied or not.

The order condition tells us if it is exactly identied or overidentied.


Steps for Rank Condition

Write down the system in the tabular form

Strike out the coe¢ cients corresponding to the equation to beidentied

Strike out the columns corresponding to those coe¢ cients in 2 whichare nonzero.

The entries left in the table will give only the coe¢ cients of thevariables included in the system but not in the equation underconsideration. From these coe¢ cients form all possible A matrices oforder M-1 and obtain a corresponding determinant. If at least one ofthese determinants is non-zero then that equation is identied.


Summary of Order and Rank Conditions of Identication

If (K k) > (m 1) and the order of rank ρ (A) is M-1 then theconcerned equation is overidentied.

If (K k) = (m 1) and the order of rank ρ (A) is M-1 then theequation is exactly identied.

If (K k) > (m 1) and he order of rank ρ (A) is less than M-1then the equation is underidentied.

If (K k) < (m 1) the structural equation is unidentied. The theorder of rank ρ (A) is less M-1 in this case.


Keynesian Model: Simultaneity Bias

bβ1 = ∑ ctyt∑ y2t

=∑Ct C

yt

∑ y2t=

∑Ctyt∑ y2t

(547)

bβ1 = ∑Ctyt∑ y2t

=∑ (β0 + β1Yt + ut ) yt

∑ y2t(548)

cov(Y , e) = E (Yt E (Yt ))E (ut E (ut )) = E

ut1 β1

ut =

σ2e1 β1(549)

p limbβ1 = β1 +

∑ ut yt∑ y2t

= β1 +∑ ut ytT

∑ y 2tT

= β1 +

σ2e1β1

σ2y(550)


Indentication issue in a Market Model

Consider a relation between quantity and price

Qt = α0 + α1Pt + ut (551)

A priory it is impossible to say whether this a demand or supplymodel, both of them have same variables.

If we estimate a regression model like this how can we be surewhether the parameters belong to a demand or supply model?

We need extra information. Economic theory suggests that demand isrelated with income of individual and supply may be respond to costor weather condition


Market Model

Qdt = α0 + α1Pt + α2It + u1,t (552)

Qst = β0 + β1Pt + β2Pt1 + u2,t (553)

In equilibrium quantity demand equals quantity supplied Qdt = Qst

α0 + α1Pt + α2It + u1,t = β0 + β1Pt + β2Pt1 + u2,t (554)

Solve for Pt

α1Pt β1Pt = β0 α0 + β2Pt1 + α2It + u2,t u1,t (555)

Pt =β0 α0α1 β1

α2α1 β1

It +β2

α1 β1Pt1 +

u2,t u1,tα1 β1

(556)


Reduced Form of the Market Model

Using this price to solve for quantityQdt = α0 + α1Pt + α2It + u1,t =

α0 + α1h

β0α0α1β1

α2α1β1

It +β2

α1β1Pt1 +

u2,tu1,tα1β1

i+ α2It + u1,t

Qt =α1β0 α0β1

α1 β1 α2β1

α1 β1It +

α1β2α1 β1

Pt1 +α1u2,t β1u1,t

α1 β1(557)

Pt = Π1,0 +Π1,1Pt1 +Π1,2It + V1,t (558)

Qt = Π1,0 +Π1,1Pt1 +Π1,2It + V1,t (559)


Market Model: Reduced form coe¢ cients

Π1,0 =β0 α0α1 β1

Π1,1 =α2

α1 β1Π1,2 =

β2α1 β1

V1,t =u2,t u1,t

α1 β1

Π2,0 =α1β0 α0β1

α1 β1Π2,1 =

α2β1α1 β1

Π2,2 =α1β2

α1 β1;(560)

V1,t =u2,t u1,t

α1 β1;V2,t =

α1u2,t β1u1,tα1 β1

(561)


Recursive estimation

Y1,t = β10 + γ11X1,t + γ12X2,t + e1,t (562)

Y2,t = β20 + β21Y1,t + γ21X1,t + γ22X2,t + e2,t (563)

Y3,t = β30 + β31Y1,t + β33Y2,t + γ31X1,t + γ32X2,t + e3,t (564)

Apply OLS to (1) and get the predicted value of bY1,t . Then use bY1,t intoequation (2) and apply OLS to equation (2) to get predicted value of bY2,t .And Finally use predicted values of bY1,t and bY2,t to estimate in equation(3).


Normal equations with instrumental variables

Yt = α0 + α1Xt + α2Yt1 + ut (565)

Xt1 as instrument for α2Yt1


∑Yt = bα0N + bα1 ∑Xt + bα2 ∑Xt1 (566)

∑XtYt = bα0 ∑Xt + bα1 ∑X 2t + bα2 ∑Xt1Yt1 (567)

∑Xt1Yt = bα0 ∑Xt1 + bα1 ∑XtXt1 + bα2 ∑Xt1Yt1 (568)


Normal equations with instrumental variables

This is di¤erent than the normal equations when instruments were notused.

∑Yt = bα0N + bα1 ∑Xt + bα2 ∑Yt1 (569)

∑XtYt = bα0 ∑Xt + bα1 ∑X 2t + bα2 ∑Xt1Yt1 (570)

∑Yt1Yt = bα0 ∑Yt1 + bα1 ∑XtYt1 + bα2 ∑Y 2t1 (571)


Sargan test (SARG) is used for validity of instruments

Divide variables which are uncorrelated and correlated with the errorterms X1,X2, ...,Xp and Z1,Z2, ...,Zs . Use instruments.W1,W2, ...,Wp

Obtain estimates of but from the original regression. Replace Z1,Z2, ...,Zs by instruments, W1,W2, ...,Wp . Regress on all and but exclude . Obtain R2 of the regression. Compute SARG statistics SARG = (n k)R2 where n is thenumber of observations and k is the number of coe¢ cients; SARG followsχ2 distribution with df = s p. H0: W instruments are valid if the computed SARG exceed theχ2critical value ; if H0: is rejected at least one instrument is not valid.


Two Stage Least Square Estimation (2SLS)

Consider a hybrid of Keynesian and classical model in which income Y1,t isfunction of money Y2,t investment X1,t and government spending X2,t .

Y1,t = β1,0 + β11Y2,t + γ11X1,t + γ12X2,t + e1,t (572)

Y2,t = β2,0 + β21Y1,t + e2,t (573)

First estimate Y1,t is all exogenous variables.

Y1,t = bΠ1,0 + bΠ1,1X1,t + bΠ1,2X2,t + be1,t (574)

Obtain predicted bY1,tbY1,t = bΠ1,0 + bΠ1,1X1,t + bΠ1,2X2,t (575)


Two Stage Least Square Estimation (2SLS)

Y1,t = bY1,t + be1,t (576)

In the second stage put this into the money supply equationY2,t = β2,0 + β21Y1,t + e2,t

Y2,t = β2,0 + β21

bY1,t + be1,t+ e2,t (577)

Y2,t = β2,0 + β21bY1,t + β21be1,t + e2,t (578)

Y2,t = β2,0 + β21bY1,t + e2,t (579)

e2,t = β21be1,t + e2,t (580)

Application of the OLS in this equation gives consistent estimators.


Restricted Least Square Estimation

Restrictions in Multiple Regression: Restricted Least Square Estimation(Judge-Hill-Gri¢ th-Lutkopohl-Lee (1988): 236)OLS procedure to minimise the sum of squared error terms.

MinβS (β) = e 0e = (Y βX )0 (Y βX )

= Y 0Y Y 0 (βX ) (βX )0 Y + (βX )0 (βX ) (581)

= Y 0Y 2βX 0Y + (βX )0 (βX ) (582)

∂S (β)∂β

= 2X 0Y + 2bβX 0X = 0 =) bβ = X 0X 1 X 0Y (583)

Imposing a restriction involves constrained optimisation with a Lagrangemultiplier.

L = e 0e + 2λr 0 β0R 0

= (Y βX )0 (Y βX ) + 2λ

r 0 β0R 0

= Y 0Y 2βX 0Y + (βX )0 (βX ) + 2λ

r 0 β0R 0

(584)



Partial derivation of this constrained minimisation function (Lagrangianfunction) wrt β and λ yields

∂L∂β= 2X 0X + 2X 0Xb 2λR 0 = 0 (585)

∂L∂λ= 2 (r Rb) = 0 (586)

X 0Xb = X 0Y + λR 0 (587)

b =X 0X

1 X 0Y + X 0X 1 R 0λ (588)

b = bβ+ X 0X 1 R 0λ (589)



This is the restricted least square estimator but need still to be solved forλ. For that multiply the above equation both sides by R

Rb = Rbβ+ R X 0X 1 R 0λ (590)

λ =hRX 0X

1 R 0i1 hRb Rbβi (591)

λ =hRX 0X

1 R 0i1 [r Rb] (592)

b = bβ+ X 0X 1 R 0λ = bβ+ X 0X 1 R 0 hR X 0X 1 R 0i1 [r Rb](593)

Thus the restricted least square estimator is a linear function of therestriction, [r Rb].



Thus the restricted least square estimator is a linear function of therestriction, [r Rb].

E (b) = Ebβ+ X 0X 1 R 0 hR X 0X 1 R 0i1 [r RE (b)] (594)

E (b) = Ebβ (595)

For variance we need to use property of an idempotent matrix AA=A.Such as

A =0.4 0.80.3 0.6

(596)

Recall in unrestricted case bβ = (X 0X )1 X 0Y = β+ (X 0X )1 X 0e

E (b) β =X 0X

1 X 0e+ X 0X 1 R 0 hR X 0X 1 R 0i1 hr RE (b) R X 0X 1 X 0ei(597)



Since Rb r = 0

E (b) β = MX 0X

1 X 0e (598)

Where M is the idempotent matrix:

M = I X 0X

1 R 0 hR X 0X 1 R 0i1 R (599)



The variance covariance matrix of

cov (b) = [E (b) β] [E (b) β]0 = EhMX 0X

1 X 0ee 0X X 0X 1M 0i

(600)

cov (b) = σ2MX 0X

1M (601)

cov (b) = σ2MX 0X

1 (602)

M = σ2I

X 0X

1 R 0 hR X 0X 1 R 0i1R (603)

Thus the variance of the restricted least square estimator is smaller thanthe variance of the unrestricted least square estimator.


Basic Metrics BM

Documents

iterativemaximum likelihood procedure

2 r

e 2

ce 2 x x 1 2c2

long run impact

matrix form 012

xt ut

time series data