ECOMET2 Lecture Notes

Notes in ECOMET2

1. Review of Classical Linear Regression Model

Drivers: independent variable (right hand side)

�� = �� +�� + �� + �� +�� Real world variable; �� stochasticorrandomvariable

Under the condition of uncertainty, we have to expect errors.

X’s � exogenous fixed variable

� = 1, 2, 3, ..., n

�� = intercept; value of Y when all Xs are zero

In matrix form,

� = �(� �) ��(� �)+ ��(� �) Objective:

1. To find �"# (problem of estimation)

2. To perform inferences on �"# (problem of inference)

$(�) = �� +%�&&��&'� → )*+

�, = �-� +%�&&��&'� → .*+

�& � marginal contribution of & to Y, ceteris paribus.

i. Do (statistical test) individual test

ii. Joint (overall) test

iii. Goodness of fit

i. /0:�& = 034/�:�&(<,>,≠)0

j= 1, 2, 3, ... . k

Alternative hypothesis = operational statement of our research agenda

Basis of decision: p-value� true probability of incorrect rejection 9 < 0.05<=>=?@/0

ii. /0:�� = � == �� = 0 → A4ℎC3=DE@ℎ�DF@EGEH�@ℎ�

iii. /0 = )I�-&J = �& = �-&KL = 1 − O j= 1, 2, ..., k

iv. *�: proportion of variation collectively explained Y by X’s

*P�� adjusted *�: remedy for loss of degrees of freedom

Degrees of freedom (df): number of observations left after estimation

Assumptions:

1) u is multivariate normal (MVN)

2) $(�) = 0� vanishing error expectation

3) $(QQ′) = S�T (if zero) � autocorrelation

4) = S� → variances → homoscedasticity 5) $V��W&X = 0 = exogeneityassumptionosX′s 6) A has full rank and there is no maximum number of K; X’s should not be

correlated (non-multicollinearity)

If assumptions 1-6 are valid; by Gauss-Markov Theorem then OLS is BLUE

� Unbiased, consistent, sufficient, and efficient

Limited Dependent Variable Models

CLRM requires Y to be quantitative (continuous)

�measure in interval or ratio scale

� Binary Response Models (Y is dummy)

� Differential intercept: right hand-side; dummy: left hand side

o LPM (Linear probability model)

o Logit model

o Probit model

� Multinomial Response Model (Y is multinomial)

� 3 or more response categories

o Multinomial Logit model

o Multinomial Probit model

Nominal�scale of measurement

� Ordered Response Model (Y is ordinal)

o Ordered Logit model

o Ordered Probit model

� Censored Response Model

� (selectivity bias)

o Tobit model

o Heckit model

� Discrete Dependent Variable Models

o Count models

� Poisson model

� Negative binomial model

o Duration models

Binary Response Models

�� = �� +%�&&��&'� +��

�� = 0 → ^C�_�<=; � = 1 → 4�??=44 i=1, 2, 3, ..., n

)� = Prb�� = 1| W�� , W� , … , W�� The probability that success is attained

� LPM: OLS despite Y is dummy

Consequences: ),� = �,� (predicted probability)

ANCOVA � not all in the right hand side are dummy

ANOVA � all in the right hand side are dummy

�� is not normal

Bernoulli � only 1-trial; violates normality

� $(��) = )� � mean error is not zero

� 3C<(��) = )�(1 − )�) � heteroscedastic

� *� is not reliable measure of goodnessof fit

� ), is not sure to fall within [0, 1]

e� = �� + ∑ �&�&'� W&� [index function]

Score of the ith observation(utility score)

*The bigger the z, the more probable the success

*The smaller the z, the more probable the failure

*new graph

)� = g ^(@)G@hijk where f(t) is the link function

As z becomes smaller, p is also becoming smaller

Logistic Model- logistic distribution cumulative distribution function

Logistic Link (probability distribution function) ^(@) = lmn(�olmn)p; |t|<∞ =>will result in Logit Model

Standard Normal Link (t~(N(0,1) ^(@) = �√�r =mnpp ; |t| < ∞

)� = g lmn(�olmn)phijk = -g sKKphijk

Let u=1 + =jt du= −=jtG@ )� = ��olmui= 1 − )� = �luivw = xi�jxi = =hi

= )� = luiluivw

= ln y xi�jxiz = �� + ∑ �&�&'� W&� + �� {�= logit(natural logarithm of odds ratio[ratio of success probability/failure probability]

Odds ratio also measures the probability that Y=1 relative to the probability that Y=0

Estimation using Maximum Likelihood Estimation

-estimating the betas(parameters) in such a way that the likelihood of the

sample(joint probability) is maximized

1. get 1st derivative of the likelihood function

2. get 2nd derivative for it to be maximized. Hessian needs to be negative

{� = �-� + ∑ �&�&'� W&� –sample regression function e-� = {� by maximum likelihood estimation

*test of joint hypothesis using F-statistic

Binary Response Models

Y=0,1 [failure,success] e-� = �-� + ∑ �&�&'� W&� [index function] )� = )|b� = 1|e�]- probability that success is attained

LPM( OLS) Y is dummy

Logit( Logistic Link)

Probit(Standard Normal Link)

*Link function is a probability distribution function which will link the index from ~� to

[0,1]

Logistic pdf for the logit model )� = � ^(@)G@hijk

^(@) = lmn(�olmn)p where −∞ < @ < ∞

)� = lui�olui (Logit Model)

_D � )�1 − )�� = �-� +%�&�&'� W&�

The logit of the individual is the linear function of the individuals characteristics

� Solved via MLE (there’s already an estimation)

{� = �-� +%�&�&'� W&� = e-�

Estimated logit is the linear function of the individuals characteristics

Used in forecasting: �^)� ≥ 0.05; success -> �,� = 1 �^)� ≤ 0.05; failure-> �,� = 1

�xi� �i =marginal contribution of W&� to )� , ceteris paribus[(k-i)n]

We will be able to determine the role of success factor of the ith individuals y xi�jxiz = �-� + ∑ �&�&'� W&�(Implicit Function of Differentiation)

�1 − )�)� � �(1 − )) �)��W&� − )� ��)��W&�� (−1)(1 − )�)� � = �&

�)��W&� − )� ��)��W&�� = �&)�(1 − )�) �)��W&� = �&)�(1 − )�); > = 2,3, �, 1,2,3, D → �C<F�DC_=^^=?@4

Probit Model- Standard Normal Cumulative Distribution Function

Link Function: ^(@) = �√�r =mnpp ; −∞ < @ < ∞

The ↑ the score, the ↑the probability of success

*The slope gives the change in probability given a unit change in X.(not constant slope)

Loans approved not depending on default )� = ΦV�� + ∑ (�&�&'� W&�)X; probit model where )� is the area under the normal curve

Use MLE to estimate the Betas

e-� = �-� +%�&�&'� W&�

),� = Φ(e-�) ; i=1,2, ….n

Marginal Effects =�xi� �i ��)��W&��~��W&�� = 1√2� =jhip�

��),��W&�� = ^(e-�) Censored Regression Model

Tobit Model(James Tobin)

Model:

�� = �-� + ∑ �&�&'� W&�++�� ; i=1,2,3… n D = D� + D� where: D� with observations of both Y and x’s D�with observations only on the x’s (Censored sample)

E.g. amount spent on a car by ith family is linearly dependent on the socio economic status

of the family (W&�=jth socio economic variable)

-> OLS will result in Biased and inconsistent estimates

Bias= sample selectivity bias- solution is the procedure developed by James tobin via

sampling from censored normal distribution.

*wage of workers= censored variable

-major substitute for OLS

Given: V��, ��,…��X; i=1,2…n O = � ; marginal density function of the Y and x’s with respect to Y.

FOC: ��#� =0 {=1,2,..k}

�-& =f(sample) j=1,2,..k

SOC: ��#�p� < 0 ; Hessian

Heckit Model: James Heckman

-alternative to the tobit model in addressing the selectivity bias of censored sample

Two stage process:

1. Selection stage: implementation of probit model

e� = 0,1 (no, yes)

Index function is the probit model e-� for each i.

2. Consumption stage : �� = �-� + ∑ �&�&'� W&� + � where � = �(�,i)�(�,i) = �l��t��t�l t¡�s¡|s��|¢¡�£K|¤l¡|l¡K�sl|t�l��|¢¡�£K|¤l

¥Ve-�XΦVe-�X = �D3=<4=��__4<C@�E

Truncated Regression Models

No observation in Y= no observation in the X’s; truncated-effectively taken out from the

selection process.

Element of a random selection= inference

MLE via Sample Selection from Truncated Normal

Multinomial Logit Model

�� = >; j=1,2,3.. m )� = )b�� = >|��, �,… . . ��; j=1,2 , ….m-1(to avoid dummy variable trap)

-Link between the X’s and probability of 0 to 1.

Link: e�& = �� + ∑ � �&'� W&� Link function is the logistic distribution

Link function for MLM )�(&) = l{h(�)}�o∑ l{hi(�)}¨mw�©w where j=1,2, …..m-1

)�(¢) = l{h(�)}�o∑ l{hi(�)}¨mw�©w ; base category

Relative risk ratio=xi(�)xi(¨)= =(�wo∑ �ª«ª©p i

-The probability of being in the jth category with respect to the RRR reference/ base

category

Panel Data Econometric Models This represents real world phenomena(data in two dimensions)

Panel Data� combination of cross section and time-series data. ��t = @t� time series observation for the ith individual on Y variable

i�space dimension indicating microunits

t� time dimension period in which Y is observed(behaves like time-series)

e.g. effect of education in income with data across individuals

i=1,2,3,.n

t=1,2,3,…T

unbalanced panel=missing observation

Advantages of Using panel data

1. It allows us to account for unobserved heterogeneity (or

effects)of cross section or time series observational units

which when neglected will result in, OBV(Omitted Variable

Bias

2. more information, more variability, more degrees of freedom,

less collinearity and more efficiency

3. allows us to be able to model they dynamics of

change(structural change)

4. to have basis for modelling more sophisticated phenomena

which pure CS or TS models cannot perform.

5. to mitigate aggregation bias

Panel Data Models

1. Fixed Effects Models

Assumption: effects are fixed parameters to be estimated

a. Naive model: all parameters are fixed (time and space invariant)

� = �� + ∑ �&�&'� &t + ��t ; i=1,2…n and t= 1,2…. T

df=nT-k

b. LSDV Models

i) Model 1: intercept is time invariant and slopes are fixed

��t = �� +%�&�&'� &t + ��t

��=��o¬�;¬�is the animal spirit(unobserved heterogeneity)

ii) Model 2: intercept is varying and slopes are fixed

��t = ��t +%�&�&'� &t + ��t

��t = �� + ¬@; ¬@ is the animal spirit of time period “shock”

iii) Model 3: intercept varies and space and slopes are fixed

��t = ��t +%�&�&'� &�t + ��t

Where: ��t = �� + ¬@- for every time period there is one delta

iv) Model 4: all parameters are time invariant, the slopes and intercepts vary

across individual microunits.

��t = �� +%�&�&'� &�t + ��t

Simultaneous Equations Model (SEM)

• -Economic models for econometric data determined by 2 or more economic relationships

• multi equation model

- 1 equation per sector

- 1 sector will be represented by 1 endogenous variable

• M sectors (or M endogenous varibales)

• K presetermined variables (exogenous + lagged endogenous) �� = ^( , _CFF=G� , E@ℎ=<� ) + Q

CLRM: �� ←��, �, … , ��; � = 1,2,3, … , D

SEM: �� ←′4, _CFF=G�, E@ℎ=<�4; � = 1,2,3, … , D

Notations: (time based) →(t=1,2,3,…,T) time series

1. �� → �t�=DGEF=DE�43C<�C®_=

(m endogenoud var in SEM)

2. predetermined variables (x)

2 types:

1. truly endogenous variable

2. lagged y’s

k-> predetermined variables

��t +↾ t = Qt � → �W� �t → �W1 ↾→ �W� t → �W1 Qt → �W1 �, ↾= 4@<�?@�<C_9C<C�=@=<4

° 1 … ��±⋮ 1 ⋮�±� … 1 ³°��t⋮�±t³ + °�� … ��⋮ ⋱ ⋮�±� … �±�³ °�t⋮±t³ = °Q�t⋮Q±t³

for i=1

��t +��t +⋯+ ��±�±t + ��t + ��t +⋯+ ��t = Q�t Simultaneous bias (SB)

-use of OLS when RHV include Ys

-OLS is biased and inconsistent

��t +↾ t = Qt �t → 3=?@E<E^=DGEF=DE�43C<�C®_=4 ¶ → #E^=DGEF=DE�43C<�C®_=4�D.$¶ t → 3=?@E<E^9<=G=@=<��D=G3C<�C®_=4 � → #E^9<=G=@=<��D=G3C<�C®_=4

Simultaneous Bias -> OLS is inconsistent in the presence of endogenous variables at

RHV �t = �t +¸t � = −¹j� ↾; ¸t =¹j��t OLS is BLUE for RFM ��& = ��& C@@� = 1,2, . . D; > = 1,2, . . �

problem of identification

per equation:

“how can we recover the B’s and �′4 from the � ′4? " � states of identifiability on an equation

o exactly identified (unique solution)

o over identified (multiple solution)

o unidentified (no solution)

Conditions of identifiability

- order condition (necessary)

- rank condition (necessary and sufficient)

Order condition

“an equation is identifiable if the number of excluded variables from it is atleast one less than the

number of endogenous variable in the SEM”

Nationally,

M- # of endogenous var in SEM

K- # of predetermined in SEM

m- # of endogenous variables in the equation

k- # or predetermined variables in the equation

(M+K) – (m-k) ≥ � − 1

K-k≥m-1

> over

= exact

Rank condition

“an equation is identifiable if and only if there exsist atleast one subdeterminant (M-1) (m-1)

formed by the coefficients of the variable exclude from the equation evaluated.

L. Klein Close Economy Basic Model

� equations: Structural model

1. ¼t = C� +®��t + ?�½t + G�*t +��t (consumption function)

2. Tt = C� +®��t +?�*t +��t (investment function)

3. �t = ?t +Tt +¾t (Identity)

4. ¶t = C +®�t +¼*t +G)t +�t (Liquidity preference)

5. �t =O¿ +®¿Àt + �¿t (production function)

6. Àt = CÁ +®ÁHt +?Á)t +�Át ( Labor demand function)

7. Àt = CÂ +®ÂHt +?Â)t + �Ât (labor supply function)

Standard form:

¹�t + )t = �t → �t = �t +3t Identification issue: how are we going to retrieve the �s and βs from the πs?

Maddala System

Included = 1

Excluded = 0

Eqn# C I N P R Y W G T M K-k m-1 Status

1 1 0 0 0 1 1 0 0 1 0 2 2 Exact

2 0 1 0 0 1 1 0 0 0 0 3 2 Over

3 1 1 0 0 0 1 0 1 0 0 2 2 Exact

4 0 0 0 1 1 1 0 0 0 1 2 2 Exact

5 0 0 1 0 0 1 0 0 0 0 3 1 Over

6 0 0 1 1 0 0 1 0 0 0 3 2 Over

7 0 0 1 1 0 0 1 0 0 0 3 2 Over

K � number of predetermined variables

k � number of included predetermined variables

m � number of included endogenous variables

• Identity: no coefficient; no problem

• Status should be exact

• All identification equations are almost exact

• An equation is identified if and only if there exist at least one non-singular m-1 by m-1

sub-matrix form out of coefficients of the variable excluded in other equations.

Rank Condition

This is not equal to zero = non-singular

It is singular if there is a row or column that is filled of all zeros.

Via Laplace expansion

ÃÄÄÅ1 0 01 0 00 0 1

0 0 00 1 00 0 10 1 00 1 10 1 10 0 01 0 01 0 0Æ

ÇÇÈ

Hausman test for:

• Simultaneity

• Exogeneity

Hausman test for Simultaneity

��t =�� + ��t + ��t +��t Step 1: Get the reduced form residuals of ��

��t = O� +O��t + 3�t 3É�t = *+<=4�G�C_E^�� Step 2: Augment the original model by 3É�t ��t =�� + ��t + ��t +¬�3É�t + Êt If ¬-� is significant

� Reject /0:¬� = 0

OLS can still be used despite ��t(=DGEF=DE�4) variables at RHS.

Hausman test for Exogeneity

Step 1: Get the fitted value (�,) of the endogenous variable at right hand side

�,�t and augment the original model + ¬��,�tËÌÍK ltjtl t + Êt If there are many restrictions � Wald’s test

Simultaneous Equation Method

-at this point the equations in the system are identified

2 approached of solving structural SEM

1. Full information (aka System Approach)

-estimating all parameters in the SEM in one fell swoop

-all equations solved together simultaneously

risky proportions (solving the entire system)–unpopular

- any specification errors in one equation of the system will be transmitted in the entire

system (contagious)

o OVB

o Wrong functional form

o Non normality of the error etc.

- Only do this if you have very high confidence in your model

Among techniques include:

1. full information maximum likelihood (FIML)

you cant formulate the likelihood formula because of some peculiarities (sometimes)

due to transmission errors

2. 3SLS (3 stage least squares)

3 dimentional model (L W H)

rarely used

3. seemingly unrelated regression (SURE)

errors of equations are contemporaneously correlated

multiple equations solving technique

4. joint generalized least squared (JGLS)

SEM version of GLS

2. Limited information (aka single equation approach)

a. Solve one eqation after the other

b. Not susceptible to another equation

c. Specification error confined with one equation only

Techniques used:

1. OLS

a. When systems are recursive

b. OLS is BLUE

c. Maligned to be avoided

Whenever b is triangular

B= ° 1 0 0¹�� 1 0¹� ¹� 1³ Matrix of endogenous variable coefficients

Eg. ��t = C�_(hl|� £�l��£�l�t)_ + =��t + �̂�t + Q�t ��t = C� + ®��t + =��t + �̂�t +Q�t �t = C + ®��t + ¼��t + ¼�t + ̂�t + Qt {��t} → {Q�t} {�h, �t} → {��t} → {Q�t}

{�t} → {Qt} 2. ILS (indirect least squares)

a. Applicable on exactly identified equations

b. Mathematical solution exploit one to one correspondence on ��4CDG�4

3. Limited information maximum likelihood

a. Single equation counterpart of FIML

4. 2SLS

a. best thing that happened to SEM

b. henry theil Robert bassman

Stage1: ÏÉ�4G=@=<��D=G�4�DF<=G�?=GÊ<� =Ð��3C_=D@C@*/.

stage2: structural equation is determined with ÏÉ in the forst stage proxying for the

y at RHS.

Dynamic Econometric models

Models Concerned with the consequences of economic actions over time

→ .@C@�?:�t = �� +%�&&t�&'� +�t

�Dynamic: lapse of time before impact is felt

e.g. Y � target variable

X � proxy variable

When consequences are rarely instantaneous

1. DL(p) � distributed lag model �t = O +�0t +��tj� +…+�ÑtjÑ +�t O� endowment (autonomous Y) �0� impact multiplier �� intermediate multiplier � = 1, 2, 3, … , 9

�t = O +%��tj�Ñ�'0 +��

�� = �D@=<��_@�9_�=< = %�&�&'0

�Ñ = � = {EDF − <�D��_@�9_�=<(@E@C_=^^=?@) 2. AR(q) � Autoregressive model

�t = ¬ +%O&�tj&Ò&'0 +�t

�t = ¬ +O��tj� + O��tj� +…+OÒ�tjÒ +�t What you are today is a function of what you were before

3. ARDL (q, p)

�t = ¬ +%��tj�Ñ�'0 +%O&�tj&Ò

&'� + �t What you are today is influenced by what you were before plus how the authorities molded you

to be how you are today

Focus: DL(p) model

� Finite DL model (p is finite)

� Infinite DL model (p�∞)

�t = O +%��tj�Ñ�'0 + �t → Ó*(9)

P � finite

�OLS: there are no endogenous at RHS (estimation method)

• Alt-Tinbergen Method

o Sequential OLS

o Bottom-up

o Will start in simple regression up to complicated model

• Hendry Top-down Method

o Will start in big model

o AIC (choose the model p* with smallest AIC

• Almon Model

� Koyck model

� Adaptive Expectation

� Rational Expectation

Infinite DL Models

�t = O +%��tj�Ñ�'0 + �t → Ô{(9)

� Finite DL model (p is finite)

� Infinite DL model (p�∞)

�� → �D@=<�=G�C@=��_@�9_�=<� = 0,1, 2, 3, … , 9

�t = O + �0t +��tj� +⋯+��tj� +…+ �t Koyck Model

�� =�0�� = 0, 1,2, 3, …

� → <C@=E^G=?CÏ(0 ≤ ≤ 1) 1. �t = O + �0t +�0�tj� + �0��tj� +⋯+�0tj�Õ� +…+ �t −→ �D^�D�@=G�4@<�®�@�3=_CF�EG=_

Lag (1) by 1 period multiply the result by �, subtract the outcome from (1)

�t − ��tj� = O (1 − �)ËÖÌÖÍ Ñlls��¡s&K t¢l�t+�0t + �t − ��tj�

�t = O(1 − �) +�0t + ��tj� +�t − ��tj�

�t =O∗ + �0t + ��tj� +3t � ARDL (1, 0)

Three issues:

• Autocorrelation

• Simultaneity bias

• Non-linear in parameter

SEM� static/contemporaneous

�tj� � is endogenous in dynamic model

O(1 − �) � non-linear in parameter

Durbin-watson cannot because of the presence of lag(1) on RHS, �tj�.

We cannot use OLS.

Estimation of O∗, �0&W:Ù{.�4E�@! Use 2 tests for Autocorrelation

Durbin h-test � presence/absence of autocorrelation

ℎ = ÛÉÜ ½1 − ½3C<V�-X ; ÛÉ = 1 − G2 G = ÔÝ4@C@�4@�?

Instrumental Variable/ Proxy Method

� Liviathan Method

• �tj� has to be proxied by tj�

SRF: �,t = OÉ∗ + �-0t + �-�tj�

*.. =%(Þt'� ß�t − OÉ∗ − �-0t −�-tj�ËÖÖÖÖÖÌÖÖÖÖÖÍà, á�)

FOC: �âãã��ä∗ = �âãã��#å = �âãã�Õ# = 0 � normal equation

�-� = �-0�-� �� = �D@=<��_@�9_�=<−→ ?��_C@�3==^^=?@ �� =%�-&�

&'0

�� = 4��E^�0 +�� + ��(?��_C@�3=) �Ñ = � = {*��_@�9_�=< → @E@C_=^^=?@ p�∞

� = %�-&k&'0 → ?��_C@�3==^^=?@

� = %�-&k&'0 = �-01 − � = limÑ→k%�-&Ñ

&'0 → �D^�D�@=4=<�=4^�D?@�ED

Median Lag � amount of time lag (lapse of time) within 50% of total effect would be

manifested

Mean Lag � amount of time on the average, the total effect would be perceptible

Koyck Model

�t = O +%��tj�k�'0 + �t

With �� =�0��; � = 0, 1, 2, 3, …

� ARDL (0,1) �t =O∗ +�0t + ��tj� +3t Problems:

1. O∗ = O(1 − �) 2. 3t = �t − ��tj� → Ó*(1) 3. �tj� → =DGEF=DE�4

OLS is out because we cannot get BLUE due to these problems.

Estimation:

1. IV method of Liviatan

• tj�9<EW�=4Ê<�tj�

2. 2SLS

For autocorrelation: use Durbin h

�Use large T for problem (1)

ALMON MODEL

�t = O +%��æ�'0 tj� + �t

9 = ^�D�@=(/=DG<Ï@E9 − GEHD) �� = O0 +%C&�t¢

&'�

-mth degree polynomial

Jan Kmenta

-provided the se’s to the SRF

�-� = C0 +%C&�&¢&'�

3C<V�-�X =%3C<(C&)¢&'� + ∑∑?E3(C�, C ) → � ≠ 4

Time series Econometrics

� Basic Concepts

o Application of econometrics to TS data

o Data on variables captured and recorded in regular intervals of time (e.g. annual,

quarterly, monthly, etc.)

o Historical data-frequently/massively available

Challenges in using TS data in research

� Autocorrelation

� Spurious regression (non-sensical)

� Random walk phenomenon

o Tomorrow’s stock market price, the best prediction is closing price today �

random walk forecasting

� ARCH effect (Autoregressive Conditional Heterescedasticity) � conditional volatility in

stock market

� Forecasting

� Non-stationarity of most variables

Stochastic process (SP) � collection of random variables ordered in time

e.g. {�t}@ = 1, 2, 3, … , ½

→ {��,��, … , �Þ} DGP (Data Gathering Process)

� Unknown mechanism that generates realization for a SP

Realization� historical data (TS data)

{�t} → {�t} If {�t} is weakly stationary process, its first 2 moments are said to be time-invariant.

i.e. $(�t) = ç

3C<(�t) = $(�t − ç) = S�

?E3(�t, �tj�) = $(�t − ç)(�tj� − ç) = ��

If strongly stationary, time-invariant in all moments

� White noise most basic of all stationary SPs – building block of all TS models, �t � Random walk most basic of all non-stationary stochastic processes, �t = �tj� + �t �t − �tj�ËÖÖÌÖÖÍt|¡� ��|¢ls¤¡| = �t

Δ �t�|¡�s�¢é¡��= �t�é��tl�� l

ê = G�^^=<=D?�DFE9=<C@E<; ê = 1 − {;

{ = _CFE9=<C@E<H�@ℎED=9<E9=<@Ï{��t = �tj�

ê�t = �t = (1 − {)�t = �t − {�t ê��t = êê�t = ê(�t − �tj�) = �t − �tj� − (�tj� −�tj�) = �t − 2�tj� + �tj� = (1 − {)��t Unit roots � number of times a SP say {�t} is to be differenced to make it stationary

i.e. If �t~T(G)E<�t�4�D@=F<C@=GEÊ<G=<G

d� number of unit roots in �t ∆�t~T(0) → 4@C@�EDC<Ï Integrated SPs

� �t~T(G); d� order of integration or # of unit roots in �t 1

st difference of CPI is inflation

�t = ¼)Tt ∆�t = TD t̂ ∆�t = ¼)Tt − ¼)Ttj�

∆TD t̂ = TD t̂ − TD t̂j�ËÖÖÖÌÖÖÖÍí(0)

G = 0,1,2

Random walks and White noise process

RWM

�t = �tj� +�t ∆�t = �t 3 types of Random Walks

1. RWM

�t = Û�tj� +�t; Û = 1

2. RWMD (with drift)

�t = ¬�s|��tÑ¡|¡¢ltl|+ Û�tj� + �t; Û�¡Kt�£�||l�¡t��£�l��£�l�t

= 1

3. RWDT (with deterministic trend) �t = ¬ + Û�tj� + �t +�t; Û = 1

When two or more NS variable are regressed, the result is spurious.

Granger-Newbold Rule (aka Classical)

�symptom of spurious regression

If *� > Ô�<®�D − HC@4ED4@C@�4@�? → 49�<�E�4 Unit root testing

�provided by Dickey and Fuller

Dickey-Fuller Test

Auxillary regression:

RWM:

�t = Û�tj� + �t Δ�t = (1 − Û)�tj� +�t Δ�t = O�tj� +�t /0:O = 0 → �=CD4Û = 1/�:O < 0 → (4@C@�EDC<Ï) Derived from normal distribution

@ = OÉ4=(OÉ) → DE@3C_�G

That’s why they developed

� Τ − @C�distribution (aka Dickey-fuller distribution)

Shortcoming: Δ�t = O�tj� +�t �t is highly correlated but Dickey and fuller assumed that it is white noise.

Augmented Dickey-Fuller Test (ADF)

RWM:

Δ�t = �tj� +%��Δ�tj�Ñ�'� + ït

RWMD:

Δ�t = ¬ + �tj� +%��Δ�tj�Ñ�'� + ït

RWMDT:

Δ�t = ¬ + �tj� + �t +%��Δ�tj�Ñ�'� + ït

Level series � original time series that we’re investigating

�t~Ó*(9) current error has long memory; current is related to past

�t =Û��tj� + Û��tj� +⋯+ ÛÑ�tjÑ + Êt Alternative unit root tests: PP, KPSS, ADF-GLS, DP, NP

Cointegation Analysis

It is a property of 2 or more non-stationary variables to be linked together by a LR equilibrium

relationship

Robert engle and Clive Granger (1982)

� Work for cointegration analysis to check if there’s spurious regression

Augmented Engle Granger Test (AEG)

If t, �t~T(0) if their SRF

�, = �- + �-�t; �Ét = � − �,~T(0) Then tCDG�tare cointegrated

Stage 1: ADF tCDG�t if both are I(1)

We perform unit root test on both variables and see to it that they are non-stationary.

Stage 2: Run OLS on tCDG�t to get SRF and obtain �Ét , ADF �Ét and if �Ét~T(0)� X & Y are

cointegrated

Short-run dynamics

ECM: Error Correction model

� Granger Representation Theorem

- If the variable used in regression are cointegrated, an ECM representation

of the LR model is possible.

In k=2:

Δ�t = O0 + O�Δt + O��Étj� + Êt OÉ� → =<<E<?E<<=?@�ED?E=^^�?�=D@

Cointegration and Error correction

VAR(p) � Christopher Sims

Vector Autoregression

Let �tA = b��t��t …��t

All variables are endogenous

�t = �t +Ó��tj� + Ó��tj�…+ ÓÑ�tjÑ + Êt A’s are matrices.

OLS per equation is BLUE

AR(p) current var is related to past value

Δ�t = çt + ��tj� +%Γ�æj��'� Δ�tj� + Êt

Johansen

Eigen-value � � − max @=4@ � individual test

“trace” test � cumulative testing

Identifiication of the cointegration vectors

• Cointegrated vector� economic theory; long-run equation

• � = O�′ • � = O�¡s&K t¢l�t��|l||�|tl|¢

��£��tl�|¡t��¤l£t�|′

Johansen� more versatile and can be used for more than two variables

ECOMET2 Lecture Notes

Documents

equation o

ordinal o

unique solution o

w b mce

luiluivw y xijxiz w

intercept value of y

m666 d c

g lmnolmn