Page 1
Notes in ECOMET2
1. Review of Classical Linear Regression Model
Drivers: independent variable (right hand side)
�� = �� +���� + �� + ���� +�� �� � Real world variable; �� � stochasticorrandomvariable
Under the condition of uncertainty, we have to expect errors.
X’s � exogenous fixed variable
� = 1, 2, 3, ..., n
�� = intercept; value of Y when all Xs are zero
In matrix form,
� = �(� �) ��(� �)+ ��(� �) Objective:
1. To find �"# (problem of estimation)
2. To perform inferences on �"# (problem of inference)
$(�) = �� +%�&&��&'� → )*+
�, = �-� +%�&&��&'� → .*+
�& � marginal contribution of & to Y, ceteris paribus.
i. Do (statistical test) individual test
ii. Joint (overall) test
iii. Goodness of fit
i. /0:�& = 034/�:�&(<,>,≠)0
j= 1, 2, 3, ... . k
Alternative hypothesis = operational statement of our research agenda
Basis of decision: p-value� true probability of incorrect rejection 9 < 0.05<=>=?@/0
Page 2
ii. /0:�� = � == �� = 0 → A4ℎC3=DE@ℎ�DF@EGEH�@ℎ�
iii. /0 = )I�-&J = �& = �-&KL = 1 − O j= 1, 2, ..., k
iv. *�: proportion of variation collectively explained Y by X’s
*P�� adjusted *�: remedy for loss of degrees of freedom
Degrees of freedom (df): number of observations left after estimation
Assumptions:
1) u is multivariate normal (MVN)
2) $(�) = 0� vanishing error expectation
3) $(QQ′) = S�T (if zero) � autocorrelation
4) = S� → variances → homoscedasticity 5) $V��W&X = 0 = exogeneityassumptionosX′s 6) A has full rank and there is no maximum number of K; X’s should not be
correlated (non-multicollinearity)
If assumptions 1-6 are valid; by Gauss-Markov Theorem then OLS is BLUE
� Unbiased, consistent, sufficient, and efficient
Limited Dependent Variable Models
CLRM requires Y to be quantitative (continuous)
�measure in interval or ratio scale
� Binary Response Models (Y is dummy)
� Differential intercept: right hand-side; dummy: left hand side
o LPM (Linear probability model)
o Logit model
o Probit model
� Multinomial Response Model (Y is multinomial)
� 3 or more response categories
o Multinomial Logit model
o Multinomial Probit model
Nominal�scale of measurement
� Ordered Response Model (Y is ordinal)
o Ordered Logit model
o Ordered Probit model
� Censored Response Model
� (selectivity bias)
Page 3
o Tobit model
o Heckit model
� Discrete Dependent Variable Models
o Count models
� Poisson model
� Negative binomial model
o Duration models
Binary Response Models
�� = �� +%�&&��&'� +��
�� = 0 → ^C�_�<=; � = 1 → 4�??=44 i=1, 2, 3, ..., n
)� = Prb�� = 1| W�� , W� , … , W�� The probability that success is attained
� LPM: OLS despite Y is dummy
Consequences: ),� = �,� (predicted probability)
ANCOVA � not all in the right hand side are dummy
ANOVA � all in the right hand side are dummy
�� is not normal
Bernoulli � only 1-trial; violates normality
� $(��) = )� � mean error is not zero
� 3C<(��) = )�(1 − )�) � heteroscedastic
� *� is not reliable measure of goodnessof fit
� ), is not sure to fall within [0, 1]
e� = �� + ∑ �&�&'� W&� [index function]
Score of the ith observation(utility score)
*The bigger the z, the more probable the success
*The smaller the z, the more probable the failure
*new graph
)� = g ^(@)G@hijk where f(t) is the link function
As z becomes smaller, p is also becoming smaller
Page 4
Logistic Model- logistic distribution cumulative distribution function
Logistic Link (probability distribution function) ^(@) = lmn(�olmn)p; |t|<∞ =>will result in Logit Model
Standard Normal Link (t~(N(0,1) ^(@) = �√�r =mnpp ; |t| < ∞
)� = g lmn(�olmn)phijk = -g sKKphijk
Let u=1 + =jt du= −=jtG@ )� = ��olmui= 1 − )� = �luivw = xi�jxi = =hi
= )� = luiluivw
= ln y xi�jxiz = �� + ∑ �&�&'� W&� + �� {�= logit(natural logarithm of odds ratio[ratio of success probability/failure probability]
Odds ratio also measures the probability that Y=1 relative to the probability that Y=0
Estimation using Maximum Likelihood Estimation
-estimating the betas(parameters) in such a way that the likelihood of the
sample(joint probability) is maximized
1. get 1st derivative of the likelihood function
2. get 2nd derivative for it to be maximized. Hessian needs to be negative
{� = �-� + ∑ �&�&'� W&� –sample regression function e-� = {� by maximum likelihood estimation
*test of joint hypothesis using F-statistic
Binary Response Models
Y=0,1 [failure,success] e-� = �-� + ∑ �&�&'� W&� [index function] )� = )|b� = 1|e�]- probability that success is attained
LPM( OLS) Y is dummy
Logit( Logistic Link)
Probit(Standard Normal Link)
Page 5
*Link function is a probability distribution function which will link the index from ~� to
[0,1]
Logistic pdf for the logit model )� = � ^(@)G@hijk
^(@) = lmn(�olmn)p where −∞ < @ < ∞
)� = lui�olui (Logit Model)
_D � )�1 − )�� = �-� +%�&�&'� W&�
The logit of the individual is the linear function of the individuals characteristics
� Solved via MLE (there’s already an estimation)
{� = �-� +%�&�&'� W&� = e-�
Estimated logit is the linear function of the individuals characteristics
Used in forecasting: �^)� ≥ 0.05; success -> �,� = 1 �^)� ≤ 0.05; failure-> �,� = 1
�xi� �i =marginal contribution of W&� to )� , ceteris paribus[(k-i)n]
We will be able to determine the role of success factor of the ith individuals y xi�jxiz = �-� + ∑ �&�&'� W&�(Implicit Function of Differentiation)
�1 − )�)� � �(1 − )) �)��W&� − )� ��)��W&�� (−1)(1 − )�)� � = �&
�)��W&� − )� ��)��W&�� = �&)�(1 − )�) �)��W&� = �&)�(1 − )�); > = 2,3, �, 1,2,3, D → �C<F�DC_=^^=?@4
Probit Model- Standard Normal Cumulative Distribution Function
Link Function: ^(@) = �√�r =mnpp ; −∞ < @ < ∞
The ↑ the score, the ↑the probability of success
*The slope gives the change in probability given a unit change in X.(not constant slope)
Page 6
Loans approved not depending on default )� = ΦV�� + ∑ (�&�&'� W&�)X; probit model where )� is the area under the normal curve
Use MLE to estimate the Betas
e-� = �-� +%�&�&'� W&�
),� = Φ(e-�) ; i=1,2, ….n
Marginal Effects =�xi� �i ��)��W&����~��W&�� = 1√2� =jhip�
��),��W&�� = ^(e-�) Censored Regression Model
Tobit Model(James Tobin)
Model:
�� = �-� + ∑ �&�&'� W&�++�� ; i=1,2,3… n D = D� + D� where: D� with observations of both Y and x’s D�with observations only on the x’s (Censored sample)
E.g. amount spent on a car by ith family is linearly dependent on the socio economic status
of the family (W&�=jth socio economic variable)
-> OLS will result in Biased and inconsistent estimates
Bias= sample selectivity bias- solution is the procedure developed by James tobin via
sampling from censored normal distribution.
*wage of workers= censored variable
-major substitute for OLS
Given: V��, ��,…��X; i=1,2…n O = � ; marginal density function of the Y and x’s with respect to Y.
FOC: ������#� =0 {=1,2,..k}
Page 7
�-& =f(sample) j=1,2,..k
SOC: �������#�p� < 0 ; Hessian
Heckit Model: James Heckman
-alternative to the tobit model in addressing the selectivity bias of censored sample
Two stage process:
1. Selection stage: implementation of probit model
e� = 0,1 (no, yes)
Index function is the probit model e-� for each i.
2. Consumption stage : �� = �-� + ∑ �&�&'� W&� + � where � = �(�,i)�(�,i) = �l���t��t�l t¡�s¡|s��|¢¡�£K|¤l¡|l¡K�sl|t�l��|¢¡�£K|¤l
¥Ve-�XΦVe-�X = �D3=<4=��__4<C@�E
Truncated Regression Models
No observation in Y= no observation in the X’s; truncated-effectively taken out from the
selection process.
Element of a random selection= inference
MLE via Sample Selection from Truncated Normal
Multinomial Logit Model
�� = >; j=1,2,3.. m )� = )b�� = >|��, �,… . . ��; j=1,2 , ….m-1(to avoid dummy variable trap)
-Link between the X’s and probability of 0 to 1.
Link: e�& = �� + ∑ � �&'� W&� Link function is the logistic distribution
Link function for MLM )�(&) = l{h(�)}�o∑ l{hi(�)}¨mw�©w where j=1,2, …..m-1
)�(¢) = l{h(�)}�o∑ l{hi(�)}¨mw�©w ; base category
Relative risk ratio=xi(�)xi(¨)= =(�wo∑ �ª«ª©p i
Page 8
-The probability of being in the jth category with respect to the RRR reference/ base
category
Panel Data Econometric Models This represents real world phenomena(data in two dimensions)
Panel Data� combination of cross section and time-series data. ��t = @t� time series observation for the ith individual on Y variable
i�space dimension indicating microunits
t� time dimension period in which Y is observed(behaves like time-series)
e.g. effect of education in income with data across individuals
i=1,2,3,.n
t=1,2,3,…T
unbalanced panel=missing observation
Advantages of Using panel data
1. It allows us to account for unobserved heterogeneity (or
effects)of cross section or time series observational units
which when neglected will result in, OBV(Omitted Variable
Bias
2. more information, more variability, more degrees of freedom,
less collinearity and more efficiency
3. allows us to be able to model they dynamics of
change(structural change)
4. to have basis for modelling more sophisticated phenomena
which pure CS or TS models cannot perform.
5. to mitigate aggregation bias
Panel Data Models
1. Fixed Effects Models
Assumption: effects are fixed parameters to be estimated
a. Naive model: all parameters are fixed (time and space invariant)
� = �� + ∑ �&�&'� &t + ��t ; i=1,2…n and t= 1,2…. T
df=nT-k
b. LSDV Models
i) Model 1: intercept is time invariant and slopes are fixed
Page 9
��t = �� +%�&�&'� &t + ��t
���=��o¬�;¬�is the animal spirit(unobserved heterogeneity)
ii) Model 2: intercept is varying and slopes are fixed
��t = ��t +%�&�&'� &t + ��t
��t = �� + ¬@; ¬@ is the animal spirit of time period “shock”
iii) Model 3: intercept varies and space and slopes are fixed
��t = ���t +%�&�&'� &�t + ��t
Where: ���t = �� + ¬@- for every time period there is one delta
iv) Model 4: all parameters are time invariant, the slopes and intercepts vary
across individual microunits.
��t = ��� +%�&�&'� &�t + ��t
Simultaneous Equations Model (SEM)
• -Economic models for econometric data determined by 2 or more economic relationships
• multi equation model
- 1 equation per sector
- 1 sector will be represented by 1 endogenous variable
• M sectors (or M endogenous varibales)
• K presetermined variables (exogenous + lagged endogenous) �� = ^( , _CFF=G� , E@ℎ=<� ) + Q
CLRM: �� ←��, �, … , ��; � = 1,2,3, … , D
SEM: �� ←′4, _CFF=G�, E@ℎ=<�4; � = 1,2,3, … , D
Notations: (time based) →(t=1,2,3,…,T) time series
1. �� → �t�=DGEF=DE�43C<�C®_=
(m endogenoud var in SEM)
2. predetermined variables (x)
2 types:
1. truly endogenous variable
Page 10
2. lagged y’s
k-> predetermined variables
��t +↾ t = Qt � → �W� �t → �W1 ↾→ �W� t → �W1 Qt → �W1 �, ↾= 4@<�?@�<C_9C<C�=@=<4
° 1 … ��±⋮ 1 ⋮�±� … 1 ³°��t⋮�±t³ + °��� … ���⋮ ⋱ ⋮�±� … �±�³ °�t⋮±t³ = °Q�t⋮Q±t³
for i=1
��t +�����t +⋯+ ��±�±t + ����t + ����t +⋯+ ����t = Q�t Simultaneous bias (SB)
-use of OLS when RHV include Ys
-OLS is biased and inconsistent
��t +↾ t = Qt �t → 3=?@E<E^=DGEF=DE�43C<�C®_=4 ¶ → #E^=DGEF=DE�43C<�C®_=4�D.$¶ t → 3=?@E<E^9<=G=@=<��D=G3C<�C®_=4 � → #E^9<=G=@=<��D=G3C<�C®_=4
Simultaneous Bias -> OLS is inconsistent in the presence of endogenous variables at
RHV �t = �t +¸t � = −¹j� ↾; ¸t =¹j��t OLS is BLUE for RFM ��& = ����& C@@� = 1,2, . . D; > = 1,2, . . �
problem of identification
per equation:
Page 11
“how can we recover the B’s and �′4 from the � ′4? " � states of identifiability on an equation
o exactly identified (unique solution)
o over identified (multiple solution)
o unidentified (no solution)
Conditions of identifiability
- order condition (necessary)
- rank condition (necessary and sufficient)
Order condition
“an equation is identifiable if the number of excluded variables from it is atleast one less than the
number of endogenous variable in the SEM”
Nationally,
M- # of endogenous var in SEM
K- # of predetermined in SEM
m- # of endogenous variables in the equation
k- # or predetermined variables in the equation
(M+K) – (m-k) ≥ � − 1
K-k≥m-1
> over
= exact
Rank condition
“an equation is identifiable if and only if there exsist atleast one subdeterminant (M-1) (m-1)
formed by the coefficients of the variable exclude from the equation evaluated.
L. Klein Close Economy Basic Model
� equations: Structural model
1. ¼t = C� +®��t + ?�½t + G�*t +��t (consumption function)
2. Tt = C� +®��t +?�*t +��t (investment function)
3. �t = ?t +Tt +¾t (Identity)
4. ¶t = C +®�t +¼*t +G)t +�t (Liquidity preference)
5. �t =O¿ +®¿Àt + �¿t (production function)
Page 12
6. Àt = CÁ +®ÁHt +?Á)t +�Át ( Labor demand function)
7. Àt = C +®ÂHt +?Â)t + �Ât (labor supply function)
Standard form:
¹�t + )t = �t → �t = �t +3t Identification issue: how are we going to retrieve the �s and βs from the πs?
Maddala System
Included = 1
Excluded = 0
Eqn# C I N P R Y W G T M K-k m-1 Status
1 1 0 0 0 1 1 0 0 1 0 2 2 Exact
2 0 1 0 0 1 1 0 0 0 0 3 2 Over
3 1 1 0 0 0 1 0 1 0 0 2 2 Exact
4 0 0 0 1 1 1 0 0 0 1 2 2 Exact
5 0 0 1 0 0 1 0 0 0 0 3 1 Over
6 0 0 1 1 0 0 1 0 0 0 3 2 Over
7 0 0 1 1 0 0 1 0 0 0 3 2 Over
K � number of predetermined variables
k � number of included predetermined variables
m � number of included endogenous variables
• Identity: no coefficient; no problem
• Status should be exact
• All identification equations are almost exact
• An equation is identified if and only if there exist at least one non-singular m-1 by m-1
sub-matrix form out of coefficients of the variable excluded in other equations.
Rank Condition
This is not equal to zero = non-singular
It is singular if there is a row or column that is filled of all zeros.
Via Laplace expansion
Page 13
ÃÄÄÅ1 0 01 0 00 0 1
0 0 00 1 00 0 10 1 00 1 10 1 10 0 01 0 01 0 0Æ
ÇÇÈ
Hausman test for:
• Simultaneity
• Exogeneity
Hausman test for Simultaneity
��t =�� + ���t + ���t +��t Step 1: Get the reduced form residuals of ��
��t = O� +O��t + 3�t 3É�t = *+<=4�G�C_E^�� Step 2: Augment the original model by 3É�t ��t =�� + ���t + ���t +¬�3É�t + Êt If ¬-� is significant
� Reject /0:¬� = 0
OLS can still be used despite ��t(=DGEF=DE�4) variables at RHS.
Hausman test for Exogeneity
Step 1: Get the fitted value (�,) of the endogenous variable at right hand side
�,�t and augment the original model + ¬��,�tËÌÍK ltjtl t + Êt If there are many restrictions � Wald’s test
Simultaneous Equation Method
-at this point the equations in the system are identified
2 approached of solving structural SEM
1. Full information (aka System Approach)
Page 14
-estimating all parameters in the SEM in one fell swoop
-all equations solved together simultaneously
risky proportions (solving the entire system)–unpopular
- any specification errors in one equation of the system will be transmitted in the entire
system (contagious)
o OVB
o Wrong functional form
o Non normality of the error etc.
- Only do this if you have very high confidence in your model
Among techniques include:
1. full information maximum likelihood (FIML)
you cant formulate the likelihood formula because of some peculiarities (sometimes)
due to transmission errors
2. 3SLS (3 stage least squares)
3 dimentional model (L W H)
rarely used
3. seemingly unrelated regression (SURE)
errors of equations are contemporaneously correlated
multiple equations solving technique
4. joint generalized least squared (JGLS)
SEM version of GLS
2. Limited information (aka single equation approach)
a. Solve one eqation after the other
b. Not susceptible to another equation
c. Specification error confined with one equation only
Techniques used:
1. OLS
a. When systems are recursive
b. OLS is BLUE
c. Maligned to be avoided
Whenever b is triangular
B= ° 1 0 0¹�� 1 0¹� ¹� 1³ Matrix of endogenous variable coefficients
Eg. ��t = C�_(hl|� £�l���£�l�t)_ + =��t + �̂�t + Q�t ��t = C� + ®���t + =��t + �̂�t +Q�t �t = C + ®��t + ¼��t + ¼�t + ̂�t + Qt {��t} → {Q�t} {�h, �t} → {��t} → {Q�t}
Page 15
{�t} → {Qt} 2. ILS (indirect least squares)
a. Applicable on exactly identified equations
b. Mathematical solution exploit one to one correspondence on ��4CDG�4
3. Limited information maximum likelihood
a. Single equation counterpart of FIML
4. 2SLS
a. best thing that happened to SEM
b. henry theil Robert bassman
Stage1: ÏÉ�4G=@=<��D=G�4�DF<=G�?=G^E<� =Ð��3C_=D@C@*/.
stage2: structural equation is determined with ÏÉ in the forst stage proxying for the
y at RHS.
Dynamic Econometric models
Models Concerned with the consequences of economic actions over time
→ .@C@�?:�t = �� +%�&&t�&'� +�t
�Dynamic: lapse of time before impact is felt
e.g. Y � target variable
X � proxy variable
When consequences are rarely instantaneous
1. DL(p) � distributed lag model �t = O +�0t +��tj� +…+�ÑtjÑ +�t O� endowment (autonomous Y) �0� impact multiplier ��� intermediate multiplier � = 1, 2, 3, … , 9
�t = O +%��tj��'0 +��
Page 16
�� = �D@=<����_@�9_�=< = %�&�&'0
�Ñ = � = {EDF − <�D��_@�9_�=<(@E@C_=^^=?@) 2. AR(q) � Autoregressive model
�t = ¬ +%O&�tj&Ò&'0 +�t
�t = ¬ +O��tj� + O��tj� +…+OÒ�tjÒ +�t What you are today is a function of what you were before
3. ARDL (q, p)
�t = ¬ +%��tj�Ñ�'0 +%O&�tj&Ò
&'� + �t What you are today is influenced by what you were before plus how the authorities molded you
to be how you are today
Focus: DL(p) model
� Finite DL model (p is finite)
� Infinite DL model (p�∞)
�t = O +%��tj�Ñ�'0 + �t → Ó*(9)
P � finite
�OLS: there are no endogenous at RHS (estimation method)
• Alt-Tinbergen Method
o Sequential OLS
o Bottom-up
o Will start in simple regression up to complicated model
• Hendry Top-down Method
o Will start in big model
o AIC (choose the model p* with smallest AIC
Page 17
• Almon Model
� Koyck model
� Adaptive Expectation
� Rational Expectation
Infinite DL Models
�t = O +%��tj�Ñ�'0 + �t → Ô{(9)
� Finite DL model (p is finite)
� Infinite DL model (p�∞)
�� → �D@=<�=G�C@=��_@�9_�=<� = 0,1, 2, 3, … , 9
�t = O + �0t +��tj� +⋯+��tj� +…+ �t Koyck Model
��� =�0��� = 0, 1,2, 3, …
� → <C@=E^G=?CÏ(0 ≤ ≤ 1) 1. �t = O + �0t +�0�tj� + �0��tj� +⋯+�0tj�Õ� +…+ �t −→ �D^�D�@=G�4@<�®�@�3=_CF�EG=_
Lag (1) by 1 period multiply the result by �, subtract the outcome from (1)
�t − ��tj� = O (1 − �)ËÖÌÖÍ Ñlls��¡s&K t¢l�t+�0t + �t − ��tj�
�t = O(1 − �) +�0t + ��tj� +�t − ��tj�
�t =O∗ + �0t + ��tj� +3t � ARDL (1, 0)
Three issues:
• Autocorrelation
• Simultaneity bias
• Non-linear in parameter
SEM� static/contemporaneous
Page 18
�tj� � is endogenous in dynamic model
O(1 − �) � non-linear in parameter
Durbin-watson cannot because of the presence of lag(1) on RHS, �tj�.
We cannot use OLS.
Estimation of O∗, �0&W:Ù{.�4E�@! Use 2 tests for Autocorrelation
Durbin h-test � presence/absence of autocorrelation
ℎ = ÛÉÜ ½1 − ½3C<V�-X ; ÛÉ = 1 − G2 G = ÔÝ4@C@�4@�?
Instrumental Variable/ Proxy Method
� Liviathan Method
• �tj� has to be proxied by tj�
SRF: �,t = OÉ∗ + �-0t + �-�tj�
*.. =%(Þt'� ß�t − OÉ∗ − �-0t −�-tj�ËÖÖÖÖÖÌÖÖÖÖÖÍà, á�)
FOC: �âãã��ä∗ = �âãã��#å = �âãã�Õ# = 0 � normal equation
�-� = �-0�-� �� = �D@=<����_@�9_�=<−→ ?���_C@�3==^^=?@ �� =%�-&�
&'0
�� = 4��E^�0 +�� + ��(?���_C@�3=) �Ñ = � = {*��_@�9_�=< → @E@C_=^^=?@ p�∞
� = %�-&k&'0 → ?���_C@�3==^^=?@
Page 19
� = %�-&k&'0 = �-01 − � = limÑ→k%�-&Ñ
&'0 → �D^�D�@=4=<�=4^�D?@�ED
Median Lag � amount of time lag (lapse of time) within 50% of total effect would be
manifested
Mean Lag � amount of time on the average, the total effect would be perceptible
Koyck Model
�t = O +%��tj�k�'0 + �t
With �� =�0��; � = 0, 1, 2, 3, …
� ARDL (0,1) �t =O∗ +�0t + ��tj� +3t Problems:
1. O∗ = O(1 − �) 2. 3t = �t − ��tj� → Ó*(1) 3. �tj� → =DGEF=DE�4
OLS is out because we cannot get BLUE due to these problems.
Estimation:
1. IV method of Liviatan
• tj�9<EW�=4^E<�tj�
2. 2SLS
For autocorrelation: use Durbin h
�Use large T for problem (1)
ALMON MODEL
�t = O +%��æ�'0 tj� + �t
9 = ^�D�@=(/=DG<Ï@E9 − GEHD) �� = O0 +%C&�t¢
&'�
Page 20
-mth degree polynomial
Jan Kmenta
-provided the se’s to the SRF
�-� = C0 +%C&�&¢&'�
3C<V�-�X =%3C<(C&)¢&'� + ∑∑?E3(C�, C ) → � ≠ 4
Time series Econometrics
� Basic Concepts
o Application of econometrics to TS data
o Data on variables captured and recorded in regular intervals of time (e.g. annual,
quarterly, monthly, etc.)
o Historical data-frequently/massively available
Challenges in using TS data in research
� Autocorrelation
� Spurious regression (non-sensical)
� Random walk phenomenon
o Tomorrow’s stock market price, the best prediction is closing price today �
random walk forecasting
� ARCH effect (Autoregressive Conditional Heterescedasticity) � conditional volatility in
stock market
� Forecasting
� Non-stationarity of most variables
Stochastic process (SP) � collection of random variables ordered in time
e.g. {�t}@ = 1, 2, 3, … , ½
→ {��,��, … , �Þ} DGP (Data Gathering Process)
� Unknown mechanism that generates realization for a SP
Realization� historical data (TS data)
Page 21
{�t} → {�t} If {�t} is weakly stationary process, its first 2 moments are said to be time-invariant.
i.e. $(�t) = ç
3C<(�t) = $(�t − ç) = S�
?E3(�t, �tj�) = $(�t − ç)(�tj� − ç) = ��
If strongly stationary, time-invariant in all moments
� White noise most basic of all stationary SPs – building block of all TS models, �t � Random walk most basic of all non-stationary stochastic processes, �t = �tj� + �t �t − �tj�ËÖÖÌÖÖÍt|¡� ��|¢ls¤¡| = �t
Δ �t�|¡�s�¢é¡��= �t�é��tl��� l
ê = G�^^=<=D?�DFE9=<C@E<; ê = 1 − {;
{ = _CFE9=<C@E<H�@ℎED=9<E9=<@Ï{��t = �tj�
ê�t = �t = (1 − {)�t = �t − {�t ê��t = êê�t = ê(�t − �tj�) = �t − �tj� − (�tj� −�tj�) = �t − 2�tj� + �tj� = (1 − {)��t Unit roots � number of times a SP say {�t} is to be differenced to make it stationary
i.e. If �t~T(G)E<�t�4�D@=F<C@=GE^E<G=<G
d� number of unit roots in �t ∆�t~T(0) → 4@C@�EDC<Ï Integrated SPs
� �t~T(G); d� order of integration or # of unit roots in �t 1
st difference of CPI is inflation
Page 22
�t = ¼)Tt ∆�t = TD t̂ ∆�t = ¼)Tt − ¼)Ttj�
∆TD t̂ = TD t̂ − TD t̂j�ËÖÖÖÌÖÖÖÍí(0)
G = 0,1,2
Random walks and White noise process
RWM
�t = �tj� +�t ∆�t = �t 3 types of Random Walks
1. RWM
�t = Û�tj� +�t; Û = 1
2. RWMD (with drift)
�t = ¬�s|��tÑ¡|¡¢ltl|+ Û�tj� + �t; Û�¡Kt�£�||l�¡t���£�l���£�l�t
= 1
3. RWDT (with deterministic trend) �t = ¬ + Û�tj� + �t +�t; Û = 1
When two or more NS variable are regressed, the result is spurious.
Granger-Newbold Rule (aka Classical)
�symptom of spurious regression
If *� > Ô�<®�D − HC@4ED4@C@�4@�? → 49�<�E�4 Unit root testing
�provided by Dickey and Fuller
Dickey-Fuller Test
Auxillary regression:
Page 23
RWM:
�t = Û�tj� + �t Δ�t = (1 − Û)�tj� +�t Δ�t = O�tj� +�t /0:O = 0 → �=CD4Û = 1/�:O < 0 → (4@C@�EDC<Ï) Derived from normal distribution
@ = OÉ4=(OÉ) → DE@3C_�G
That’s why they developed
� Τ − @C�distribution (aka Dickey-fuller distribution)
Shortcoming: Δ�t = O�tj� +�t �t is highly correlated but Dickey and fuller assumed that it is white noise.
Augmented Dickey-Fuller Test (ADF)
RWM:
Δ�t = �tj� +%��Δ�tj�Ñ�'� + ït
RWMD:
Δ�t = ¬ + �tj� +%��Δ�tj�Ñ�'� + ït
RWMDT:
Δ�t = ¬ + �tj� + �t +%��Δ�tj�Ñ�'� + ït
Level series � original time series that we’re investigating
�t~Ó*(9) current error has long memory; current is related to past
�t =Û��tj� + Û��tj� +⋯+ ÛÑ�tjÑ + Êt Alternative unit root tests: PP, KPSS, ADF-GLS, DP, NP
Page 24
Cointegation Analysis
It is a property of 2 or more non-stationary variables to be linked together by a LR equilibrium
relationship
Robert engle and Clive Granger (1982)
� Work for cointegration analysis to check if there’s spurious regression
Augmented Engle Granger Test (AEG)
If t, �t~T(0) if their SRF
�, = �- + �-�t; �Ét = � − �,~T(0) Then tCDG�tare cointegrated
Stage 1: ADF tCDG�t if both are I(1)
We perform unit root test on both variables and see to it that they are non-stationary.
Stage 2: Run OLS on tCDG�t to get SRF and obtain �Ét , ADF �Ét and if �Ét~T(0)� X & Y are
cointegrated
Short-run dynamics
ECM: Error Correction model
� Granger Representation Theorem
- If the variable used in regression are cointegrated, an ECM representation
of the LR model is possible.
In k=2:
Δ�t = O0 + O�Δt + O��Étj� + Êt OÉ� → =<<E<?E<<=?@�ED?E=^^�?�=D@
Cointegration and Error correction
VAR(p) � Christopher Sims
Vector Autoregression
Let �tA = b��t��t …��t
Page 25
All variables are endogenous
�t = �t +Ó��tj� + Ó��tj�…+ ÓÑ�tjÑ + Êt A’s are matrices.
OLS per equation is BLUE
AR(p) current var is related to past value
Δ�t = çt + ��tj� +%Γ�æj��'� Δ�tj� + Êt
Johansen
Eigen-value � � − max @=4@ � individual test
“trace” test � cumulative testing
Identifiication of the cointegration vectors
• Cointegrated vector� economic theory; long-run equation
• � = O�′ • � = O�¡s&K t¢l�t��|l||�|tl|¢
��£���tl�|¡t���¤l£t�|′
Johansen� more versatile and can be used for more than two variables