Lecture notes to Stock and Watson chapter 4 Introductory linear regression Tore Schweder August 2008 TS () LN3 25/08 1 / 13
Lecture notes to Stock and Watson chapter 4Introductory linear regression
Tore Schweder
August 2008
TS () LN3 25/08 1 / 13
Regression
"Regression" is due to Francis Galton (1822-1911): how is a son�s heightrelated to his father�s height?
TS () LN3 25/08 2 / 13
Regression towards the mean
Figure: Galton�s original diagram. Parents-child pairs by heigth.TS () LN3 25/08 3 / 13
The geometry of linear regression
Figure: The regression curve is y = f (x) = E [Y jX = x ] .
TS () LN3 25/08 4 / 13
The linear regression model
Question: how is a response variable Y related to a stimulus variable X(explanatory/control/explanatory)? Assuming a linear relation,
1 what is the slope?2 is the slope positive?3 how good does the estimated line �t the observed data?
Example: Y = growth in BNP, X = in�ation previous year.
D : A sample of size n of pairs explanatory variables X and responsevariable Y .
M : (X1,Y1) , � � � , (Xn,Yn) is an iid random sample from an in�nitepopulation. Y = β0 + β1X + u;E [Y jX = x ] = β0 + β1x , Eu = 0, cov(u,X ) = 0. β0, β1 areparameters.
TS () LN3 25/08 5 / 13
OLS
Least squares and regression was known by Laplace and Gauss (80 yearsbefore Galton).
�bβ0, bβ1� = argminb0,b1
n
∑i=1(Yi � (b0 + b1Xi ))2
bβ1 =sXYs2X
= rsYsX, bβ0 = Y � bβ1X
Predicted response given the stimuli: bYi = bβ0 + bβ1Xi is the �ttedvalue.
(Empirical) residual: bui = Yi � bYi is the vertical distance from the�tted value to the observed value of Y . bu = 0, rbuX = 0
TS () LN3 25/08 6 / 13
Measures of �t
Homoscedasticity when var(Y jX = x) = σ2 is independent of x .
Then bσ2 = 1n�2 ∑n
i=1
�Yi � bYi�2 = 1
n�2RSS is unbiased.
TSS = ∑ni=1
�Yi � Y
�2= ∑n
i=1
�Yi � bYi�2 +∑n
i=1
�bYi � Y �2 =RSS + ESS
R2 = ESSTSS = 1�
RSSTSS = r
2XY
TS () LN3 25/08 7 / 13
Be aware of extreme observations!
Figure: The OLS line hinges on one point.
Figure: Heavy tails in the conditional distribution of Y jX distorts the OLS line.
TS () LN3 25/08 8 / 13
Desired number of children and age for beginning masterstudents
Age
Chi
ldre
n
20 22 24 26 28 30 32 34
01
23
45
Figure: Desired number of children by age for 38 students in ECON4135 class2008. The points are slightly jittered. OLS line: Children = 1.884+ 0.002Age.Children = 1.92, Age = 23.8, cor(Children,Age) = 0.004, R2 = 0.000016.
TS () LN3 25/08 9 / 13
Constructed model and simulated data
x = age (and probability)
y =
nu
mb
er
of c
hild
ren
20 25 30 35 40
01
23
4
y = number of children
x =
ag
e (a
nd
pro
ba
bili
ty)
0 1 2 3 4
20
25
30
35
40
x
y
20 25 30 35
01
23
4
x = age
var(
Y1X
=x)
20 25 30 35 400
.00
.40
.81
.2
Figure: UL: E [Y = y jX = x ] = �2.02+ 0.101x , and P [Y = y jX = x ] shownby horizontal thick lines for some values of x ; UR: the same, but axesinterchanged; LL the scatter plot of a simulated sample of size n = 43, the pointsare slightly jittered, with the OLS line y = �1.9+ 0.097x ; LR: var [Y jX = x ] .
TS () LN3 25/08 10 / 13
Constructed model - repeated simulations
beta0
dens
ity
10 5 0 5
0.0
0.05
0.10
0.15
0.20
beta1
dens
ity
0.2 0.0 0.2 0.4
01
23
45
T0
dens
ity
4 2 0 2 4
0.0
0.1
0.2
0.3
0.4
T1de
nsity
4 2 0 2 4
0.0
0.1
0.2
0.3
0.4
Figure: Approximate densities of bβ0 (UL), bβ1 (UR), T0 = �bβ0 � β0
�/SE
�bβ0�(LL), and T1 =
�bβ1 � β1
�/SE
�bβ1� (LL) based on 1000 replicates of simulateddata of size n = 43 in constructed model for (X ,Y ) . Standard errors by simpleOLS.
TS () LN3 25/08 11 / 13
Constructed model - repeated simulations (cont)
Quantiles of Standard N ormal
beta
0.ha
t
2 0 2
10
50
5
Quantiles of Standard N ormal
beta
1.ha
t
2 0 2
0.1
0.1
0.3
Quantiles of Standard N ormal
T0
2 0 2
20
24
Quantiles of Standard N ormal
T1
2 0 24
20
24
Figure: Normal probability plots (QQ-plots against N(0,1)) for the samesimulated material as in previous �gure.
TS () LN3 25/08 12 / 13