Assumptions and errors Leverage and influence Multicollinearity Heteroscedasticity Quantitative Methods I: Regression diagnostics Johan A. Elkink University College Dublin 10 December 2014 Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Quantitative Methods I:Regression diagnostics
Johan A. Elkink
University College Dublin
10 December 2014
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
1 Assumptions and errors
2 Leverage and influence
3 Multicollinearity
4 Heteroscedasticity
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Outline
1 Assumptions and errors
2 Leverage and influence
3 Multicollinearity
4 Heteroscedasticity
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Assumptions: specification
Linear in parameters (i.e. f (Xβ) = Xβ and E [y] = Xβ)No extraneous variables in XNo omitted independent variablesParameters to be estimated are constantNumber of parameters is less than the number of cases,k < n
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Assumptions: errors
Errors have an expected value of zero, E [ε|X] = 0Errors are normally distributed, ε ∼ N(0, σ2)
Errors have a constant variance, Var(ε|X) = σ2 <∞Errors are not autocorrelated, Cov(εi , εj |X) = 0 ∀ i 6= jErrors and X are uncorrelated, Cov(X, ε) = 0
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Assumptions: regressors
X variesX is of full column rank (note: requires k < n)No measurement error in XNo endogenous variables in X
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Assumptions for unbiasedness
If
the population regression model is linear in its parameters;the sample is a random sample from the population;there is no perfect collinearity, r(X) = k ;and the expected values of the error term is zeroconditional on the explanatory variables, E(ε|X) = 0 andCov(ε,X) = 0,
then the OLS estimators of β are unbiased.
(Glynn, 2007)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Non-constant error variance
Consequences:
σ2 is a biased estimator of σ2;the probability of a Type I error will not be α;the least squares estimator is no longer the best linearunbiased estimator,
but:
the severity depends on the level of heteroscedasticity;β still an unbiased estimator of β.
(King, 2007)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Non-normal errors
Consequences:
sampling distribution of β not normal;test statistics will not have t- and F -distributions;the probability of a Type I error will not be α,
but the estimates are still consistent: as n increases, the aboveproblems disappear.
(King, 2007)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Exercise: film reviews
Open the films.dta data set. Create a new variablehighrating, which is 1 for films rated 3 or higher, 0 otherwise.
Using matrix formulas,
1 regress desclength on a constant2 regress desclength on castsize3 regress desclength on castsize, highrating, length
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Exercise: film reviews
Based on the last regression:
1 Which observation has the largest residual?2 Compute mean and median of residuals3 Compute correlation between residuals and fitted values4 Compute correlation between residuals and length5 All other predictors held constant, what would be the
difference in predicted description length between high andlow rated movies?
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Non-linearity
If there is non-linearity in the variables, but not in theparameters, there is no problem. E.g.
yi = β0 + β1xi + β2x2i + εi
can be estimated with OLS.
If there are other non-linearities, sometimes the equation canbe transformed. E.g.
yi = β0xβ1i εi
log(yi) = log(β0) + β1 log(xi) + log(εi)
y∗i = β∗0 + β1x∗i + ε∗i
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Functional forms for additional non-lineartransformations
log-linear as with the previous example
semi-log has two forms:
yi = β0 + β1 log(xi),
where β1 is ∆y due to %∆x
log(yi) = β0 + β1xi ,
where β1 is %∆y due to ∆x
inverse or reciprocal: yi = β0 + β11xi
polynomial yi = β0 + β1xi + β2x2i
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Outline
1 Assumptions and errors
2 Leverage and influence
3 Multicollinearity
4 Heteroscedasticity
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Leverage
A high leverage point i is one where xi is far from the mean ofX. These points can be identified using the so-called hat matrix,the matrix that puts a hat on y:
y = Xβ = X(X′X)−1X′y = Hy,
the diagonal of which is a measure of leverage.
(King, 2007)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Outliers
An outlier is a point on the regression line where the residual islarge.
To account for the potential variables in the sampling variancesof the residuals, we calculate externally studentized residuals(or studentized deleted residuals), where a large absolute valueindicates an outlier. A test could be based on the fact that in amodel without outliers, they should follow a t(n− k) distribution.
(Kutner et al., 2005, 390–398)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Outliers
An outlier is a point on the regression line where the residual islarge.
To account for the potential variables in the sampling variancesof the residuals, we calculate externally studentized residuals(or studentized deleted residuals), where a large absolute valueindicates an outlier. A test could be based on the fact that in amodel without outliers, they should follow a t(n− k) distribution.
(Kutner et al., 2005, 390–398)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Influence
An influential point is one which has a strong impact on theestimation of β. An influential point is one which has highleverage and is also an outlier.
We typically look at Cook’s Distance to assess the level ofinfluence of each variable.
(King, 2007)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Leverage and influence
A point with high leverage is located far from the other points. Ahigh leverage point that strongly influences the regression lineis called an influential point.
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Outlier, low leverage, low influence
●
●
●
●
●
●
3 4 5 6 7 8
810
1214
x
y
●
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
High leverage, low influence
●
●
●
●
●
●
5 10 15 20
1015
2025
x
y
●
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
High leverage, high influence
●
●
●
●
●
●
5 10 15 20
810
1214
16
x
y
●
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Cook’s Distance
Di ≡∑n
j=1(yj − yj(−i))2
ks2 =(βOLS
(−i) − βOLS)′X′X(βOLS
(−i) − βOLS)
ks2
=
(ei
s√
1− hi
)2 hi
k(1− hi)
=t2ik
var(yi)
var(ei)
∼ F (k ,n − k)
The F -test here refers to whether βOLS would be significantlydifferent if observation i were to be removed (H0 : β = β(−i)).
(Cook, 1979, 168)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Cook’s Distance
Di =t2ik
var(yi)
var(ei)
“t2i is a measure of the degree to which the i th observation can
be considered as an outlier from the assumed model.”
“The ratios var(yi )var(ei )
measure the relative sensitivity of the
estimate, βOLS, to potential outlying values at each data point.”
(Cook, 1977, 16)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
What to do with outliers?
Options:
1 Ignore the problem2 Investigate why the data are outliers — what makes them
unusual?3 Consider respecifying the model, either by tranforming a
variable or by including an additional variable (but bewareof overfitting)
4 Consider a variant of “robust regression” that downweightsoutliers
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Diagnosing problems in R
A very easy set of diagnostic plots can be accessed byplotting a lm object, using plot.lm()
This produces, in order:1 residuals against fitted values2 Normal Q-Q plot3 scale-location plot of
√|ei | against fitted values
4 Cook’s distances versus row labels5 residuals against leverages6 Cook’s distances against leverage/(1-leverage)
Note that by default, plot.lm() only gives you 1,2,3,5
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Exercise
Open the uswages.dta data set and regress log(wage) oneduc, exper and race.
Check for leverage, outliers, influential points and nonlinearities.
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Outline
1 Assumptions and errors
2 Leverage and influence
3 Multicollinearity
4 Heteroscedasticity
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Collinearity
When some variables are linear combinations of others then wehave exact (or perfect) collinearity, and there is no unique leastsquares estimate of β.
When X variables are highly correlated, we havemulticollinearity.
Detecting multicollinearity:
look at correlation matrix of predictors for pairwisecorrelationsregress each independent variable on all otherindependent variables to produce R2
j , and look for highvalues (close to 1.0)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Multicollinearity
The extent to which multicollinearity is a problem is debatable.
The issue is comparable to that of sample size: if n is too small,we have difficulty picking up effects even if they really exist; thesame holds for variables that are highly multicollinear, making itdifficult to separate their effects on y.
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Multicollinearity
However, some problems with high multicollinearity:
Small changes in data can lead to large changes inestimatesHigh standard errors but joint significanceCoefficients may have “wrong” sign or implausiblemagnitudes
(Greene, 2002, 57)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Variance of βOLS
var(βOLSk ) =
σ2
(1− R2k )∑n
i (xik − xk )2
σ2: all else equal, the better the fit, the lower the variance(1− R2
k ): all else equal, the lower the R2 from regressingthe k th independent variable on all other independentvariables, the lower the variance
∑ni (xik − xk )2: all else equal, the more variation in x , the
lower the variance
(Greene, 2002, 57)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Variance Inflation Factor
var(βOLSk ) =
σ2
(1− R2k )∑n
i (xik − xk )2
VIFk =1
1− R2k,
thus VIFk shows the increase in the var(βOLSk ) due to the
variable being collinear with other independent variables.
library(faraway)
vif(lm(...))
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Multicollinearity: solutions
Check for coding or logical mistakes (esp. in cases ofperfect multicollinearity)Increase nRemove one of the collinear variables (apparently notadding much)Combine multiple variables in indices or underlyingdimensionsFormalise the relationship
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Exercise
Using demdev.dta data and model
polity2i = β0+β1cwari+β3laggdppci+β4propdemi+β5energy2i+εi ,
check whether there are any multicollinearity problems.
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Outline
1 Assumptions and errors
2 Leverage and influence
3 Multicollinearity
4 Heteroscedasticity
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Homoscedasticity
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Heteroscedasticity
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Heteroscedasticity
Regression disturbances whose variances are not constantacross observations are heteroscedastic.
Under heteroscedasticity, the OLS estimators remain unbiasedand consistent, but are no longer BLUE or asymptoticallyefficient.
(Thomas, 1985, 94)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Causes of heteroscedasicity
More variation for larger sizes (e.g. profits of firms variesmore for larger firms)More variation across different groups in the sampleLearning effects in time-seriesVariation in data collection quality (e.g. historical data)Turbulence after shocks in time-series (e.g. financialmarkets)Omitted variableWrong functional formAggregation with varying sizes of populationsetc.
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Heteroscedasticity
Since OLS is no longer BLUE or asymptotically efficient,
other linear unbiased estimators exist which have smallersampling variances;other consistent estimators exist which collapse morequickly to the true values as n increases;we can no longer trust hypothesis tests, becausevar(βOLS) is biased.
cov(X2i , σ
2i ) > 0, then var(βOLS) is underestimated
cov(X2i , σ
2i ) = 0, then no bias in var(βOLS)
cov(X2i , σ
2i ) < 0, then var(βOLS) is overestimated
(inefficient)
(Thomas, 1985, 94–95)
(Judge et al., 1985, 422)Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
To detect heteroscedasticity (unequal variances), it is useful toplot:
Residuals against fitted valuesResiduals against dependent variableResiduals against independent variable(s)
Usually, the first one is sufficient to detect heteroscedasticity,and can simply be found by:
m <- lm(y ~ x)
plot(m)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
0 2 4 6 8
510
1520
2530
35
x
y
m <- lm(y ~ x)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
5 10 15 20 25
−10
−5
05
fitted(m)
resi
dual
s(m
)
plot(residuals(m) ~ fitted(m))
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
5 10 15 20 25 30 35
−10
−5
05
y
resi
dual
s(m
)
plot(residuals(m) ~ y)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
0 2 4 6 8
−10
−5
05
x
resi
dual
s(m
)
plot(residuals(m) ~ x)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: homoscedasticity
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
0 2 4 6
510
1520
25
x
y
m <- lm(y ~ x)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: homoscedasticity
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
5 10 15 20 25
−2
−1
01
2
fitted(m)
resi
dual
s(m
)
plot(residuals(m) ~ fitted(m))
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: homoscedasticity
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
5 10 15 20 25
−2
−1
01
2
y
resi
dual
s(m
)
plot(residuals(m) ~ y)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: homoscedasticity
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
0 2 4 6
−2
−1
01
2
x
resi
dual
s(m
)
plot(residuals(m) ~ x)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 2 4 6 8
020
4060
8010
012
0
x
y
m <- lm(y ~ x)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
−20 0 20 40 60 80
−10
010
20
fitted(m)
resi
dual
s(m
)
plot(residuals(m) ~ fitted(m))
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
0 20 40 60 80 100 120
−10
010
20
y
resi
dual
s(m
)
plot(residuals(m) ~ y)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Residual plots: heteroscedasticity
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
0 2 4 6 8
−10
010
20
x
resi
dual
s(m
)
plot(residuals(m) ~ x)
Johan A. Elkink diagnostics
Assumptions and errorsLeverage and influence
MulticollinearityHeteroscedasticity
Cook, R. Dennis. 1977. “Detection of influential observation in linear regression.” Technometrics pp. 15–18.
Cook, R. Dennis. 1979. “Influential observations in linear regression.” Journal of the American Statistical Association74(365):169–174.
Glynn, Adam. 2007. “GOV 2000: Quantitative Methodology for Political Science I.” Lecture slides, Harvard University.
Greene, William H. 2002. Econometric analysis. London: Prentice Hall.
Judge, George G, William E Griffiths, R Carter Hill, Helmut Lutkepohl and Tsoung-Chao Lee. 1985. The Theory andPractice of Econometrics. New York: John Wiley and Sons.
King, Gary. 2007. “GOV 2000: Quantitative Methodology for Political Science I.” Lecture slides, Harvard University.
Kutner, Michael H., Christopher J. Nachtsheim, John Neter and William Li. 2005. Applied linear statistical models.5th ed. McGraw-Hill.
Thomas, R. Leighton. 1985. Introductory econometrics: theory and applications. Longman Harlow, Essex.
Johan A. Elkink diagnostics