Top Banner
29

2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: [email protected] April 2014.

Jan 01, 2016

Download

Documents

Conrad Hoover
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.
Page 2: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

2

Multicollinearity

Presented by: Shahram Arsang

Isfahan University of Medical Sciences

Email: [email protected]

April 2014

Page 3: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

FOCUS

• Definition of Multicollinearity• Distinguish of Multicollinearity• Remedial measures of Multicollinearity• Example

6 April 2014 3

Page 4: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Multicollinearity

• Definition: predictor variable are highly correlated among themselves

• Example: body fat

• potential harm of collinearity:difficult to infer the separate influence of such explanatory

variables on the response variable.

6 April 2014 4

Page 5: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Problems with Multicollinearity

1. adding or deleting a predictor variable change the regression coefficient.

2. the extra sum of square associated with a predictor varies , depending upon which other predictor variables are already included in the model.

3. the estimated SD of the regression coefficients become large

4. the estimated regression coefficients individually may not be statistically significant even though a definite statistical relations exists between the response variable and the set of predictor variables.

6 April 2014 5

Page 6: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

6 April 2014 6

Page 7: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

• diagnosis consists of two related but separate elements:

1- detecting the presence of collinear

2-assessing the extent to which these relationships have degraded estimated parameters.

6 April 2014 7

Page 8: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

diagnostic Informal diagnostics for multicollinearity

6 April 2014 8

1-large changes in the estimated regression coefficient when a predictor variable is added or deleted, or when an observation is altered or deleted.

2- nonsignificant result in individual tests on the regression coefficient for important predictor variables.

3- estimated regression coefficient with an algebraic sign that is the opposite of that expected from theoretical considerations or prior experience.

4- large coefficient of simple correlation between pairs of predictor variable in the correlation matrix rxx.

5- wide confidence intervals for the regression coefficients representing

important predictor variables .

Page 9: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

limitation of informal diagnostics

1. they provide qualitative measurements

2. sometimes the observed behavior may occur without Multicollinearity being present.

Page 10: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Multicollinearity diagnostic methods

• Correlation matrix R (or ) of x`s (absence of high correlations cannot be viewed as evidence of no problem)

• • Variance Inflation Factor (VIF)Weakness:1. unable to reveal the presence of several coexisting near dependencies among

the explanatory variates.

2. meaningful boundary to distinguish between values of VIF

• The technique of Farrar and Glauber (partial correlation)

6 April 2014 10

Page 11: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

The technique of Farrar and Glauber• the n*p data matrix X is a sample of size n from

a p-variate Gaussian (normal) distribution

• that is, the partial correlation between Xi and Xj, adjusted for all other X-variates, to investigate the patterns of interdependence in greater detail

6 April 2014 11

Page 12: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Variance inflation factor (VIF)

VIF: how much the variances of the ß are inflated as compared to when the xi`s are not linearly related.

Variance-covariance matrix of the ß and ß*:

Page 13: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Diagnostic uses

severity of multicollinearity:

1. Large value of VIF VIF > 10

2. means of the VIF

: how far the estimated standardized regression coefficient bk* are from the true values βk*.

It can be shown that the expected value of the sum of these squared errors (bk*-βk*)2 is given by :

Page 14: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

When no X variable is linearly related to the others in the regression model ;

Sum of (VIF)k ≡ p-1

Provide useful information about the effect of multicollinearity on the sum of the squared errors :

Page 15: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Mean of the VIF values , to be denote by (VIF) :

VIF > 1 indicate of serious Multicollinearity problems.

Page 16: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Body fat example ;

The expected sum of the squared errors in the least squares standardized regression coefficient is nearly 460 times as large as it would be if the x variables were uncorrelated .

Multicollinearity problem ?

Page 17: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Comments

1. reciprocal of the VIF for exclusion x variables:

2. Limitation of VIF: distinguish between several simultaneous multicollinearity

Page 18: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Remedial measures1. Making predictions is not a problem

2. Centered data for x`s

3. Dropping one or more predictors

4. Add some cases that may break the pattern of multicollinearity

5. Use different data sets to estimate different coefficients

6. Principal component analysis

6 April 2014 18

Page 19: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Ridge regression

• By modifying the method of least square to allow biased estimator of the regression coefficients

• Ridge estimators:

by the correlation transform

• Idea is to use a small biasing constant c and find

6 April 2014 19

Page 20: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Ridge regression

• standardized ridge coefficients• C amount of bias in estimator

• C=0 =OLS in standardized form• c>0 `s are biased but more stable than OLS

6 April 2014 20

1( )Rxx yxb r cI r

Page 21: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Ridge regression

• Results:

1. As c increases, bias increases, variance of the betas decreases

2. There always exists a c for which the total MSE for ridge regression is SMALLER than that for OLS.

3. There are no hard and fast ways of finding c.

6 April 2014 21

Page 22: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

• Choice of biasing constant c:

1. Ridge trace: Simultaneous plot of the values of the (p-1) estimated ridge standardized regression coefficients for different values of c between 0 and 1.

2. VIF

6 April 2014 22

Page 23: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Choice of biasing constant c

• Smallest value of c where it is deemed that:

1- regression coefficients have steadied itself and

2- VIF is small

6 April 2014 23

Page 24: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Comments:

• Limitation of ridge regression:

1-Precision of ridge regression coefficient: Bootstrap

2- choice of c

6 April 2014 24

Page 25: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Use ridge regression to reducing predictor variables:

• Unstable ridge trace with coefficient tending toward zero • Ridge trace is stable but at a very small value• Unstable ridge trace that do not tend toward zero: candidate

6 April 2014 25

Page 26: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

VIF - SPSS

6 April 2014 26

Page 27: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Output- Spss

6 April 2014 27

Page 28: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.

Example 2 – VIF value and remedial measure

6 April 2014 28

Page 29: 2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences Email: shahramarsang@gmail.com April 2014.