Top Banner
Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008
15

Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Outliers and Influential Data Points in Regression Analysis

James P. Stevens

sujin jangnovember 10, 2008

Page 2: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Beware of Outliers

• Regression is sensitive to outliers– Important to detect outliers and influential points

• Summary stats can be misleading…– Important to explore the data, rather than relying

on just 1-2 summary stats

Page 3: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Look at your Data!

– For all three plots, r, means, and SD are equal

Page 4: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

But it’s not enough to look…

Page 5: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

So what should we do?

• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in

the space of predictors

Page 6: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Types of Outliers• Classifying Outliers:

- Outliers in the space of outcomes (outliers on y)- Outliers in the space of predictors (outliers on x)

Page 7: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

So what should we do?

• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in

the space of predictors

Page 8: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

So what should we do?

• Ways of Detecting Outliers:– Studentized residuals for outliers on y– Mahalanobis distance &Hat matrix for outliers in

the space of predictors

BUT…The points they identify will not necessarily be influential in affecting the regression coefficients…

Page 9: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Outliers and Influential Points

outliers

influentialpoints

Page 10: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Example: Influential Points

Non-influential

Influential

Page 11: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Cook’s Distance:Identifying Influential Points

• A measure of the change in the regression coefficients that would occur if the case was omitted. – Affected by both the case being an outlier on y and in

the set of predictors – Measures the joint (combined) influence on the case

being an outlier on y and on x

Page 12: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Now what?

Step 1. Detect Step 2. IsolateStep 3. Examine

-Are they qualitatively different?-Are they influential?Another thing to consider:

influential “clusters”?

Page 13: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Example: Groups of Cases

Page 14: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Now what?

Step 1. Detect Step 2. IsolateStep 3. Examine

-Are they qualitatively different?-Are they influential?

Step 4. Delete or retain as you see fit … Or try both

Page 15: Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

The End