Top Banner
STA291 Statistical Methods Lecture 28
23

STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Apr 01, 2015

Download

Documents

Brenden Garvis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

STA291Statistical Methods

Lecture 28

Page 2: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Extrapolation and PredictionExtrapolating – predicting a y value by extending the regression model to regions outside the range of the x-values of the data.

Page 3: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Why is extrapolation dangerous?

It introduces the questionable and untested assumption that the relationship between x and y does not change.

Extrapolation and Prediction

Page 4: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Cautionary Example: Oil Prices in Constant Dollars

Model Prediction (Extrapolation):

On average, a barrel of oil will increase $7.39 per year from 1983 to 1998.

Price = – 0.85 + 7.39 Time

Extrapolation and Prediction

Page 5: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Cautionary Example: Oil Prices in Constant Dollars

Actual Price Behavior

Extrapolating the 1971-1982 model to the ’80s and ’90s lead to grossly erroneous forecasts.

Extrapolation and Prediction

Price = – 0.85 + 7.39 Time

Page 6: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Remember: Linear models ought not be trusted beyond the span of the x-values of the data.

If you extrapolate far into the future, be prepared for the actual values to be (possibly quite) different from your predictions.

Extrapolation and Prediction

Page 7: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Unusual and Extraordinary ObservationsOutliers, Leverage, and Influence

In regression, an outlier can stand out in two ways. It can have…

1) a large residual:

Page 8: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Outliers, Leverage, and Influence

In regression, an outlier can stand out in two ways. It can have…

2) a large distance from : x“High-leveragepoint”

A high leverage point is influential if omitting it gives a regression model with a very different slope.

Unusual and Extraordinary Observations

Page 9: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

Unusual and Extraordinary Observations

Not high-leverage

Large residual

Not very influential

Page 10: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

High-leverage

Small residual

Not very influential

Unusual and Extraordinary Observations

Page 11: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

High-leverage

Medium (large?) residual

Very influential (omitting the red point will change the slope dramatically!)

Unusual and Extraordinary Observations

Page 12: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Outliers, Leverage, and Influence

What should you do with a high-leverage point?

Sometimes, these points are important. They can indicate that the underlying relationship is in fact nonlinear.

Other times, they simply do not belong with the rest of the data and ought to be omitted.

When in doubt, create and report two models: one with the outlier and one without.

Unusual and Extraordinary Observations

Page 13: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Example: Hard Drive Prices

Prices for external hard drives are linearly associated with the Capacity (in GB). The least squares regression line without a 200 GB drive that sold for $299.00 was found to be .

The regression equation with the original data is

How are the two equations different?

The intercepts are different, but the slopes are similar.

Does the new point have a large residual? Explain.

Yes. The hard drive’s price doesn’t fit the pattern since it pulled the line up but didn’t decrease the slope very much.

Price 18.64 0.104Capacity

Price 66.57 0.088Capacity

Unusual and Extraordinary Observations

Page 14: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Working with Summary ValuesScatterplots of summarized (averaged) data tend to show less variability than the un-summarized data.

Example:

Wind speeds at two locations, collected at 6AM, noon, 6PM, and midnight.

Raw data:

R2 = 0.736

Daily-averageddata:

R2 = 0.844

Monthly-averaged data:

R2 = 0.942

Page 15: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

WARNING:

Be suspicious of conclusions based on regressions of summary data.

Regressions based on summary data may look better than they really are!

In particular, the strength of the correlation will be misleading.

Working with Summary Values

Page 16: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

AutocorrelationTime-series data are sometimes autocorrelated, meaning points near each other in time will be related.First-order autocorrelation:

Adjacent measurements are related

Second-order autocorrelation:Every other measurement is related

etc…

Autocorrelation violates the independence condition. Regression analysis of autocorrelated data can produce misleading results.

Page 17: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Transforming (Re-expressing) Data

An aside

On using technology:

Page 18: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Linearity

Some data show departures from linearity.

Example: Auto Weight vs. Fuel Efficiency

Linearity condition is not satisfied.

Transforming (Re-expressing) Data

Page 19: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Linearity

In cases involving upward bends of negatively-correlated data, try analyzing –1/y (negative reciprocal of y) vs. x instead.

Linearity condition now appears satisfied.

Transforming (Re-expressing) Data

Page 20: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

The auto weight vs. fuel economy example illustrates the principle of transforming data.

There is nothing sacred about the way x-values or y-values are measured. From the standpoint of measurement, all of the following may be equally-reasonable:

x vs. y

x vs. –1/y

x2 vs. y

x vs. log(y)

One or more of these transformations may be useful for making data more linear, more normal, etc.

Transforming (Re-expressing) Data

Page 21: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Goals of Re-expression

Goal 1 Make the distribution of a variable more symmetric.

vs. y x vs. logy x

Transforming (Re-expressing) Data

Page 22: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Goals of Re-expression

Goal 2 Make the spread of several groups more alike.

vs. y x log vs. y x

We’ll see methods later in the book that can be applied only to groups with a common standard deviation.

Transforming (Re-expressing) Data

Page 23: STA291 Statistical Methods Lecture 28. Extrapolation and Prediction Extrapolating – predicting a y value by extending the regression model to regions.

Looking backMake sure the relationship is straight enough to fit a regression model.

Beware of extrapolating.

Treat unusual points honestly. You must not eliminate points simply to “get a good fit”.

Watch out when dealing with data that are summaries.

Re-express your data when necessary.