Happiness comes not from material wealth but less desire.

Post on 19-Mar-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Happiness comes not from material wealth but less desire. Applied Statistics Using SAS and SPSS. Topic: Simple linear regression By Prof Kelly Fan, Cal State Univ, East Bay. - PowerPoint PPT Presentation

Transcript

1

Happiness comes not from material wealth but less desire.

2

Applied Statistics Using SAS and SPSS

Topic: Simple linear regression

By Prof Kelly Fan, Cal State Univ, East Bay

3

Example: Computer Repair

 A company markets and repairs small computers. How fast (Time) an electronic component (Computer Unit) can be repaired is very important to the efficiency of the company. The Variables in this example are:

Time and Units.

4

Humm… How long will it take

me to repair this unit?

Goal: to predict the length of repair Time for a given number of computer Units

5

Computer Repair Data

Units Min’s Units Min’s

1 23 6 97

2 29 7 109

3 49 8 119

4 64 9 149

4 74 9 145

5 87 10 154

6 96 10 166

6

Scatterplot of response variable against explanatory variable

What is the overall (average) pattern?What is the direction of the pattern? How much do data points vary from the overall (average)

pattern?Any potential outliers?

Graphical Summary of Two Quantitative Variable

7

Time is Linearly related with computer Units.

(The length of) Time is Increasing as (the number of) Units increases.

Data points are closed to the line.

No potential outlier.

Scatterplot (Time vs Units) Some Simple Conclusions

Summary for Computer Repair Data

8

Numerical Summary of Two Quantitative Variable

Regression equation

Correlation

9

Review: Math Equation for a Line

Y: the response variableX: the explanatory variable

X

Y Y=b0+b1X

} b0

} b1

1

10

Regression Equation

The regression line models the relationship between X and Y on average.

The math equation of a regression line is called regression equation.

11

The Usage of Regression Equation

Predict the value of Y for a given X valueEg. How long will it take to repair 3

computer units?

12

General Notation

is called “predicted Y,” pronounced as “y hat,” which estimates the average Y value for a specified X value.

Eg.

The predicted repair time of a given # of units

XY 51.1516.4ˆ

XbbY 10ˆ

13

The Limitation of the Regression Equation

The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed.

Eg. The predicted WT of a given HT:

Given HT of 40”, the regression equation will give us WT of -205+5x40 = -5 pounds!!

XY 5205ˆ

14

The Unpredicted Part

The value is the part the regression equation (model) cannot predict, and it is called “residual.”

YY ˆ

15

residual {

16

Correlation between X and Y

X and Y might be related to each other in many ways: linear or curved.

17

x

y

0.0 0.2 0.4 0.6 0.8 1.0

1.2

1.4

1.6

1.8

2.0

2.2

x

y

0.0 0.2 0.4 0.6 0.8 1.0

1.5

2.0

2.5

3.0

r=.98Strong Linearity

r=.71Median Linearity

Examples of Different Levels of Correlation

18

x

y

0.0 0.2 0.4 0.6 0.8 1.0

2.0

2.5

3.0

3.5

4.0

r=-.09Nearly Uncorrelated

Examples of Different Levels of Correlation

x

y

0.0 0.2 0.4 0.6 0.8 1.0

1.0

1.5

2.0

2.5

3.0

r=.00Nearly Curved

19

(Pearson) Correlation Coefficient of X and Y

A measurement of the strength of the “LINEAR” association between X and Y

Sx: the standard deviation of the data values in X, Sy: the standard deviation of the data values in Y;the correlation coefficient of X and Y is:

xy

n

iii

ssn

xxyyr

)1(

))((1

20

Correlation Coefficient of X and Y

-1< r < 1The magnitude of r measures the strength

of the linear association of X and YThe sign of r indicate the direction of the

association: “-” negative association“+” positive

association

21

Goodness of Fit

R^2 is the proportion of Y variance explained/accounted by the model we use to fit the data

When there is only one X (simple linear regression) R^2 = r^2.

22

SPSS Output

Analyze >> Regression >> Linear

Model Summaryb

.994a .987 .986 5.39172Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), unitsa.

Dependent Variable: timeb.

ANOVAb

27419.509 1 27419.509 943.201 .000a

348.848 12 29.07127768.357 13

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), unitsa.

Dependent Variable: timeb.

23

Confidence Intervals

Coefficientsa

4.162 3.355 1.240 .239 -3.148 11.47215.509 .505 .994 30.712 .000 14.409 16.609

(Constant)units

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound95% Confidence Interval for B

Dependent Variable: timea.

24

Check for Normality

25

Check for Equal VariancesSCATTERPLOT of zresid & zpred

26

4 6 8 10 12 14X3

5

7

9

11

13

Y3

The Influence of Outliers

The slope becomes smaller (toward outliers)

The r value becomes smaller (less linear)

27

The slope becomes clear (toward outliers)

The | r | value becomes larger (more linear: 0.1590.935)

The Influence of Outliers

x

y

1086420

5

4

3

2

1

0

Scatterplot of y vs x

28

Identify Outliers using Residual Plots

Use “standardized” residuals!!

The cases with standardized residuals of size 3 or more outliers

top related