Top Banner
Simple Linear Simple Linear Regression and Regression and Correlation Correlation
31

Simple Linear Regression and Correlation

Jan 03, 2016

Download

Documents

kirsten-damia

Simple Linear Regression and Correlation. Introduction. Regression refers to the statistical technique of modeling the relationship between variables. In simple linear regression , we model the relationship between two variables . - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Simple Linear Regression and Correlation

Simple Linear Simple Linear Regression and Regression and

CorrelationCorrelation

Page 2: Simple Linear Regression and Correlation

Introduction• RegressionRegression refers to the statistical technique of

modeling the relationship between variables.• In simple linearsimple linear regressionregression, we model the

relationship between two variablestwo variables. • One of the variables, denoted by Y, is called the

dependent variable dependent variable and the other, denoted by X, is called the independent variableindependent variable.

• The model we will use to depict the relationship between X and Y will be a straight-line relationshipstraight-line relationship.

• A graphical sketch of the pairs (X, Y) is called a scatter plotscatter plot.

Page 3: Simple Linear Regression and Correlation

This scatterplot locates pairs of observations of advertising expenditures on the x-axis and sales on the y-axis. We notice that:

Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising.

Scatterplot of Advertising Expenditures (X) and Sales (Y)

50403020100

140

120

100

80

60

40

20

0

Advertising

Sale

s

The scatter of points tends to be distributed around a positively sloped straight line.

The pairs of values of advertising expenditures and sales are not located exactly on a straight line.

The scatter plot reveals a more or less strong tendency rather than a precise linear relationship.

The line represents the nature of the relationship on average.

Using Statistics

Page 4: Simple Linear Regression and Correlation

X

Y

X

Y

X 0

0

0

0

0

Y

X

Y

X

Y

XY

Examples of Other Scatterplots

Page 5: Simple Linear Regression and Correlation

Simple Linear Regression Model

yy = = aa+ b+ bxx + +where:where:

aa and b are called and b are called parameters of the modelparameters of the model,,a is the a is the interceptintercept and b is the and b is the slopeslope..

is a random variable called theis a random variable called the error term error term..

The The simple linear regression modelsimple linear regression model is: is:

The equation that describes how The equation that describes how yy is related to is related to xx and and an error term is called the an error term is called the regression modelregression model..

Page 6: Simple Linear Regression and Correlation

• The relationship between X and Y is a straight-line relationship.

• The errors i are normally distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations.

• That is: ~ N(0,2)

• The relationship between X and Y is a straight-line relationship.

• The errors i are normally distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations.

• That is: ~ N(0,2)

X

Y

E[Y]=0 + 1 X

Assumptions of the Simple Linear Regression

Model

Identical normal distributions of errors, all centered on the regression line.

Assumptions of the Simple Linear Regression Model

Page 7: Simple Linear Regression and Correlation

.{Error ei Yi Yi

Yi the predicted value of Y for Xi

Yi the predicted value of Y for Xi

YY

XX

line regression fitted theˆ bXaY line regression fitted theˆ bXaY

Yi

Yi

Errors in Regression

XiXi

point data observed the point data observed the

Page 8: Simple Linear Regression and Correlation

SIMPLE REGRESSION AND CORRELATION

Estimating Using the Regression Line

First, lets look at the equation of a straight line is:

bXaY Independent variable

Slope of the line

Dependent variable

Y-intercept

Page 9: Simple Linear Regression and Correlation

SIMPLE REGRESSION AND CORRELATION

The Method of Least Squares

To estimate the straight line we have to use the least squares method.

This method minimizes the sum of squaresof error between the estimated points on theline and the actual observed points.

Page 10: Simple Linear Regression and Correlation

SIMPLE REGRESSION AND CORRELATION

The estimating line bXaY Slope of the best-fitting Regression Line

22 XXn

YXXYnb

Y-intercept of the Best-fitting Regression Line

XbYa

Page 11: Simple Linear Regression and Correlation

SIMPLE REGRESSION - EXAMPLE

Suppose an appliance store conducts a five-month experiment to determinethe effect of advertising on sales revenue.The results are shown below. (File PPT_Regr_example.sav)Month Advertising Exp.($100s) Sales Rev.($1000S) 1 1 1 2 2 1 3 3 2 4 4 2 5 5 4

Page 12: Simple Linear Regression and Correlation

SIMPLE REGRESSION - EXAMPLE

X Y X2 XY1 1 1 12 1 4 23 2 9 64 2 16 85 4 25 20

15X 10Y 552X 37XY

35

15 X 25

10 Y

Page 13: Simple Linear Regression and Correlation

SIMPLE REGRESSION - EXAMPLE

103702 ..a

XbYa

X..Y 7010

22 XXn

YXXYnb b = 0.7

Page 14: Simple Linear Regression and Correlation

Standard Error of Estimate

The standard error of estimate is used to measure the reliability of the estimatingequation.

It measures the variability or scatter of the observed values around the regressionline.

Page 15: Simple Linear Regression and Correlation

Standard Error of Estimate

Standard Error of Estimate

2

2

nYY

se

Short-cut

2

2

nXYbYaY

s e

Page 16: Simple Linear Regression and Correlation

Standard Error of Estimate

Y2

114416

262Y

2

2

nXYbYaY

s e

25

3770101026

..se

60550.

Page 17: Simple Linear Regression and Correlation

Correlation Analysis

Correlation analysis is used to describethe degree to which one variable islinearly related to another.

There are two measures for describing correlation:

1.The Coefficient of Correlation

2.The Coefficient of Determination

Page 18: Simple Linear Regression and Correlation

The correlationcorrelation between two random variables, X and Y, is a measure of the degree of linear associationdegree of linear association between the two variables.

The population correlation, denoted by, can take on any value from -1 to 1.

The correlationcorrelation between two random variables, X and Y, is a measure of the degree of linear associationdegree of linear association between the two variables.

The population correlation, denoted by, can take on any value from -1 to 1.

indicates a perfect negative linear relationship-1 < < 0 indicates a negative linear relationship indicates no linear relationship0 < < 1 indicates a positive linear relationshipindicates a perfect positive linear relationship

The absolute value of indicates the strength or exactness of the relationship.

indicates a perfect negative linear relationship-1 < < 0 indicates a negative linear relationship indicates no linear relationship0 < < 1 indicates a positive linear relationshipindicates a perfect positive linear relationship

The absolute value of indicates the strength or exactness of the relationship.

Correlation

Page 19: Simple Linear Regression and Correlation

Y

X

= 0= 0

Y

X

= -.8= -.8 Y

X

= .8= .8

Y

X

= 0= 0

Y

X

= -1= -1Y

X

= 1= 1

Illustrations of Correlation

Page 20: Simple Linear Regression and Correlation

The coefficient of correlation:

2222

yynxxn

yxxynr

Sample Coefficient of Determination2r

Alternate Formula

22

2

2

YnY

YnXYbYar

Page 21: Simple Linear Regression and Correlation

Sample Coefficient of Determination

22

22

YnYnYXYbYa

r

2

22

2526

25377.0101.0

r 8167.0

Interpretation:We can conclude that 81.67 % of the variation in the sales revenues is explain by the variation in advertising expenditure.

Percentage of total variation explained by the regression.

Percentage of total variation explained by the regression.

Page 22: Simple Linear Regression and Correlation

The Coefficient of Correlation or Karl Pearson’s Coefficient of

Correlation

The coefficient of correlation is the squareroot of the coefficient of determination.

The sign of r indicates the direction of the relationship between the two variables X and Y.

The sign of r will be the same as the sign of the coefficient “b” in the regressionequation Y = a + b X

Page 23: Simple Linear Regression and Correlation

SIMPLE REGRESSION AND CORRELATION

If the slope of the estimatingline is positive

If the slope of the estimatingline is negative

:- r is the positive square root

:- r is the negative square root

2rr

9037.08167.0 rThe relationship between the two variables is direct

Page 24: Simple Linear Regression and Correlation

H0: = 0 (No linear relationship)H1: 0 (Some linear relationship)

Test Statistic: t

r

rn

n( )

2 212

Hypothesis Tests for the Correlation Coefficient

Page 25: Simple Linear Regression and Correlation

Analysis-of-Variance Table and an F Test of the Regression Model

Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F Ratio

Regression SSR (1) MSR MSRMSE

Error SSE (n-2) MSE

Total SST (n-1) MST

Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F Ratio

Regression SSR (1) MSR MSRMSE

Error SSE (n-2) MSE

Total SST (n-1) MST

H0 : The regression model is not significantH1 : The regression model is significant

Page 26: Simple Linear Regression and Correlation

We pose the question:

Is the independent variable linearly related to the dependent variable?

To answer the question we test the hypothesis

H0: b = 0

H1: b is not equal to zero.

If b is not equal to zero, the model has some validity.

Testing for the existence of linear relationship

Test statistic, with n-2 degrees of freedom:bs

bt

Page 27: Simple Linear Regression and Correlation

Correlations

Advertising

expenses ($00)

Sales revenue ($000)

Advertising expenses ($00)

Pearson Correlation 1 .904*

Sig. (2-tailed) .035N 5 5

Sales revenue ($000)

Pearson Correlation .904* 1Sig. (2-tailed) .035N 5 5

*. Correlation is significant at the 0.05 level (2-tailed).

Page 28: Simple Linear Regression and Correlation

Model Summary

Model R R SquareAdjusted R

SquareStd. Error of the Estimate

1 .904a .817 .756 .606a. Predictors: (Constant), Advertising expenses ($00)

ANOVAb

ModelSum of Squares df

Mean Square F Sig.

1 Regression 4.900 1 4.900 13.364 .035a

Residual 1.100 3 .367Total 6.000 4

a. Predictors: (Constant), Advertising expenses ($00)b. Dependent Variable: Sales revenue ($000)

Alternately, R2 = 1-[SS(Residual) / SS(Total)] = 1-(1.1/6.0)=0.817 When adjusted for degrees of freedom, Adjusted R2 = 1-[SSResidual/(n-k-1)] / [SS(Total)/(n-1)] = 1-[1.1//3]/[6/4] = 0.756

Page 29: Simple Linear Regression and Correlation

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.BStd. Error Beta

1 (Constant) -.100 .635 -.157 .885Advertising expenses ($00) .700 .191 .904 3.656 .035

a. Dependent Variable: Sales revenue ($000)

X..Y 7010

Page 30: Simple Linear Regression and Correlation

Test StatisticMSE

MSRF

Value of the test statistic: 364.13F

Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. is not equal to zero. Thus, the independent variable is linearly related to y. This linear regression model is valid

Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. is not equal to zero. Thus, the independent variable is linearly related to y. This linear regression model is valid

The p-value is 0.035

Page 31: Simple Linear Regression and Correlation

Test statistic, with n-2 degrees of freedom:

Rejection Region 182.33/05.0 ttValue of the test statistic: 66.3

191.0

7.0t

Conclusion:

The calculated test statistic is 3.66 which is outside the acceptance region. Alternately, the actual significance is 0.035. Therefore we will reject the null hypothesis. The advertising expenses is a significant explanatory variable.

bs

bt