This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Chapter 12Chapter 12Simple Linear RegressionSimple Linear Regression
Simple Linear Regression ModelSimple Linear Regression ModelLeast Squares MethodLeast Squares MethodCoefficient of DeterminationCoefficient of DeterminationModel AssumptionsModel AssumptionsTesting for SignificanceTesting for SignificanceUsing the Estimated Regression EquationUsing the Estimated Regression Equation
for Estimation and Predictionfor Estimation and PredictionComputer SolutionComputer SolutionResidual Analysis: Validating Model AssumptionsResidual Analysis: Validating Model Assumptions
Simple Linear Regression ModelSimple Linear Regression Model
yy = = ββ00 + + ββ11xx ++εε
where:where:ββ00 and and ββ11 are called are called parameters of the modelparameters of the model,,εε is a random variable called theis a random variable called the error termerror term..
The The simple linear regression modelsimple linear regression model is:is:
The equation that describes how The equation that describes how yy is related to is related to xx andandan error term is called the an error term is called the regression modelregression model..
Simple Linear Regression EquationSimple Linear Regression Equation
The The simple linear regression equationsimple linear regression equation is:is:
•• EE((yy) is the expected value of ) is the expected value of yy for a given for a given xx value.value.•• ββ11 is the slope of the regression line.is the slope of the regression line.•• ββ00 is the is the yy intercept of the regression line.intercept of the regression line.•• Graph of the regression equation is a straight line.Graph of the regression equation is a straight line.
Estimated Simple Linear Regression EquationEstimated Simple Linear Regression Equation
The The estimated simple linear regression equationestimated simple linear regression equation
0 1y b b x= +0 1y b b x= +
•• is the estimated value of is the estimated value of yy for a given for a given xx value.value.yy•• bb11 is the slope of the line.is the slope of the line.•• bb00 is the is the yy intercept of the line.intercept of the line.•• The graph is called the estimated regression line.The graph is called the estimated regression line.
Reed Auto periodically hasReed Auto periodically hasa special weeka special week--long sale. long sale. As part of the advertisingAs part of the advertisingcampaign Reed runs one orcampaign Reed runs one ormore television commercialsmore television commercialsduring the weekend preceding the sale. Data from aduring the weekend preceding the sale. Data from asample of 5 previous sales are shown on the next slide.sample of 5 previous sales are shown on the next slide.
Coefficient of DeterminationCoefficient of Determination
Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE
where:where:SST = total sum of squaresSST = total sum of squaresSSR = sum of squares due to regressionSSR = sum of squares due to regressionSSE = sum of squares due to errorSSE = sum of squares due to error
The regression relationship is very strong; 88%The regression relationship is very strong; 88%of the variability in the number of cars sold can beof the variability in the number of cars sold can beexplained by the linear relationship between theexplained by the linear relationship between thenumber of TV ads and the number of cars sold.number of TV ads and the number of cars sold.
Assumptions About the Error Term Assumptions About the Error Term εε
1. The error ε is a random variable with mean of zero.1. The error 1. The error εε is a random variable with mean of zero.is a random variable with mean of zero.
2. The variance of ε , denoted by σ 2, is the same forall values of the independent variable.
2. The variance of 2. The variance of εε , denoted by , denoted by σσ 22, is the same for, is the same forall values of the independent variable.all values of the independent variable.
3. The values of ε are independent.3. The values of 3. The values of εε are independent.are independent.
4. The error ε is a normally distributed randomvariable.
4. The error 4. The error εε is a normally distributed randomis a normally distributed randomvariable.variable.
To test for a significant regression relationship, wemust conduct a hypothesis test to determine whetherthe value of β1 is zero.
To test for a significant regression relationship, weTo test for a significant regression relationship, wemust conduct a hypothesis test to determine whethermust conduct a hypothesis test to determine whetherthe value of the value of ββ11 is zero.is zero.
Two tests are commonly used:Two tests are commonly used:Two tests are commonly used:
tt TestTest andand FF TestTest
Both the t test and F test require an estimate of σ 2,the variance of ε in the regression model.Both the Both the tt test and test and FF test require an estimate of test require an estimate of σσ 22,,the variance of the variance of εε in the regression model.in the regression model.
The mean square error (MSE) provides the estimateThe mean square error (MSE) provides the estimateof of σσ 22, and the notation , and the notation ss22 is also used.is also used.
Testing for Significance: Testing for Significance: tt TestTest
5. Compute the value of the test statistic.5. Compute the value of the test statistic.
6. Determine whether to reject 6. Determine whether to reject HH00..
tt = 4.541 provides an area of .01 in the upper= 4.541 provides an area of .01 in the uppertail. Hence, the tail. Hence, the pp--value is less than .02. (Also,value is less than .02. (Also,tt = 4.63 > 3.182.) We can reject = 4.63 > 3.182.) We can reject HH00..
Confidence Interval for Confidence Interval for ββ11
HH00 is rejected if the hypothesized value of is rejected if the hypothesized value of ββ11 is notis notincluded in the confidence interval for included in the confidence interval for ββ11..
We can use a 95% confidence interval for We can use a 95% confidence interval for ββ11 to testto testthe hypotheses just used in the the hypotheses just used in the tt test.test.
The form of a confidence interval for The form of a confidence interval for ββ11 is:is:
Confidence Interval for Confidence Interval for ββ11
11 /2 bb t sα±11 /2 bb t sα±
wherewhere is the is the tt value providing an areavalue providing an areaof of αα/2 in the upper tail of a /2 in the upper tail of a tt distributiondistributionwith with n n -- 2 degrees of freedom2 degrees of freedom
Testing for Significance: Testing for Significance: FF TestTest
where:where:FFαα is based on an is based on an FF distribution withdistribution with1 degree of freedom in the numerator and1 degree of freedom in the numerator andnn -- 2 degrees of freedom in the denominator2 degrees of freedom in the denominator
1. Determine the hypotheses.1. Determine the hypotheses.
2. Specify the level of significance.2. Specify the level of significance.
3. Select the test statistic.3. Select the test statistic.
α α = .05= .05
4. State the rejection rule.4. State the rejection rule. Reject Reject HH00 if if pp--value value << .05.05or or FF >> 10.13 (with 10.13 (with 1 d.f.1 d.f.
in numerator andin numerator and3 d.f. in denominator)3 d.f. in denominator)
Testing for Significance: Testing for Significance: FF TestTest
Testing for Significance: Testing for Significance: FF TestTest
5. Compute the value of the test statistic.5. Compute the value of the test statistic.
6. Determine whether to reject 6. Determine whether to reject HH00..
FF = 17.44 provides an area of .025 in the upper = 17.44 provides an area of .025 in the upper tail. Thus, the tail. Thus, the pp--value corresponding to value corresponding to FF = 21.43 = 21.43 is less than 2(.025) = .05. Hence, we reject is less than 2(.025) = .05. Hence, we reject HH00..
The statistical evidence is sufficient to concludeThe statistical evidence is sufficient to concludethat we have a significant relationship between thethat we have a significant relationship between thenumber of TV ads aired and the number of cars sold. number of TV ads aired and the number of cars sold.
Some Cautions about theSome Cautions about theInterpretation of Significance TestsInterpretation of Significance Tests
Just because we are able to reject Just because we are able to reject HH00: : ββ11 = 0 and= 0 anddemonstrate statistical significance does not enabledemonstrate statistical significance does not enableus to conclude that there is a us to conclude that there is a linear relationshiplinear relationshipbetween between xx and and yy..
Rejecting Rejecting HH00: : ββ11 = 0 and concluding that the= 0 and concluding that therelationship between relationship between xx and and yy is significant does is significant does not enable us to conclude that a not enable us to conclude that a causecause--andand--effecteffectrelationshiprelationship is present between is present between xx and and yy..
Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction
/y t sp yp± α 2/y t sp yp± α 2
where:where:confidence coefficient is 1 confidence coefficient is 1 -- αα andandttαα/2 /2 is based on ais based on a t t distributiondistributionwith with nn -- 2 degrees of freedom2 degrees of freedom
/2 indpy t sα± /2 indpy t sα±
Confidence Interval Estimate of Confidence Interval Estimate of EE((yypp))
Prediction Interval Estimate of Prediction Interval Estimate of yypp
If 3 TV ads are run prior to a sale, we expectIf 3 TV ads are run prior to a sale, we expectthe mean number of cars sold to be:the mean number of cars sold to be:
ExcelExcel’’s Confidence Interval Outputs Confidence Interval Output D E F G1 CONFIDENCE INTERVAL2 x p 33 xbar 2.04 x p -xbar 1.05 (x p -xbar)2 1.06 Σ (x p -xbar)2 4.07 Variance of yhat 2.10008 Std. Dev of yhat 1.44919 t Value 3.1824
10 Margin of Error 4.611811 Point Estimate 25.012 Lower Limit 20.3913 Upper Limit 29.61
Confidence Interval for Confidence Interval for EE((yypp))
The 95% confidence interval estimate of the mean The 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is:number of cars sold when 3 TV ads are run is:
Confidence Interval for Confidence Interval for EE((yypp))
25 25 ++ 4.61 = 20.39 to 29.61 cars4.61 = 20.39 to 29.61 cars
ExcelExcel’’s Prediction Interval Outputs Prediction Interval Output H I1 PREDICTION INTERVAL2 Variance of y ind 6.766673 Std. Dev. of y ind 2.601284 Margin of Error 8.278455 Lower Limit 16.726 Upper Limit 33.287
Prediction Interval for Prediction Interval for yypp
The 95% prediction interval estimate of the The 95% prediction interval estimate of the number of cars sold in one particular week when 3 number of cars sold in one particular week when 3 TV ads are run is:TV ads are run is:
Prediction Interval for Prediction Interval for yypp
25 25 ++ 8.28 = 16.72 to 33.28 cars8.28 = 16.72 to 33.28 cars
Much of the residual analysis is based on anMuch of the residual analysis is based on anexamination of graphical plots.examination of graphical plots.
Residual for Observation Residual for Observation iiThe residuals provide the best information about The residuals provide the best information about εε ..
If the assumptions about the error term If the assumptions about the error term εε appearappearquestionable, the hypothesis tests about thequestionable, the hypothesis tests about thesignificance of the regression relationship and thesignificance of the regression relationship and theinterval estimation results may not be valid.interval estimation results may not be valid.
If the assumption that the variance of If the assumption that the variance of εε is the same is the same for all values of for all values of x x is valid, and the assumed is valid, and the assumed regression model is an adequate representation of the regression model is an adequate representation of the relationship between the variables, thenrelationship between the variables, then
The residual plot should give an overallThe residual plot should give an overallimpression of a horizontal band of pointsimpression of a horizontal band of points