This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
4.1 Using Multiple Regression4.1 Using Multiple Regression
In Chapter 3, the method of least squares In Chapter 3, the method of least squares was used to describe the relationship was used to describe the relationship between a dependent variable between a dependent variable yy and an and an explanatory variable explanatory variable xx..
Here we extend that to two or more Here we extend that to two or more predictor variables, using an equation of predictor variables, using an equation of the form:the form:
In Chapter 3 our main graphic tool In Chapter 3 our main graphic tool was the X-Y scatter plot. was the X-Y scatter plot.
Exploratory graphics are a bit harder Exploratory graphics are a bit harder to produce here because they need to produce here because they need to be multidimensional.to be multidimensional.
Even if there were just two Even if there were just two xx variables a 3-D display is needed.variables a 3-D display is needed.
Estimation of CoefficientsEstimation of Coefficients
We want an equation of the form:We want an equation of the form:
As before we use least squares. The As before we use least squares. The coefficients coefficients bb00 b b11 b b22 ... b ... bkk are are determined by minimizing the sum of determined by minimizing the sum of squared residuals.squared residuals.
Interpretation of CoefficientsInterpretation of Coefficients
Recall that sales is in $1000s and Recall that sales is in $1000s and advertising and bonus in $100s.advertising and bonus in $100s.
If advertising is held fixed, sales increase If advertising is held fixed, sales increase $1860 for each $100 of bonus paid.$1860 for each $100 of bonus paid.
If bonus were fixed, sales increase $2470 If bonus were fixed, sales increase $2470 for each $100 spent on ads.for each $100 spent on ads.
4.2 Inferences From a Multiple4.2 Inferences From a Multiple Regression Analysis Regression Analysis
In general, the population regression equation involving K predictors is:In general, the population regression equation involving K predictors is:
This says the mean value of This says the mean value of yy at a given set of at a given set of xx values is a point on the surface values is a point on the surface
described by the terms on the right-hand side of the equation.described by the terms on the right-hand side of the equation.
4.2.1 Assumptions Concerning the Population 4.2.1 Assumptions Concerning the Population Regression LineRegression Line
An alternative way of writing the relationship An alternative way of writing the relationship is:is:
where where ii denotes the i denotes the ithth observation and observation and eeii denotes a random error or disturbance denotes a random error or disturbance (deviation from the mean).(deviation from the mean).
We make certain assumptions about the We make certain assumptions about the eeii..
The assumptions allow inferences about The assumptions allow inferences about the population relationship to be made the population relationship to be made from a sample equation.from a sample equation.
The first inferences considered are those The first inferences considered are those about the individual population about the individual population coefficients coefficients 1 1 2 2 ...... KK..
Chapter 6 examines what happens when Chapter 6 examines what happens when the assumptions are violated.the assumptions are violated.
4.2.2 Inferences about the Population 4.2.2 Inferences about the Population Regression CoefficientsRegression Coefficients
If we wish to make an estimate of the effect If we wish to make an estimate of the effect of a change in one of the of a change in one of the xx variables on variables on yy, , use the interval: use the interval:
this refers to the this refers to the jjthth of the of the KK regression regression coefficients. The multiplier coefficients. The multiplier tt is selected is selected from the from the tt-distribution with -distribution with n-K-1n-K-1 degrees degrees of freedom. of freedom.
The test would be performed by using the The test would be performed by using the standardized test statistic:standardized test statistic:
The most common form of this test is for the The most common form of this test is for the parameter to be 0. In this case the test parameter to be 0. In this case the test statistic is just the estimate divided by its statistic is just the estimate divided by its standard error.standard error.
Example 4.2 Meddicorp (Continued)Example 4.2 Meddicorp (Continued)
Refer again to the portion of the regression output Refer again to the portion of the regression output about the individual regression coefficients:about the individual regression coefficients:
This lists the estimates, their standard errors and This lists the estimates, their standard errors and the ratio of the estimates to their standard errors.the ratio of the estimates to their standard errors.
Predictor Coef SE Coef T PConstant -516.4 189.9 -2.72 0.013ADV 2.4732 0.2753 8.98 0.000BONUS 1.8562 0.7157 2.59 0.017
Tests For Effect of AdvertisingTests For Effect of AdvertisingTo see if an increase in advertising To see if an increase in advertising
expenditure affects sales, we can test:expenditure affects sales, we can test:
HH00:: ADVADV = 0 (An increase in advertising= 0 (An increase in advertising has no effect on sales)has no effect on sales)
HHaa:: ADVADV ≠ ≠ 0 (Sales do change when0 (Sales do change when advertising increases)advertising increases)
The df are The df are n-K-1 = 25–2-1 = 22. n-K-1 = 25–2-1 = 22. At a 5% At a 5% significance level, the critical point from significance level, the critical point from the t-table is 2.074the t-table is 2.074
t = 1.8562/.7157 = 2.59 which is > 1.717 t = 1.8562/.7157 = 2.59 which is > 1.717
We reject We reject HH00 but this time make a more but this time make a more specific conclusion.specific conclusion.
The listed p-value (.017) is for a two-sided The listed p-value (.017) is for a two-sided test. For our one-sided test, cut it in half.test. For our one-sided test, cut it in half.
4.3.1 The ANOVA Table and R4.3.1 The ANOVA Table and R22
These are the same statistics we briefly These are the same statistics we briefly examined in simple regression.examined in simple regression.
They are perhaps more important here They are perhaps more important here because they measure how well all the because they measure how well all the variables in the equation work together.variables in the equation work together.
S = 90.75 R-Sq = 85.5% R-Sq(adj) = 84.2%
Analysis of Variance
Source DF SS MS F PRegression 2 1067797 533899 64.83 0.000Residual Error 22 181176 8235Total 24 1248974
RR22 – a Universal Measure of Fit – a Universal Measure of Fit
RR22 = SSR / SST = proportion of variation = SSR / SST = proportion of variation explained by the regression equation.explained by the regression equation.
If multiplied by 100, interpret as %If multiplied by 100, interpret as %
If only one If only one xx, R, R22 is square of correlation is square of correlation
For multiple, RFor multiple, R22 is square of correlation is square of correlation between the Y values and Y-hat valuesbetween the Y values and Y-hat values
If there are many predictor variables to choose If there are many predictor variables to choose from, the best Rfrom, the best R22 is always obtained by throwing is always obtained by throwing them all in the model.them all in the model.
Some of these predictors could be insignificant, Some of these predictors could be insignificant, suggesting they contribute little to the model's suggesting they contribute little to the model's RR22..
Adjusted RAdjusted R22 is a way to balance the desire for is a way to balance the desire for high Rhigh R22 against the desire to include only against the desire to include only important variables. important variables.
The "adjustment" is for the number of The "adjustment" is for the number of variables in the model.variables in the model.
Although regular RAlthough regular R22 may decrease when you may decrease when you remove a variable, the adjusted version remove a variable, the adjusted version may actually may actually increaseincrease if that variable did if that variable did not have much significance.not have much significance.
Since RSince R22 is so high, you would certainly is so high, you would certainly think that the model contains significant think that the model contains significant predictive power.predictive power.
In other problems it is perhaps not so In other problems it is perhaps not so obvious. For example, would an Robvious. For example, would an R22 of 20% of 20% show any prediction ability at all?show any prediction ability at all?
We can test for the predictive power of the We can test for the predictive power of the entire model using the entire model using the FF statistic. statistic.