Simple Linear Regression. Chapter Topics Types of Regression Models Determining the Simple Linear Regression Equation Measures of Variation Assumptions.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Simple Linear Regression
Slide 2
Chapter Topics Types of Regression Models Determining the
Simple Linear Regression Equation Measures of Variation Assumptions
of Regression and Correlation Residual Analysis Measuring
Autocorrelation Inferences about the Slope
Slide 3
Chapter Topics Correlation - Measuring the Strength of the
Association Estimation of Mean Values and Prediction of Individual
Values Pitfalls in Regression and Ethical Issues (continued)
Slide 4
Purpose of Regression Analysis Regression Analysis is Used
Primarily to Model Causality and Provide Prediction Predict the
values of a dependent (response) variable based on values of at
least one independent (explanatory) variable Explain the effect of
the independent variables on the dependent variable
Slide 5
Types of Regression Models Positive Linear Relationship
Negative Linear Relationship Relationship NOT Linear No
Relationship
Slide 6
Simple Linear Regression Model Relationship between Variables
is Described by a Linear Function The Change of One Variable Causes
the Other Variable to Change A Dependency of One Variable on the
Other
Slide 7
Population Regression Line (Conditional Mean) Simple Linear
Regression Model average value (conditional mean) Population
regression line is a straight line that describes the dependence of
the average value (conditional mean) of one variable on the other
Population Y Intercept Population Slope Coefficient Random Error
Dependent (Response) Variable Independent (Explanatory) Variable
(continued)
Slide 8
Simple Linear Regression Model (continued) = Random Error Y X
(Observed Value of Y) = Observed Value of Y (Conditional Mean)
Slide 9
estimate Sample regression line provides an estimate of the
population regression line as well as a predicted value of Y Linear
Regression Equation Sample Y Intercept Sample Slope Coefficient
Residual Simple Regression Equation (Fitted Regression Line,
Predicted Value)
Slide 10
Linear Regression Equation and are obtained by finding the
values of and that minimize the sum of the squared residuals
estimate provides an estimate of (continued)
Slide 11
Linear Regression Equation (continued) Y X Observed Value
Slide 12
Interpretation of the Slope and Intercept is the average value
of Y when the value of X is zero measures the change in the average
value of Y as a result of a one-unit change in X
Slide 13
Interpretation of the Slope and Intercept estimated is the
estimated average value of Y when the value of X is zero estimated
is the estimated change in the average value of Y as a result of a
one-unit change in X (continued)
Slide 14
Simple Linear Regression: Example You wish to examine the
linear dependency of the annual sales of produce stores on their
sizes in square footage. Sample data for 7 stores were obtained.
Find the equation of the straight line that fits the data best.
Annual Store Square Sales Feet($1000) 1 1,726 3,681 2 1,542 3,395 3
2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313
3,760
Slide 15
Scatter Diagram: Example Excel Output
Slide 16
Simple Linear Regression Equation: Example From Excel
Printout:
Slide 17
Graph of the Simple Linear Regression Equation: Example Y i =
1636.415 +1.487X i
Slide 18
Interpretation of Results: Example The slope of 1.487 means
that for each increase of one unit in X, we predict the average of
Y to increase by an estimated 1.487 units. The equation estimates
that for each increase of 1 square foot in the size of the store,
the expected annual sales are predicted to increase by $1487.
Slide 19
Simple Linear Regression in PHStat In Excel, use PHStat |
Regression | Simple Linear Regression Excel Spreadsheet of
Regression Sales on Footage
Slide 20
Measures of Variation: The Sum of Squares SST = SSR + SSE Total
Sample Variability = Explained Variability + Unexplained
Variability
Slide 21
Measures of Variation: The Sum of Squares SST = Total Sum of
Squares Measures the variation of the Y i values around their mean,
SSR = Regression Sum of Squares Explained variation attributable to
the relationship between X and Y SSE = Error Sum of Squares
Variation attributable to factors other than the relationship
between X and Y (continued)
Slide 22
Measures of Variation: The Sum of Squares (continued) XiXi Y X
Y SST = (Y i - Y) 2 SSE = (Y i - Y i ) 2 SSR = (Y i - Y) 2 _ _
_
Slide 23
Venn Diagrams and Explanatory Power of Regression Sales Sizes
Variations in Sales explained by Sizes or variations in Sizes used
in explaining variation in Sales Variations in Sales explained by
the error term or unexplained by Sizes Variations in store Sizes
not used in explaining variation in Sales
Slide 24
The ANOVA Table in Excel ANOVA dfSSMSF Significance F
RegressionkSSR MSR =SSR/k MSR/MSE P-value of the F Test
Residualsn-k-1SSE MSE =SSE/(n-k-1) Totaln-1SST
Slide 25
Measures of Variation The Sum of Squares: Example Excel Output
for Produce Stores SSR SSE Regression (explained) df Degrees of
freedom Error (residual) df Total df SST
Slide 26
The Coefficient of Determination Measures the proportion of
variation in Y that is explained by the independent variable X in
the regression model
Slide 27
Venn Diagrams and Explanatory Power of Regression Sales
Sizes
Slide 28
Coefficients of Determination (r 2 ) and Correlation (r) r 2 =
1, r 2 =.81, r 2 = 0, Y Y i =b 0 +b 1 X i X ^ Y Y i =b 0 +b 1 X i X
^ Y Y i =b 0 +b 1 X i X ^ Y Y i =b 0 +b 1 X i X ^ r = +1 r = -1 r =
+0.9 r = 0
Slide 29
Standard Error of Estimate Measures the standard deviation
(variation) of the Y values around the regression equation
Slide 30
Measures of Variation: Produce Store Example Excel Output for
Produce Stores r 2 =.94 94% of the variation in annual sales can be
explained by the variability in the size of the store as measured
by square footage. S yx n
Slide 31
Linear Regression Assumptions Normality Y values are normally
distributed for each X Probability distribution of error is normal
Homoscedasticity (Constant Variance) Independence of Errors
Slide 32
Consequences of Violation of the Assumptions Violation of the
Assumptions Non-normality (error not normally distributed)
Heteroscedasticity (variance not constant) Usually happens in
cross-sectional data Autocorrelation (errors are not independent)
Usually happens in time-series data Consequences of Any Violation
of the Assumptions Predictions and estimations obtained from the
sample regression line will not be accurate Hypothesis testing
results will not be reliable It is Important to Verify the
Assumptions
Slide 33
Y values are normally distributed around the regression line.
For each X value, the spread or variance around the regression line
is the same. Variation of Errors Around the Regression Line X1X1
X2X2 X Y f(e) Sample Regression Line
Slide 34
Residual Analysis Purposes Examine linearity Evaluate
violations of assumptions Graphical Analysis of Residuals Plot
residuals vs. X and time
Slide 35
Residual Analysis for Linearity Not Linear Linear X e e X Y X Y
X
Slide 36
Residual Analysis for Homoscedasticity Heteroscedasticity
Homoscedasticity SR X X Y X X Y
Slide 37
Residual Analysis: Excel Output for Produce Stores Example
Excel Output
Slide 38
Residual Analysis for Independence The Durbin-Watson Statistic
Used when data is collected over time to detect autocorrelation
(residuals in one time period are related to residuals in another
period) Measures violation of independence assumption Should be
close to 2. If not, examine the model for autocorrelation.
Slide 39
Durbin-Watson Statistic in PHStat PHStat | Regression | Simple
Linear Regression Check the box for Durbin-Watson Statistic
Slide 40
Obtaining the Critical Values of Durbin-Watson Statistic Table
13.4 Finding Critical Values of Durbin-Watson Statistic
Slide 41
Accept H 0 (no autocorrelation) Using the Durbin-Watson
Statistic : No autocorrelation (error terms are independent) :
There is autocorrelation (error terms are not independent) 042 dLdL
4-d L dUdU 4-d U Reject H 0 (positive autocorrelation) Inconclusive
Reject H 0 (negative autocorrelation)
Slide 42
Residual Analysis for Independence Not Independent Independent
e e Time Residual is Plotted Against Time to Detect Any
Autocorrelation No Particular PatternCyclical Pattern Graphical
Approach
Slide 43
Inference about the Slope: t Test t Test for a Population Slope
Is there a linear dependency of Y on X ? Null and Alternative
Hypotheses H 0 : 1 = 0(no linear dependency) H 1 : 1 0(linear
dependency) Test Statistic
Slide 44
Example: Produce Store Data for 7 Stores: Estimated Regression
Equation: Annual Store Square Sales Feet($000) 1 1,726 3,681 2
1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563
7 1,313 3,760 The slope of this model is 1.487. Does square footage
affect annual sales?
Slide 45
Inferences about the Slope: t Test Example H 0 : 1 = 0 H 1 : 1
0 .05 df 7 - 2 = 5 Critical Value(s): Test Statistic: Decision:
Conclusion: There is evidence that square footage affects annual
sales. t 02.5706-2.5706.025 Reject.025 From Excel Printout Reject H
0. p-value
Slide 46
Inferences about the Slope: Confidence Interval Example
Confidence Interval Estimate of the Slope: Excel Printout for
Produce Stores At 95% level of confidence, the confidence interval
for the slope is (1.062, 1.911). Does not include 0. Conclusion:
There is a significant linear dependency of annual sales on the
size of the store.
Slide 47
Inferences about the Slope: F Test F Test for a Population
Slope Is there a linear dependency of Y on X ? Null and Alternative
Hypotheses H 0 : 1 = 0(no linear dependency) H 1 : 1 0(linear
dependency) Test Statistic Numerator d.f.=1, denominator
d.f.=n-2
Slide 48
Relationship between a t Test and an F Test Null and
Alternative Hypotheses H 0 : 1 = 0(no linear dependency) H 1 : 1
0(linear dependency) The p value of a t Test and the p value of an
F Test are Exactly the Same The Rejection Region of an F Test is
Always in the Upper Tail
Slide 49
Inferences about the Slope: F Test Example Test Statistic:
Decision: Conclusion: H 0 : 1 = 0 H 1 : 1 0 .05 numerator df = 1
denominator df 7 - 2 = 5 There is evidence that square footage
affects annual sales. From Excel Printout Reject H 0. 06.61 Reject
=.05 p-value
Slide 50
Purpose of Correlation Analysis Correlation Analysis is Used to
Measure Strength of Association (Linear Relationship) Between 2
Numerical Variables Only strength of the relationship is concerned
No causal effect is implied
Slide 51
Purpose of Correlation Analysis Population Correlation
Coefficient (Rho) is Used to Measure the Strength between the
Variables (continued)
Slide 52
Sample Correlation Coefficient r is an Estimate of and is Used
to Measure the Strength of the Linear Relationship in the Sample
Observations Purpose of Correlation Analysis (continued)
Slide 53
r =.6r = 1 Sample Observations from Various r Values Y X Y X Y
X Y X Y X r = -1 r = -.6r = 0
Slide 54
Features of and r Unit Free Range between -1 and 1 The Closer
to -1, the Stronger the Negative Linear Relationship The Closer to
1, the Stronger the Positive Linear Relationship The Closer to 0,
the Weaker the Linear Relationship
Slide 55
Hypotheses H 0 : = 0 (no correlation) H 1 : 0 (correlation)
Test Statistic t Test for Correlation
Slide 56
Example: Produce Stores From Excel Printout r Is there any
evidence of linear relationship between annual sales of a store and
its square footage at.05 level of significance? H 0 : = 0 (no
association) H 1 : 0 (association) .05 df 7 - 2 = 5
Slide 57
Example: Produce Stores Solution 02.5706-2.5706.025 Reject.025
Critical Value(s): Conclusion: There is evidence of a linear
relationship at 5% level of significance. Decision: Reject H 0. The
value of the t statistic is exactly the same as the t statistic
value for test on the slope coefficient.
Slide 58
Estimation of Mean Values Confidence Interval Estimate for :
The Mean of Y Given a Particular X i t value from table with df=n-2
Standard error of the estimate Size of interval varies according to
distance away from mean,
Slide 59
Prediction of Individual Values Prediction Interval for
Individual Response Y i at a Particular X i Addition of 1 increases
width of interval from that for the mean of Y
Slide 60
Interval Estimates for Different Values of X Y X Prediction
Interval for a Individual Y i a given X Confidence Interval for the
Mean of Y Y i = b 0 + b 1 X i
Slide 61
Example: Produce Stores Y i = 1636.415 +1.487X i Data for 7
Stores: Regression Model Obtained: Annual Store Square Sales
Feet($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543
5 1,292 3,318 6 2,208 5,563 7 1,313 3,760 Consider a store with
2000 square feet.
Slide 62
Estimation of Mean Values: Example Find the 95% confidence
interval for the average annual sales for stores of 2,000 square
feet. Predicted Sales Y i = 1636.415 +1.487X i = 4610.45 ($000) X =
2350.29S YX = 611.75 t n-2 = t 5 = 2.5706 Confidence Interval
Estimate for
Slide 63
Prediction Interval for Y : Example Find the 95% prediction
interval for annual sales of one particular store of 2,000 square
feet. Predicted Sales Y i = 1636.415 +1.487X i = 4610.45 ($000) X =
2350.29S YX = 611.75 t n-2 = t 5 = 2.5706 Prediction Interval for
Individual
Slide 64
Estimation of Mean Values and Prediction of Individual Values
in PHStat In Excel, use PHStat | Regression | Simple Linear
Regression Check the Confidence and Prediction Interval for X= box
Excel Spreadsheet of Regression Sales on Footage
Slide 65
Pitfalls of Regression Analysis Lacking an Awareness of the
Assumptions Underlining Least-Squares Regression Not Knowing How to
Evaluate the Assumptions Not Knowing What the Alternatives to
Least- Squares Regression are if a Particular Assumption is
Violated Using a Regression Model Without Knowledge of the Subject
Matter
Slide 66
Strategy for Avoiding the Pitfalls of Regression Start with a
scatter plot of X on Y to observe possible relationship Perform
residual analysis to check the assumptions Use a histogram,
stem-and-leaf display, box- and-whisker plot, or normal probability
plot of the residuals to uncover possible non- normality
Slide 67
Strategy for Avoiding the Pitfalls of Regression If there is
violation of any assumption, use alternative methods (e.g., least
absolute deviation regression or least median of squares
regression) to least-squares regression or alternative
least-squares models (e.g., curvilinear or multiple regression) If
there is no evidence of assumption violation, then test for the
significance of the regression coefficients and construct
confidence intervals and prediction intervals (continued)
Slide 68
Chapter Summary Introduced Types of Regression Models Discussed
Determining the Simple Linear Regression Equation Described
Measures of Variation Addressed Assumptions of Regression and
Correlation Discussed Residual Analysis Addressed Measuring
Autocorrelation
Slide 69
Chapter Summary Described Inference about the Slope Discussed
Correlation - Measuring the Strength of the Association Addressed
Estimation of Mean Values and Prediction of Individual Values
Discussed Pitfalls in Regression and Ethical Issues
(continued)
Slide 70
Introduction to Multiple Regression
Slide 71
Chapter Topics The Multiple Regression Model Residual Analysis
Testing for the Significance of the Regression Model Inferences on
the Population Regression Coefficients Testing Portions of the
Multiple Regression Model Dummy-Variables and Interaction
Terms
Slide 72
Population Y-intercept Population slopesRandom error The
Multiple Regression Model Relationship between 1 dependent & 2
or more independent variables is a linear function Dependent
(Response) variable Independent (Explanatory) variables
Slide 73
Multiple Regression Model Bivariate model
Slide 74
Multiple Regression Equation Bivariate model Multiple
Regression Equation
Slide 75
Too complicated by hand! Ouch!
Slide 76
Interpretation of Estimated Coefficients Slope ( b j )
Estimated that the average value of Y changes by b j for each 1
unit increase in X j, holding all other variables constant (ceterus
paribus) Example: If b 1 = -2, then fuel oil usage ( Y ) is
expected to decrease by an estimated 2 gallons for each 1 degree
increase in temperature ( X 1 ), given the inches of insulation ( X
2 ) Y-Intercept ( b 0 ) The estimated average value of Y when all X
j = 0
Slide 77
Multiple Regression Model: Example ( 0 F) Develop a model for
estimating heating oil used for a single family home in the month
of January, based on average temperature and amount of insulation
in inches.
Slide 78
Multiple Regression Equation: Example Excel Output For each
degree increase in temperature, the estimated average amount of
heating oil used is decreased by 5.437 gallons, holding insulation
constant. For each increase in one inch of insulation, the
estimated average use of heating oil is decreased by 20.012
gallons, holding temperature constant.
Slide 79
Multiple Regression in PHStat PHStat | Regression | Multiple
Regression Excel spreadsheet for the heating oil example
Slide 80
Venn Diagrams and Explanatory Power of Regression Oil Temp
Variations in Oil explained by Temp or variations in Temp used in
explaining variation in Oil Variations in Oil explained by the
error term Variations in Temp not used in explaining variation in
Oil
Slide 81
Venn Diagrams and Explanatory Power of Regression Oil Temp
(continued)
Slide 82
Venn Diagrams and Explanatory Power of Regression Oil Temp
Insulation Overlapping variation NOT estimation Overlapping
variation in both Temp and Insulation are used in explaining the
variation in Oil but NOT in the estimation of nor NOT Variation NOT
explained by Temp nor Insulation
Slide 83
Coefficient of Multiple Determination Proportion of Total
Variation in Y Explained by All X Variables Taken Together Never
Decreases When a New X Variable is Added to Model Disadvantage when
comparing among models
Slide 84
Venn Diagrams and Explanatory Power of Regression Oil Temp
Insulation
Slide 85
Adjusted Coefficient of Multiple Determination Proportion of
Variation in Y Explained by All the X Variables Adjusted for the
Sample Size and the Number of X Variables Used Penalizes excessive
use of independent variables Smaller than Useful in comparing among
models Can decrease if an insignificant new X variable is added to
the model
Slide 86
Coefficient of Multiple Determination Excel Output Adjusted r 2
reflects the number of explanatory variables and sample size is
smaller than r 2
Slide 87
Interpretation of Coefficient of Multiple Determination 96.56%
of the total variation in heating oil can be explained by
temperature and amount of insulation 95.99% of the total
fluctuation in heating oil can be explained by temperature and
amount of insulation after adjusting for the number of explanatory
variables and sample size
Slide 88
Simple and Multiple Regression Compared simple The slope
coefficient in a simple regression picks up the impact of the
independent variable plus the impacts of other variables that are
excluded from the model, but are correlated with the included
independent variable and the dependent variable multiple
Coefficients in a multiple regression net out the impacts of other
variables in the equation Hence, they are called the net regression
coefficients They still pick up the effects of other variables that
are excluded from the model, but are correlated with the included
independent variables and the dependent variable
Slide 89
Simple and Multiple Regression Compared: Example Two Simple
Regressions: Multiple Regression:
Slide 90
Simple and Multiple Regression Compared: Slope
Coefficients
Slide 91
Simple and Multiple Regression Compared: r 2
Slide 92
Example: Adjusted r 2 Can Decrease Adjusted r 2 decreases when
k increases from 2 to 3 Color is not useful in explaining the
variation in oil consumption.
Slide 93
Using the Regression Equation to Make Predictions Predict the
amount of heating oil used for a home if the average temperature is
30 0 and the insulation is 6 inches. The predicted heating oil used
is 278.97 gallons.
Slide 94
Predictions in PHStat PHStat | Regression | Multiple Regression
Check the Confidence and Prediction Interval Estimate box Excel
spreadsheet for the heating oil example
Slide 95
Residual Plots Residuals Vs May need to transform Y variable
Residuals Vs May need to transform variable Residuals Vs May need
to transform variable Residuals Vs Time May have
autocorrelation
Slide 96
Residual Plots: Example No Discernable Pattern Maybe some non-
linear relationship
Slide 97
Testing for Overall Significance Shows if Y Depends Linearly on
All of the X Variables Together as a Group Use F Test Statistic
Hypotheses: H 0 : k = 0 (No linear relationship) H 1 : At least one
i ( At least one independent variable affects Y ) The Null
Hypothesis is a Very Strong Statement The Null Hypothesis is Almost
Always Rejected
Slide 98
Testing for Overall Significance Test Statistic: Where F has k
numerator and ( n-k-1 ) denominator degrees of freedom
(continued)
Slide 99
Test for Overall Significance Excel Output: Example k = 2, the
number of explanatory variables n - 1 p -value
Slide 100
Test for Overall Significance: Example Solution F 03.89 H 0 : 1
= 2 = = k = 0 H 1 : At least one j 0 =.05 df = 2 and 12 Critical
Value : Test Statistic: Decision: Conclusion: Reject at = 0.05.
There is evidence that at least one independent variable affects Y.
= 0.05 F 168.47 (Excel Output)
Slide 101
Test for Significance: Individual Variables Show If Y Depends
Linearly on a Single X j Individually While Holding the Effects of
Other X s Fixed Use t Test Statistic Hypotheses: H 0 : j 0 (No
linear relationship) H 1 : j 0 (Linear relationship between X j and
Y )
Slide 102
t Test Statistic Excel Output: Example t Test Statistic for X 1
(Temperature) t Test Statistic for X 2 (Insulation)
Slide 103
t Test : Example Solution H 0 : 1 = 0 H 1 : 1 0 df = 12
Critical Values: Test Statistic: Decision: Conclusion: Reject H 0
at = 0.05. There is evidence of a significant effect of temperature
on oil consumption holding constant the effect of insulation. t 0
2.1788 -2.1788.025 Reject H 0 0.025 Does temperature have a
significant effect on monthly consumption of heating oil? Test at =
0.05. t Test Statistic = -16.1699
Slide 104
Venn Diagrams and Estimation of Regression Model Oil Temp
Insulation Only this information is used in the estimation of This
information is NOT used in the estimation of nor
Slide 105
Confidence Interval Estimate for the Slope Provide the 95%
confidence interval for the population slope 1 (the effect of
temperature on oil consumption). -6.169 1 -4.704 We are 95%
confident that the estimated average consumption of oil is reduced
by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F
holding insulation constant. We can also perform the test for the
significance of individual variables, H 0 : 1 = 0 vs. H 1 : 1 0,
using this confidence interval.
Slide 106
Contribution of a Single Independent Variable Let X j Be the
Independent Variable of Interest Measures the additional
contribution of X j in explaining the total variation in Y with the
inclusion of all the remaining independent variables
Slide 107
Contribution of a Single Independent Variable Measures the
additional contribution of X 1 in explaining Y with the inclusion
of X 2 and X 3. From ANOVA section of regression for
Slide 108
Coefficient of Partial Determination of Measures the proportion
of variation in the dependent variable that is explained by X j
while controlling for (holding constant) the other independent
variables
Slide 109
Coefficient of Partial Determination for (continued) Example:
Model with two independent variables
Slide 110
Venn Diagrams and Coefficient of Partial Determination for Oil
Temp Insulation =
Slide 111
Coefficient of Partial Determination in PHStat PHStat |
Regression | Multiple Regression Check the Coefficient of Partial
Determination box Excel spreadsheet for the heating oil
example
Slide 112
Contribution of a Subset of Independent Variables Let X s Be
the Subset of Independent Variables of Interest Measures the
contribution of the subset X s in explaining SST with the inclusion
of the remaining independent variables
Slide 113
Contribution of a Subset of Independent Variables: Example Let
X s be X 1 and X 3 From ANOVA section of regression for
Slide 114
Testing Portions of Model Examines the Contribution of a Subset
X s of Explanatory Variables to the Relationship with Y Null
Hypothesis: Variables in the subset do not improve the model
significantly when all other variables are included Alternative
Hypothesis: At least one variable in the subset is significant when
all other variables are included
Slide 115
Testing Portions of Model One-Tailed Rejection Region Requires
Comparison of Two Regressions One regression includes everything
Another regression includes everything except the portion to be
tested (continued)
Slide 116
Partial F Test for the Contribution of a Subset of X Variables
Hypotheses: H 0 : Variables X s do not significantly improve the
model given all other variables included H 1 : Variables X s
significantly improve the model given all others included Test
Statistic: with df = m and ( n-k-1 ) m = # of variables in the
subset X s
Slide 117
Partial F Test for the Contribution of a Single Hypotheses: H 0
: Variable X j does not significantly improve the model given all
others included H 1 : Variable X j significantly improves the model
given all others included Test Statistic: with df = 1 and ( n-k-1 )
m = 1 here
Slide 118
Testing Portions of Model: Example Test at the =.05 level to
determine if the variable of average temperature significantly
improves the model, given that insulation is included.
Slide 119
Testing Portions of Model: Example H 0 : X 1 (temperature) does
not improve model with X 2 (insulation) included H 1 : X 1 does
improve model =.05, df = 1 and 12 Critical Value = 4.75 (For X 1
and X 2 )(For X 2 ) Conclusion: Reject H 0 ; X 1 does improve
model.
Slide 120
Testing Portions of Model in PHStat PHStat | Regression |
Multiple Regression Check the Coefficient of Partial Determination
box Excel spreadsheet for the heating oil example
Slide 121
Do We Need to Do This for One Variable? The F Test for the
Contribution of a Single Variable After All Other Variables are
Included in the Model is IDENTICAL to the t Test of the Slope for
that Variable The Only Reason to Perform an F Test is to Test
Several Variables Together
Slide 122
Dummy-Variable Models Categorical Explanatory Variable with 2
or More Levels Yes or No, On or Off, Male or Female, Use
Dummy-Variables (Coded as 0 or 1) Only Intercepts are Different
Assumes Equal Slopes Across Categories The Number of
Dummy-Variables Needed is (# of Levels - 1) Regression Model Has
Same Form:
Slide 123
Dummy-Variable Models (with 2 Levels) Given: Y = Assessed Value
of House X 1 = Square Footage of House X 2 = Desirability of
Neighborhood = Desirable ( X 2 = 1) Undesirable ( X 2 = 0) 0 if
undesirable 1 if desirable Same slopes
Slide 124
Undesirable Desirable Location Dummy-Variable Models (with 2
Levels) (continued) X 1 (Square footage) Y (Assessed Value) b 0 + b
2 b0b0 Same slopes Intercepts different
Slide 125
Interpretation of the Dummy- Variable Coefficient (with 2
Levels) Example: : GPA 0 non-business degree 1 business degree :
Annual salary of college graduate in thousand $ With the same GPA,
college graduates with a business degree are making an estimated 6
thousand dollars more than graduates with a non-business degree, on
average. :
Slide 126
Dummy-Variable Models (with 3 Levels)
Slide 127
Interpretation of the Dummy- Variable Coefficients (with 3
Levels) With the same footage, a Split- level will have an
estimated average assessed value of 18.84 thousand dollars more
than a Condo. With the same footage, a Ranch will have an estimated
average assessed value of 23.53 thousand dollars more than a
Condo.
Slide 128
Regression Model Containing an Interaction Term Hypothesizes
Interaction between a Pair of X Variables Response to one X
variable varies at different levels of another X variable Contains
a Cross-Product Term Can Be Combined with Other Models E.g.,
Dummy-Variable Model
Slide 129
Effect of Interaction Given: Without Interaction Term, Effect
of X 1 on Y is Measured by 1 With Interaction Term, Effect of X 1
on Y is Measured by 1 + 3 X 2 Effect Changes as X 2 Changes
Slide 130
Y = 1 + 2X 1 + 3(1) + 4X 1 (1) = 4 + 6X 1 Y = 1 + 2X 1 + 3(0) +
4X 1 (0) = 1 + 2X 1 Interaction Example Effect (slope) of X 1 on Y
depends on X 2 value X1X1 4 8 12 0 010.51.5 Y Y = 1 + 2X 1 + 3X 2 +
4X 1 X 2
Slide 131
Interaction Regression Model Worksheet Multiply X 1 by X 2 to
get X 1 X 2 Run regression with Y, X 1, X 2, X 1 X 2 Case, iYiYi X
1i X 2i X 1i X 2i 11133 248540 31326 435630 :::::
Slide 132
Interpretation When There Are 3+ Levels MALE = 0 if female and
1 if male MARRIED = 1 if married; 0 if not DIVORCED = 1 if
divorced; 0 if not MALEMARRIED = 1 if male married; 0 otherwise =
(MALE times MARRIED) MALEDIVORCED = 1 if male divorced; 0 otherwise
= (MALE times DIVORCED)
Slide 133
Interpretation When There Are 3+ Levels (continued)
Slide 134
Interpreting Results FEMALE Single: Married: Divorced: MALE
Single: Married: Divorced: Main Effects : MALE, MARRIED and
DIVORCED Interaction Effects : MALEMARRIED and MALEDIVORCED
Difference
Slide 135
Suppose X 1 and X 2 are Numerical Variables and X 3 is a
Dummy-Variable To Test if the Slope of Y with X 1 and/or X 2 are
the Same for the Two Levels of X 3 Model: Hypotheses: H 0 : = = 0
(No Interaction between X 1 and X 3 or X 2 and X 3 ) H 1 : 4 and/or
5 0 ( X 1 and/or X 2 Interacts with X 3 ) Perform a Partial F Test
Evaluating the Presence of Interaction with Dummy-Variable
Slide 136
Evaluating the Presence of Interaction with Numerical Variables
Suppose X 1, X 2 and X 3 are Numerical Variables To Test If the
Independent Variables Interact with Each Other Model: Hypotheses: H
0 : = = = 0 (no interaction among X 1, X 2 and X 3 ) H 1 : at least
one of 4, 5, 6 0 (at least one pair of X 1, X 2, X 3 interact with
each other) Perform a Partial F Test
Slide 137
Chapter Summary Developed the Multiple Regression Model
Discussed Residual Plots Addressed Testing the Significance of the
Multiple Regression Model Discussed Inferences on Population
Regression Coefficients Addressed Testing Portions of the Multiple
Regression Model Discussed Dummy-Variables and Interaction
Terms