This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Interpreting Multiple Regression: A Short Overview
Abdel-Salam G. Abdel-Salam
Laboratory for Interdisciplinary Statistical Analysis (LISA)
Department of Statistics
Virginia Polytechnic Institute and State University
Today , we will cover how to do Linear Regression Analysis (LRA) in SPSS andSAS
We will learn concepts and vocabularies in regression analysis such as:1 How to use the F-test to determine if your predictor variables have a statistically
significant relationship with your outcome/response variable?2 What are the assumptions for LRA and what you should do to meet these
assumptions?3 Why adjusted R2 is smaller than R2 and what these numbers mean when
comparing between several models?4 What is the difference between regression and ANOVA and when are they
equivalent?5 How can you select the best model?6 Other LR problems (Multicollinearity and Outliers observations), what you
Today , we will cover how to do Linear Regression Analysis (LRA) in SPSS andSAS
We will learn concepts and vocabularies in regression analysis such as:1 How to use the F-test to determine if your predictor variables have a statistically
significant relationship with your outcome/response variable?2 What are the assumptions for LRA and what you should do to meet these
assumptions?3 Why adjusted R2 is smaller than R2 and what these numbers mean when
comparing between several models?4 What is the difference between regression and ANOVA and when are they
equivalent?5 How can you select the best model?6 Other LR problems (Multicollinearity and Outliers observations), what you
Today , we will cover how to do Linear Regression Analysis (LRA) in SPSS andSAS
We will learn concepts and vocabularies in regression analysis such as:1 How to use the F-test to determine if your predictor variables have a statistically
significant relationship with your outcome/response variable?2 What are the assumptions for LRA and what you should do to meet these
assumptions?3 Why adjusted R2 is smaller than R2 and what these numbers mean when
comparing between several models?4 What is the difference between regression and ANOVA and when are they
equivalent?5 How can you select the best model?6 Other LR problems (Multicollinearity and Outliers observations), what you
Today , we will cover how to do Linear Regression Analysis (LRA) in SPSS andSAS
We will learn concepts and vocabularies in regression analysis such as:1 How to use the F-test to determine if your predictor variables have a statistically
significant relationship with your outcome/response variable?2 What are the assumptions for LRA and what you should do to meet these
assumptions?3 Why adjusted R2 is smaller than R2 and what these numbers mean when
comparing between several models?4 What is the difference between regression and ANOVA and when are they
equivalent?5 How can you select the best model?6 Other LR problems (Multicollinearity and Outliers observations), what you
Linear regression is a general method for estimating/describing association
between a continuous outcome variable (dependent) and one or multiplepredictors in one equation.
Simple linear regression model
yi =β0 +β1xi +εi i = 1,2, . . . ,n
where yi represents the ith response value, β0 is the intercept (the mean valueof y at x = 0), β1 (Slope, Regression coefficient) tells us that, on average, as xincreases by 1 so y increases by β1, and εi is the error term.
The estimated model is
yi = β0 + β1xi i = 1,2, . . . ,n
where εi = yi − yi, represents the residuals for the ith observation.
Linear regression is a general method for estimating/describing association
between a continuous outcome variable (dependent) and one or multiplepredictors in one equation.
Simple linear regression model
yi =β0 +β1xi +εi i = 1,2, . . . ,n
where yi represents the ith response value, β0 is the intercept (the mean valueof y at x = 0), β1 (Slope, Regression coefficient) tells us that, on average, as xincreases by 1 so y increases by β1, and εi is the error term.
The estimated model is
yi = β0 + β1xi i = 1,2, . . . ,n
where εi = yi − yi, represents the residuals for the ith observation.
Linear regression is a general method for estimating/describing association
between a continuous outcome variable (dependent) and one or multiplepredictors in one equation.
Simple linear regression model
yi =β0 +β1xi +εi i = 1,2, . . . ,n
where yi represents the ith response value, β0 is the intercept (the mean valueof y at x = 0), β1 (Slope, Regression coefficient) tells us that, on average, as xincreases by 1 so y increases by β1, and εi is the error term.
The estimated model is
yi = β0 + β1xi i = 1,2, . . . ,n
where εi = yi − yi, represents the residuals for the ith observation.
What are the reasons for using a multiple regression?
Two reasons for using multiple regression:
1 To be able to make stronger causal inferences from observed associationsbetween two or more variables.
2 To predict a dependent variable based on values of a number of other
independent variables.
Example : There might be many factors associated with crime such asPoverty, Urbanisation, Low social cohesion and informal social control, andEducation
Therefore , we want to be able to understand the unique contribution of eachvariable to variation in crime levels.
What are the reasons for using a multiple regression?
Two reasons for using multiple regression:
1 To be able to make stronger causal inferences from observed associationsbetween two or more variables.
2 To predict a dependent variable based on values of a number of other
independent variables.
Example : There might be many factors associated with crime such asPoverty, Urbanisation, Low social cohesion and informal social control, andEducation
Therefore , we want to be able to understand the unique contribution of eachvariable to variation in crime levels.
What are the reasons for using a multiple regression?
Two reasons for using multiple regression:
1 To be able to make stronger causal inferences from observed associationsbetween two or more variables.
2 To predict a dependent variable based on values of a number of other
independent variables.
Example : There might be many factors associated with crime such asPoverty, Urbanisation, Low social cohesion and informal social control, andEducation
Therefore , we want to be able to understand the unique contribution of eachvariable to variation in crime levels.
A group of 13 children participated in a psychological study to analyze therelationship between age and average total sleep time (ATST). The results aredisplayed below. Determine the SLR model for the data?
Example Problem 2 In order to study the growth rate of a particular type ofbacteria biologists were interested in the relationship between time and theproportion of total area taken up by a colony of bacteria. The biologists placedsamples in four Petri dishes and observed the percentage of total area takenup by the bacteria colony after fixed time intervals
Data and SAS code is posted for future analysis.http://filebox.vt.edu/users/abdo/statwww/Example2.pdf
Outliers can distort the regression results. When an outlier is included in theanalysis, it pulls the regression line towards itself. This can result in a solutionthat is more accurate for the outlier, but less accurate for all of the other casesin the data set.
The problems of satisfying assumptions and detecting outliers areintertwined. For example, if a case has a value on the dependent variable thatis an outlier, it will affect the skew, and hence, the normality of thedistribution.
Removing an outlier may improve the distribution of a variable.
Transforming a variable may reduce the likelihood that the value for a case
Outliers can distort the regression results. When an outlier is included in theanalysis, it pulls the regression line towards itself. This can result in a solutionthat is more accurate for the outlier, but less accurate for all of the other casesin the data set.
The problems of satisfying assumptions and detecting outliers areintertwined. For example, if a case has a value on the dependent variable thatis an outlier, it will affect the skew, and hence, the normality of thedistribution.
Removing an outlier may improve the distribution of a variable.
Transforming a variable may reduce the likelihood that the value for a case
Outliers can distort the regression results. When an outlier is included in theanalysis, it pulls the regression line towards itself. This can result in a solutionthat is more accurate for the outlier, but less accurate for all of the other casesin the data set.
The problems of satisfying assumptions and detecting outliers areintertwined. For example, if a case has a value on the dependent variable thatis an outlier, it will affect the skew, and hence, the normality of thedistribution.
Removing an outlier may improve the distribution of a variable.
Transforming a variable may reduce the likelihood that the value for a case
Our strategy for solving problems about violations of assumptions and outliers willinclude the following steps:
1 Run type of regression specified in problem statement on variables using fulldata set.
2 Test the dependent variable for normality. If it does not satisfy the criteria fornormality unless transformed, substitute the transformed variable in theremaining tests that call for the use of the dependent variable.
3 Test for normality, linearity, homoscedasticity using scripts. Decide which
transformations should be used.
4 Substitute transformations and run regression entering all independentvariables, saving studentized residuals.
5 Remove the outliers (studentized residual greater(Smaller) than 3 (-3) , andrun regression with the method and variables specified in the problem.
6 Compare R2 for analysis using transformed variables and omitting outliersstep 5 to R2 obtained for model using all data and original variables step 1 .
Our strategy for solving problems about violations of assumptions and outliers willinclude the following steps:
1 Run type of regression specified in problem statement on variables using fulldata set.
2 Test the dependent variable for normality. If it does not satisfy the criteria fornormality unless transformed, substitute the transformed variable in theremaining tests that call for the use of the dependent variable.
3 Test for normality, linearity, homoscedasticity using scripts. Decide which
transformations should be used.
4 Substitute transformations and run regression entering all independentvariables, saving studentized residuals.
5 Remove the outliers (studentized residual greater(Smaller) than 3 (-3) , andrun regression with the method and variables specified in the problem.
6 Compare R2 for analysis using transformed variables and omitting outliersstep 5 to R2 obtained for model using all data and original variables step 1 .
Our strategy for solving problems about violations of assumptions and outliers willinclude the following steps:
1 Run type of regression specified in problem statement on variables using fulldata set.
2 Test the dependent variable for normality. If it does not satisfy the criteria fornormality unless transformed, substitute the transformed variable in theremaining tests that call for the use of the dependent variable.
3 Test for normality, linearity, homoscedasticity using scripts. Decide which
transformations should be used.
4 Substitute transformations and run regression entering all independentvariables, saving studentized residuals.
5 Remove the outliers (studentized residual greater(Smaller) than 3 (-3) , andrun regression with the method and variables specified in the problem.
6 Compare R2 for analysis using transformed variables and omitting outliersstep 5 to R2 obtained for model using all data and original variables step 1 .
Our strategy for solving problems about violations of assumptions and outliers willinclude the following steps:
1 Run type of regression specified in problem statement on variables using fulldata set.
2 Test the dependent variable for normality. If it does not satisfy the criteria fornormality unless transformed, substitute the transformed variable in theremaining tests that call for the use of the dependent variable.
3 Test for normality, linearity, homoscedasticity using scripts. Decide which
transformations should be used.
4 Substitute transformations and run regression entering all independentvariables, saving studentized residuals.
5 Remove the outliers (studentized residual greater(Smaller) than 3 (-3) , andrun regression with the method and variables specified in the problem.
6 Compare R2 for analysis using transformed variables and omitting outliersstep 5 to R2 obtained for model using all data and original variables step 1 .
Thank You For Your Attention!Acknowledgment for Dr. Schabanberger,Dr. J.P. Morgan, Jonathan Duggins,
Dingcai Cao, and all our Consultants.
Selected References : [for Interdisciplinary Statistical Analysis (LISA), ,Schabenberger and Morgen, , Montgomery et al., 2006, Smaxone et al., 2005,Chatterjee and Hadi, 2006, Kutner et al., 2005, Kleinbaum et al., 2007,Myers, 1990, Zar, 1999, Grafarend, 2006, Hastie and Tibshirani., 1990,Rencher, 2000, Vonesh and Chinchilli, 1997, Lee et al., 2006]
ReferencesChatterjee, S. and Hadi, A. S. (2006).Regression Analysis by Example. 4th edition.ISBN: 978-0-471-74696-6.
for Interdisciplinary Statistical Analysis (LISA), L.http://www.stat.vt.edu/consult/index.html.
Grafarend, E. W. (2006).Linear and Nonlinear Models: Fixed Effects, Random Effects, and Mixed Models.Walter de Gruyter.
Hastie, T. J. and Tibshirani., R. J. (1990).Generalized additive models.New York : Chapman and Hall.
Kleinbaum, D., Kupper, L., Nizam, A., and Muller, K. (2007).Applied Regression Analysis and Multivariate Methods. 4th edition.ISBN - 13: 978-0-495-38496-0.
Kutner, M., Nachtsheim, C., Neter, J., and Li, W. (2005).Applied Linear Statistical Models. 5th edition.ISBN - 13: 978-0-073-10874-2.
Lee, Y., Nelder, J. A., and Pawitan, Y. (2006).Generalized Linear Models with Random Effects: Unified Analysis via H-likelihood.Chapman & Hall/CRC.
Montgomery, D. C., Peck, E. A., and Vining, G. G. (2006).Introduction to Linear Regression Analysis, 4th Edition.John Wiley & Sones, New Jersey.
Myers, R. H. (1990).Classical and Modern Regression with Applications.Second edition, Boston, MA : PWS-KENT.
Rencher, A. C. (2000).Linear Models in Statistics.John Wiley and Sons, New York, NY.
Schabenberger, O. and Morgen, J. P.Regression and anova course pack.STAT 5044.
Smaxone, BÃÂÿgballe, M., Rasmussen, B., and Skafte, C. (2005).Regression.BL Music Scarlet, S. Donato Mil.se.Indspillet I Danmark 2004-2005 Tekster pÃÂe omslag Indhold: 5 I see you, part 1 Regression Freedom 2003 Smiling Waiting Bad sensation Dead but alive Afterlife If you could I see you, part 2.
Vonesh, E. F. and Chinchilli, V. M. (1997).Linear and Nonlinear Models for the Analysis of Repeated Measurements.Marcel Dekker, Inc.,New York.
Zar, J. (1999).Biostatistical Analysis. 4th edition.ISBN - 13: 978-0-130-81542-2.