Page 1
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 1/14
Example how to perform Multiple Regression Analysis using SPSS Statistics
Introduction
Multiple regression is an extension of simple linear regression. It is used when we want to
predict the value of a variable based on the value of two or more other variables. The variable we
want to predict is called the dependent variable (or sometimes, the outcome, target or criterion
variable). The variables we are using to predict the value of the dependent variable are called the
independent variables (or sometimes, the predictor, explanatory or regressor variables).
For example, you could use multiple regression to understand whether exam performance can be
predicted based on revision time, test anxiety, lecture attendance and gender. lternately, you
could use multiple regression to understand whether daily cigarette consumption can be
predicted based on smo!ing duration, age when started smo!ing, smo!er type, income and
gender.
Multiple regression also allows you to determine the overall fit (variance explained) of the model
and the relative contribution of each of the predictors to the total variance explained. For
example, you might want to !now how much of the variation in exam performance can be
explained by revision time, test anxiety, lecture attendance and gender "as a whole", but also the
"relative contribution" of each independent variable in explaining the variance.
This "#uic! start" guide shows you how to carry out multiple regression using $%$$ $tatistics, as
well as interpret and report the results from this test. &owever, before we introduce you to this
procedure, you need to understand the different assumptions that your data must meet in order
for multiple regression to give you a valid result. 'e discuss these assumptions next.
Assumptions
'hen you choose to analyse your data using multiple regression, part of the process involves
chec!ing to ma!e sure that the data you want to analyse can actually be analysed using multiple
regression. ou need to do this because it is only appropriate to use multiple regression if your
Low Cost Statistics Data Analysis Service
Page 2
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 2/14
data "passes" eight assumptions that are re#uired for multiple regression to give you a valid
result. In practice, chec!ing for these eight assumptions ust adds a little bit more time to your
analysis, re#uiring you to clic! a few more buttons in $%$$ $tatistics when performing your
analysis, as well as thin! a little bit more about your data, but it is not a difficult tas!.
*efore we introduce you to these eight assumptions, do not be surprised if, when analysing your
own data using $%$$ $tatistics, one or more of these assumptions is violated (i.e., not met). This
is not uncommon when wor!ing with real+world data rather than textboo! examples, which often
only show you how to carry out multiple regression when everything goes well &owever, don-t
worry. ven when your data fails certain assumptions, there is often a solution to overcome this.
First, let/s ta!e a loo! at these eight assumptions0
• Assumption #1: our dependent variale should be measured on a continuous scale
(i.e., it is either an interval or ratio variable). xamples of variables that meet this
criterion include revision time (measured in hours), intelligence (measured using I1
score), exam performance (measured from 2 to 322), weight (measured in !g), and so
forth. ou can learn more about interval and ratio variables in our article0 Types of
4ariable. If your dependent variable was measured on an ordinal scale, you will need to
carry out ordinal regression rather than multiple regression. xamples of ordinal
variales include 5i!ert items (e.g., a 6+point scale from "strongly agree" through to
"strongly disagree"), amongst other ways of ran!ing categories (e.g., a 7+point scale
explaining how much a customer li!ed a product, ranging from "8ot very much" to "es,
a lot").
• Assumption #!: ou have two or more independent variales, which can be either
continuous (i.e., an interval or ratio variable) or categorical (i.e., an ordinal or nominal
variable). For examples of continuous and ordinal variales, see the bullet above.
xamples of nominal variales include gender (e.g., 9 groups0 male and female),
ethnicity (e.g., 7 groups0 :aucasian, frican merican and &ispanic), physical activity
level (e.g., ; groups0 sedentary, low, moderate and high), profession (e.g., < groups0
surgeon, doctor, nurse, dentist, therapist), and so forth. gain, you can learn more about
variables in our article0 Types of 4ariable. If one of your independent variables is
Low Cost Statistics Data Analysis Service
Page 3
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 3/14
dichotomous and considered a moderating variable, you might need to run a
=ichotomous moderator analysis.
• Assumption #": ou should have independence of oservations (i.e., independence of
residuals), which you can easily chec! using the =urbin+'atson statistic, which is a
simple test to run using $%$$ $tatistics. 'e explain how to interpret the result of the
=urbin+'atson statistic, as well as showing you the $%$$ $tatistics procedure re#uired,
in our enhanced multiple regression guide.
• Assumption #: There needs to be a linear relationship between (a) the dependent
variable and each of your independent variables, and (b) the dependent variable and the
independent variables collectively. 'hilst there are a number of ways to chec! for these
linear relationships, we suggest creating scatterplots and partial regression plots using
$%$$ $tatistics, and then visually inspecting these scatterplots and partial regression plots
to chec! for linearity. If the relationship displayed in your scatterplots and partial
regression plots are not linear, you will have to either run a non+linear regression analysis
or "transform" your data, which you can do using $%$$ $tatistics. In our enhanced
multiple regression guide, we show you how to0 (a) create scatterplots and partial
regression plots to chec! for linearity when carrying out multiple regression using $%$$
$tatistics> (b) interpret different scatterplot and partial regression plot results> and (c)
transform your data using $%$$ $tatistics if you do not have linear relationships between
your variables.
• Assumption #$: our data needs to show homoscedasticity, which is where the
variances along the line of best fit remain similar as you move along the line. 'e explain
more about what this means and how to assess the homoscedasticity of your data in our
enhanced multiple regression guide. 'hen you analyse your own data, you will need to
plot the studenti?ed residuals against the unstandardi?ed predicted values. In our
enhanced multiple regression guide, we explain0 (a) how to test for homoscedasticity
using $%$$ $tatistics> (b) some of the things you will need to consider when interpreting
your data> and (c) possible ways to continue with your analysis if your data fails to meet
this assumption.
Low Cost Statistics Data Analysis Service
Page 4
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 4/14
• Assumption #%: our data must not show multicollinearity, which occurs when you
have two or more independent variables that are highly correlated with each other. This
leads to problems with understanding which independent variable contributes to the
variance explained in the dependent variable, as well as technical issues in calculating a
multiple regression model. Therefore, in our enhanced multiple regression guide, we
show you0 (a) how to use $%$$ $tatistics to detect for multicollinearity through an
inspection of correlation coefficients and Tolerance@4IF values> and (b) how to interpret
these correlation coefficients and Tolerance@4IF values so that you can determine
whether your data meets or violates this assumption.
• Assumption #&: There should be no significant outliers, high leverage points or highly
influential points. Autliers, leverage and influential points are different terms used to
represent observations in your data set that are in some way unusual when you wish to
perform a multiple regression analysis. These different classifications of unusual points
reflect the different impact they have on the regression line. n observation can be
classified as more than one type of unusual point. &owever, all these points can have a
very negative effect on the regression e#uation that is used to predict the value of the
dependent variable based on the independent variables. This can change the output that
$%$$ $tatistics produces and reduce the predictive accuracy of your results as well as the
statistical significance. Fortunately, when using $%$$ $tatistics to run multiple regression
on your data, you can detect possible outliers, high leverage points and highly influential
points. In our enhanced multiple regression guide, we0 (a) show you how to detect
outliers using "casewise diagnostics" and "studenti?ed deleted residuals", which you can
do using $%$$ $tatistics, and discuss some of the options you have in order to deal with
outliers> (b) chec! for leverage points using $%$$ $tatistics and discuss what you should
do if you have any> and (c) chec! for influential points in $%$$ $tatistics using a measure
of influence !nown as :oo!/s =istance, before presenting some practical approaches in
$%$$ $tatistics to deal with any influential points you might have.
• Assumption #': Finally, you need to chec! that the residuals (errors) are
approximately normally distriuted (we explain these terms in our enhanced multiple
regression guide). Two common methods to chec! this assumption include using0 (a) a
Low Cost Statistics Data Analysis Service
Page 5
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 5/14
histogram (with a superimposed normal curve) and a 8ormal %+% %lot> or (b) a 8ormal
1+1 %lot of the studenti?ed residuals. gain, in our enhanced multiple regression guide,
we0 (a) show you how to chec! this assumption using $%$$ $tatistics, whether you use a
histogram (with superimposed normal curve) and 8ormal %+% %lot, or 8ormal 1+1 %lot>
(b) explain how to interpret these diagrams> and (c) provide a possible solution if your
data fails to meet this assumption.
ou can chec! assumptions B7, B;, B<, BC, B6 and BD using $%$$ $tatistics. ssumptions B3 and
B9 should be chec!ed first, before moving onto assumptions B7, B;, B<, BC, B6 and BD. Eust
remember that if you do not run the statistical tests on these assumptions correctly, the results
you get when running multiple regression might not be valid. This is why we dedicate a number
of sections of our enhanced multiple regression guide to help you get this right.
In the section, %rocedure, we illustrate the $%$$ $tatistics procedure to perform a multiple
regression assuming that no assumptions have been violated. First, we introduce the example that
is used in this guide.
Example
health researcher wants to be able to predict "4A9max", an indicator of fitness and health.
8ormally, to perform this procedure re#uires expensive laboratory e#uipment and necessitates
that an individual exercise to their maximum (i.e., until they can longer continue exercising due
to physical exhaustion). This can put off those individuals who are not very active@fit and those
individuals who might be at higher ris! of ill health (e.g., older unfit subects). For these reasons,
it has been desirable to find a way of predicting an individual/s 4A 9max based on attributes that
can be measured more easily and cheaply. To this end, a researcher recruited 322 participants to
perform a maximum 4A9max test, but also recorded their "age", "weight", "heart rate" and
"gender". &eart rate is the average of the last < minutes of a 92 minute, much easier, lower
wor!load cycling test. The researcher/s goal is to be able to predict 4A 9max based on these four
attributes0 age, weight, heart rate and gender.
Low Cost Statistics Data Analysis Service
Page 6
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 6/14
Setup in SPSS Statistics
In $%$$ $tatistics, we created six variables0 (3) 4A9max, which is the maximal aerobic capacity>
(9) age, which is the participant/s age> (7) weight, which is the participant/s weight (technically, it
is their /mass/)> (;) heartrate, which is the participant/s heart rate> (<) gender, which is the
participant/s gender> and (C) caseno, which is the case number. The caseno variable is used to
ma!e it easy for you to eliminate cases (e.g., "significant outliers", "high leverage points" and
"highly influential points") that you have identified when chec!ing for assumptions. In our
enhanced multiple regression guide, we show you how to correctly enter data in $%$$ $tatistics
to run a multiple regression when you are also chec!ing for assumptions.
Test Procedure in SPSS Statistics
The seven steps below show you how to analyse your data using multiple regression in $%$$
$tatistics when none of the eight assumptions in the previous section, ssumptions, have been
violated. t the end of these seven steps, we show you how to interpret the results from your
multiple regression. If you are loo!ing for help to ma!e sure your data meets assumptions B7, B;,
B<, BC, B6 and BD, which are re#uired when using multiple regression and can be tested using
$%$$ $tatistics,
• :lic! Analy*e + Regression + ,inear--- on the main menu, as shown below0
Low Cost Statistics Data Analysis Service
Page 7
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 7/14
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
8ote0 =on/t worry that you/re selecting Analy*e + Regression + ,inear--- on the main
menu or that the dialogue boxes in the steps that follow have the title, ,inear
Regression. ou have not made a mista!e. ou are in the correct place to carry out the
multiple regression procedure. This is ust the title that $%$$ $tatistics gives, even when
running a multiple regression procedure.
• ou will be presented with the ,inear Regression dialogue box below0
Low Cost Statistics Data Analysis Service
Page 8
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 8/14
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
• Transfer the dependent variable, 4A9max, into the =ependent0 box and the independent
variables, age, weight, heartrate and gender into the Independent(s)0 box, using the
buttons, as shown below (all other boxes can be ignored)0
Low Cost Statistics Data Analysis Service
Page 9
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 9/14
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
8ote0 For a standard multiple regression you should ignore the and
buttons as they are for se#uential (hierarchical) multiple regression. The Method0 option
needs to be !ept at the default value, which is . If, for whatever reason,
is not selected, you need to change Method0 bac! to . The
method is the name given by $%$$ $tatistics to standard regression analysis.
• :lic! the button. ou will be presented with the ,inear Regression: Statistics
dialogue box, as shown below0
Low Cost Statistics Data Analysis Service
Page 10
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 10/14
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
• In addition to the options that are selected by default, select :onfidence intervals in the G
Hegression :oefficientsG area leaving the 5evel()0 option at "J<". ou will end up with
the following screen0
Low Cost Statistics Data Analysis Service
Page 11
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 11/14
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
• :lic! the button. ou will be returned to the ,inear Regression dialogue box.
• :lic! the button. This will generate the output.
•
•
Interpreting and Heporting the Autput of Multiple Hegression nalysis
$%$$ $tatistics will generate #uite a few tables of output for a multiple regression analysis. Inthis section, we show you only the three main tables re#uired to understand your results from the
multiple regression procedure, assuming that no assumptions have been violated. complete
explanation of the output you have to interpret when chec!ing your data for the eight
assumptions re#uired to carry out multiple regression is provided in our enhanced guide. This
includes relevant scatterplots and partial regression plots, histogram (with superimposed normal
curve), 8ormal %+% %lot and 8ormal 1+1 %lot, correlation coefficients and Tolerance@4IF
values, casewise diagnostics and studenti?ed deleted residuals.
&owever, in this "#uic! start" guide, we focus only on the three main tables you need to
understand your multiple regression results, assuming that your data has already met the eight
assumptions re#uired for multiple regression to give you a valid result0
Determining how well the model fts
The first table of interest is the Model Summary table. This table provides the R, R2, adusted
R2
, and the standard error of the estimate, which can be used to determine how well a regression
model fits the data0
Low Cost Statistics Data Analysis Service
Page 12
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 12/14
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
The "R " column represents the value of R, the multiple correlation coefficient . R can be
considered to be one measure of the #uality of the prediction of the dependent variable> in this
case, 4A9max. value of 2.6C2, in this example, indicates a good level of prediction. The " R
S.uare" column represents the R2 value (also called the coefficient of determination), which is
the proportion of variance in the dependent variable that can be explained by the independent
variables (technically, it is the proportion of variation accounted for by the regression model
above and beyond the mean model). ou can see from our value of 2.<66 that our independent
variables explain <6.6 of the variability of our dependent variable, 4A9max. &owever, you also
need to be able to interpret "Ad/usted R S.uare" (adj. R2) to accurately report your data. 'e
explain the reasons for this, as well as the output, in our enhanced multiple regression guide.
Statistical signifcance
The F +ratio in the A02A table (see below) tests whether the overall regression model is a good
fit for the data. The table shows that the independent variables statistically significantly predict
the dependent variable, F (;, J<) K 79.7J7, p L .222< (i.e., the regression model is a good fit of
the data).
Low Cost Statistics Data Analysis Service
Page 13
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 13/14
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
Estimated model coecients
The general form of the e#uation to predict 4A9max from age, weight, heartrate, gender, is0
predicted 4A9max K D6.D7 G (2.3C< x age) G (2.7D< x weight) G (2.33D x heartrate)
(37.92D x gender)
This is obtained from the 3oefficients table, as shown below0
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
Nnstandardi?ed coefficients indicate how much the dependent variable varies with an
independent variable when all other independent variables are held constant. :onsider the effect
of age in this example. The unstandardi?ed coefficient, *3, for age is e#ual to +2.3C< (see
3oefficients table). This means that for each one year increase in age, there is a decrease in
4A9max of 2.3C< ml@min@!g.
Statistical signifcance o! the independent varia"les
ou can test for the statistical significance of each of the independent variables. This tests
whether the unstandardi?ed (or standardi?ed) coefficients are e#ual to 2 (?ero) in the population.
If p L .2<, you can conclude that the coefficients are statistically significantly different to 2
Low Cost Statistics Data Analysis Service
Page 14
8/17/2019 Example How to Perform Multiple Regression Analysis Using SPSS Statistics
http://slidepdf.com/reader/full/example-how-to-perform-multiple-regression-analysis-using-spss-statistics 14/14
(?ero). The t +value and corresponding p+value are located in the "t" and "Sig-" columns,
respectively, as highlighted below0
%ublished with written permission from $%$$ $tatistics, I*M :orporation.
ou can see from the "Sig-" column that all independent variable coefficients are statistically
significantly different from 2 (?ero). lthough the intercept, *2, is tested for statistical
significance, this is rarely an important or interesting finding.
Putting it all together
ou could write up the results as follows0
• Oeneral
multiple regression was run to predict 4A 9max from gender, age, weight and heart rate. These
variables statistically significantly predicted 4A9max, F (;, J<) K 79.7J7, p L .222<, R2 K .<66.
ll four variables added statistically significantly to the prediction, p L .2<.
If you are unsure how to interpret regression e#uations or how to use them to ma!e predictions,
we discuss this in our enhanced multiple regression guide. 'e also show you how to write up the
results from your assumptions tests and multiple regression output if you need to report this in a
dissertation@thesis, assignment or research report. 'e do this using the &arvard and % styles.
Low Cost Statistics Data Analysis Service