SPSS INSTRUCTION – CHAPTER 8 SPSS provides rather straightforward output for regression and correlation analysis. The program’s graph, regression, and correlation functions can respectively produce scatterplots, provide regression equation coefficients, and create correlation matrices. Within the outputs for these functions, you can also find information, such as coefficients of determination and significance values. Preparing Regression and Correlation Analysis Data in SPSS The first step in performing regression and correlation analyses in SPSS is, of course, inputting data into the program. Each variable should receive its own column on SPSS’s Data View screen. With this arrangement, each subject’s independent and dependent variable scores should fall into the same row. Example 8.25 – SPSS Data View Screen for Regression and Correlation Analysis For a simple example, consider the five-subject sample introduced in Example 8.5 (selected for this example due to the small sample size, which allows the entire data set to be shown easily). Figure 8.19presents the data from this example as it would look in the SPSS Data View screen . FIGURE 8.19– SPSS REGRESSION AND CORRELATION ANALYSIS DATA ARRANGEMENT Data for the independent variable appears on the left and data for the dependent variable appears on the right. However, the variables do not need to appear in this order because, in forthcoming steps, SPSS asks the user to identify the independent and the dependent variable by name. ▄ If your analysis involves more than two variables, you can simply include additional columns. In the commands that you provide to SPSS about the analysis that you wish to perform, you must specify which of these columns you wish to represent independent variables, dependent variables, and intervening variables. Creating Scatterplots in SPSS
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SPSS INSTRUCTION – CHAPTER 8
SPSS provides rather straightforward output for regression and correlation analysis. The
program’s graph, regression, and correlation functions can respectively produce
scatterplots, provide regression equation coefficients, and create correlation matrices.
Within the outputs for these functions, you can also find information, such as coefficients of
determination and significance values.
Preparing Regression and Correlation Analysis Data in SPSS The first step in performing regression and correlation analyses in SPSS is, of course,
inputting data into the program. Each variable should receive its own column on SPSS’s
Data View screen. With this arrangement, each subject’s independent and dependent
variable scores should fall into the same row.
Example 8.25 – SPSS Data View Screen for Regression and Correlation Analysis
For a simple example, consider the five-subject sample introduced in Example 8.5 (selected
for this example due to the small sample size, which allows the entire data set to be shown
easily). Figure 8.19presents the data from this example as it would look in the SPSS Data
View screen .
FIGURE 8.19– SPSS REGRESSION AND CORRELATION ANALYSIS DATA ARRANGEMENT
Data for the independent variable appears on the left and data for the dependent variable appears on the
right. However, the variables do not need to appear in this order because, in forthcoming steps, SPSS asks the
user to identify the independent and the dependent variable by name. ▄
If your analysis involves more than two variables, you can simply include additional
columns. In the commands that you provide to SPSS about the analysis that you wish to
perform, you must specify which of these columns you wish to represent independent
variables, dependent variables, and intervening variables.
Creating Scatterplots in SPSS
Basic scatterplots are most easily created through SPSS’s Graphs function. SPSS
instructions forChapter 2 and Chapter 3 explain how to use this function to create bar
graphs, pie charts, and frequency histograms. The process for creating scatterplots in SPSS
begins the same way.
1. From the pull-down menu under the Graphs option at the top of the data view or
variable view screen, select “Legacy Dialogues.” A listing of graphs and charts available
through this method should appear.
2. Select “Scatter/Dot.” A window entitled Scatter/Dot should appear. The Scatter/Dot
window contains various options for the graph. A two-variable situation requires a
Simple Scatter. A three-variable situation, such as that described in Section 8.3.1,
requires a 3-D Scatter. After selecting the name of the appropriate scatterplot, click
“Define.”
a. For a simple scatterplot, a new window, entitled Simple Scatterplot should appear.
FIGURE 8.20 – SPSS SIMPLE SCATTERPLOT WINDOW
The user creates two-variable scatterplot by identifying the independent (X) and dependent (Y)
variables from those listed on the left side of the window. To do so, highlight the name of each
variable and click on the arrow next to the box labeled with the appropriate axis name.
Identify the independent variable by moving its name from the box on the left to the
box labeled “X Axis.” Identify the dependent variable by moving its name from the
box on the left to the box labeled “Y Axis.”
b. For a 3-D scatterplot, a new window, entitled, 3-D Scatterplot should appear.
FIGURE 8.21 – SPSS 3-D SCATTERPLOT WINDOW
The user creates three-variable scatterplot by identifying the two independent variables and the
dependent variable from those listed on the left side of the window. To do so, highlight the name of
each variable and click on the arrow next to the box labeled with the appropriate axis name.
Move the names of each of the two independent variables and the dependent
variable from the box on the left to a box on the right marked for one of the axis. The
assignment of the three variables to the X, Y, and Z axis on the graph depends upon
the user’s intentions and preference for the graph’s appearance.
3. Click OK.
Example 8.26 – Simple Scatterplot in SPSS
The steps for producing a simple scatterplot can be applied to the examples from Section
8.2.1. The following graph results from moving the name of the independent variable,
students, to the box labeled, “X Axis,” and moving the name of the dependent variable,
“hedgers,” to the box labeled “Y Axis.”
FIGURE 8.22 – SPSS SIMPLE SCATTERPLOT OUTPUT
The scale for independent-variable scores lies along the X axis and the scale for dependent-variable scores
lies along the Y axis. Each point represents a particular independent and dependent variable score.
This particular scatterplot indicates that, as class size increases, teachers’ use of hedgers
tends to increase. Thus, it suggests a positively-sloped regression line. ▄
The basic SPSS scatterplot does not show the regression line. If you would like the graph to
include this line, you must use SPSS’s Chart Editor. To access the Chart Editor, you must
double click on the scatterplot.
The Chart Editor refers to the least-squares regression line as a fit line. The pull-down
menu for the Elements function in the Chart Editor contains a “Fit Line at Total” option.
(Often, the lowest menu bar in the Chart Editor also contains a shortcut icon for this
process.) Selecting this option begins the process for overlaying the regression line onto
the existing scatterplot.
1. From the “Elements” pull-down menu in the Chart Editor, select “Fit Line at Total.”
2. A new window entitled Properties should appear.
FIGURE 8.23 – SPSS CHART EDITOR PROPERTIES WINDOW
The choice of a fit method determines the line or curve that SPSS superimposes on the scatterplot. Simple
analyses may require only a horizontal line to visually indicate the mean of all Y values. A linear fit
produces a least-squares regression line. Loess, quadratic, and cubic fits refer to curvilinear relationships.
Select the appropriate Fit Method from the options provided. Most analyses require a
linear fit. However, if you wish to investigate a possible curvilinear relationship, you
may wish to request a cubic, quadratic, or loess fit.
3. Click CLOSE.
Example 8.27 – Regression Line in SPSS
Figure 8.23 shows the scatterplot in Figure 8.22 with an added regression line, obtained by
requesting a linear fit within the Chart Editor window. As expected, the line has a positive
slope.
FIGURE 8.24 – SPSS SIMPLE SCATTERPLOT WITH REGRESSION LINE OUTPUT
The regression line indicates the general linear trend of points. This particular line is the one that SPSS
identifies as producing the smallest sum of squared residuals for all points on the scatterplot.
In this case, the points may fit a curvilinear path, particularly a cubic curve, slightly better
than they fit a linear path. Requesting a cubic fit in the Chart Editor window produces
Figure 8.26.
FIGURE 8.26 – SPSS SIMPLE SCATTERPLOT WITH CUBIC CURVE OUTPUT
The curve that appears in Figure 8.26 indicates the general cubic trend of points. This particular cubic curve
is the one SPSS identifies as producing the smallest sum of squared residuals for all points on the scatterplot.
This curve does, in fact, seem to fit the data better than Figure 8.25’s line does. The
researcher may, therefore, which to characterize the relationship between the number of
students in a class and the number of hedgers used per hour by the teacher as curvilinear.
▄
Example 8.28 – 3-D Scatterplot in SPSS
A three-dimensional scatterplot can represent the two variables from Example 8.26 and
Example 8.27 along with the questions/hour variable used to demonstrate calculation of
the multiple correlation coefficient in Example 8.13 In Example 8.13, x corresponds to the
number of students in a particular class, y corresponds to the number of hedgers used per
hour by the teacher, and z corresponds to the number of student questions per hour.
Assigning these three variables to the appropriate axes in the 3-D Scatterplot window
produces the following scatterplot.
FIGURE 8.25 – SPSS 3-D SCATTERPLOT OUTPUT
Scales for the two independent variables appear along the X axis and the Y axis. The scale for the dependent
variable appears along the Z axis. The researcher, however, can assign the variables to the axes that suit his or
her purposes. Each point represents a particular subject’s scores for the two independent variables and the
dependent variable.
The points on this scatterplot seem to float in space. Actually, though, each point is situated
at the intersection of the planes representing the enrollment for a particular class, the
number of hedgers used per hour by the teacher of that class and the number of questions
asked per hour by students in the class. ▄
You should know that methods of creating a scatterplot in SPSS other than “Legacy
Dialogues” option exist. The “Chart Builder” function within the “Graph” menu, for
instance, also leads you through steps that produce a scatterplot. With the Chart Builder,
you gain some more control over the appearance and components of the scatterplot than
you have when using Legacy Dialogues. However, when comparing the two methods, the
process needed to use the Chart Builder is a bit more complicated.
If you need to create a scatterplot that uses data points other than raw values you may wish
to use a different approach. SPSS’s regression analysis function allows you to create such
scatterplots. By clicking on the window’s “plots” button, you can access a new, entitled,
Linear Regression: Plots, which allows you to specify scales based upon standardized
values, residuals, and predicted values. This function generally has the most value for
somewhat advanced analyses.
Regression Analysis in SPSS With the exception of the scatterplot, itself, you can obtain all pairwise regression and
correlation values by using SPSS’s “Regression” function. Output from the following steps
includes regression equation coefficients, r, and r2.
1. Select “Regression” from SPSS’s Analyze pull-down menu and then, assuming a linear
regression is desired, select the “Linear” option.
2. A window entitled Linear Regression should appear. A box in the upper left of the
window contains the names of all variables.
FIGURE 8.26 – SPSS LINEAR REGRESSION WINDOW The user obtains regression values by identifying the independent variable(s) and the dependent variable
from those listed on the left side of the window. To do so, highlight the name of the variable and click on
the arrow next to the appropriate box.
Move the names of the independent and dependent variables to the properly-labeled
boxes on the right. If the user moves the name of only one variable the box labeled
“independent variable(s)”, SPSS performs a bivariate regression analysis. If the names
of more than one variable are moved to the “independent variable(s) box, SPSS
performs a multiple regression analysis.
3. Click OK
Four output tables result. The first of these tables simply identifies the variables used for
the analysis. The other three tables provide the information that you need to assess the
relationship between the independent and dependent variables. You can find the
correlation coefficient and the coefficient of determination in the Model Summary table and
coefficients for the regression equation in the Coefficients table’s column “B.” SPSS refers
to the y-intercept as the constant and lists each slope next to its corresponding variable’s
name.
The other table included in SPSS output provides ANOVA results. As explained in Section
8.6, some statisticians supplement regression and correlation analysis with an ANOVA.
Although a regression and correlation analysis addresses the trend in changes between
independent and dependent variable scores, it does not measure the sizes of differences
between scores on either factor. So, even if a trend exists, differences in dependent-variable
scores associated with changes in independent-variable scores may be so miniscule that
the trend becomes negligible. Those concerned about this issue may use an ANOVA
determine whether significant differences exist between dependent-variable scores. When
conducting an ANOVA in this circumstance, SPSS regards the independent variable as a
categorical measure. Each independent-variable score, thus, defines a separate category,
often resulting in categories that contain only one subject. Then, the ANOVA compares the
dependent-variable score that corresponds to each independent-variable category. You can
interpret the results of this test just as you would interpret the results of any ANOVA.
(Please see Chapter 7 for information about ANOVAs.)
Example 8.29 – SPSS Regression Output
To further understand how to locate and interpret relevant regression and correlation
coefficients, consider the four output tables as they apply to the bivariate situation used for
Example 8.26 and Example 8.27.
Variables Entered/Removedb
Model
Variables
Entered
Variables
Removed Method
1 studentsa . Enter
a. All requested variables entered.
b. Dependent Variable: hedgers
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .703a .494 .481 2.59045
a. Predictors: (Constant), students
ANOVAb
Model
Sum of
Squares df Mean Square F Sig.
1 Regression 249.405 1 249.405 37.167 .000a
Residual 254.995 38 6.710
Total 504.400 39
a. Predictors: (Constant), students
b. Dependent Variable: hedgers
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
T Sig. B Std. Error Beta
1 (Constant) 1.017 .799 1.272 .211
Students .101 .017 .703 6.096 .000
a. Dependent Variable: hedgers
TABLE 8.9, TABLE 8.10, TABLE 8.11, AND TABLE 8.12 – SPSS LINEAR REGRESSION OUTPUT
SPSS output for the linear regression command includes four tables. Table 8.9, entitled “Variables
Entered/Removed,” indicates the independent variables and footnotes the name of the dependent variable.
Table 8.10, Table 11, and Table 8.12 provide information about the changes in variable scores. The
correlation coefficient (r) and the coefficient of determination (r2) found in the Model Summary, indicate the
strength of the linear trend between the variables. The significance value in the ANOVA table, when compared
to a predetermined α, indicates whether changes in dependent- variable scores that accompany changes in
independent variable scores are significant. Finally, the Coefficients table provides the y-intercept and the
slope for the regression equation.
The correlation coefficient of .703, from Table 8.10, suggests that the number of students in
a class and number of hedgers used per hour by the teacher have a strong (although barely
so) linear relationship. For those who do not wish to square the correlation coefficient
themselves, this table also includes the coefficient of determination, which indicates that
differences in the number of student in the class can explain 49.4% of differences in
teachers’ use of hedgers. Further, the ANOVA produces a p-value of .000, which, obviously,
lies below all α values. So, one could conclude that the number of hedgers used by teachers
per hour changes significantly with respect to in the number of students in the class. The
regression equation helps to further describe this change. Using the regression equation of
y = 1.017 + .101x, obtained from value in Table 8.12, one can the dependent-variable score
for each independent-variable score. Each x value substituted into the equation and the y
value that results provides an ordered pair that falls on the regression line. This process
produces a best guess for the number of hedgers used based upon class size. ▄
If you input more than one variable name into the Linear Regression window’s
“Independent Variable(s)” box, output looks similar to that shown in Example 8.29. In this
case, however, the Model Summary provides the multiple correlation coefficient and the
coefficient of multiple determination. Also, the “B” column in the Coefficients table includes
a slope for each independent variable.
Correlation Matrices in SPSS
You may not always want to obtain all of the information provided by SPSS’s regression
analysis. In some situations, correlation coefficients, alone, suffice. The Correlate function
can not only provide these values without unneeded regression output, but can also display
coefficients for more than one pair of variables at a time and can compute partial
correlation coefficients. Coefficients appear in a correlation matrix. The following steps