Top Banner

of 33

SPSS Manual 07

Apr 07, 2018

Download

Documents

franckc0
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/6/2019 SPSS Manual 07

    1/33

    Using SPSS version 14

    Joel Elliott, Jennifer Burnaford, Stacey Weiss

    SPSS is a program that is very easy to learn and is also very powerful. This manual is designed tointroduce you to the program however, it is not supposed to cover every single aspect of SPSS.

    There will be situations in which you need to use the SPSS Help Menu or Tutorial to learn how toperform tasks which are not detailed in here. You should turn to those resources any time you havequestions.

    The following document provides some examples of common statistical tests used in Ecology. Todecide which test to use, consult your class notes, your Statistical Roadmap or the Statistics Coach(under Help menu in SPSS).

    Data entry p. 2

    Descriptive statistics p. 4

    Examining assumptions of parametric statistics

    Test for normality p. 5

    Test for homogeneity of variances p. 6

    Transformations p. 7

    Comparative Statistics 1: Comparing means among groups

    Comparing two groups using parametric statistics

    Two-sample t-test p. 8

    Paired T-test p. 10

    Comparing two groups using non-parametric statistics

    Mann Whitney U test p. 11

    Comparing three or more groups using parametric statisticsOne-way ANOVA and post-hoc tests p. 13

    Comparing three or more groups using non-parametric statistics

    Kruskal-Wallis test p. 15

    For studies with two independent variables

    Two-way ANOVA p. 17

    ANCOVA p. 20

    Comparative Statistics 2: Comparing frequencies of events

    Chi Square Goodness of Fit p. 23

    Chi Square Test of Independence p. 24

    Comparative Statistics 3: Relationships among continuous variables

    Correlation (no causation implied) p. 26Regression (causation implied) p. 27

    Graphing your data

    Simple bar graph p. 30

    Clustered bar graph p. 31

    Box plot p. 32

    Scatter plot p. 32

    Printing from SPSS p. 33

  • 8/6/2019 SPSS Manual 07

    2/33

    Start SPSS and when the first box appears for What would you like to do? click the button forType in data.

    A spreadsheet will appear. The set-up here is similar to Excel, but at the bottom of the windowyou will notice two tabs. One is Data View. The other is Variable View. To enter your

    data, you will need to switch back and forth between these pages by clicking on the tabs.

    Suppose you are part of a biodiversity survey group working in the Galapagos Islands and you arestudying marine iguanas. After visiting a couple of islands you think that there may be higherdensities of iguanas on island A than on island B. To examine this hypothesis, you decide toquantify the population densities of the iguanas on each island. You take 20 transects (100 m2) oneach island (A and B), counting the number of iguanas in each transect. Your data are shown below.

    A 12 13 10 11 12 12 13 13 14 14 14 14 15 15 15 16 14 12 14 14

    B 15 13 16 10 9 24 13 18 14 16 15 19 14 16 17 15 17 22 15 16

    First define the variables to be used. Go toVariable View of the SPSS Data Editor window asshown below.

    The first column (Name) is where you name your variables. For example, you might name oneLocation (you have 2 locations in your data set, Island A and Island B). You might name theother one Density (this is your response variable, number of iguanas).

    Other important columns are the Type, Label, Values, and Measure.o For now, we will keep Type as Numeric but look to see what your options are. At

    some point in the future, you may need to use one of these options.o The Label column is very helpful. Here, you can expand the description of your variable

    name. In the Name column you are restricted by the number & type of characters you canuse. In the Label column, there are no such restrictions. Type in labels for your iguanadata.

    o In the Values column, you can assign numbers to represent the different locations (soIsland A will be 1 and Island B will 2). To do this, you need to assign Values toyour categorical explanatory variable. Click on the cell in the Values column, and clickon the that shows up. A dialog box will appear as below. Type in 1 in the valuecell and A in the value label cell, and then hit Add. Type in 2 in the value cell andB in the value label cell. Hit Add again. Then Hit OK.

  • 8/6/2019 SPSS Manual 07

    3/33

    o In the Measure column, you can tell the computer what type of variables these are. In this

    example, island is a categorical variable. So in the Location row, go to the measurecolumn (the far right) and click on the cell. There are 3 choices for variable types. Youwant to pick Nominal. Iguana density is a continuous variable... since scale (meaningcontinuous) is the default condition, you dont need to change anything.

    Now switch to the Data View. You will see that your columns are now titled Location andDensity.

    To make the value labels appear in the spreadsheet pull down the View menu and choose ValueLabels. The labels will appear as you start to enter data.

    You can now enter your data in the columns. Each row is a single observation. Since you havechosen View Value Labels and entered your Location value labels in the Variable Viewwindow, when you type 1 in the Location column, the letter A will appear. After youveentered all the values for Island A, enter the ones from Island B below them. The top of yourdata table will eventually look like this:

  • 8/6/2019 SPSS Manual 07

    4/33

    Once you have the data entered, you want to summarize the trends in the data. There a variety ofstatistical measures for summarizing your data, and you want to explore your data by making tablesand graphs. To help you do this you can use the Statistics Coach found under the Help menu in

    SPSS, or you can go directly to the Analyze menu and choose the appropriate tests.

    To get a quick view of what your data look like:

    Pull down theAnalyze menu and chooseDescriptive statistics, then Frequencies. A newwindow will appear. Put the Density variable in the box, then choose the statistics that you wantto use to explore your data by the clicking on the Statistics and Charts buttons at the bottom ofthe box (e.g., mean, median, mode, standard deviation, skewness, kurtosis). This will producesummary statistics for the whole data set. Your results will show up in a new window.

    SPSS can also produce statistics and plots for each of the islands separately. To do this, youneed to split the file. Pull down theData menu and choose Split File. Click on Organizeoutput by groups and then select the Island [Location] variable as shown below. Click OK.

    Now, if you repeat theAnalyze

    Descriptive statistics

    Frequencies steps and hit Okayagain, your output will now be similar to the following for each Island.Statistics(a)

    Density

    Valid 20N

    Missing 0

    Mean 13.3500

    Median 14.0000

    Mode 14.00

    Std. Deviation 1.49649

    Variance 2.239

    Skewness -.463Std. Error of Skewness .512

    Kurtosis -.045

    Std. Error of Kurtosis .992

    Range 6.00

    Minimum 10.00

    Maximum 16.00

    a Island = A

    Statistics(b)

    Density

    Valid 20N

    Missing 0

    Mean 15.7000

    Median 15.5000

    Mode 15.00(a)

    Std. Deviation 3.46562

    Variance 12.011

    Skewness .475Std. Error of Skewness .512

    Kurtosis 1.302

    Std. Error of Kurtosis .992

    Range 15.00

    Minimum 9.00

    Maximum 24.00

    a Multiple modes exist. The smallest value is shownb Island = B

  • 8/6/2019 SPSS Manual 07

    5/33

    10.00 12.00 14.00 16.00

    Density

    0

    1

    2

    3

    4

    5

    6

    7

    Frequency

    Mean = 13.35Std. Dev. = 1.49649N = 20

    Island: A

    Histogram

    9.00 12.00 15.00 18.00 21.00 24.00

    Density

    0

    2

    4

    6

    8

    10

    Freq

    uency

    Mean = 15.70Std. Dev. = 3.46562N = 20

    Island: B

    Histogram

    From these summary statistics you can see that the mean density of iguanas on Island A issmaller than that on Island B. Also, the variation patterns of the data are different on the twoislands as shown by the frequency distributions of the data and their different dispersionparameters. In each histogram, the normal curve indicates the expected frequency curve for a

    normal distribution with the same mean and standard deviation as your data. The range of datavalues for Island A is lower with a lower variance and kurtosis. Also, the distribution of IslandA is skewed to the left whereas the data for Island B is skewed to the right.

    You could explore your data more by making box plots, stem-leaf plots, and error bar charts.Use the functions under theAnalyze and Graphs menus to do this.

    After getting an impression of what your data look like you can now move on to determinewhether there is a significant difference between the mean densities of iguanas on the twoislands. To do this we have to use comparative statistics.

    NOTE: Once you are done looking at your data for the two islands separately, you need to unsplitthe data. Go toDataSplit File and selectAnalyze all cases, do not create groups.

    As you know, parametric tests have two main assumptions: 1) approximately normally distributeddata, and 2) homogeneous variances among groups. Lets examine each of these assumptions.

    Before you conduct any parametric tests you need to check that the data values come from anapproximately normal distribution. To do this, you can compare the frequency distribution ofyour data values with those of a normalized version of these values (See Descriptive Statisticssection above). If the data are approximately normal, then the distributions should be similar. Fromyour initial descriptive data analysis you know that the distributions of data for Island A and B didnot appear to fit an expected normal distribution perfectly. However, to objectively determinewhether the distribution varies significantly from a normal distribution you have to conduct anormality test. This test will provide you with a statistic that determines whether your data are

  • 8/6/2019 SPSS Manual 07

    6/33

    significantly different from normal. The null hypothesis is that the distribution on your data is NOTdifferent from a normal distribution.

    For the marine iguana example, you want to know if the data from Island A population arenormally distributed and if the data from Island B are normally distributed. Thus, your datamust be split. (DataSplit FileOrganize output by groups split by Location) Dontforget to unsplit when you are done!

    To conduct a statistical test for normality on your split data, go toAnalyzeNonparametric

    Tests1 Sample K-S. In the window that appears, put the response variable (in this case,Density) variable into the box on the right. ClickNormal in the Test Distribution check boxbelow. Then click OK.

    A output shows a Komolgorov-Smirnov (K-S) table for the data from each island. Your p-valueis the last line of the table: Asymp. Sig. (2-tailed).

    If p>0.05 (i.e., there a greater than 5% chance that your null hypothesis is true), you shouldconclude that the distribution of your data is not significantly different from a normaldistribution.

    If p 0.05). With a sample size of only N=20 the data would have to be

    skewed much more or have some large outliers to vary significantly from normal. If your data are not normally distributed, you should try to transform the data to meet this

    important assumption. (See below.)

    Another assumption of parametric tests is that the variances of each of the groups that you arecomparing have relatively similar variances. Most of the comparative tests in SPSS will do this test

  • 8/6/2019 SPSS Manual 07

    7/33

    for you as part of the analysis. For example, when you run a t-test, the output will include columnslabeled Levenes test for Equality of Variances. The p-value is labeled Sig. and will tell youwhether or not your data meet the assumption of parametric statistics.

    If the variances are not homogeneous, then you must either transform your data (e.g., using a logtransformation) to see if you can equalize the variances, or you can use a nonparametric comparisontest that does not require this assumption.

    If your data do not meet one or both of the above assumptions of parametric statistics, you may beable to transform the data so that they do. You can use a variety of transformations to try and makethe variances of the different groups equal or normalize the data. If the transformed data meet theassumptions of parametric statistics, you may proceed by running the appropriate test on thetransformed data. If, after a number of attempts, the transformed data do not meet the assumptionsof parametric statistics, you must run a non-parametric test.

    If the variances were not homogeneous, look at how the variances change with the mean. The usualcase is that larger means have larger variances. If this is the case, a transformation such as commonlog, natural log or square root often makes the variances homogeneous.

    Whenever your data are percents (e.g., % cover) they will generally not be normally distributed. Tomake percent data normal, you should do an arcsine-square root transformation of the percent data(percents/100).

    To transform your data:

    Go to TransformCompute. You will get the Compute Variable window.

    In the Target Variable box, you want to name your new transformed variable (for example,Log_Density).

    There are 3 ways you can transform your data. 1) using the calculator, 2) choosing functionsfrom lists on the right, or 3) typing the transformation in theNumeric Expression box.

    For this example: In the FunctionGroup box on the right, highlightArithmetic by clicking on itonce. Various functions will show up in the Functions and Special Variables box below.Choose theLG10 function. Double click on it.

    In theNumeric Expression box, it will now say LG10[?]. Double-click on the name of thevariable you want to transform (e.g., Density) in the box on the lower left to make Densityreplace the ?.

    Click Ok. SPSS will create a new column in your data sheet that has log-values of the iguanadensities.

    NOTE: you might want to do a transformation such as LN (x + 1). Follow the directions asabove but chooseLNinstead ofLG10 from the Functions and Special Variables box. Moveyour variable in the parentheses to replace the ?. Then type in +1 after your variable so itreads, for example, LN[Density+1].

    NOTE: for the arcsine-square root transformation, the composite function to be put into theNumeric Expression box would look like: arcsin(sqrt(percent data/100)).

  • 8/6/2019 SPSS Manual 07

    8/33

    After your transform your data, redo the tests of normality and homogeneity of variances to see ifthe transformed data now meet the assumptions of parametric statistics.

    Again, if your data now meet the assumptions of the parametric test, conduct a parametric statisticaltest using the transformed data. If the transformed data still do not meet the assumption, you can doa nonparametric test instead, such as a Mann-Whitney U test on the original data. This test isdescribed later in this handout.

    This test compares the means from two groups, such as the density data for the two different iguanapopulations. To run a two-sample t-test on the data:

    First, be sure that your data are unsplit. (DataSplit FileAnalyze all cases, do not creategroups.)

    Then, go toAnalyzeCompare Means Independent Samples T-test.

    Put the Density variable in the Test Variable(s) box and the Location variable in the GroupingVariable box as shown below.

    Now, click on theDefine Groups button and put in the names of the groups in each box as shownbelow. The click Continue and OK.

  • 8/6/2019 SPSS Manual 07

    9/33

    The output consists of two tables

    Group Statistics

    Island N Mean Std. DeviationStd. Error

    Mean

    A 20 13.3500 1.49649 .33462Density

    B20 15.7000 3.46562 .77494

    Independent Samples Test

    4.234 .047 -2.784 38 .008 -2.35000 .84410 -4.05879 -.64121

    -2.784 25.847 .010 -2.35000 .84410 -4.08557 -.61443

    Equal variances

    assumed

    Equal variances

    not assumed

    DensityF Sig.

    Levene's Test for

    Equality of Variances

    t df Sig. (2-tailed)

    Mean

    Difference

    Std. Error

    Difference Lower Upper

    95% Confidence

    Interval of the

    Difference

    t-test for Equality of Means

    The first table shows the means and variances of the two groups. The second table shows theresults of the Levenes Test for Equality of Variances, the t-value of the t-test, the degrees offreedom of the test, and the p-value which is labeled Sig. (2-tailed).

    Before you look at the results of the t-test, you need to make sure your data fit theassumption of homogeneity of variances. Look at the columns labeled Levenes test forEquality of Variances. The p-value is labeled Sig..

    In this example the data fail the Levenes Test for Equality of Variances, so the data will have tobe transformed in order to see if we can get it to meet this assumption of the t-test. If you log-transformed the data and re-ran the test, youd get the following output.

    Group Statistics

    Island N Mean Std. DeviationStd. Error

    Mean

    Log_Density A 20 1.1228 .05052 .01130

    B 20 1.1856 .09817 .02195

    Independent Samples Test

    2.642 .112 -2.547 38 .015 -.06288 .02469 -.11286 -.01290

    -2.547 28.404 .017 -.06288 .02469 -.11342 -.01234

    Equal variances

    assumed

    Equal variances

    not assumed

    Log_DensityF Sig.

    Levene's Test for

    Equality of Variances

    t df Sig. (2-tailed) MeanDifference Std. ErrorDifference Lower Upper

    95% Confidence

    Interval of the

    Difference

    t-test for Equality of Means

    Now the variances of the two groups are not significantly different from each other (p =0.112)

    and you can focus on the results of the t-test. For the t-test, p=0.015 (which is

  • 8/6/2019 SPSS Manual 07

    10/33

    WHAT TO REPORT: Following a statement that describes the patterns in the data, you shouldparenthetically report the t-value, df, and p. For example: Iguanas are significantly more denseon Island B than on Island A (t=2.5, df=38, p

  • 8/6/2019 SPSS Manual 07

    11/33

    The first table shows the summary statistics for the 2 groups. The second table showsinformation that you can ignore. The third table, the Paired Samples Test table, is the one youwant. It shows the mean difference between samples in a pair, the variation of the differencesaround the mean, your t-value, your df, and your p-value (labeled as Sig (2-tailed)). In this case,the P-value reads 0.000, which means that it is very low it is smaller than the program willshow in the default 3 decimal places. You can express this in your results section as p

  • 8/6/2019 SPSS Manual 07

    12/33

    The output consists of two tables. The first table shows the parameters used in the calculation ofthe test. The second table shows the statistical significance of the test. The value of the Ustatistic is given in the 1st row (Mann-Whitney U). The p-value is labeled as Asymp. Sig. (2-tailed).

    Ranks

    Island N Mean Rank Sum of Ranks

    A 20 15.08 301.50

    B 20 25.93 518.50

    Density

    Total 40

    Test Statistics(b)

    Density

    Mann-Whitney U 91.500Wilcoxon W 301.500Z -2.967Asymp. Sig. (2-tailed)

    .003

    Exact Sig. [2*(1-tailed Sig.)] .003(a)

    a Not corrected for ties.b Grouping Variable: Island

    In the table above (for the marine iguana data), the p-value = 0.003, which means that thedensities of iguanas on the two islands are significantly different from each other (p < 0.05). So,again this statistical test provides strong support for your original hypothesis that the iguanadensities are significantly different between the islands.

    WHAT TO REPORT: Following a statement that describes the patterns in the data, you shouldparenthetically report the U-value, df, and p. For example: Iguanas are significantly more denseon Island B than on Island A (U=91.5, df=39, p

  • 8/6/2019 SPSS Manual 07

    13/33

    significant difference among the means of the groups, but does not tell you which means aredifferent from each other. In order to find out which means are significantly different from eachother, you have to conduct post-hoc paired comparisons. They are called post-hoc, because youconduct the tests after you have completed an ANOVA and it shows where significant differences lieamong the groups. One of the Post-hoc tests is the Fisher PLSD (Protected Least Sig. Difference)test, which gives you a test of all pairwise combinations.

    To run the ANOVA test: Go toAnalyzeCompare Means One-way ANOVA.

    In the dialog box put the Density variable in theDependent Listbox and the Location variable inthe Factorbox.

    Click on the Post Hoc button and then click on theLSD check box and then clickContinue.

    Click on the Options button and check 2 boxes:Descriptive andHomogeneity of variance test.Then clickContinue and then OK.

    The output will include four tables Descriptive statistics, results of the Levene test, the resultsof the ANOVA, and the results of the post-hoc tests.

    The first table gives you some basic descriptive statistics for the three islands.

    Descriptives

    Density

    20 13.3500 1.49649 .33462 12.6496 14.0504 10.00 16.00

    20 15.7000 3.46562 .77494 14.0780 17.3220 9.00 24.00

    16 13.3125 1.62147 .40537 12.4485 14.1765 10.00 16.00

    56 14.1786 2.63616 .35227 13.4726 14.8845 9.00 24.00

    A

    B

    C

    Total

    N Mean Std. Deviation Std. Error Lower Bound Upper Bound

    95% Confidence Interval for

    Mean

    Minimum Maximum

    The second table gives you the results of the Levene Test (which examines the assumption of

    homogeneity of variances). You must assess the results of this test before looking at theresults of your ANOVA.

    Test of Homogeneity of Variances

    Density

    3.237 2 53 .047

    Levene

    Statistic df1 df2 Sig.

  • 8/6/2019 SPSS Manual 07

    14/33

    In this case,your variances are not homogeneous (p0.05), and you can continue with the assessment of the

    ANOVA.

    The third table gives you the results of the ANOVA test, which examined whether there wereany significant differences in mean density among the three island populations of marineiguanas.

    ANOVA

    Log_Density

    .052 2 .026 4.989 .010

    .277 53 .005

    .329 55

    Between Groups

    Within Groups

    Total

    Sum of

    Squares df Mean Square F Sig.

    Look at the p-value in the ANOVA table (Sig.). If this p-value is > 0.05, then there are no

    significant differences among any of the means. If the p-value is < 0.05, then at least one meanis significantly different from the others. In this example, p = 0.01 in the ANOVA table, and

    thus p < 0.05, so the mean densities are significantly different. Now that you know the means are different, you want to find out which pairs of means are

    different from each other. e.g., is the density on Island A greater than B? Is it greater than C?How do B & C compare with each other?

    The Post Hoc tests, Fisher LSD (Least Sig. Difference), allow you to examine all pairwisecomparisons of means. The results are listed in the fourth table. Which groups are and are notsignificantly different from each other? Look at the Sig. column for each comparison. B isdifferent from both A and C, but A and C are not different from each other.

  • 8/6/2019 SPSS Manual 07

    15/33

    Multiple Comparisons

    Dependent Variable: Log_Density

    LSD

    -.06288* .02284 .008 -.1087 -.0171.00166 .02423 .946 -.0469 .0503

    .06288* .02284 .008 .0171 .1087

    .06453* .02423 .010 .0159 .1131

    -.00166 .02423 .946 -.0503 .0469

    -.06453* .02423 .010 -.1131 -.0159

    (J) Island

    BC

    A

    C

    A

    B

    (I) Island

    A

    B

    C

    Mean

    Difference

    (I-J) Std. Error Sig. Lower Bound Upper Bound

    95% Confidence Interval

    The mean difference is significant at the .05 level.*.

    WHAT TO REPORT: Following a statement that describes the general patterns in the data,you should parenthetically report the F-value, df, and p from the ANOVA. Following statementsthat describe the differences between specific groups, you should report the p-value from thepost-hoc test only. (NOTE: there is no F-value or df associated with the post-hoc tests only a

    p-value!) For example: Iguana density varies significantly across the three islands (F=5.0,df=2,53, p=0.01). Iguana populations on Island B are significantly more dense than on Island A(p0.90).

    Like a Mann-Whitney U test was a non-parametric version of a t-test, a Kruskal-Wallis test is thenon-parametric version of an ANOVA. The test is used when you want to compare three or moregroups of data, and those data do not fit the assumptions of parametric statistics even afterattempting standard transformations. Remind yourself of the assumptions of parametric statisticsand the downside of using non-parametric statistics by reviewing the information on Page 11.

    To run the Kruskal-Wallis test:

    Go toAnalyzeNonparametric Tests K Independent Samples.

    Note: Remember for the Mann-Whitney U test, you went toNonparametric tests2 Independent Samples. Now you have more than 2 groups, so you go to K IndependentSamples instead, where K is just standing in for any number or more than 2.

    Put your variables in the appropriate boxes, define your groups, and be sure Kruskal-Wallis boxis clicked on in the Test Type box. Click OK.

  • 8/6/2019 SPSS Manual 07

    16/33

    The output consists of two tables. The first table shows the parameters used in the calculation ofthe test. The second table shows you the statistical results of the test. As you will see, the teststatistic that gets calculated is a chi-square value and it is reported in the first row of the secondtable. The p-value is labeled as Asymp. Sig. (2-tailed).

    Ranks

    20 23.1520 38.20

    16 23.06

    56

    Location

    AB

    C

    Total

    density

    N Mean Rank

    Test Statistics(a,b)

    density

    Chi-Square 11.279

    df 2

    Asymp. Sig. .004

    a Kruskal Wallis Testb Grouping Variable: Location

    In the table above, the p-value = 0.004, which means that the densities on the three islands aresignificantly different from each other (p < 0.01). So, this test also supports the hypothesis thatiguana densities differ among islands. We do not yet know which islands are different fromwhich other ones.

    Unlike an ANOVA, a Kruskal-Wallis test does not have an easy way to do post-hoc analyses.So, if you have a significant effect for the overall Kruskal-Wallis, you can follow that up with aseries of two-group comparisons using Mann-Whitney U tests. In this case, we would follow upthe Kruskal-Wallis with three Mann-Whitney U tests: Island A vs. Island B, Island B vs. IslandC, and Island C vs. Island A.

    WHAT TO REPORT: Following a statement that describes that general patterns in the data,

    you should parenthetically report the chi-square value, df, and p. For example: Iguana densityvaried significantly across the three islands (2=11.3, df=2, p=0.004).

    In many studies, researchers are interested in examining the effect of >1 independent variable (i.e.,factors) on a given dependent variable. For example, say you want to know whether the bill sizeof finches is different between males and females of two different species. In this example, you

  • 8/6/2019 SPSS Manual 07

    17/33

    have two factors (Species and Sex) and both are categorical. They can be examined simultaneouslyin a Two-way ANOVA, a parametric statistical test. The two-way ANOVA will also tell youwhether the two factors have joint effects on the dependent variable (bill size), or whether they actindependently of each other (i.e., does bill size depend on sex in one species but not in the otherspecies?).

    What if we wanted to know, for a single species, how sex and body size affect bill size? We still

    have two factors, but now one of the factors is categorical (Sex) and one is continuous (Body Size).In this case, we need to use an ANCOVA an analysis of covariance.

    Both tests require that the data are normally distributed and all of the groups have homogeneousvariances. So you need to check these assumptions first. If you want to compare means from two(or more) grouping variables simultaneously, as ANOVA and ANCOVA do, there is no satisfactorynon-parametric alternative. So you may need to transform your data.

    Enter the data as shown to the right: The two factors (Species and Sex) are put in two

    separate columns. The dependent variable (Bill length)is entered in a third column.

    Before you run a two-way ANOVA, you might want to first

    run a t-test on bill size just between species, then a t-test on

    bill size just between sexes. Note the results. Do you thinkthese results accurately represent the data? This exercise

    will show you how useful a two-way ANOVA can be intelling you more about the patterns in your data.

    Now run a two-way ANOVA on the same data. The procedure is much the same as for a One-wayANOVA with one added step to include the second variable to the analysis.

    Go toAnalyzeGeneral Linear ModelUnivariate. A dialog box appears as below.

    Your dependent variable goes in the DependentVariable box.

    Your explanatory variables are Fixed Factors

    Now clickOptions. A new window will appear.Click on the check boxes forDescriptive

  • 8/6/2019 SPSS Manual 07

    18/33

    Statistics andHomogeneity tests, then clickContinue.

    Click OK. The output will consist of three tables which show descriptive statistics, the results ofthe Levenes test and the results of the 2-way ANOVA.

    From the descriptive statistics, it appears that the means may be different between the sexes and

    also different between species.Descriptive Statistics

    Dependent Variable: Bill size

    17.60 1.140 5

    23.00 1.581 5

    20.30 3.129 10

    26.60 2.074 5

    16.60 2.074 5

    21.60 5.621 10

    22.10 4.999 10

    19.80 3.795 10

    20.95 4.478 20

    SpeciesSpecies A

    Species B

    Total

    Species A

    Species B

    Total

    Species A

    Species B

    Total

    SexFemale

    Male

    Total

    Mean Std. Deviation N

    From this second table, you know that your data meet the assumption of homogeneity of

    variance. So, you are all clear to interpret the results of your 2-way ANOVA.

    Levene's Test of Equality of Error Variancesa

    Dependent Variable: Bill size

    1.193 3 16 .344

    F df1 df2 Sig.

    Tests the null hypothesis that the error variance of the

    dependent variable is equal across groups.

    Design: Intercept+Sex+Species+Sex * Speciesa.

  • 8/6/2019 SPSS Manual 07

    19/33

    The ANOVA table shows the statistical significance of the differences among the means for eachof the independent variables (i.e., factors or main effects. Here, they are Sex and Species) andthe interaction between the two factors (i.e., Sex * Species). Lets walk through how to interpretthis information

    Tests of Between-Subjects Effects

    Dependent Variable: Bill size

    331.350a 3 110.450 35.629 .000

    8778.050 1 8778.050 2831.629 .000

    8.450 1 8.450 2.726 .118

    26.450 1 26.450 8.532 .010

    296.450 1 296.450 95.629 .000

    49.600 16 3.100

    9159.000 20

    380.950 19

    SourceCorrected Model

    Intercept

    Sex

    Species

    Sex * Species

    Error

    Total

    Corrected Total

    Type III Sum

    of Squares df Mean Square F Sig.

    R Squared = .870 (Adjusted R Squared = .845)a.

    Always look at the interaction term FIRST. The p-value of the interaction term tells you

    the probability that the two factors act independently of each other and that differentcombinations of the variables have different effects. In this bill-size example, the interactionterm shows a significant sex*species interaction (p < 0.001). This means that the effect ofsex on bill size differs between the two species. Simply looking at sex or species on theirown wont tell you anything.

    To get a better idea of what the interaction term means, make a Bar Chart with error bars.See the graphing section of the manual for instructions on how to do this.

    If you look at the data, the interactionshould become apparent. In SpeciesA, bills are larger in males than in

    females, but in Species B, bills arelarger in females than in males. Sosimply looking at sex doesnt tell usanything (as you saw when you didthe t-test) and neither sex has aconsistently larger bill whenconsidered across both species.

    The main effects terms in a 2-wayANOVA basically ignore theinteraction term and give similarresults to the t-tests you may haveperformed earlier. So, the p-valueassociated with each independentvariable (i.e., factor or main effect)

    tells you the probability that the means of the different groups of that variable are the same. So,if p < 0.05, the groups of that variable are significantly different from each other. In this case, ittests whether males and females are different from each other disregarding the fact that wehave males and females from two different species in our data set. And it tests whether the twospecies are different from each other disregarding the fact that we have males and femalesfrom each species in our data set.

  • 8/6/2019 SPSS Manual 07

    20/33

    The two-way ANOVA found that species was significant if you ignore the interaction. Thissuggests that species A has larger bills overall, mainly because of the large size of the males ofSpecies A, but does not always have larger bills because bill size also depends gender.

    WHAT TO REPORT:

    If there is a significant interaction term, the significance of the main effects cannotbe fully accepted because of differences in the trends among different combinations ofthe variables. Thus, you only need to tell your reader about the interaction term of the

    ANOVA table. Describe the pattern and parenthetically report the appropriate F-value,df, and p). For example: The way that sex affected bill size was different for the twodifferent species (F=95.6, df=1,16, p

  • 8/6/2019 SPSS Manual 07

    21/33

    (under Factors & Covariates) and click the arrow button. That variable should now show upon the right side (under Model). Do the same with the second factor. Now, highlight the twofactors on the right simultaneously and click the arrow, making sure the option is set tointeraction. In the end, yourModel pop-up window should look something like the imagebelow:

    ClickContinue and then click OK. The output will consist of four tables which show thecategorical (between-subjects) variable groupings, some descriptive statistics, the results of theLevenes test and the results of the ANCOVA.

    From the first and second table, it appears that males and females have similarly sized bills.

    Between-Subjects Factors

    Value Label N

    1.00 male 8sex

    2.00 female 8

    Descriptive Statistics

    Dependent Variable: bill_size

    sex MeanStd.

    Deviation N

    male 21.2625 1.70791 8

    female 21.6500 2.24817 8

    Total 21.4563 1.93906 16

    From the third table, you know that the data meet the assumption of homogeneity of variance.So, you are clear to interpret the results of the ANCOVA (assuming your data are normal).

    Levene's Test of Equality of Error Variances(a)

    Dependent Variable: bill_size

    F df1 df2 Sig.

    .237 1 14 .634

    Tests the null hypothesis that the error variance of the dependent variable is equal across groups.a Design: Intercept+sex+body_size+sex * body_size

    The ANCOVA results are shown in an ANOVA table which is interpreted similar to the tablefrom the two-way ANOVA. You can see the statistical results regarding the two independent

  • 8/6/2019 SPSS Manual 07

    22/33

    variables (factors) and the interaction between the two factors (i.e., Sex * Body_size) are shownon three separate rows of the table below.

    Tests of Between-Subjects Effects

    Dependent Variable: bill_size

    SourceType III Sumof Squares df Mean Square F Sig.

    Corrected Model 48.612(a) 3 16.204 24.970 .000Intercept 10.555 1 10.555 16.265 .002

    sex .278 1 .278 .428 .525

    body_size 44.322 1 44.322 68.299 .000

    sex * body_size .141 1 .141 .217 .649

    Error 7.787 12 .649

    Total 7422.330 16

    Corrected Total 56.399 15

    a R Squared = .862 (Adjusted R Squared = .827)

    As with the 2-way ANOVA, you must interpret the interaction term FIRST. In this example,the interaction term shows up on the ANOVA table as a row labeled sex*body_size and it tells

    you whether or not the way that body size affects bill size is the same for males as it is forfemales. The null hypothesis is that body size does affect bill size the same for each of the twosexes. In other words, the null hypothesis is that the two factors (body size and sex) do notinteract in the way they affect bill size.

    Here, you can see that the interaction term is not significant (p=0.649). Therefore, you can go onto interpret the two factors independently. You can see that there is no effect of Sex on bill size(p=0.525). And, you can see that there is an effect of Body Size on bill size (p

  • 8/6/2019 SPSS Manual 07

    23/33

    From the figure you can see 1) that the way that body size affects bill size is the same for malesas it is for females (i.e., there is no interaction between the two factors), that males and femalesdo not differ in their mean bill size (there is clear overlap in the distributions of male and femalebill sizes), and 3) that body size and bill size are related to each other (as body size increase, billsize also increases).

    WHAT TO REPORT:

    If there is a significant interaction term, the significance of the main effects cannot

    be fully accepted because of differences in the trends among different combinations ofthe variables. Thus, you only need to tell your reader about the interaction term from theANOVA table. Describe the pattern and parenthetically report the appropriate F-value,df, and p). For example: The way that prey size affected energy intake rate was differentfor large and small fish (F=95.6, df=1,16, p050), and for both sexes, bill sizeincreases as body size increases (F=68.3, df=1,12, p0.60).

    This test allows you to compare observed to expected values within a single group of test subjects.For example: Are guppies more likely to be found in predator or non-predator areas?

    You are interested in whether predators influence guppy behavior. So you put guppies in a tank thatis divided into a predator-free refuge and an area with predators. The guppies can move between thetwo sides, but the predators can not. You count how many guppies were in the predator area and inthe refuge after 5 minutes.

    Here are your data: number of guppiesin predator area in refuge

    4 16

    Your null hypothesis for this test is that guppies are evenly distributed between the 2 areas.

    To perform the Chi-Square Goodness of fit test:

  • 8/6/2019 SPSS Manual 07

    24/33

    Open a new data file in SPSS

    In Variable View, name the first variable Location. In the Measure column, chooseOrdinal. Assign 2 values: one for Predator Area and one for Refuge. Then create a secondvariable called Guppies. In the Measure column, choose Scale.

    In Data View, enter the observed number of guppies in the 2 areas.

    Go toDataWeight Cases. In the window that pops up, click on Weight Cases by and selectGuppies. Hit OK.

    Go toAnalyzeNonparametric TestsChi-square.

    Your test variable is Location.

    UnderExpected Values click on Values. Enter the expected value for the refuge area first, hitadd then enter the expected value for the predator area and hit add. Hit OK.

    In the Location Table, check the values to make sure the test did what you thought it was goingto do. Are the observed and expected numbers for the 2 categories correct?

    Your Chi-Square value, df, and p-value are displayed in the Test Statistics Table.

    NOTE: Once you are done with this analysis, you will likely want to stop weighting cases. Go toDataWeight Cases and selectDo not weight cases.

    WHAT TO REPORT: You want to report the 2 value, df, and p, parenthetically, following astatement that describes the patterns in the data.

    If you have 2 different test subject groups, you can compare their responses to the independentvariable. For example, you could ask the question: Do female guppies have the same response topredators as male guppies?

    The chi-square test of independence allows you to determine whether the response of your 2 groups(in this case, female & male guppies) is the same or is different.

    You are interested in whether male and female guppies have different responses to predators. Soyou test 10 male and 10 female guppies in tanks that are divided into a predator-free refuge and anarea with predators. Guppies can move between the areas predators can not. You count howmany guppies were in the predator area and in the refuge after 5 minutes.

    Here are the data: number of guppiesin predator area in refuge

    male guppies 1 9

    female guppies 3 7

    Your null hypothesis is that guppy gender does not affect response to predators or in other words,that there will be no difference in the response of male and female guppies to predators. Or in otherwords you predict that the effect of predators will not depend on guppy gender.

    To perform the test in SPSS:

    In Variable View, set up two variables: Gender and Location. Both are categorical, so theymust be Nominal, and you need to set up Values.

  • 8/6/2019 SPSS Manual 07

    25/33

    Enter your data in 2 columns. Each row is a single fish.

    Go toAnalyzeDescriptive StatisticsCrosstabs.

    In the pop-up window, move one of your variables into theRows window and the other oneinto the Column window.

    Click on the Statistics button on the bottom of the Crosstabs window, then clickChi-squarein the new pop-up window.

    ClickContinue, then Okay.

    Your output should look like this:

    Case Processing Summary

    Cases

    Valid Missing Total

    N Percent N Percent N Percent

    Gender * Location 20 100.0% 0 .0% 20 100.0%

    Gender * Location Crosstabulation

    Location

    predators refuge Totalmale 1 9 10Gender

    female 3 7 10

    Total 4 16 20

    Chi-Square Tests

    Value dfAsymp. Sig. (2-

    sided)Exact Sig. (2-

    sided)Exact Sig. (1-

    sided)

    Pearson Chi-Square 1.250(b) 1 .264

    Continuity Correction(a) .313 1 .576Likelihood Ratio 1.297 1 .255Fisher's Exact Test .582 .291

    Linear-by-LinearAssociation

    1.188 1 .276

    N of Valid Cases 20

    a Computed only for a 2x2 tableb 2 cells (50.0%) have expected count less than 5. The minimum expected count is 2.00.

    How to interpret your output:

    Ignore the 1st table.

    The second table (Gender*Location Crosstabulation) has your observed values for eachcategory. You should check this table to make sure your data were entered correctly. In this example, the table correctly reflects that there were 10 of each type of fish, and that 1

    male and 3 females were in the predator side of their respective tanks. In the 3rd table, look at the Pearson Chi-Square line. Your Chi-square value is 2 = 1.250.

    Your p-value is p = 0.264. This suggests that the response to predators was not differentbetween male and female guppies.

    WHAT TO REPORT: You want to report the 2 value, df, and p, parenthetically, following astatement that describes the patterns in the data. For example: Male and female guppies did notdiffer in their response to predators (chi-square test of independence,

    2=1.25, df=1, p>0.20).

  • 8/6/2019 SPSS Manual 07

    26/33

    Regardless of gender, more guppies fed in the refuge areas than in the predator areas. Ninetypercent of males and seventy percent of females fed in the refuge areas.

    If the values of two variables appear to be related to one another, but one is not dependent on theother, they are considered to be correlated. For example, fish weight and egg production aregenerally correlated, but neither variable is dependent on the other. No causation is implied,meaning we have no reason to suspect that fish weight causes egg number or vice versa.

    The correlation coefficient, r, provides a quantitative measurement of how closely two variables are

    related. It ranges from 0 (no correlation) to 1 or -1 (the two variables are perfectly related,positively or negatively).

    Lets examine the correlation between bird weight and bill length, using the data displayed below.

    Bird #

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    bird weight (g) 15 13 10 14 12 12 9 17 14 14 11 13 16 12 15bill length (mm) 43 45 35 41 42 39 39 47 44 48 41 43 42 45 45

    Enter the data above in a new spreadsheet and name the columns Weight and Length.

    To visualize what the correlation represents, make a scatterplot of the data. For instructions, goto the graphing section of this manual.

    The bird data listed above looks like this when graphed:

    From this plot you can see that as weight increasesthere is also an increase in bird bill length. Thus, thesetwo variables appear to be correlated.

    To quantify the extent of the correlation and see if it is statistically significant:

    Go toAnalyzeCorrelateBivariate.

    10 12 14 16

    Bird weight (g)

    36

    40

    44

    48

    Billlength(mm)

  • 8/6/2019 SPSS Manual 07

    27/33

    In the dialog box, move your 2 variables to the box on the right. Click on the check box forPearson. Click OK.

    Correlations

    1 .666**

    . .007

    15 15.666** 1

    .007 .

    15 15

    Pearson Correlation

    Sig. (2-tailed)

    NPearson Correlation

    Sig. (2-tailed)

    N

    Weight

    Length

    Weight Length

    Correlation is significant at the 0.01 level**.

    The first row in your correlation table gives you the Pearson correlation coefficient (r). In thisexample, r = 0.666, which shows there is a positive correlation between Weight and Length.

    The results of the statistical test shows that it is a statistically significant correlation, (p = 0.007which is

  • 8/6/2019 SPSS Manual 07

    28/33

    A regression calculates the equation of the best fitting straight line through the (x,y) points that thedata pairs define. In the equation of a line (y = a + bx), a is the y-intercept (where x=0) and b is theslope. The output of a regression will give you estimates for both of these values. If we wanted topredict the length of a fish at a given age, we could do so using the regression equation that best fitsthese data.

    Enter the data above into a new spreadsheet andname the two data columns Age and Length.

    To visualize the relationship between these twovariables, make a scatterplot of the data. See thegraphing section of this manual for instructions onmaking scatterplots.

    The graph shows that there is a strong positiverelationship between fish Age and Length. Theequation is for the regression line that bestdescribes the relationship between the two

    variables. The R-square (R2) value is the coefficient of

    determination, and can be interpreted as theproportion of the variation in the dependentvariable that is explained by variation in theindependent variables. R

    2ranges from 0 to 1. If it is close to 1, it means that your independent

    variable has explained almost all of why the value of your dependent variable differs fromobservation to observation. If R

    2is close to 0, it means that they have explained almost none of

    the variation in your dependent variable.

    In this example it appears that 97% of the variation in the Length is explained by variation inAge.

    Now what you want to do is determine whether the relationship is statistically significant.

    To run a regression analysis:

    Go toAnalyzeRegressionLinear.

    Your response variable goes in the Dependent variable box.

    Your explanatory variable goes in the Independent variable box.

    The output contains four tables The first table simply tells you what variables were used inwhat way.

    Variables Entered/Removed(b)

    Model

    Variables

    Entered

    Variables

    Removed Method1 Age(a) . Enter

    a All requested variables entered.b Dependent Variable: Length

    The model summary table provides the basic data for the analysis, along with the R2 value.

    Model Summary

    2 4 6

    Age (years)

    12

    14

    16

    18

    20

    Length(cm)

    Length (cm) = 11.34 + 1.33 * AgeR-Square = 0.97

  • 8/6/2019 SPSS Manual 07

    29/33

    Model R R SquareAdjusted R

    SquareStd. Error ofthe Estimate

    1 .984(a) .969 .963 .56208

    a Predictors: (Constant), Age

    The next table is an ANOVA table, and in fact, a regression analysis is very similar to anANOVA. (If the independent variable is categorical you use an ANOVA and if it is continuous

    you use a regression.) The results of the ANOVA table indicate whether the relationshipbetween the two variables is significant. Here, p < 0.001 (in the Sig. column), so we canconclude that age is a significant predictor of length.

    ANOVAb

    49.157 1 49.157 155.597 .000a

    1.580 5 .316

    50.737 6

    Regression

    Residual

    Total

    Model1

    Sum of

    Squares df Mean Square F Sig.

    Predictors: (Constant), Agea.

    Dependent Variable: Lengthb.

    The fourth table contains the Regression Coefficients which are the estimates of y-intercept (inthe row titled Constant) and the slope (in the row titled Age). A regression analysis testswhether the y-intercept and slope of the best fit line are each significantly different from zero.The p-value for each row allows you to assess this. If the p-value for the y-intercept is less than0.05, then the y-intercept is significantly different from zero. If the p-value for the slope is lessthan 0.05, then the slope is significantly different from zero.

    Coefficientsa

    11.343 .475 23.878 .000

    1.325 .106 .984 12.474 .000

    (Constant)

    Age

    Model1

    B Std. Error

    Unstandardized

    Coefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: Lengtha.

    From the output, we can see the very high R2 value reveals that 97% of the variation in length(dependent variable) can be explained by variation in age (independent variable).

    NOTE: There is no p-value associated with an R2!

    The very low p-value (Sig. = 0.000) in the ANOVA table indicates that the relationship is highlysignificant, and thus very unlikely to occur by chance alone.

    The output also indicates that the y-intercept (Constant) and the slope (Age) are significantlydifferent from zero.

    These statistics support the strong relationship that is evident in the scatterplot shown on theprevious page.

    In a paper describing your results you would include a scatterplot of your data along with theequation for the regression line.

    WHAT TO REPORT: Following a statement that describes the patterns in the data, you shouldparenthetically report the F, df, and p from the ANOVA, as well as the R2 value. Remember that

  • 8/6/2019 SPSS Manual 07

    30/33

    there is no p-value associated with an R2! For example: Fish age can significantly predict fish

    length (F=155.6, df=1,5, p

  • 8/6/2019 SPSS Manual 07

    31/33

    Island A Island B

    Location

    0

    5

    10

    15

    Densityofiguana

    s

    Figure 1. Mean (+SE) density of iguanas on two smallislands in the Galapagos archipelago. Island B hassignificantly more iguanas per unit area than Island A(t=2.5, df=38, p

  • 8/6/2019 SPSS Manual 07

    32/33

    Island A Island B

    Location

    12

    16

    20

    24

    Densityofiguanas

    Keep the key that shows which bars are identified by which colors! Make sure you use colors /patterns that will print well in black and white. (Two dark solid colors will not work).

    Box plots shows the median, the interquartile ranges, and any outliers in the data. This is a commonway to graphically represent non-parametric data.

    Go to GraphsInteractiveBoxplot.

    Put your categorical variable on the x axis and yourresponse variable on the y.

    Click on theBoxes tab in the window. Make sure thatthe following boxes are all checked: Outliers,Extremes, andMedian Line.

    Dont forget to add a caption.

    To pretty it up, follow directions as above.

    A boxplot has a number of nifty features. The line inthe middle of each box represents the median value ofthe response variable in that category. The box coversthe middle 50% of observations in each category. Thewhiskers outside the box extend between the highestand lowest values in the sample that are within 1.5 box lengths from the edge of the box.Individuals that are outside this limit are shown by circles.

    What a boxplot can tell you: a) where the medians are in the 2 groups, b) how variable thegroups are. For example, iguana densities on Island B have a higher median value and are muchmore variable than iguana densities on island A.

    Go to GraphsInteractiveScatterplot.

    For a dataset being analyzed with correlation, it doesntreally matter which variable goes on the x-axis andwhich goes on the y-axis.

    Click and drag the variables where you wantthem. For the correlation example in thismanual, we put Length on the y-axis and Weight

    on the x-axis. Click OK.

    For a dataset being analyzed with regression, you mustput the explanatory variable on the x-axis and theresponse variable on the y-axis. In the regressionexample in this manual, we put age on the x-axis andlength on the y-axis.

    Click on the tab for Fit and for the method choose Regression

    Click OK.

  • 8/6/2019 SPSS Manual 07

    33/33

    To print table or graph output from SPSS, click on it, go to Print (under File or choose the icon) and

    print the selection. Alternatively, you can copy it into MS Word and print from there. Somestudents find it easier to produce figures in SPSS without figure legends, copy the figures to Word,and add captions there using text boxes.