Statistical Analysis using SPSS Dr.Shaikh Shaffi Ahamed Asst. Professor Dept. of Family & Community Medicine
Jan 17, 2016
Statistical Analysis using SPSS
Dr.Shaikh Shaffi Ahamed
Asst. Professor
Dept. of Family & Community Medicine
SPSS Windows• Data View
– Used to display data
– Columns represent variables
– Rows represent individual units or groups of units that share common values of variables
• Variable View– Used to display information on variables in dataset
– TYPE: Allows for various styles of displaying
– LABEL: Allows for longer description of variable name
– VALUES: Allows for longer description of variable levels
– MEASURE: Allows choice of measurement scale
• Output View– Displays Results of analyses/graphs
Data Entry Tips
• For large datasets, use a spreadsheet such as EXCEL which is more flexible for data entry, and import the file into SPSS
• Give descriptive LABEL to variable names in the VARIABLE VIEW
• Keep in mind that Columns are Variables, you don’t want multiple columns with the same variable
Importing data into SPSS
To import an EXCEL file, click on:
FILE OPEN DATA then change FILES OF TYPE to EXCEL (.xls)
To import a TEXT or DATA file, click on:
FILE OPEN DATA then change FILES OF TYPE to TEXT (.txt) or
DATA (.dat)
You will be prompted through a series of dialog boxes to import dataset
Descriptive Statistics-Numeric Data
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES
• Choose any variables to be analyzed and place them in box on right
• Options include:
n
S
Sn
yyS
yn
yy
n
ii
n
ii
n
ii
:Mean S.E.
:Variance 1
:deviation Std.
:Sum :Mean
21
2
1
1
Descriptive Statistics
8 38 120 621 77.63 8.63 24.401 595.411
8
CRCL
Valid N (listwise)
Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic
N Minimum Maximum Sum Mean Std.Deviation
Variance
Descriptive Statistics-General Data
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS FREQUENCIES
• Choose any variables to be analyzed and place them in box on right
• Options include (For Categorical Variables):– Frequency Tables
– Pie Charts, Bar Charts
• Options include (For Numeric Variables)– Frequency Tables (Useful for discrete data)
– Measures of Central Tendency, Dispersion, Percentiles
– Pie Charts, Histograms
Example 1.4 - Smoking Status
SMKSTTS
1990 37.9 37.9 37.9
1063 20.3 20.3 58.2
609 11.6 11.6 69.8
1332 25.4 25.4 95.2
253 4.8 4.8 100.0
5247 100.0 100.0
Never Smoked
Quit > 10 Years Ago
Quit < 10 Years Ago
Current Cigarette Smoker
Other Tobacco User
Total
ValidFrequency Percent Valid Percent
CumulativePercent
Vertical Bar Charts and Pie Charts
• After Importing your dataset, and providing names to variables, click on:
• GRAPHS BAR… SIMPLE (Summaries for Groups of Cases) DEFINE
• Bars Represent N of Cases (or % of Cases)
• Put the variable of interest as the CATEGORY AXIS
• GRAPHS PIE… (Summaries for Groups of Cases) DEFINE
• Slices Represent N of Cases (or % of Cases)
• Put the variable of interest as the DEFINE SLICES BY
Example 1.5 - Antibiotic Study
OUTCOME
54321
Co
un
t
80
60
40
20
0
5
4
3
2
1
Histograms
• After Importing your dataset, and providing names to variables, click on:
• GRAPHS HISTOGRAM
• Select Variable to be plotted
• Click on DISPLAY NORMAL CURVE if you want a normal curve superimposed (see Chapter 3).
Example 1.6 - Drug Approval Times
MONTHS
30
20
10
0
Std. Dev = 20.97
Mean = 32.1
N = 175.00
Side-by-Side Bar Charts
• After Importing your dataset, and providing names to variables, click on:
• GRAPHS BAR… Clustered (Summaries for Groups of Cases) DEFINE
• Bars Represent N of Cases (or % of Cases)
• CATEGORY AXIS: Variable that represents groups to be compared (independent variable)
• DEFINE CLUSTERS BY: Variable that represents outcomes of interest (dependent variable)
Example 1.7 - Streptomycin Study
TRT
21
Co
un
t30
20
10
0
OUTCOME
1
2
3
4
5
6
Scatterplots
• After Importing your dataset, and providing names to variables, click on:
• GRAPHS SCATTER SIMPLE DEFINE
• For Y-AXIS, choose the Dependent (Response) Variable
• For X-AXIS, choose the Independent (Explanatory) Variable
Example 1.8 - Theophylline Clearance
DRUG
3.53.02.52.01.51.0.5
TH
CL
RN
CE8
7
6
5
4
3
2
1
0
Scatterplots with 2 Independent Variables
• After Importing your dataset, and providing names to variables, click on:
• GRAPHS SCATTER SIMPLE DEFINE
• For Y-AXIS, choose the Dependent Variable
• For X-AXIS, choose the Independent Variable with the most levels
• For SET MARKERS BY, choose the Independent Variable with the fewest levels
Example 1.8 - Theophylline Clearance
SUBJECT
1614121086420
TH
CL
RN
CE
8
7
6
5
4
3
2
1
0
DRUG
Tagamet
Pepcid
Placebo
Contingency Tables for Conditional Probabilities
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, select the variable you are conditioning on (Independent Variable)
• For COLUMNS, select the variable you are finding the conditional probability of (Dependent Variable)
• Click on CELLS
• Click on ROW Percentages
Example 1.10 - Alcohol & Mortality
WINE * DEATH Crosstabulation
10535 2155 12690
83.0% 17.0% 100.0%
521 74 595
87.6% 12.4% 100.0%
11056 2229 13285
83.2% 16.8% 100.0%
Count
% within WINE
Count
% within WINE
Count
% within WINE
0
1
WINE
Total
0 1
DEATH
Total
Independent Sample t-Test
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE COMPARE MEANS INDEPENDENT SAMPLES T-TEST
• For TEST VARIABLE, Select the dependent (response) variable(s)
• For GROUPING VARIABLE, Select the independent variable. Then define the names of the 2 levels to be compared (this can be used even when the full dataset has more than 2 levels for independent variable).
Example 3.5 - Levocabastine in Renal Patients
Group Statistics
6 563.83 172.032 70.232
6 499.67 131.409 53.647
GROUPNon-Dialysis
Hemodialysis
AUCN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
.204 .661 .726 10 .484 64.17 88.377 -132.750 261.083
.726 9.353 .486 64.17 88.377 -134.613 262.946
Equal variancesassumed
Equal variancesnot assumed
AUCF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Paired t-test
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE COMPARE MEANS PAIRED SAMPLES T-TEST
• For PAIRED VARIABLES, Select the two dependent (response) variables (the analysis will be based on first variable minus second variable)
Example 3.7 - Cmax in SRC&IRC Codeine
Paired Samples Statistics
217.838 13 79.7792 22.1268
138.815 13 59.3635 16.4645
SRC
IRC
Pair1
Mean N Std. DeviationStd. Error
Mean
Paired Samples Correlations
13 .746 .003SRC & IRCPair 1N Correlation Sig.
Paired Samples Test
79.023 53.0959 14.7262 46.938 111.109 5.366 12 .000SRC - IRCPair 1Mean Std. Deviation
Std. ErrorMean Lower Upper
95% ConfidenceInterval of the
Difference
Paired Differences
t df Sig. (2-tailed)
Chi-Square Test
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED, EXPECTED, ROW PERCENTAGES, and ADJUSTED STANDARDIZED RESIDUALS
• NOTE: Large ADJUSTED STANDARDIZED RESIDUALS (in absolute value) show which cells are inconsistent with null hypothesis of independence. A common rule of thumb is seeing which if any cells have values >3 in absolute value
Example 5.8 - Marital Status & CancerMARITAL * CANCREV Crosstabulation
29 47 76
38.1 37.9 76.0
38.2% 61.8% 100.0%
-2.3 2.3
116 108 224
112.3 111.7 224.0
51.8% 48.2% 100.0%
.7 -.7
67 56 123
61.6 61.4 123.0
54.5% 45.5% 100.0%
1.1 -1.1
5 5 10
5.0 5.0 10.0
50.0% 50.0% 100.0%
.0 .0
217 216 433
217.0 216.0 433.0
50.1% 49.9% 100.0%
Count
Expected Count
% within MARITAL
Adjusted Residual
Count
Expected Count
% within MARITAL
Adjusted Residual
Count
Expected Count
% within MARITAL
Adjusted Residual
Count
Expected Count
% within MARITAL
Adjusted Residual
Count
Expected Count
% within MARITAL
Single
Married
Widowed
Div/Sep
MARITAL
Total
Cancer No Cancer
CANCREV
Total
Chi-Square Tests
5.530a 3 .137
5.572 3 .134
3.631 1 .057
433
Pearson Chi-Square
Likelihood Ratio
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)
1 cells (12.5%) have expected count less than 5. Theminimum expected count is 4.99.
a.
Fisher’s Exact Test
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED and ROW PERCENTAGES
• NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.5 - Antiseptic ExperimentTRTREV * DEATHREV Crosstabulation
6 34 40
15.0% 85.0% 100.0%
16 19 35
45.7% 54.3% 100.0%
22 53 75
29.3% 70.7% 100.0%
Count
% within TRTREV
Count
% within TRTREV
Count
% within TRTREV
Antiseptic
Control
TRTREV
Total
Death No Death
DEATHREV
Total
Chi-Square Tests
8.495b 1 .004
7.078 1 .008
8.687 1 .003
.005 .004
8.382 1 .004
75
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is10.27.
b.
McNemar’s Test
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the outcome for condition/time 1
• For COLUMNS, Select the outcome for condition/time 2
• Under STATISTICS, Click on MCNEMAR
• Under CELLS, Click on OBSERVED and TOTAL PERCENTAGES
• NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.6 - Report of Implant Leak
SELFREV * SURGREV Crosstabulation
69 28 97
41.8% 17.0% 58.8%
5 63 68
3.0% 38.2% 41.2%
74 91 165
44.8% 55.2% 100.0%
Count
% of Total
Count
% of Total
Count
% of Total
Present
Absent
SELFREV
Total
Present Absent
SURGREV
Total
Chi-Square Tests
.000a
165
McNemar Test
N of Valid Cases
ValueExact Sig.(2-sided)
Binomial distribution used.a.
P-value
Relative Risks and Odds Ratios
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on RISK
• Under CELLS, Click on OBSERVED and ROW PERCENTAGES
• NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.1 - Pamidronate Study
PAMIDREV * SKLEVREV Crosstabulation
47 149 196
24.0% 76.0% 100.0%
74 107 181
40.9% 59.1% 100.0%
121 256 377
32.1% 67.9% 100.0%
Count
% within PAMIDREV
Count
% within PAMIDREV
Count
% within PAMIDREV
Pamidronate
Placebo
PAMIDREV
Total
Yes No
SKLEVREV
Total
Risk Estimate
.456 .293 .710
.587 .432 .795
1.286 1.113 1.486
377
Odds Ratio for PAMIDREV(Pamidronate / Placebo)
For cohort SKLEVREV =Yes
For cohort SKLEVREV =No
N of Valid Cases
Value Lower Upper
95% ConfidenceInterval
Example 5.2 - Lip CancerPIPESREV * LIPCREV Crosstabulation
339 149 488
69.5% 30.5% 100.0%
198 351 549
36.1% 63.9% 100.0%
537 500 1037
51.8% 48.2% 100.0%
Count
% within PIPESREV
Count
% within PIPESREV
Count
% within PIPESREV
Yes
No
PIPESREV
Total
Yes No
LIPCREV
Total
Risk Estimate
4.033 3.111 5.229
1.926 1.698 2.185
.478 .412 .554
1037
Odds Ratio forPIPESREV (Yes / No)
For cohort LIPCREV =Yes
For cohort LIPCREV = No
N of Valid Cases
Value Lower Upper
95% ConfidenceInterval
Correlation
After Importing your dataset, and providing names to variables, click on:
ANALYZE CORRELATE BIVARIATE
Select the VARIABLES
Select the PEARSON CORRELATION
Select the Two tailed test of significance
Select Flag significant correlations
Linear Regression
• After Importing your dataset, and providing names to variables, click on:
• ANALYZE REGRESSION LINEAR
• Select the DEPENDENT VARIABLE
• Select the INDEPENDENT VARAIABLE(S)
• Click on STATISTICS, then ESTIMATES, CONFIDENCE INTERVALS, MODEL FIT
Examples 7.1-7.6 - Gemfibrozil ClearanceCoefficientsa
460.828 54.338 8.481 .000 345.010 576.646
-3.215 1.181 -.575 -2.723 .016 -5.732 -.698
(Constant)
CLCR
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: CLGMa.
Examples 7.1-7.6 - Gemfibrozil ClearanceANOVAb
107168.2 1 107168.158 7.413 .016a
216865.8 15 14457.723
324034.0 16
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), CLCRa.
Dependent Variable: CLGMb.
Model Summaryb
.575a .331 .286 120.240Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), CLCRa.
Dependent Variable: CLGMb.