Statistical Analysis using SPSS Dr.Shaikh Shaffi Ahamed Asst. Professor Dept. of Family & Community Medicine.

Statistical Analysis using SPSS

Dr.Shaikh Shaffi Ahamed

Asst. Professor

Dept. of Family & Community Medicine

SPSS Windows• Data View

– Used to display data

– Columns represent variables

– Rows represent individual units or groups of units that share common values of variables

• Variable View– Used to display information on variables in dataset

– TYPE: Allows for various styles of displaying

– LABEL: Allows for longer description of variable name

– VALUES: Allows for longer description of variable levels

– MEASURE: Allows choice of measurement scale

• Output View– Displays Results of analyses/graphs

Data Entry Tips

• For large datasets, use a spreadsheet such as EXCEL which is more flexible for data entry, and import the file into SPSS

• Give descriptive LABEL to variable names in the VARIABLE VIEW

• Keep in mind that Columns are Variables, you don’t want multiple columns with the same variable

Importing data into SPSS

To import an EXCEL file, click on:

FILE OPEN DATA then change FILES OF TYPE to EXCEL (.xls)

To import a TEXT or DATA file, click on:

FILE OPEN DATA then change FILES OF TYPE to TEXT (.txt) or

DATA (.dat)

You will be prompted through a series of dialog boxes to import dataset

Descriptive Statistics-Numeric Data

• After Importing your dataset, and providing names to variables, click on:

• ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES

• Choose any variables to be analyzed and place them in box on right

• Options include:

n

S

Sn

yyS

yn

yy

n

ii

n

ii

n

ii

:Mean S.E.

:Variance 1

:deviation Std.

:Sum :Mean

21

2

1

1

Descriptive Statistics

8 38 120 621 77.63 8.63 24.401 595.411

8

CRCL

Valid N (listwise)

Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic

N Minimum Maximum Sum Mean Std.Deviation

Variance

Descriptive Statistics-General Data


• ANALYZE DESCRIPTIVE STATISTICS FREQUENCIES

• Choose any variables to be analyzed and place them in box on right

• Options include (For Categorical Variables):– Frequency Tables

– Pie Charts, Bar Charts

• Options include (For Numeric Variables)– Frequency Tables (Useful for discrete data)

– Measures of Central Tendency, Dispersion, Percentiles

– Pie Charts, Histograms

Example 1.4 - Smoking Status

SMKSTTS

1990 37.9 37.9 37.9

1063 20.3 20.3 58.2

609 11.6 11.6 69.8

1332 25.4 25.4 95.2

253 4.8 4.8 100.0

5247 100.0 100.0

Never Smoked

Quit > 10 Years Ago

Quit < 10 Years Ago

Current Cigarette Smoker

Other Tobacco User

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Vertical Bar Charts and Pie Charts


• GRAPHS BAR… SIMPLE (Summaries for Groups of Cases) DEFINE

• Bars Represent N of Cases (or % of Cases)

• Put the variable of interest as the CATEGORY AXIS

• GRAPHS PIE… (Summaries for Groups of Cases) DEFINE

• Slices Represent N of Cases (or % of Cases)

• Put the variable of interest as the DEFINE SLICES BY

Example 1.5 - Antibiotic Study

OUTCOME

54321

Co

un

t

80

60

40

20

0

5

4

3

2

1

Histograms


• GRAPHS HISTOGRAM

• Select Variable to be plotted

• Click on DISPLAY NORMAL CURVE if you want a normal curve superimposed (see Chapter 3).

Example 1.6 - Drug Approval Times

MONTHS

30

20

10

0

Std. Dev = 20.97

Mean = 32.1

N = 175.00

Side-by-Side Bar Charts


• GRAPHS BAR… Clustered (Summaries for Groups of Cases) DEFINE

• Bars Represent N of Cases (or % of Cases)

• CATEGORY AXIS: Variable that represents groups to be compared (independent variable)

• DEFINE CLUSTERS BY: Variable that represents outcomes of interest (dependent variable)

Example 1.7 - Streptomycin Study

TRT

21

Co

un

t30

20

10

0

OUTCOME

1

2

3

4

5

6

Scatterplots


• GRAPHS SCATTER SIMPLE DEFINE

• For Y-AXIS, choose the Dependent (Response) Variable

• For X-AXIS, choose the Independent (Explanatory) Variable

Example 1.8 - Theophylline Clearance

DRUG

3.53.02.52.01.51.0.5

TH

CL

RN

CE8

7

6

5

4

3

2

1

0

Scatterplots with 2 Independent Variables


• GRAPHS SCATTER SIMPLE DEFINE

• For Y-AXIS, choose the Dependent Variable

• For X-AXIS, choose the Independent Variable with the most levels

• For SET MARKERS BY, choose the Independent Variable with the fewest levels

Example 1.8 - Theophylline Clearance

SUBJECT

1614121086420

TH

CL

RN

CE

8

7

6

5

4

3

2

1

0

DRUG

Tagamet

Pepcid

Placebo

Contingency Tables for Conditional Probabilities


• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS

• For ROWS, select the variable you are conditioning on (Independent Variable)

• For COLUMNS, select the variable you are finding the conditional probability of (Dependent Variable)

• Click on CELLS

• Click on ROW Percentages

Example 1.10 - Alcohol & Mortality

WINE * DEATH Crosstabulation

10535 2155 12690

83.0% 17.0% 100.0%

521 74 595

87.6% 12.4% 100.0%

11056 2229 13285

83.2% 16.8% 100.0%

Count

% within WINE

Count

% within WINE

Count

% within WINE

0

1

WINE

Total

0 1

DEATH

Total

Independent Sample t-Test


• ANALYZE COMPARE MEANS INDEPENDENT SAMPLES T-TEST

• For TEST VARIABLE, Select the dependent (response) variable(s)

• For GROUPING VARIABLE, Select the independent variable. Then define the names of the 2 levels to be compared (this can be used even when the full dataset has more than 2 levels for independent variable).

Example 3.5 - Levocabastine in Renal Patients

Group Statistics

6 563.83 172.032 70.232

6 499.67 131.409 53.647

GROUPNon-Dialysis

Hemodialysis

AUCN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

.204 .661 .726 10 .484 64.17 88.377 -132.750 261.083

.726 9.353 .486 64.17 88.377 -134.613 262.946

Equal variancesassumed

Equal variancesnot assumed

AUCF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Paired t-test


• ANALYZE COMPARE MEANS PAIRED SAMPLES T-TEST

• For PAIRED VARIABLES, Select the two dependent (response) variables (the analysis will be based on first variable minus second variable)

Example 3.7 - Cmax in SRC&IRC Codeine

Paired Samples Statistics

217.838 13 79.7792 22.1268

138.815 13 59.3635 16.4645

SRC

IRC

Pair1

Mean N Std. DeviationStd. Error

Mean

Paired Samples Correlations

13 .746 .003SRC & IRCPair 1N Correlation Sig.

Paired Samples Test

79.023 53.0959 14.7262 46.938 111.109 5.366 12 .000SRC - IRCPair 1Mean Std. Deviation

Std. ErrorMean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

Chi-Square Test



• For ROWS, Select the Independent Variable

• For COLUMNS, Select the Dependent Variable

• Under STATISTICS, Click on CHI-SQUARE

• Under CELLS, Click on OBSERVED, EXPECTED, ROW PERCENTAGES, and ADJUSTED STANDARDIZED RESIDUALS

• NOTE: Large ADJUSTED STANDARDIZED RESIDUALS (in absolute value) show which cells are inconsistent with null hypothesis of independence. A common rule of thumb is seeing which if any cells have values >3 in absolute value

Example 5.8 - Marital Status & CancerMARITAL * CANCREV Crosstabulation

29 47 76

38.1 37.9 76.0

38.2% 61.8% 100.0%

-2.3 2.3

116 108 224

112.3 111.7 224.0

51.8% 48.2% 100.0%

.7 -.7

67 56 123

61.6 61.4 123.0

54.5% 45.5% 100.0%

1.1 -1.1

5 5 10

5.0 5.0 10.0

50.0% 50.0% 100.0%

.0 .0

217 216 433

217.0 216.0 433.0

50.1% 49.9% 100.0%

Count

Expected Count

% within MARITAL

Adjusted Residual

Count

Expected Count

% within MARITAL

Adjusted Residual

Count

Expected Count

% within MARITAL

Adjusted Residual

Count

Expected Count

% within MARITAL

Adjusted Residual

Count

Expected Count

% within MARITAL

Single

Married

Widowed

Div/Sep

MARITAL

Total

Cancer No Cancer

CANCREV

Total

Chi-Square Tests

5.530a 3 .137

5.572 3 .134

3.631 1 .057

433

Pearson Chi-Square

Likelihood Ratio

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

1 cells (12.5%) have expected count less than 5. Theminimum expected count is 4.99.

a.

Fisher’s Exact Test





• Under STATISTICS, Click on CHI-SQUARE

• Under CELLS, Click on OBSERVED and ROW PERCENTAGES

• NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.

Example 5.5 - Antiseptic ExperimentTRTREV * DEATHREV Crosstabulation

6 34 40

15.0% 85.0% 100.0%

16 19 35

45.7% 54.3% 100.0%

22 53 75

29.3% 70.7% 100.0%

Count

% within TRTREV

Count

% within TRTREV

Count

% within TRTREV

Antiseptic

Control

TRTREV

Total

Death No Death

DEATHREV

Total

Chi-Square Tests

8.495b 1 .004

7.078 1 .008

8.687 1 .003

.005 .004

8.382 1 .004

75

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is10.27.

b.

McNemar’s Test



• For ROWS, Select the outcome for condition/time 1

• For COLUMNS, Select the outcome for condition/time 2

• Under STATISTICS, Click on MCNEMAR

• Under CELLS, Click on OBSERVED and TOTAL PERCENTAGES


Example 5.6 - Report of Implant Leak

SELFREV * SURGREV Crosstabulation

69 28 97

41.8% 17.0% 58.8%

5 63 68

3.0% 38.2% 41.2%

74 91 165

44.8% 55.2% 100.0%

Count

% of Total

Count

% of Total

Count

% of Total

Present

Absent

SELFREV

Total

Present Absent

SURGREV

Total

Chi-Square Tests

.000a

165

McNemar Test

N of Valid Cases

ValueExact Sig.(2-sided)

Binomial distribution used.a.

P-value

Relative Risks and Odds Ratios





• Under STATISTICS, Click on RISK

• Under CELLS, Click on OBSERVED and ROW PERCENTAGES


Example 5.1 - Pamidronate Study

PAMIDREV * SKLEVREV Crosstabulation

47 149 196

24.0% 76.0% 100.0%

74 107 181

40.9% 59.1% 100.0%

121 256 377

32.1% 67.9% 100.0%

Count

% within PAMIDREV

Count

% within PAMIDREV

Count

% within PAMIDREV

Pamidronate

Placebo

PAMIDREV

Total

Yes No

SKLEVREV

Total

Risk Estimate

.456 .293 .710

.587 .432 .795

1.286 1.113 1.486

377

Odds Ratio for PAMIDREV(Pamidronate / Placebo)

For cohort SKLEVREV =Yes

For cohort SKLEVREV =No

N of Valid Cases

Value Lower Upper

95% ConfidenceInterval

Example 5.2 - Lip CancerPIPESREV * LIPCREV Crosstabulation

339 149 488

69.5% 30.5% 100.0%

198 351 549

36.1% 63.9% 100.0%

537 500 1037

51.8% 48.2% 100.0%

Count

% within PIPESREV

Count

% within PIPESREV

Count

% within PIPESREV

Yes

No

PIPESREV

Total

Yes No

LIPCREV

Total

Risk Estimate

4.033 3.111 5.229

1.926 1.698 2.185

.478 .412 .554

1037

Odds Ratio forPIPESREV (Yes / No)

For cohort LIPCREV =Yes

For cohort LIPCREV = No

N of Valid Cases

Value Lower Upper

95% ConfidenceInterval

Correlation

After Importing your dataset, and providing names to variables, click on:

ANALYZE CORRELATE BIVARIATE

Select the VARIABLES

Select the PEARSON CORRELATION

Select the Two tailed test of significance

Select Flag significant correlations

Linear Regression


• ANALYZE REGRESSION LINEAR

• Select the DEPENDENT VARIABLE

• Select the INDEPENDENT VARAIABLE(S)

• Click on STATISTICS, then ESTIMATES, CONFIDENCE INTERVALS, MODEL FIT

Examples 7.1-7.6 - Gemfibrozil ClearanceCoefficientsa

460.828 54.338 8.481 .000 345.010 576.646

-3.215 1.181 -.575 -2.723 .016 -5.732 -.698

(Constant)

CLCR

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: CLGMa.

Examples 7.1-7.6 - Gemfibrozil ClearanceANOVAb

107168.2 1 107168.158 7.413 .016a

216865.8 15 14457.723

324034.0 16

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), CLCRa.

Dependent Variable: CLGMb.

Model Summaryb

.575a .331 .286 120.240Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), CLCRa.

Dependent Variable: CLGMb.

Statistical Analysis using SPSS Dr.Shaikh Shaffi Ahamed Asst. Professor Dept. of Family & Community Medicine.

Documents

variable names

categorical variables

variable viewkeep

data file

n of cases

groups of cases definebars

dependent response variable

file open data