Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Statistics and Data Analysisfor Nursing Research
Second Edition
CHAPTER
Bivariate Description: Crosstabulation, Risk Indexes, and Correlation
4
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Bivariate Descriptive Statistics
• Bivariate descriptive statistics are used to describe relationships between two variables– Examples:
Height and weight Smoking status and lung cancer
incidence
• Appropriate statistic depends on the variables’ level of measurement
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Crosstabulation
• Researchers crosstabulate the frequencies of all categories of two variables in a two-dimensional frequency distribution– Results are displayed in a contingency table
(crosstab table)
• Crosstabulated variables should be nominal level (or ordinal level with a small number of categories)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Crosstab Tables
• Crosstab tables are described by the number of categories of each variable– E.g., a 2 × 2 table summarizes counts &
percentages for two dichotomous variables (e.g., male/female, smoker/nonsmoker)
• The number of cells in the table is the product of the two sets of categories:– 2 × 2 table = 4 cells– 3 × 3 table = 9 cells
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Example of a Crosstab Table
• Shaded areas show marginal frequencies (totals)
Male Female Total
Smoker 1050.0%20.0%10.0%
1050.0%20.0%10.0%
20100.0%20.0%20.0%
Non-smoker
4050.0%80.0%40.0%
4050.0%80.0%40.0%
80100.0%80.0%80.0%
Total 5050.0%100.0%50.0%
5050.0%100.0%50.0%
100100.0%100.0%100.0%
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Example of a Crosstab Table (cont’d)
• Shaded areas here show the four cells (2 2)
Male Female Total
Smoker 1050.0%20.0%10.0%
1050.0%20.0%10.0%
20100.0%20.0%20.0%
Non-smoker
4050.0%80.0%40.0%
4050.0%80.0%40.0%
80100.0%80.0%80.0%
Total 5050.0%100.0%50.0%
5050.0%100.0%50.0%
100100.0%100.0%100.0%
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
How to Read Cells in a Crosstab Table
Male Explanation
Smoker 1050.0%20.0%10.0%
•10 men are smokers•50.0% of all smokers are male•20.0% of all males are smokers•10.0% of all sample members are male smokers
Non-smoker
4050.0%80.0%40.0%
•Cell count•Row percentage (40 ÷ 80)•Column percentage (40 ÷ 50)•Overall percentage (40 ÷ 100)
Total 5050.0%
100.0%50.0%
•Column total•Row percentage•Column percentage•Overall percentage
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Risk Indexes
• Risk indexes have been developed to describe risk outcomes and facilitate clinical decision making
• Indexes discussed here: For situations with two dichotomous variables (2 × 2 situation)– One is a risk factor—or an intervention status
(e.g., smoked/did not smoke)– The other is the outcome (lung cancer/no lung
cancer)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Risk Index Scenarios
• Prospective (cohort) design: – Some people have exposure to the risk factor,
others do not – Both groups are followed to assess outcome
• Retrospective (case-control) design– Some people have a bad outcome (cases)
others do not (controls)– Groups are compared regarding prior
exposure to the risk factor
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Risk Index Scenarios (cont’d)
• Experimental design (clinical trials): – Some people are assigned (often at random)
to a control group in which they have ongoing or “baseline” exposure to risks, while others are assigned to an experimental group in which they receive an intervention hypothesized to reduce risk
– Both groups are followed to assess outcome
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Risk Index Types
• Risk indexes capture two aspects of the effects of risk exposure: – Absolute risk: Indexes quantify the actual
amount of risk related to different exposures– Relative risk: Indexes compare risks in the two
risk exposure groups
• Both types are important and should be examined in interpreting the effects of risk (or an intervention)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Hypothetical Data for Risk Index
• Fictitious data for the effect of a shorter versus longer needle on swelling for pediatric immunizations
Needle Length
Swelling Total
Yes No
16 mm needle
20(a)
80(b)
100
25 mm needle
10(c)
90(d)
100
Total 30 170 200
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Absolute Risk
• Absolute risk is the proportion of people with a negative outcome – ARE: proportion in risk-exposed group with
the outcome (a ÷ (a + b))
– ARNE: proportion in nonexposed group with the outcome (c ÷ (c + d))
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Absolute Risk (cont’d)
• In our example:– ARE = .20 (20 ÷ 100): 20% of those with
shorter needle had swelling
– ARNE = .10 (10 ÷ 100): 10% of those with longer needle had swelling
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Absolute Risk Reduction
• Absolute risk reduction is the absolute difference between the two risk groups
• ARR: ARE - ARNE
– In our example: ARR = .20 - .10 = .10– That is, there was a 10 percentage point
reduction in risk of swelling with the longer needle
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Relative Risk
• Relative risk is the ratio of absolute risks (adverse outcomes) in the two groups
• RR: ARE ÷ ARNE
– In our example, RR = : .20 ÷ .10 = 2.00– That is, children immunized with the shorter
needle were twice as likely to have swelling
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Relative Risk Reduction
• Relative risk reduction is the proportion of baseline risk that is reduced through nonexposure (or receipt of an intervention)
• RRR: ARR ÷ ARNE
– In our example, RR = .10 ÷ .10 = 1.00– That is, being immunized with the longer
needle reduced the relative risk of swelling by 100%
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Odds
• The odds is the proportion of people in each risk group who have the adverse outcome, relative to the proportion who do not
• OddsE = a ÷ b
• OddsNE = c ÷ d
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Odds (cont’d)
• In our example:
• OddsE = 20 ÷ 80 = .25 – Among children immunized with the 16 mm
needle, the odds of swelling were 1 out of 4
• OddsNE = 10 ÷ 90 = .111 – Among children immunized with the 25 mm
needle, the odds of swelling were 1 out of 9
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Odds Ratio
• The odds ratio is the ratio of the two odds
• OR = OddsE ÷ OddsNE – In our example, OR = .25 ÷ .111 = 2.25– The odds of swelling are two and a quarter
times higher with the shorter needle as with the longer one
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Number Needed to Treat
• Number needed to treat: Estimate of how many people would need to avoid the exposure (or get a treatment) to prevent one negative outcome
• NNT = 1 ÷ ARR – In our example, NNT = 1 ÷ .10 = 10.0– 10 children would need to be immunized with
the 25 mm needle rather than the 16 mm one to prevent one case of swelling
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Risk Index Considerations
• RR is usually preferred to OR as the index of comparative risk, because it is more intuitively meaningful– However, RR should not be used in case-
control studies because you cannot estimate the probability of a bad outcome for someone with/without exposure
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Risk Index Considerations (cont’d)
• In many cases, the value of RR and OR are similar– Similarity in values increases as differences
in outcomes between the two risk groups decreases
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Risk Index Considerations (cont’d)
• Because RR is a relative (comparative) measure, it is insensitive to absolute valuesFor example:– ARE = .60 ARE = .30 RR = 2.00
– ARE = .20 ARE = .10 RR = 2.00 • Despite a threefold reduction in negative
outcomes in both risk groups in the second example, the RR remains the same
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Correlation
• A correlation is a bond or connection between variables– Variation in one variable is systematically
related to variation in another
• Correlations between two quantitative variables can be graphed in a scatterplot
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Scatterplot
• A scatterplot graphs the values of one variable on the X axis and the values of the second one on the Y axis of a graph
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Scatterplot (cont’d)
• A scatterplot indicates whether the variables have a linear relationship with each other– A linear (straight line) relationship occurs
when there is a constant rate of change between the two variables
• Scatterplots indicate direction and magnitude of the relationship
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Scatterplots and Types of Relationships
Variable X
1211109876543210
Var
iabl
e Y
12
11
10
9
8
7
6
5
4
3
2
10
• Lines sloping from lower left to upper right depict positive relationships:– Low values of one variable correspond to low
values of the other, and high values in one correspond to high values in the other
• This graph illustrates a
perfect relationship
• For each value of X, we can
perfectly predict the value of Y
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Scatterplots and Negative Relationships
• Lines sloping from upper left to lower right are negative relationships:– Low values of one variable correspond to
high values of the other, and vice versa
• This graph illustrates a
perfect negative relationship
• As before, for each value of X,
we can perfectly predict the value of YVariable X
1211109876543210
Varia
ble Y
12
11
10
9
8
7
6
5
4
3
2
10
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Scatterplots and Relationship Strength
• If data points are tightly packed along the diagonal, it indicates a strong relationship– Top graph shows a strong, positive
relationship
• If data points are loosely spaced, but suggest a diagonal, it indicates a weak relationship– Bottom graph shows a relatively
weak negative relationship
Variable X
1211109876543210
Varia
ble Y
12
11
10
9
8
7
6
5
4
3
2
10
Variable X
1211109876543210
Varia
ble Y
12
11
10
9
8
7
6
5
4
3
2
10
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Other Types of Relationship
• If data points are seemingly random (widely scattered), there is no relationship–Top graph shows two unrelated variables
• Sometimes data points are not linearly related—they are positively or negatively correlated, but only up to a point, then the relationship changes–Bottom graph shows a curvilinear relationship
Variable X
1211109876543210
Varia
ble Y
12
11
10
9
8
7
6
5
4
3
2
10
Variable X
1211109876543210
Varia
ble Y
8
7
6
5
4
3
2
1
0
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Correlation Coefficients
• A correlation coefficient is a statistic that summarizes the magnitude and direction of relationships between two variables
• Most widely used correlation coefficient: Pearson’s product moment correlation coefficient– Often called Pearson’s r– Pearson’s r is computed with variables that
are interval- or ratio-level measures
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Correlation Coefficient Values• Correlation coefficients range from
-1.00 through .00 to 1.00
• The sign of the coefficient indicates direction: – Minus sign = negative correlation – Plus sign (or no sign) = positive correlation
• The absolute value of the coefficient indicates strength– r = -.75 is stronger than r = .50
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Correlation Coefficient Computation
• Formula for computing Pearson’s r is cumbersome, though not really difficult
• Formula involves calculating and manipulating the deviation scores from the two variables (i.e., deviation of each score from its own mean)
• Computation is rarely done manually
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Correlation Coefficient Examples• 1.00 = Perfect positive relationship
E.g., a flat $1 tax for every $5 earned
• .35 = Weak/moderate positive relationship E.g., nurses’ degree of autonomy and job satisfaction (those with
more autonomy are somewhat more satisfied)
• .00 = No relationship E.g., nurses’ degree of autonomy and height (tall and short nurses
equally autonomous)
• -.20 = Weak negative relationshipE.g., diabetic knowledge and a person’s age (older people
are somewhat less knowledgeable)• -.70 = Strong negative relationship
E.g., levels of depression and life satisfaction (those with high levels of depression have lower life satisfaction)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Correlation Coefficients and Scatterplots
• r = -1.00 r = .96
• r = -.31 r = .00
Variable X
1211109876543210
Varia
ble Y
12
11
10
9
8
7
6
5
4
3
2
10
Variable X
1211109876543210
Varia
ble Y
12
11
10
9
8
7
6
5
4
3
2
10
Variable X
1211109876543210
Varia
ble Y
12
11
10
9
8
7
6
5
4
3
2
10
Variable X
1211109876543210
Varia
ble Y
12
11
10
9
8
7
6
5
4
3
2
10
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Interpretation of Correlation Coefficients
• A correlation between two variables never implies that one variable caused the other– Correlations indicate a link, not necessarily a causal link
• The square of r indicates the proportion of variability in one variable accounted for or explained by the second variable– If the r between height and weight = .60, then 36% of
the variation in weight is accounted for by height (r2 = .36)
– The remaining 64% of variation in weight is accounted for by other factors (e.g., caloric intake, amount of exercise, metabolic factors, etc.)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Correlation Matrix
• A two-dimensional correlation matrix is an efficient way to display several correlation coefficients
• A correlation matrix lists all variables in the top row and first column—then information about the correlation between variables is entered in the appropriate “cells”
• Diagonals (the cell for the variable’s correlation with itself) usually is blank or has the value of 1.00
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Example of a Correlation Matrix
• Variables A and B are highly correlated• Other relationships in the matrix are
weak to moderate
A B C DVariable A 1.00
Variable B .82 1.00
Variable C -.23 -.35 1.00
Variable D .07 .17 -.02 1.00
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Crosstabs in SPSS
• Use Analyze Descriptive Statistics Crosstabs
• Select a variable as a row variable
• Or as a column variable, use arrows to move variables from main list
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Crosstabs in SPSS (cont’d)
• Statistics pushbutton allows you to select risk index calculation
• Cell pushbutton allows you to decide what statistics appear in the crosstabs table
• You can also request bar charts
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Cell Display in SPSS Crosstabs
• Observed Counts is the default
• Select whether you want Row, Column, or Total Percentages (or any combination of these three)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Statistics Options in SPSS Crosstabs
• Several statistics options will be discussed in later chapters
• But here we see where we can obtain risk index statistics
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Bivariate Correlation in SPSS
• Use Analyze Correlate Bivariate
• Move variables to be correlated from the main list into analysis list, using arrow
• Pearson coefficients are the default