1 Chapter 5 Correlation I Introduction to Correlation and Regression A.Describing the Linear Relationship Between Two Variables, X and Y 1. Pearson product-moment correlation coefficient (r)
Dec 13, 2015
1
Chapter 5
Correlation
I Introduction to Correlation and Regression
A. Describing the Linear Relationship Between Two Variables, X and Y
1. Pearson product-moment correlation coefficient (r)
2
2. Bivariate frequency distributions (scatterplots)for various correlation coefficients (r)
5040302010
5040302010
Y
X
r= + 1
••
••
••
••
•
5040302010
5040302010X
•
••
• •
••
•••
•• •
r = .80
3
5040302010X
Y••••••••
••
•
••
•
•
•
•
•
•
•5040302010
r = .30
= 0
5040302010X
r
•
•••
••
••
•
•
•••
•
•
•
••
••
•
•
••
•
Y
5040302010
5040302010X
Y•
•
•••
•• •
•
•
• •
••
•
••
5040302010
r = –.20
= –1
5040302010X
r
••
•••
••
••Y
5040302010
4
3. Upper and lower limits for r: +1 to –1
B. Correlation and Regression Distinguished
1. Characteristics of regression situations
One dependent variable, Y, and one or more independent variables, X
Levels of independent variables are
selected in advance
The value of the dependent variable for a given level of the independent variable is free
to vary
5
The researcher is primarily interested in predicting Y from a knowledge of X
2. Characteristics of correlation situation
Neither variable is considered the independent variable
The researcher is primarily interested in assessing the strength of the relationship between X and Y
X and Y are both free to vary
6
II Correlation
A. Formula for Pearson Product-Moment Correlation Coefficient
r SXY
SX SY
( X i X )(Yi Y )i1
n
n
( X i X )2
i1
n
n
(Yi Y )2
i1
n
n
7
1. Understanding the formula for r; what the numerator tells you
Covariance
SXY
( X i X )(Yi Y )i1
n
n
Information in the cross products
( X i X )(Yi Y )
i1
n
8
••••••••••••••••••iiQuadrant 1(X – X) (Y – Y) > 0iiQuadrant 3(X – X) (Y – Y) > 0Variable YY Quadrant 2(Xi – X) (Yi – Y) < 0Quadrant 4 (Xi – X) (Yi – Y) < 0XVariable Xa.
•
••
••
•
•
•
••
••• •
••
• •
i i
Quadrant 1
(X – X) (Y – Y ) > 0
i i
Quadrant 3
( X – X ) (Y – Y ) > 0
Var
iabl
e Y
Y
Quadrant 2
( Xi – X ) (Yi – Y ) < 0
Quadrant 4
( Xi – X ) (Yi – Y ) < 0
XVariable X
a.
9
••••
•
•
•
•
••
••
•
•
••••
i i
Quadrant 1
(X – X) (Y – Y ) > 0
i i
Quadrant 3
( X – X ) (Y – Y ) > 0
Var
iabl
e Y
Y
Quadrant 2
(Xi – X ) (Yi – Y ) < 0
Quadrant 4
(Xi – X) (Yi – Y ) < 0
XVariable X
b.
10
2. If the majority of the data points fall in quadrants1 and 3, the cross product is positive and r > 0
3. If the majority of the data points fall in quadrants2 and 4, the cross product is negative and r < 0
4. If the data points are equally dispersed over the four quadrants, the cross product equals zero and r = 0
5. The cross product is largest when the data pointsfall on a straight line
6. The cross product is small when the data pointsfall in an elongated circle (ellipse)
11
Table 1. Height and Weight of Girl’s Basketball Team
1 7.0 140 .64 289 13.62 6.5 130 .09 49 2.13 6.5 140 .09 289 5.14 6.5 130 .09 49 2.15 6.5 120 .09 9 –0.96 6.0 120 .04 9 0.67 6.0 130 .04 49 –1.48 6.0 110 .04 169 2.69 5.5 100 .49 529 16.1
10 5.5 110 .49 169 9.1
X i
Yi Girl ( X i X )2
(Yi Y )2
( X i X )(Yi Y )
(1) (2) (3) (4) (5)
X 6.2 Y 123 2.10 1610 49.0
(6)
13
r
( X i X )(Yi Y )i1
n
n
( X i X )2
i1
n
n
(Yi Y )2
i1
n
n
49.0
10
2.10
10
1610
10
6.30
5.8152.84
C. Computation of r for Data in Table 1
14
III Interpretation of the Correlation Coefficient
A. Coefficient of Determination, r2 , and
Nondetermination, k2
Total Y variance
expressed as a
proportion
Proportion of Y
variance explained
by X variance
Proportion of Y
variance not explained
by X variance
SY2
SY2 r 2 k 2
15
B. Visual Representation of r2 and k2
b.
Variance in Y Variance in X
k2 = .84 k2 = .84 r2 = .16
r = .40 a.Variance in Y Variance in X
k2 = .29 k2 = .29 r2 = .71
r = .84
16
c. d.= 1r = 0r
= 1k 2
= 0k 2
= 0r 2= 1r 2
Variance in YVariance in Y
Variance in XVariance in X
= 1k 2
17
IV Common Errors in Interpreting r
A. Interpreting r in Direct Proportion to its Size
B. Interpreting r in Terms of Arbitrary Labels
r .90 very high
r .70 Š .89 high
r .30 Š .69 medium
r .30 low
18
1. Typical reliability coefficients
2. Typical validity coefficients
C. Inferring Causation from Correlation
V Some Factors That Affect the Correlation Coefficient
19
A. Nature of the Relationship Between X and Y
•
•
••
•• •
•
•
•
••
•
•
••
•
•
•••
•• ••
• ••
•
•
•
•
•
•
••
• ••
•
••
•••
•••
•
•
••
•• •
••
•• •
••
•
•
•
a. b. c.
Y Y Y
X X X
1. Eta or eta squared can be used to describe the curvilinear relation between X and Y
20
B. Truncated Range
110
100
90
80
70
60
504030 60 70 80
Aptitude score
Pro
duct
ion
units
per
day
90
•
• •
••
•
•
•
• •
•••
•••
••
•
••
•
••
•
••
••
••
•
••
•
•
•
•
•
Y
X
Prod
ucti
on u
nits
per
day
21
C. Subgroups with Different Means or Standard Deviations
A
A AAA
A
A
A
AA
A B
B
B
BB B
BB
B
BB
B
LL
L
L
LL
L
L
MMM
MMM
MM
Anxiety
Scho
ol a
chie
vem
ent
X
Y Y
YM–
YA
–Y
B–
XB–XM
–X
A–
YL–
XL– X
a. Combined is spuriously high.r b. Combined is spuriously low.r
22
X
Y Y
X
A A
AAB
A
AA
A
ABAAAA
A A
A A
A
AA
AA
A
AA
A
A
AA
ABB
BBB
BB BB
B BB
BBB
B
B
B
high for B and low for A.c. Combined is spuriouslyr d. Combined is spuriously low.r
AB
X
Y Y
YA–
AY–
BY–
XB–X
B
–X
A–
YB–
XA
– X
•••
••
•
••• •
• • •••
•
• •• • ••••
•••
•• •
•
•• ••
•• •
= – = +
e. f.
= +r= +r
r= +r
rcombined rcombined
= –
23
D. Discontinuous Distribution
16
1618
18
20
20
22
22 24
2426
26
28
28
30
30 32
3234
34 36
3638
38 40
404244
•
••
•
••
•
•
Region of discontinuity
Father's authoritarianism
Son
's a
utho
rita
rian
ism
24
E. Non-Normal Distributions
X
Y Y
Y
XX
X
Y
Most scores will fall in this quadrant
Most scores will fall in this quadrant
Most scores will fall in this quadrant
Most scores will fall in this quadrant
–X
Y–
Y–
Y–Y
–
–X
–X
–X
26
VI Spearman Rank Correlation (rs)
A. Strength of Monotonic Relationship Based On Ranks, RXi
and RYi
B. Computational Example
1
6
12
1
2
nn
RR
r
n
iYX
s
ii
27
Table 2. Progress of Patients in Therapy as Ranked by Occupational Therapist, RX, and Physical Therapist, RY
1 5 7 –2 42 3 3 0 03 1 2 –1 14 7 6 1 15 4 5 –1 16 2 1 1 17 8 8 0 08 6 4 2 4
Patient RX i
RYi RX i
RYi
(1) (2) (3) (4)
(RX i
RYi) 0
(RX i
RYi)2 11
(5)
2ii YX RR
28
C. Computation of rs
rs 1 6(11)
8 (8)2 1
1 66
504.87
1. Dealing with tied ranks
1
6
12
1
2
nn
RR
r
n
iYX
s
ii
29
VII Other Kinds of Correlation Coefficients
Coefficient Symbol Characteristics
1. Eta X and Y quantitative,curvilinear relationship
2. Biserial rb X and Y quantitative, but one variable forced into a
dichotomy
3. Cramér’s V X and Y both dichotomous correlation
4. Multiple R All X’s and Y’s quantitative, correlation linear relationships