This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Covariance and correlation• Pearson correlation coefficient• Tests and confidence interval for
correlations• Spearmann correlation• Pitfalls
3
Association
Study goal = examine the association between two variables
Some questions arise:• What measure of association should we use?• Is there a positive or negative association?• Is there a linear association?• Is there a significant association?
4
Covariance (1)= measure of how much two random variables
vary together
• difference with variance?• formula:
• Note: cov(X,X) = var(X)
1
))((),(cov
−
−−=∑
n
yyxxYX i
ii
5
Covariance (2)Example:• X = height, Y = weight• Positive or negative
covariance?•• Cov(X,Y) = 35.0
– Positive association– Strong or weak?
• X* = height in meters:– Cov(X*,Y) = 0.35
kg 5.76 cm, 181 == yx
Height (cm)
200190180170160150
Wei
ght (
kg)
110
100
90
80
70
60
50 +
+-
-
6
Correlation (1)
= measure of linear association between two random variables
• Notation: – population: ρ (rho)– sample: r
• Can take any value from -1 to 1• Closer to -1: stronger negative association• Closer to +1: stronger positive association
One-sample z-test: H0: ρ = ρ0 (1)• If ρ0 ≠ 0, r has a skewed distribution
– e.g. H0: ρ = 0.5 more “room” for deviation below 0.5 than above 0.5
– previous t-test for correlations is invalid!
• Solution: Fisher’s z-transformation van r
• ln = natural logarithm (base = e = 2.718)
⎟⎠⎞
⎜⎝⎛−+
=rr z
11ln
21
21
Fisher’s z-transformation
r = 0.05
z = 0.05
z = 1.1
r = 0.8
z = -1.1
r = -0.8
22
One-sample z-test: H0: ρ = ρ0 (2)
• z is approximately normally distributed under H0with mean
and variance 1/(n-3)
• Equivalently,
~ N(0,1)λ = (z – z0)√(n-3)
⎟⎟⎠
⎞⎜⎜⎝
⎛−+
=0
00 1
1ln21
ρρ z
link31
23
One-sample z-test: H0: ρ = ρ0 (3)
In conclusion:• H0: ρ = ρ0 (≠ 0); H1: ρ ≠ ρ0
• Compute sample correlation coefficient r• Transform r and ρ0 to z and z0, respectively,
using Fisher’s z-transformation• Compute test statistic λ = (z – z0)√(n-3)• Compute p-value (λ ~ N(0,1))• Not (yet) available in SPSS!!!
24
Example: Body weight (1)Research question:Association between body weights of father and son different for biological than for non-biological fathers?Previous research:A correlation of 0.1 is expected based on previous research with sons and non-biological fathersSample:• n = 100 biological fathers and sons• Pearson’s correlation coefficient r = 0.38
Rank correlation (1)• Assumed that X and Y are normally distributed
• If X and/or Y are either ordinal or have a distribution far from normal (due to outliers), then significance tests based on the Pearson correlation coefficient are no longer valid
• A non-parametric alternative should then be used. For example, a test based on the Spearman rank correlation coefficient
34
Rank correlation (2)
Spearman’s rank correlation coefficient:= Pearson’s correlation coefficient based on the
ranks of X and Y• Less sensitive for outliers; more general
association (not specifically linear)• n ≥ 10 (or 30): similar tests and CI as for
Pearson correlation• n < 10 (or 30): exact significance levels can be
found in table• Many ties (same value): use Kendall’s Tau
35
Normality check (1)• Use pp-plots and histograms to check normality
(symmetry)
• Problem with (significance) tests for normality:– Small sample size: no or little power to detect
discrepancy from normality– Medium or large sample size: no or small impact
due to central limit theorem
• Data skewed (outliers) & small sample size data transformation
36
Normality check (2)Be aware: significance depends on sample size!
Outcome
654321
Freq
uenc
y
1.5
1.0
.5
0.0
Shapiro-Wilk: p = 0.961 p = 0.039
Outcome
654321
Freq
uenc
y
6
4
2
0
37
Example: Apgar scores
• Apgar score (physical condition) at 1 and 5 minutes for 24 newborns
Pitfalls• Spurious correlations• No measurement of agreement• Change scores (Y-X) always related to baseline
X (“regression to the mean”)• Dependent pairs of observations (xi, yi)• …
Note:• No mathematical problem• Interpretation is incorrect
39
Dependent pairs of observation• Association between study duration and grade• Plot 1: dependency ignored negatively association• Plot 2: dependency taken into account (data from same
subject connected) positively association
STUDY DURATION
7654
GR
ADE
10
9
8
7
6
5
4
3
STUDY DURATION
7654G
RAD
E
10
9
8
7
6
5
4
3
Students were measured twice!!!
40
Relation between two variables
Three main purposes:• Association
– Pearson or Spearman correlation coefficient
• Agreement (same quantity: X = Y)– Method of Bland and Altman (Lancet, 1986)