PSY 1950 Correlation November 5, 2008. Definition Correlation quantifies the strength and direction of a linear relationship between two variables.

Post on 15-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

PSY 1950Correlation

November 5, 2008

Definition• Correlation quantifies the strength and direction of a linear relationship between two variables

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

History

The First Scatterplot (Galton, 1885)

Importance• Prior to correlation, “there was no way to discuss -- let alone measure -- the association between variables that lacked a cause-effect relationship”

• Correlation underlies many advanced statistical techniques– Factor analysis– Structural equation modeling

• Correlation informs– Prediction of a unkown variable– Validity of a measure– Reliability of a measure– Validity of a theory

Covariance• Covariance measures how much two variables change together– The more they change together, the higher the covariance

– Variance is a special case of covariance

3

4

5

6

7

8

9

0 1 2 3 4 5 6

3

4

5

6

7

8

9

0 2 4 6

X Y X Y Product1 4 -2 -2 42 5 -1 -1 13 6 0 0 04 7 1 1 15 8 2 2 4

Score DeviationX Y X Y Product1 8 -2 2 -42 5 -1 -1 13 4 0 -2 04 5 1 -1 -15 8 2 2 4

DeviationScore

The Problem with Covariation

• It reflects not only the degree of a bivariate relationship, but also the variation of each variables

• In other words, its units depends on the variables

3

5

7

9

11

13

15

17

19

21

0 1 2 3 4 5 6

3

4

5

6

7

8

9

0 2 4 6

Pearson Product-Moment Correlation (r)

• Special case of covariance– Standardized covariance– Covariance of standardized variables

Example

Interpreting r• Things to consider carefully

– Correlation versus causation– Restricted Range– Group sampling– Outliers– Linearity– Size– Homoscedasticity– Significance

Correlation versus Causation

Correlation versus Causation

Restriction of Range• When the bivariate range is artificially limited– In the case of linear relationship, the correlation is almost spuriously attenuated

– In the case of curvilinear relationship, can result in a spuriously large correlation

• Possibly a grouping/selection effect– The correlation between height and basketball ability among NBA players

• http://www.ruf.rice.edu/~lane/stat_sim/restricted_range/index.html

Grouping• Grouping of heterogeneous groups (either a priori via sampling or a posteriori via data segregation) can inflate correlation– e.g., the correlation between height and basketball ability among small people and tall people

– e.g., the correlation between height and weight in men and women•For men, r = .60, for women r = .49•Together, r = .78

• http://www.ruf.rice.edu/~lane/stat_sim/restricted_range/index.html

Outliers• Correlation is very sensitive to outliers– For all three plots, r, means, and SD are equal

Linearity

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Size• The magnitude of r

• The magnitude of r2

– The coefficient of determination– The proportion of variability in one variable accounted for by variability in the other variable

Homoscedasticity• Same as homogeneity of variance assumption

• Variance for Y does not depend on value of Y and vice-versa

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Significance• To test the null hypothesis that the population correlation, (“rho”) = 0, use:

QuickTime™ and a decompressor

are needed to see this picture.

Other measures of correlation

• Computationally identical to r– Point-biserial

•One dichotomous variable

– Phi•Two dichotomous variables

– Spearman•Both variables on ordinal scale•Tests monotonicity of relationship•As X increases, so does Y•No accurate significance test

• Computationally novel techniques– e.g., Kendll’s Tau

top related