Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)

Post on 26-Dec-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Correlation andCovariance

Overview

Continuous

Continuous

Categorical

Histogram

Scatter

Boxplot

Predictor Variable(X-Axis)

Height

Outcome, Dependent Variable (Y-Axis)

Correlation

Covariance is High: r ~1

Covariance is Low: r ~0

• It varies between -1 and +1• 0 = no relationship

• It is an effect size• ±.1 = small effect• ±.3 = medium effect• ±.5 = large effect

• Coefficient of determination, r2

• By squaring the value of r you get the proportion of variance in one variable shared by the other.

Things to Know about the Correlation

Variables

Y

X’s

Height

Independent Variables

DependentVariables

Y

X4X3X2X1

Little Correlation

Correlation is For Linear Relationships

Outliers Can Skew Correlation Values

Correlation and Regression Are Related

1cov( , ) i ix x y y

Nx y

Covariance

Y

X

Persons 2,3, and 5 look to have similar magnitudes from their means

254417

441021418221

4)4)(62()2)(60()1)(41()2)(41()3)(40(

1))((

)cov(

.

.....

.....N

yyxxy,x ii

Covariance

• Calculate the error [deviation] between the mean and each subject’s score for the first variable (x).

• Calculate the error [deviation] between the mean and their score for the second variable (y).

• Multiply these error values.

• Add these values and you get the cross product deviations.

• The covariance is the average cross-product deviations:

1cov( , ) i ix x y y

Nx y

Covariance

Age Income Education7 4 34 1 86 3 58 6 18 5 77 2 95 3 39 5 87 4 58 2 29 5 28 4 29 2 38 4 73 1 43 1 38 2 61 2 53 1 76 3 3

Do they VARY the same way relative to their own means?

2.47

• It depends upon the units of measurement.• E.g. the covariance of two variables measured in miles

might be 4.25, but if the same scores are converted to kilometres, the covariance is 11.

• One solution: standardize it! normalize the data• Divide by the standard deviations of both variables.

• The standardized version of covariance is known as the correlation coefficient.• It is relatively unaffected by units of measurement.

Limitations of Covariance

cov

1

xy

x y

i i

x y

s s

x x y y

N s s

r

The Correlation Coefficient

cov

4.25

1.67 2.92.87

xy

x ys sr

Correlation

Covariance is High: r ~1

Covariance is Low: r ~0

Correlation

Correlation

Need inter-item/variable correlations > .30

Character Vector: b <- c("one","two","three")

numeric vector

character vector

Numeric Vector: a <- c(1,2,5.3,6,-2,4)

Matrix: y<-matrix(1:20, nrow=5,ncol=4)

Dataframe:d <- c(1,2,3,4)e <- c("red", "white", "red", NA)f <- c(TRUE,TRUE,TRUE,FALSE)mydata <- data.frame(d,e,f)names(mydata) <- c("ID","Color","Passed")

List:w <- list(name="Fred", age=5.3)

Data Structures

Framework Source: Hadley Wickham

Correlation Matrix

Correlation and Covariance

1cov( , ) i ix x y y

Nx y

Revisiting the Height Dataset

Galton: Height Dataset

cor(heights)Error in cor(heights) : 'x' must be numeric

Initial workaround: Create data.frame without the Factors

h2 <- data.frame(h$father,h$mother,h$avgp,h$childNum,h$kids)

cor() function does not handle Factors

Later we will RECODE the variable into a 0, 1

Excel correl() does not either

Histogram of Correlation Coefficients

-1 +1

Correlations Matrix: Both Types

library(car)scatterplotMatrix(heights)

Zoom in on Gender

Correlation Matrix for Continuous Variables

chart.Correlation(num2)PerformanceAnalytics package

Categorical: Revisit Box Plot

Factors/Categorical work with Boxplots; however some functions are not set up to handle Factors

Note there is an equation here:Y = mx b

Correlation will depend on spread of distributions

Manual Calculation: Note Stdev is Lower

Note that with 0 and 1 the Delta from Mean are low; and Standard Deviation is Lower. Whereas the

Continuous Variable has a lot of variation, spread.

Categorical: Recode!Gender recoded as

a 0= Female1 = Male

@correl does not work with Factor

Variables

Formula now works!

Correlation: Continuous & Discrete

More examples of cor.test()

Correlation Regression

Continuous Categorical

Continuous

Categorical

Histogram

Scatter

Bar

CrossTable

Boxplot

Predictor Variable(X-Axis)

Pie

Mosaic

CrossTable

LinearRegression

LogisticRegression

Regression Model

Parents Height

Gender

Frequency

0

1

Outcome, Dependent Variable (Y-Axis)

Mean, Median, Standard Deviation

Proportions

Summary

top related