This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Coefficient of CorrelationCoefficient of Correlation
The strength of the linear relationship The strength of the linear relationship between two variables is called the between two variables is called the coefficient of correlation, r.coefficient of correlation, r.
rr = =∑∑((xx - - xx)()(yy - - yy))
∑ ∑((xx - - xx))22 ∑( ∑(yy - - yy))22
==∑∑xyxy - (∑ - (∑xx)(∑)(∑yy) / ) / nn
∑ ∑xx22 - (∑ - (∑xx))22 / / nn ∑ ∑yy22 - (∑ - (∑yy))22 / / nn
Coefficient of Correlation Coefficient of Correlation PropertiesProperties
1.1. r ranges from r ranges from -1.0-1.0 to to 1.01.0
2.2. The larger |r | is, the stronger the linear The larger |r | is, the stronger the linear relationshiprelationship
3.3. The sign of r tells you whether the The sign of r tells you whether the relationship between X and Y is a positive relationship between X and Y is a positive (direct) or a negative (inverse) relationship(direct) or a negative (inverse) relationship
4.4. r r = 1= 1 or or -1-1 implies that a perfect linear implies that a perfect linear pattern exists between the two variables, pattern exists between the two variables, that they are perfectly correlatedthat they are perfectly correlated
The least squares line is the line The least squares line is the line through the data that minimizes the through the data that minimizes the sum of the differences between the sum of the differences between the observations and the lineobservations and the line
Assumptions for theAssumptions for the Simple Regression Model Simple Regression Model
1.1. The mean of each error component is zeroThe mean of each error component is zero
YY = = 00 + + 11XX + + ee
2.2. Each error component (random variable) Each error component (random variable) follows an approximate normal distributionfollows an approximate normal distribution
3.3. The variance of the error component is the The variance of the error component is the same for each value of Xsame for each value of X
4.4. The errors are independent of each otherThe errors are independent of each other
Danger of Assuming Danger of Assuming CausalityCausality
A high statistical correlation does A high statistical correlation does not imply causalitynot imply causality
There are many situations when There are many situations when variables are highly correlated variables are highly correlated because a factor not being because a factor not being studied affects the variables being studied affects the variables being studiedstudied
Coefficient of DeterminationCoefficient of Determination
SSE = SSSSE = SSYY - -(SCP(SCPXYXY))22
SSSSXX
rr22 = =(SCP(SCPXYXY))22
SSSSXXSSSSYY
rr22== coefficient of determinationcoefficient of determination
== 1 -1 -
== percentage of explained variation percentage of explained variation in the dependent variable using the in the dependent variable using the simple linear regression modelsimple linear regression model
Checking Model Checking Model AssumptionsAssumptions
1.1. The errors are normally distributed The errors are normally distributed with a mean of zerowith a mean of zero
2.2. The variance of the errors remains The variance of the errors remains constant. For example, you should not constant. For example, you should not observe larger errors associated with observe larger errors associated with larger values of X.larger values of X.
3.3. The errors are independentThe errors are independent
Outlying sample values can be found Outlying sample values can be found by calculating the sample leverageby calculating the sample leverage
hhii = + = +((xxii - - xx))22
SSSSXX
11
nn
SSSSXX = ∑ = ∑xx22 - (∑ - (∑xx))22//nn
A sample is considered an outlier if its A sample is considered an outlier if its leverage is greater than leverage is greater than 4/4/n or n or 6/6/nn
Unusually large or small values of the dependent Unusually large or small values of the dependent variable variable ((YY)) can generally be detected using the can generally be detected using the
An observation is thought to have and outlying An observation is thought to have and outlying value of Y if its standardized residual value of Y if its standardized residual > 2> 2 or or < -2< -2
You may conclude the ith observation is You may conclude the ith observation is influential if the corresponding Dinfluential if the corresponding Dii measure measure > .8> .8