Lecture 5 Correlation and Regression Dr Peter Wheale.
Post on 15-Jan-2016
236 Views
Preview:
Transcript
Lecture 5Correlation and Regression
Dr Peter Wheale
A Scatter Plot of Monthly Returns
Interpretation of Correlation Coefficient
Correlation Interpretation coefficient (r)
(r) r = +1 perfect positive correlation 0 < r < +1 positive linear relationship r = 0 no linear relationship r = -1 perfect negative correlation -1 < r < 0 negative linear relationship
Scatter Plots and Correlation
Covariance of Rates of Return
n
t,1 1 t,2 2t 1
1,2
R R R Rcov
n 1Example: Calculate the covariance between the returns on the two stocks indicated below:
Covariance Using Historical Data
R2 = 0.07
Σ = 0.0154
Cov = 0.0154 / 2 = 0.0077 R1 = 0.05
Sample Correlation Coefficient
Correlation, ρ, is a standardized measure of covariance and is bounded by +1 and –1
1,21,2
1 2
Cov
1,2
0.00510.662
0.07 0.11
Example: The covariance of returns on two assets is 0.0051 and σ1= 7% and σ2= 11%. Calculate ρ1,2.
Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:
Degrees of freedom is n – 2
Example
Data:n = 10r = 0.475 Determine if the sample correlation is significant at the
5% level of significance.t = 0.475 (8)0.5 / [1 – (0.475)2] 0.5
= 1.3435 / 0.88 = 1.5267 The two-tailed critical t – values at a 5% level of
significance with df = 8 (n-2) are found to be +/- 2.306.
Since -2.306≤ 1.5267≤ 2.306, the null hypothesis cannot be rejected, i.e. correlation between variables X and Y is not significantly different from zero at a 5% significance level.
Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:
Degrees of freedom is n – 2
Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:
Degrees of freedom is n – 2
Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:
Degrees of freedom is n – 2
Linear Regression• Dependent variable: you are trying to
explain changes in this variable• Independent variable: the variable being
used to explain the changes in the dependent variable
• Example: You want to predict housing starts using mortgage interest rates:
Independent variable = mortgage interest ratesDependent variable = housing starts
Regression Equation
y-Intercept
Slope Coefficient
Independent Variable
Dependent Variable
Error Term
Assumptions of Linear Regression
• Linear relation between dependent and independent variables
• Independent variable uncorrelated with error term
• Expected value of error term is zero• Variance of the error term is constant• Error term is independently distributed• Error term is normally distributed
Estimated Regression Coefficients
Estimated regression line is:
Y-InterceptSlope
Estimating the slope coefficient
b1 = the cov(X,Y) / var(X)
Example Compute the slope coefficient and intercept term for
the least squares regression equation using the following information:
Where X – Xmean multiplied by Y-Ymean = 445, and X – Xmean squared = 374.50. The sample means of X and Y = 25 and 75, respectively.
The slope coefficient, b1 = 445/374.5 = 1.188.
The intercept term, b0 = 75 – 1.188 (25) = 45.3.
Calculating the Standard Error of the Estimate (SEE)
• SEE measures the accuracy of the prediction from a regression equation It is the standard dev. of the error term The lower the SEE, the greater the accuracy
SSESEE
n – 2
where:
SSE sum of squared errors
Interpreting the Coefficient of Determination (R2)
• R2 measures the percentage of the variation in the dependent variable that can be explained by the independent variable
• An R2 of 0.25 means the independent variable explains 25% of the variation in the dependent variable
Caution: You cannot conclude causation
Calculating the Coefficient of Determination (R2)
• For simple linear regression, R2 is the correlation coefficient (r) squared
Example: Correlation coefficient between X and Y, (r) = 0.50
Coefficient of determination = 0.502 = 0.25
Coefficient of Determination (R2)
R2 can also be calculated with SST and SSRSS Total = SS Regression + SS Error
Total variation = explained variation + unexplained variation
2 SSR SST – SSE SSE explained variationR = = =1– =
SST SST SST total variation
top related