Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung
Mar 30, 2015
Simple Linear Regression andCorrelation
by
Asst. Prof. Dr. Min Aung
When SLR?
• Study a relationship between two variables
• Paired-Samples or matched data
• Interval or ratio level measurement
Independent and dependent variables
• You want to guess or estimate or compute the
values of the dependent variable.
• In estimating, you will use the values of the
independent variable.
Predictor and Predicted variables
• Predictor = independent variable.
• Predicted variable = dependent variable.
Scatter Diagram
• X-axis = independent variable.
• Y-axis = dependent variable.
• Each pair of data A point (x, y)
X
Y
2
3 (2, 3)
X
Purpose of Drawing Scatter Diagram
• Is there a linear relationship between the two variables X and Y?
• Linear relationship = Scatter points (roughly at least) form the shape of a straight line.
Y
X
Y
Linear relationship No linear relationship
Measuring Strength of Linear Relationship
• Pearson’s coefficient of correlation r
• Formula (2) (Not used in exam. Just for knowledge)
• Calculator Work For Casio 350MS
Switch the calculator on.
1. Set calculator in LR (Linear Regression) mode:
Press Mode.
Press 3 for Reg (Regression).
Press 1 for Linear.
• Check n. (Checking whether there are old data):
Press Shift 1, next 3, and then =.
Calculator Work for r
3. Enter Data in Pairs:
x-value , y-value M+
x-value , y-value M+
x-value , y-value M+
4. Check n again: see
step 2 above.
5. Press shift 2, then move by arrow to the right, press
3 for r, and then press =.
Now you see the value of r.
Interpretation of r (Direct linear relationship)
1. If r is 1 or – 1, then all scatter points are on a straight line.
2. If r is 1, all points are on a straight line with a positive slope.
3. If r is -1, all points are on a straight line with a negative slope.
4. If a straight line has a positive slope, it rises up to the right.
5. If a straight line has a
positive slope, if x
increases, then y increases
for the points (x, y) on it.(small x, small y)
(large x, large y)
6. In this situation, we say that the two variables X and Y are
directly or positively correlated.
Interpretation of r (Inverse linear relationship)
1. If r is -1, all points are on a straight line with a negative slope.
2. If a straight line has a
negative slope, if x
increases, then y decreases
for the points (x, y) on it.
(small x, large y)
(large x, small y)
6. In this situation, we say that the two variables X and Y are
inversely or negatively correlated.
Interpretation of r (strength)
1. If r is not exactly 1 or – 1, but it is .9 or - .9, then the points
are around a straight line. They are close to a straight-line
shape.
2. If r is .8 or - .8, then the points are close to a straight-line
shape, but not so well as in case of .9 or -.9.
3. Thus, the closer r is to 1 or – 1, the closer are the points to a
straight-line shape.
4. Thus, the closer r is to 0, the farther are the points from a
straight-line shape.
5. In r-values, 0.9 are stronger than 0.8, and 0.8 are
weaker than 0.9.
Interpretation of r (strength)
Values of r
0
No linear relationship
0.5
Weak linear relationship
- 0.5
Weak linear relationship
1
Strong
Perfect
-1
Strong
Perfect
Testing Linear Relationship
1. Pearson invented a formula to measure the strength and
direction of a linear relationship between two variables.
2. The number given by his formula is called correlation
coefficient. We call it Pearson’s coefficient of
correlation.
3. We write r for this value in a sample, and we write for
this value in a population.
4. Testing whether the correlation is significant is scientific
guessing whether there should be a correlation, in the
population, between the two variables under
consideration.
Null and Alternate Hypothesis
1. Test correlation: H0: = 0 and Ha: 0
2. Test direct correlation: H0: 0 and Ha: > 0
3. Test inverse correlation: H0: 0 and Ha: < 0
4. Test positive correlation: H0: 0 and Ha: > 0
5. Test inverse correlation: H0: 0 and Ha: < 0
Three types of test
1. H0: = 0 and Ha: 0 Two-tailed test
2. H0: 0 and Ha: < 0 Left-tailed test
3. H0: 0 and Ha: > 0 Right-tailed test
Critical value
1. Read t table.
2. Degrees of freedom (Df) = n - 2
3. n = number of pairs of data
4. Right-tailed test Positive sign
5. Left-tailed test Negative sign
6. Two-tailed test Both positive and negative sign
Test Statistic
1. Test statistic = Strength of evidence supporting alternate hypothesis Ha
2. Original test statistic to test is r.
3. Convert r to t by Formula (10).
4. Learn to compute t by your calculator correctly.
Rejection region 1
• For a two tailed-test, the rejection region is on the right of
positive critical value and on the left of negative critical value.
Real number line for t values
0 Positive Critical ValueNegative Critical Value
Total area = Level of significance = Probability = α
Rejection regionRejection region
T curve
Rejection region 2
• For a left-tailed test, the rejection region is on the left of
(negative) critical value.
Real number line for t values
0(Negative) Critical Value
α = Area = Level of significance = Probability
Rejection region
t curve
Rejection region 3
• For a right-tailed test, the rejection region is on the right of the
(positive) critical value.
Real number line for t values
0 (Positive) Critical Value
Area = Level of significance = Probability = α
Rejection region
t curve
Decision Rule
• If the test statistic (TS) is in the rejection region, then reject H0.
• Reject H0 = “H0 is false, and hence Ha is true.”
• Fail to reject H0 = “H0 is true, and hence Ha is false.”
Conclusion
• Conclusion = Decision
• Decision is the last step of statistical procedure.
• Conclusion is the report to the one who asked the original question.