This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Statistics For Management Unit 12
Sikkim Manipal University 180
Unit 12 Simple Correlation & Regression Structure
12.1 Introduction
Objectives
12.2 Correlation
12.2.1 Causation and Correlation
12.2.2 Types of Correlation
12.3 Measures of Correlation
12.3.1 Scatter Diagram
12.3.2 Karl Pearson’s Correlation Coefficient
12.3.3 Properties of Karl Pearson’s Correlation Coefficient
12.3.4 Factors Influencing the Size of Correlation Coefficient
12.4 Problems
12.5 Probable Error
12.6 Spearman’s Rank Correlation Coefficient
12.7 Partial Correlation
12.8 Multiple Correlation
12.9 Regression
12.9.1 Regression Analysis
12.9.2 Regression Lines
12.9.3 About Regression Coefficient
12.9.4 Differences Between Correlation Coefficient and Regression Coefficient
12.9.5 Examples
12.10 Standard Error Of Estimate
12.11 Multiple Regression Analysis
12.12 Reliability of Estimates
12.13 Application of Multiple Regression
Self Assessment Questions
12.14 Summary
Terminal Questions
Answer to SAQ’s and TQ’s
Statistics For Management Unit 12
Sikkim Manipal University 181
12.1 Introduction Both correlation and regression are used to measure the strength of relationships between
variables.
The following statistical tools measure the relationship between the variable analyzed in social
science research. 1. Correlation
a. Simple correlation – Here the relationship between two variables are studied.
b. Partial correlation – Here the relationship of any two variables are studied, keeping all
others constant.
c. Multiple correlation – Here the relationship between variables are studied simultaneously. 2. Regression
a. Simple regression
b. Multiple regression 3. Association of Attributes Correlation measures the relationship (positive or negative, perfect) between the two variables.
Regression analysis considers relationship between variables and estimates the value of another
variable, having the value of one variable. Association of Attributes attempts to ascertain the
extent of association between two variables. Learning Objectives
In this unit students will learn about
1. Simple, partial & multiple correlation 2. Parametric and non parametric measures of correlation
The method of estimating unknown values from known values through regression equations
12.2 Correlation
When two or more variables move in sympathy with other, then they are said to be correlated. If
both variables move in the same direction then they are said to be positively correlated. If the
variables move in opposite direction then they are said to be negatively correlated. If they move
haphazardly then there is no correlation between them.
Correlation analysis deals with
1) Measuring the relationship between variables.
2) Testing the relationship for its significance.
3) Giving confidence interval for population correlation measure.
Statistics For Management Unit 12
Sikkim Manipal University 182
12.2.1 Causation and Correlation The correlation between two variables may be due to the following causes,
i) Due to small sample sizes. Correlation may be present in sample and not in population.
ii) Due to a third factor. Correlation between yield of rice and tea may be due to a third factor
“rain”
12.2.2 Types of Correlation Types of correlation are given below
a. Positive or Negative
b. Simple, Partial and Multiple
c. Linear and Nonlinear
Positive correlation: Both the variables (X and Y) will vary in the same direction. If variable X
increases, variable Y also will increase; if variable X decreases, variable Y also will decrease.
Negative Correlation: The given variables will vary in opposite direction. If one variable
increases, other variable will decrease.
Simple, Partial and Multiple correlations: In simple correlation, relationship between two variables
are studied. In partial and multiple correlations three or more variables are studied. Three or
more variables are simultaneously studied in multiple correlations. In partial correlation more
than two variables are studied, but the effect on one variable is kept constant and relationship
between other two variables is studied.
Linear and NonLinear correlation: It depends upon the constancy of the ratio of change between
the variables. In linear correlation the percentage change in one variable will be equal to the
percentage change in another variable. It is not so in non linear correlation.
12.3 Measures of correlation i) Scatter Diagram.
ii) Karl Pearson’s correlation coefficient.
iii) Spearman’s Rank correlation coefficient.
12.3.1 Scatter Diagram The ordered pair of observed values are plotted on x y plane as dots. Therefore it is also known
as Dot Diagram. It is diagrammatic representation of relationship.
Statistics For Management Unit 12
Sikkim Manipal University 183
If the dots lie exactly on a straight line that runs form left bottom to right top, then the variables are
said to be perfectly positively correlated (fig.i).
If the dots lie close to a straight line that runs from left bottom to right top, then the variables are
said to be positively correlated (fig.ii).
If the dots lie exactly on a straight line that runs from left top to right bottom then the variables are
said to be perfectly negatively correlated (fig iii).
If the dots lie very close to a straight line that runs from left top to right bottom then the variables
are said to be negatively correlated (fig iv).
If the dots lie all over the graph paper then the variables have zero correlation (fig v).
Scatter diagram tells us the direction in which they are related and does not give any quantitative
measures for comparison between sets of data.
12.3.2 Karl Pearson’s Correlation Coefficient
It is defined as
i) ∑xy
Nσxσy
Where x =X – X y = Y Y
r = (A)
Y Y
Y
X X X i ii iii
iv v
Y Y
X X 0 0
0 0 0
Statistics For Management Unit 12
Sikkim Manipal University 184
∑ (x – x) 2
n
∑ (y – y) 2
n
n – number of paired observations ∑xy / N is called covariance of x and y. the other forms of
this formula are
ii. ∑ xy
√(∑x 2 ) (∑y 2 ) (B)
. N∑ XY ∑X∑Y
N∑X 2 – (∑X 2 ) 1/2 N∑Y 2 (∑Y 2 ) ½ (C)
. N∑ dx dy ∑dx dy
N∑dx 2 (∑dx 2 ) 1/2 N∑dy 2 (∑dy 2 ) ½ (D)
For all practical purpose we can conveniently use form D. Whenever summary information is
given choose proper form from A to C.
12.3.3 Properties Of Karl Pearson’s Correlation Coefficient.
§ Its value always lies between – 1 and 1.
§ It is not affected by change of origin or change of scale.
§ It is a relative measure (does not have any unit attached to it)
12.3.4 Factors influencing The size of Correlation Coefficient The size of r is very much dependent upon the variability of measured values in the correlation
sample. The greater the variability, the higher will be the correlation, everything else being equal.
The size of r is altered when researchers select extreme groups of subjects in order to compare
these groups with respect to certain behaviors. Selecting extreme groups on one variable
increases the size of r over what would be obtained with more random sampling.
Combining two groups which differ in their mean values on one of the variables is not likely to
faithfully represent the true situation as far as the correlation is concerned.
Addition of an extreme case (and conversely dropping of an extreme case) can lead to changes
in the amount of correlation. Dropping of such a case leads to reduction in the correlation while
the converse is also true. (Source: Aggarwal.Y.P, Statistical Methods, Sterling Publishers Pvt
Ltd., New Delhi, 1998, p.131).
σx 2 =
σy 2 =
r =
r =
r =
Statistics For Management Unit 12
Sikkim Manipal University 185
12.4 Problems
Example 1: Find Karl Pearson’s Correlation Coefficient, given
X 20 16 12 8 4
Y 22 14 4 12 8
X Y X 2 Y 2 XY
20 22 400 484 440
16 14 256 196 224
12 4 144 16 48
8 12 64 144 96
4 8 16 64 32
∑X = 60 ∑Y = 60 ∑X 2 = 880 ∑Y 2 = 904 ∑XY = 840
Applying the formula for r and substituting the respective values from the above table we get r
as:
. N∑ XY ∑X∑Y
N∑X 2 – (∑X 2 ) 1/2 N∑Y 2 (∑Y 2 ) ½
. 5(840 – (60)(60)
√5(880) – (60) 2 √5(904) – (30) 2
Example 2: Calculate Karl Pearson Coefficient of Correlation from the following data:
Year 1985 1986 1987 1988 1989 1990 1991 1992
Index of Production 100 102 104 107 105 112 103 99
Number of
unemployed
15 12 13 11 12 12 19 26
r =
r = = 0.70
r = 0.70
Statistics For Management Unit 12
Sikkim Manipal University 186
Solution:
Year Index of
Production X
X – X
x x 2
No. of
unemployed
Y – Y
y y 2 xy
1985 100 4 16 15 0 0 0
1986 102 2 4 12 3 9 + 6
1987 104 0 0 13 2 4 0
1988 107 + 3 9 11 4 16 12
1989 105 + 1 1 12 3 9 3
1990 112 + 8 64 12 3 9 24
1991 103 1 J 19 + 4 16 4
1992 99 5 25 26 + 11 121 55
∑X = 832 ∑x = 0 ∑x 2 =
120
∑Y = 120 ∑y = 0 ∑y 2 = 194 ∑xy = 92
X = 104 Y = 15
∑ xy 92
√(∑x 2 ) (∑y 2 ) √120 x 184
Therefore a correlation between production and unemployed is negative.
Example 3: Calculate Correlation Coefficient from the following data:
X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70
Solution:
X50 = dx dx 2 Y Y55 = dy dy 2 dx dy
50 0 0 48 7 49 0
60 + 10 100 65 + 10 100 + 100
58 + 8 64 50 5 25 40
47 3 9 48 7 49 + 21
49 1 1 55 0 0 0
33 17 289 58 3 9 51
r = r = = = 0.619
Statistics For Management Unit 12
Sikkim Manipal University 187
65 + 15 225 63 8 64 + 120
43 7 49 48 7 49 + 49
46 4 16 50 5 25 + 20
68 +18 324 70 15 225 + 270
∑X =
519
∑dx = + 19 ∑dx 2 =
1077
∑Y = 535 ∑dy = 5 ∑dy 2 =
595
∑dxdy =
489
Using the formula for calculating r as
. N∑ dx dy ∑dx dy
N∑dx 2 (∑dx 2 ) ½ N∑dy 2 (∑dy 2 ) ½
And substituting values we get r = 0.611
Example 4: In a Bivariate data on x and y variance of x = 49, variance of y = 9 and covariance
(∑x,y) = 17.5. Find coefficient of correlation between x and y.
Solution: we know
∑xy
Nσxσy
Given ∑xy
N
σx = √49 = 7 σy = √9 = 3
17.5
7 x 3
There is a high negative correlation.
Example 5: Ten observation in Weight (x) and Height (y) of a particular age group gave the
Example 8: Rank Difference Coefficient of Correlation (Case of No Ties)
Student Score on Test I
Score on Test II
Rank Of Test I
Rank on Test II
Difference between Ranks
Difference squared
X Y R1 R2 D D 2
A 16 8 2 5 3 9 B 14 14 3 3 0 0 C 18 12 1 4 3 9 D 10 16 4 2 2 4 E 2 20 5 1 4 16
N = 5 ∑D 2 = 38
Applying the formula of Regulations we get
6∑D 2 6(38)
N 3 – N 5 3 – 5
Relation between x and y is very high and inverse.
Relationship between score on Test I & II is very high and inverse.
iii) Where ranks are repeated
Example 9: The sales statistics of 6 sales representatives in two different localities. Find whether there is a relationship between buying habits of the people in the localities.
Representative 1 2 3 4 5 6
Locality I 70 40 65 110 60 20
Locality II 70 30 80 100 90 20
Solution:
Representative Sales in Locality I R1
Sales in locality II R2 D = R1R2 D 2
1 2 4 2 4 2 5 5 0 0 3 3 3 0 0 4 1 1 0 0
ρ= 1 = 1 1 – 9 = 0.9
ρ = 1 = 1 = 0.768
Statistics For Management Unit 12
Sikkim Manipal University 191
5 4 2 2 4 6 6 6 0 0
0 8
6 x 8 8
6 x (6 2 – 1) 35
There is high positive correlation between buying habits of the locality people.
iii When Ranks are repeated Example 10
Find rank correlation coefficient for the following data.
Student A B C D E F G H I J Score on Test I 20 30 22 28 32 40 20 16 14 18
Score on Test II 32 32 48 36 44 48 28 20 24 28
Student Score on Test I
Score on Test II
Rank Of Test I
Rank on Test II
Difference between Ranks
Difference squared
X Y R1 R2 D D2 A 20 32 6.5 5.5 0 1.00 B 30 32 3 5.5 2.5 6.25 C 22 48 5 1.5 3.5 12.25 D 28 36 4 4 0 0 E 32 44 2 3 1.0 1.00 F 40 48 1 1.5 0.5 0.25 G 20 28 6.5 7.5 1.0 1.00 H 16 20 9 10 1.0 1.00 I 14 24 10 9 1.0 1.00 J 18 28 8 7.5 0.5 0.25
N = 10 ∑D 2 = 24
[6∑D 2 + 1/12 (m1 3 – m1) + 1/12 (m2
3 – m2) + 1/12 (m3 3 – m3) + 1/12 (m4
3 – m4)]
N 3 – N
mi represents the number of times a rank is repeated