UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution Mean, Median and mode coincide , skewness is a measure to study the aspect of a statistical distribution. If adistribution is not symmetrical,we say that it is skewed. (ii) kurtosis: Kurtosis is a measure of fitness or peakness of a distribution. (iii) Pearsons coefficient of skewness = When Mode is not well defined (iv) Pearsons coefficient of skewness = () . Bowley’s formula for measuring skewness. Bowleys coefficient of skewness= 1. In a distribution mean=65,median=70 and the coefficient of skewness is -0.6. Find the coefficient of variation. Solution: () -0.6 = () = () = =25 Coefficient variation = = 2. In a distribution the sum of the two quartiles is 78.2 and their difference is 14.3 and if it’s median is 35.7 Find the coefficient of skewness Solution: Given =78.2 =14.3 Median M=35.7 Coefficient of skewness= = =0.4755 3. Pearson’s coefficient of -0.7 and the value of the median and standard deviation are 12.8 and 6 respectively. Estimate the value of mean. Solution:
18
Embed
UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
UNIT-II
SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis.
skewness In a perfectly symmetrical distribution Mean, Median and mode coincide , skewness is a measure to study the aspect of a statistical distribution. If adistribution is not symmetrical,we say that it is skewed. (ii) kurtosis: Kurtosis is a measure of fitness or peakness of a distribution.
(iii) Pearsons coefficient of skewness =
When Mode is not well defined
(iv) Pearsons coefficient of skewness = ( )
.
Bowley’s formula for measuring skewness.
Bowleys coefficient of skewness=
1. In a distribution mean=65,median=70 and the coefficient of skewness is
-0.6. Find the coefficient of variation.
Solution: ( )
-0.6 = ( )
= ( )
=
=25
Coefficient variation =
=
2. In a distribution the sum of the two quartiles is 78.2 and their difference is 14.3 and if it’s median is 35.7 Find the coefficient of skewness Solution: Given =78.2 =14.3
Median M=35.7
Coefficient of skewness=
=
=0.4755
3. Pearson’s coefficient of -0.7 and the value of the median and standard deviation are 12.8 and 6 respectively. Estimate the value of mean. Solution:
Pearsons coefficient of skewness =-0.7,Median=12.8,S,D=6
( )
- 0.7= ( )
-1.4=Mean-Median
-1.4 = Mean-12.8 Mean=12.8-1.4 Mean=11.4
4. In a frequency distribution,the coefficient of skewness based upon quaetiles is 0.6.If the sum of the upper and lower quartiles is 100 and the median 38,Estimate the value of the upper quartile. Solution: =0.6, =100 ,M=38
=
( )
0.6 = ( )
=
( ) ---( ) Adding 1&2 2 =140 ( ) 5.Find the coefficient of skewness,If difference between two quartiles is equal to 8,sum of two quartiles is 22 and median is 10.5.
Solution: Given =22, =8 ,h=10.5
=
( )
=
=
=0.125
6. Calculate the coefficient of variation,if Karl Pearson’s coefficient of skewness is 0.42,mean is 86,and median is 80.
Solution: Given ,pearsons coefficient of Skewness =0.42
Mean=86,Median=80. S.K = ( )
⇒0.42= ( )
=> =
=42.857
Coefficient of variation =
x 100 =
7. The first four central moments of a distribution are 0,2.5,0.7 and 8.75.Write the skewness and kurtosis of the distribution.
Solution: The coefficient of skewness is given by
=
( )
( ) ,Since is positive ,the distribution is
positively Skewed.
The measure of kurtosis is given by =
=
( ) =
=3
Since =3 the distribution is normal. 8 . The Karl Pearsons coefficient of skewness of a distribution is 0.32,it’s standard deviation is 6.5 and the mean is 29.6.Calculate the mode and the median.(L3)
Solution: =0.32, =6.5 ,Mean =29.6
S.K = ( )
=> 0.32=
( )
=> 0.32x6.5 =88.8 -3 Median =>3 Median =-2.08+88.8 =86.72
=221.84 –21.2152+12.404 –0.1341=212.89 Measure of Kurtosis based on moments
= 2.33
CORRELATION. Correlation; Let X and Y be two random variables, Correlation is the measure
of co variability taking into account for the variance of X and Y. Correlation coefficient
Let X and Y be two random variables,the correlation coefficient denoted by
,is defined by ( )
√ √
( )
Types of correlation
Types of correlation: ( i) positive and negative (ii).Simple,partial and multiple (iii)Linear,non linear.
lines of regression.
Regression is a mathematical measure of average relationship between two or more variables in terms of original limits of the data. Lines of regression: The line of regression fn y on x is given by
y-
( ).
The line of regression fn x on y is given by
( )=r
( )
` Regression coefficient. A measure of assotiation between two random variables obtained as the expected value of the product of the two random variables around their Means;that is Cov( )=E( ) –E( ) ( )
1. If two regression coefficients are 0.8 and 0.6.Find coefficient of correlation?(L1)
Solution: Given =0.8, =0.6
= =( )( )=0.48
r=0.692
2. The two equations of the variable are Find the correlation coefficient between (L1)
Solution: Given that the regression equations of X&Y are X=19.13-0.87y
the regression coefficient of X onY is
The regression eqn of Y on X is the regression coefficient of YonX is
the correlation oefficient between X &Y is given by
√ = √( )( )
=
3.Calculate the coefficient of correlation between from the following data. (L3)
x 1 3 5 8 9 10
y 3 4 8 10 12 11 Solution:
x y ( ) ( ) ( )( )
1 3 5 8 9 10
3 4 8 10 12 11
-5 -3 -1 2 3 4
-5 -4 0 2 4 3
25 9 1 4 9 16
25 16 0 4 16 9
25 12 0 4 12 12
36 48 0 0 64 70 65
∑
∑
∑( )( )
√∑( ) √∑( )
√ √
4.Calculate coefficient of correlation between . (L3)
x 1 2 3 4 5 6 7 8 9
y 12 11 13 15 14 17 16 19 18 Solution:
Y ( ) ( ) ( )( )
1 2 3 4 5 6 7 8 9
12 11 13 15 14 17 16 19 18
-4 -3 -2 -1 0 1 2 3 4
-3 -4 -2 0 -1 2 1 4 3
16 9 4 1 0 1 4 9 16
9 16 4 0 1 4 1 16 9
12 12 4 0 0 2 2 12 12
45 135 0 0 60 60 56
∑( )( )
√∑( ) √∑( ) =
√ √
5.Ten competitors in a musical test were ranked by 3 judges X,Y,Z in the following order. (L2)
A B C D E F G H I J
Rank by X 1 6 5 10 3 2 4 9 7 8 Rank by Y 3 5 8 4 7 10 2 1 6 9
Rank by Z 6 4 9 8 1 2 3 10 5 7 Using rank correlation method ,Discuss which pair of judges has the nearest approach. Solution:
X y Z
1 6 5 10
3 5 8 4
6 4 9 8
-2 1 -3 6
-3 1 -1 -4
-5 2 -4 2
4 1 9 36
9 1 1 16
25 4 16 4
3 2 4 9 7 8
7 10 2 1 6 9
1 2 3 10 5 7
-4 -8 2 8 1 -1
6 8 -1 -9 1 2
2 0 1
-1 2 1
16 64 4 64 1 1
36 64 1 81 1 4
4 0 1 1 4 1
200 214 60
The rank correlation between x & y is
( ) ∑
( )
( )
( )
The rank correlation between y & z is
( ) ∑
( )
( )
( )
The rank correlation between y & z is
( ) ∑
( )
( )
( )
Since ( ) is maximum and also positive, We conclude that the pair of judges x & z has the nearest approach to common likings in music
6. From the following data, Calculate (L3) (i) The two regression equations. (ii)The coefficient of correlation between the marks in Economics and Statistics. (iii)The most likely marks in statistics when marks in Economics are 30.
Marks in Economics
25 28 35 32 31 36 29 38 34 32
Marks in Statistics
43 46 49 41 36 32 31 30 33 39
Solution:
x Y x- =x-32 y- =y-38 ( ) ( ) ( )( )
25 28 35 32 31 36
43 46 49 41 36 32
-7 -4 3 0 -1 4
5 8 11 3 -2 -6
49 16 9 0 1 16
25 64 121 9 4 36
-35 -32 33 0 2 -24
29 38 34 32
31 30 33 39
-3 6 2 0
-7 -8 -5 1
9 36 4 0
49 64 25 1
21 -48 -10 0
320 380 0 0 140 398 -93
Here ∑
&
∑
Coefficient of regression of y on x is
∑( )( )
∑( )
Coefficient of regression of x on y is
∑( )( )
∑( )
(i)Equation of the line of regression of x on y is ( )
(ie) x-32 = -0.2337(y-38) = -0.2337 y + 0.2337 38 X = -0.2337 y + 40.8806 Equation of the line of regression of y on x is ( )
(ie) y-38 = -0.6643(x-32) = -0.6643 x + 0.6643 32 y = -0.6643 x + 59.2576 (ii)Coefficient of correlation
= (-0.6643) (-0.2337) = 0.1552
r = √ (iii)When x = 30, y = ? Y = -0.6643 x + 59.2576
y = -0.6643 30 + 59.2576 y = 39.32 39
7. Find the regression equation showing the regression equation of capacity utilization on production from the following data. (L2)
r=0.62.Estimate the production when the capacity utilization is 70 percent. Solution: Let production be denoted by the variable x and capacity utilization by y Then the regression equation is given by ( ) ----------------------(1)
Where
= 0.62
= 0.5019
& = 35.6 , = 84.8 (1) y – 84.8 = 0.5019 (x-35.6)
y = 66.9324 + 0.5019 x Which is the required regression of capacity utilization on production. To find regression equation x on y is ( ) -------------------------(2)
Where
= 0.62
= 0.7659
(2) x – 35.6 = 0.7659(y-84.8) X = 35.6 + 0.7659 y – 64.9483 = 0.7659 y – 29.3483
When y = 70, x = 0.7659(70) – 29.3483 = 24.2647 Hence the estimated production is 242.647 units when the capacity utilization is 70 percent.
8. The two lines of regression are (L6) The variance of x is 9. Evaluate (i)The mean values of X and Y.
(ii)Correlation coefficient between X and Y. Solution:
(i)Since both the lines of regression passes through the mean values , The point ( ) must satisfy the two given regression lines (ie) 8 – 10 = -66 -----------------(1) 40 - 18 = 214 -----------------(2)