Top Banner
Measures of Central Tendency Measures of Variations Measures of Kurtosis
65
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

y Measures of Central Tendencyy Measures of Variationsy Measures of KurtosisLearning Objectives Learning Objectivesy Distinguish between measures of central tendency, measures of variability, measures of shape, and measures of association.y Understand the meanings of mean, median, mode, quartile, percentile, and range.y Compute mean, median, mode, percentile, quartile, range, variance, standard deviation, and mean absolute deviation on ungrouped data.y Differentiate between sample and population variance and standard deviation.Properties of Quantitative data Properties of Quantitative datay The numerical value of an observation around which most entire data items show a tendency to cluster or group is called Central Tendencyy The extent to which numerical values in the data set are dispersed around the central value is called Variationy The extent of departure of these numerical values from symmetrical or normal distribution around central value is called SkewnessObjectives of Averaging Objectives of Averagingy It is useful to extract and summarize the characteristic of the entire data set in a precise formy To get a single value that describes the characteristics of the entire datay To fecilitate comparisonThe comparison of a single figure can be made either at a point of time or over a period of time y It offers a base for computing various other measures such as dispersion, skewness that help in the calculation of other phases of statistical analysisClassification Classificationy 1.Mathematical Averagesa. Arithmetic Mean simple Weightedb. Geometric Meanc.Harmonic Mean Classification Classificationy 2.Common Measures of Location or based on position or averages of position Median Quartiles Deciles Percentiles ModeAdvantages of Arithmetic Mean Advantages of Arithmetic Meany Every data set has one and only one value for Arithmetic Mean ie, it is uniquey Calculation of AM is based on all the observations in the data sety It is a single reliable value that reflects all values in the data sety The arithmetic mean is least affected by fluctuations in the sample sizeProperties of A.M Properties of A.MThe algebraic sum of deviations of all observations from A.M is always zero.The difference between mean and the data values is always zero for a grouped data. Due to this property the mean is characterized as a point of balanceie sum of positive deviations from the mean is always equal to the sum of negative deviationsy The sum of the squares of the deviations of all the observations from A.M is less than the sum of the squares of all the observations from any other quantityThis property of A.M is also known as least square property which is useful for defining the concept of standard deviationy It is possible to calculate the pooled or combined arithmetic mean of two or more than two sets of data of the same naturey While computing the data for A.M, it is possible that we may wrongly read the values of observation. In such a case the corrected value can be calculated by subtracting the sum of observations wrongly recorded from total of all observation and then adding the sum of the correct observations to it. Disadvantages Disadvantagesy The value of A.M cannot be calculated accurately for open-ended class intervals or for unequal class intervals either at the beginning or end of the given frequency distributiony Mean cannot be obtained from Graphically as in the case of Median or Modey Mean is unduly affected by extreme values ie very small as well as very large values in the data sety Mean is not a good average always. Sometimes it gives values which are not practical to usey ex: Mean of babies born in a hospital in a day=10.25y Many a times the mean value is not present in the data unlike Median and Modey Sometimes Mean may lead to false conclusionsEx:ScoreMatches Team A Team B1 20 42 30 603 20 104 10 6Total 20 20Mean 20 20y It cannot be calculated for qualitative characteristics such as intelligence,honesty, loyalty etMode Modey The most frequently occurring value in a data sety Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)y Bimodal -- Data sets that have two modesy Multimodal -- Data sets that contain more than two modesModeMode -- -- Example Exampley The mode is 44.y There are more 44s than any other value.353737394040414143434343444444444445454646464648Median Mediany Middle value in an ordered array of numbers.y Applicable for ordinal, interval, and ratio datay Not applicable for nominal datay Unaffected by extremely large and extremely small values.Median:Computational Procedure Median:Computational Procedurey First Procedure Arrange the observations in an ordered array. If there is an odd number of terms, the median is the middle term ofthe ordered array. If there is an even number of terms, the median is the average of the middle two terms.y Second Procedure The medians position in an ordered array is given by (n+1)/2.Median:Example Median:Examplewith an Odd Number of Terms with an Odd Number of TermsOrdered Array3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22y There are 17 terms in the ordered array.yPosition of median = (n+1)/2 = (17+1)/2 = 9y The median is the 9th term, 15.y If the 22 is replaced by 100, the median is 15.y If the 3 is replaced by -103, the median is 15.Median:Example with an Even Number of TermsOrdered Array3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 There are 16 terms in the ordered array. Position of median = (n+1)/2 = (16+1)/2 = 8.5 The median is between the 8th and 9th terms, 14.5. If the 21 is replaced by 100, the median is 14.5. If the 3 is replaced by -88, the median is 14.5.Arithmetic Mean Arithmetic Meany Commonly called the meany is the average of a group of numbersy Applicable for interval and ratio datay Not applicable for nominal or ordinal datay Affected by each value in the data set, including extreme valuesy Computed by summing all values in the data set and dividing the sum by the number of values in the data setPopulation Mean Population Mean = =+ + + +=+ + + +==XN NX X X XN 1 2 324 13 19 26 11593518 6....Sample Mean Sample MeanXXn nX X X Xn= =+ + + +=+ + + + +==1 2 357 86 42 38 90 666379663 167....Percentiles PercentilesyMeasures of central tendency that divide a group of data into 100 partsy At least n% of the data lie below the nthpercentile, and at most (100 - n)% of the data lie above the nth percentiley Example:90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data lie above ity The median and the 50th percentile have the same value.y Applicable for ordinal, interval, and ratio datay Not applicable for nominal dataPercentiles: Computational Procedure Percentiles: Computational Procedurey Organize the data into an ascending ordered array.y Calculate the percentile location: y Determine the percentiles location and its value.y If iis a whole number, the percentile is the average of the values at the i and (i+1) positions.y If iis not a whole number, the percentile is at the (i+1) position in the ordered array.iPn !100( )Percentiles: Example Percentiles: Exampley Raw Data:14, 12, 19, 23, 5, 13, 28, 17y Ordered Array:5, 12, 13, 14, 17, 19, 23, 28yLocation of 30th percentile:y The location index, i, is not a whole number; i+1 = 2.4+1=3.4; the whole number portion is 3; the 30th percentile is at the 3rd location of the array; the 30th percentile is 13.i ! !301008 2 4 ( ) .Quartiles Quartilesy Measures of central tendency that divide a group of data into four subgroupsy Q1: 25% of the data set is below the first quartiley Q2: 50% of the data set is below the second quartiley Q3: 75% of the data set is below the third quartiley Q1is equal to the 25th percentiley Q2is located at 50th percentile and equals the mediany Q3is equal to the 75th percentiley Quartile values are not necessarily members of the data setQuartiles Quartiles25% 25% 25% 25%Q3Q2Q1Quartiles:Example Quartiles:Exampley Ordered array:106, 109, 114, 116, 121, 122, 125, 129y Q1y Q2:y Q3:i Q ! ! !

!251008 2109 11421115 1 ( ) .i = = =+=501008 4116 12121185 2 ( ) .i = = =+=751008 6122 12521235 3 ( ) .Variability VariabilityMeanMeanMeanNo Variability in Cash FlowVariability in Cash Flow MeanVariability VariabilityNo VariabilityVariabilityMeasures of Variability:Measures of Variability: Ungrouped Data Ungrouped Datay Measures of variability describe the spread or the dispersion of a set of data.y Common Measures of Variability Range Interquartile Range Mean Absolute Deviation Variance Standard Deviation Z scores Coefficient of VariationRange Rangey The difference between the largest and the smallest values in a set of datay Simple to computey Ignores all data points exceptthe two extremesy Example: Range= Largest - Smallest = 48 - 35 = 13353737394040414143434343444444444445454646464648Interquartile Range Interquartile Rangey Range of values between the first and third quartilesy Range of the middle half yLess influenced by extremesInterquartileRange Q Q !3 1Deviation from the Mean Deviation from the Meany Data set:5, 9, 16, 17, 18y Mean: y Deviations from the mean: -8, -4, 3, 4, 5 Q ! ! !XN655130 5 10 15 20-8-4+3+4+5Mean Absolute Deviation Mean Absolute Deviationy Average of the absolute deviations from the mean59161718-8-4+3+4+50+8+4+3+4+524XX Q XQM A DXN. . ..=

==2454 8Population Variance Population Variancey Average of the squared deviations from the arithmetic mean59161718-8-4+3+4+50641691625130XX )2X )22130526 0W=== XN.Population Standard Deviation Population Standard Deviationy Square root of the variance )222130526 026 05 1WWW====== XN...59161718-8-4+3+4+50641691625130XXQ )2X Sample Variance Sample Variancey Average of the squared deviations from the arithmetic mean2,3981,8441,5391,3117,09262571-234-4620390,6255,04154,756213,444663,866XX X )2X X )221663 8663221 288 67SX Xn=

== ,, .Sample Standard Deviation Sample Standard Deviationy Square root of the sample variance )2221663 8663221 288 67221 288 67470 41SX XSnS!

!!!!! ,, ., ..2,3981,8441,5391,3117,09262571-234-4620390,6255,04154,756213,444663,866XX X )2X X Uses of Standard Deviation Uses of Standard Deviationy Indicator of financial risky Quality Control construction of quality control charts process capability studiesy Comparing populations household incomes in two cities employee absenteeism at two plantsStandard Deviation as anStandard Deviation as an Indicator of Financial Risk Indicator of Financial RiskAnnualized Rate of ReturnFinancialSecurityQWA 15% 3%B 15% 7%Empirical Rule Empirical Ruley Data are normally distributed (or approximately normal)Qs 1Qs 2Qs 39599.768Distance from the MeanPercentage of ValuesFalling Within DistanceChebyshevs Theorem Chebyshevs Theoremy Applies to all distributionsP k X kkfor( ) W W + >112 k >1Chebyshevs Theorem Chebyshevs Theoremy Applies to all distributions W 4 W 2 W 31-1/32= 0.891-1/22 = 0.75Distance from the MeanMinimum Proportionof Values Falling Within DistanceNumber ofStandardDeviationsK = 2K = 3K = 41-1/42= 0.94Coefficient of Variation Coefficient of Variationy Ratio of the standard deviation to the mean, expressed as a percentagey Measurement of relative dispersion )C V . . !WQ100Coefficient of Variation Coefficient of Variation ) )1294 61004 62910015861111WW=====.... . CV ) )28410100108410011902222WW=====CV . ..Measures of Central TendencyMeasures of Central Tendency and Variability:Grouped Data and Variability:Grouped Datay Measures of Central Tendency Mean Median Modey Measures of Variability Variance Standard DeviationMean of Grouped Data Mean of Grouped Datay Weighted average of class midpointsy Class frequencies are the weights ===+ + + ++ + ++fMffMNf M f M f M f Mf f f fi ii1 1 2 2 3 31 2 3Calculation of Grouped Mean Calculation of Grouped MeanClass Interval Frequency Class Midpoint fM20-under 30 6 25 15030-under 40 18 35 63040-under 50 11 45 49550-under 60 11 55 60560-under 70 3 65 19570-under 80 1 757550 2150Q! ! !fMf21505043 0 .Median of Grouped Data Median of Grouped Data )Median LNcffWWherepmed= +

=2:L the lower limit of the median classcf =cumulative frequency of class preceding the median classf =frequency of the median classW=width of the median classN=total of frequenciespmedMedian of Grouped DataMedian of Grouped Data -- -- Example ExampleCumulativeClass Interval Frequency Frequency20-under 30 6 630-under 40 18 2440-under 50 11 3550-under 60 11 4660-under 70 3 4970-under 80 1 50N = 50 ) )Md LNcffWpmed= +

= +

=24050224111040 909 .Mode of Grouped Data Mode of Grouped Datay Midpoint of the modal classy Modal class has the greatest frequencyClass Interval Frequency20-under 30 630-under 40 1840-under 50 1150-under 60 1160-under 70 370-under 80 1Mode=+=30 40235Variance and Standard DeviationVariance and Standard Deviation of Grouped Data of Grouped Data )222WWW== fNMPopulation )2221SM XSfnS=

= SamplePopulation Variance and StandardPopulation Variance and Standard Deviation of Grouped Data Deviation of Grouped Data1944115244158414521024720020-under 3030-under 4040-under 5050-under 6060-under 7070-under 80Class Interval61811113150f253545556575M150630495605195752150fM-18-82122232MQ )fM2 3246441444841024 )2M )22720050144W! ! ! fNMWW= = =2144 12Measures of Shape Measures of Shapey Skewness Absence of symmetry Extreme values in one side of a distributionyKurtosis Peakedness of a distribution Leptokurtic: high and thin Mesokurtic: normal shape Platykurtic:flat and spread outyBox and Whisker Plots Graphic display of a distribution Reveals skewnessSkewness SkewnessNegativelySkewedPositivelySkewedSymmetric(Not Skewed)Skewness SkewnessNegativelySkewedModeMedianMeanSymmetric(Not Skewed)MeanMedianModePositivelySkewedModeMedianMeanCoefficient of Skewness Coefficient of Skewnessy Summary measure for skewnessy If S < 0, the distribution is negatively skewed(skewed to the left).y If S = 0, the distribution is symmetric (not skewed).y If S > 0, the distribution is positively skewed(skewed to the right). )SMd= 3WCoefficient of Skewness Coefficient of Skewness ) )1111111232612 333 23 2612 3073WW====

=

= MSMdd... ) )2222222262612 333 26 2612 30WW====

=

=MSMdd.. ) )3333333292612 333 29 2612 3073WW====

=

= +MSMdd...Kurtosis KurtosisyPeakedness of a distributionLeptokurtic:high and thin Mesokurtic:normal in shapePlatykurtic:flat and spread outLeptokurticMesokurticPlatykurticBox and Whisker Plot Box and Whisker Ploty Five secific values are used: Median,Q2 First quartile,Q1 Third quartile,Q3 Minimum value in the data set Maximum value in the data sety Inner Fences IQR = Q3- Q1 Lower inner fence = Q1- 1.5 IQR Upper inner fence = Q3+ 1.5 IQRy Outer Fences Lower outer fence = Q1- 3.0 IQR Upper outer fence = Q3+ 3.0 IQRBox and Whisker Plot Box and Whisker PlotQ1Q3Q2Minimum MaximumSkewness:Box and Whisker Plots,Skewness:Box and Whisker Plots, and Coefficient of Skewness and Coefficient of SkewnessNegativelySkewedPositivelySkewedSymmetric(Not Skewed)S < 0S = 0S > 0