NUMERICAL DESCRIPTIVE STATISTICS Measures of Variability.

NUMERICAL NUMERICAL DESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICS

Measures of VariabilityMeasures of Variability

Another Description of the Data -- Variability

• For Data Set A below, the mean of the 10 observations is 2.60.

SET A: 4,2,3,3,2,2,1,4,3,2

• But each of the following two data sets with 10 observations also has a mean of 2.60

SET B: 2,2,2,2,3,3,3,3,3,3

SET C: 0,0,1,1,4,4,4,4,4,4

• Although sets A, B, an C all have the same mean, the “spread” of the data differs from set to set.

The “Spread” of the DataGrades

0

1

2

3

4

5

0 1 2 3 4

Data Set A

Grades

0

2

4

6

8

0 1 2 3 4

Data Set B

Grades

0

2

4

6

8

0 1 2 3 4

Data Set C

Most “spread”

Least “spread”

Measures of Variability

• Population– Variance 2

– Standard Deviation

• Sample– Range– Variance s2 – Standard Deviation s

The Range

• When we are talking about a sample, the range is the difference between the highest and lowest observation

• In the sample there were some A’s (4’s), and the lowest value in the sample was a D (1)– Sample range = 4 - 1 = 3

Another Approach to Variability

• The range only takes into account the two most extreme values

• A better approach– Look at the variability of all the data

• In some sense find the “average” deviation from the mean

• The value of an observation minus the mean can be positive or negative

• The plusses and minuses cancel each other out giving an average value of 0

– Need another measure

How to Average OnlyPositive Deviations

• MEAN ABSOLUTE DEVIATION (MAD)MEAN ABSOLUTE DEVIATION (MAD)– Averages the absolute values of these

differences – Used in quality control/inventory analyses– But this quantity is hard to work with

algebraically and analytically

• POPULATION VARIANCE (POPULATION VARIANCE (σσ22)) – Averages the squares of the differences from

the mean

Population Variance Formulas

N

)μ(xσ : )1(

2i2

Definition

2

2i2 μ

N

xσ : )2( Shortcut

EXAMPLECalculation of σ2

Using the numbers from the population of 2000 GPA’s: 4,2,1,3,3,3,2,… 2

92.2000

)39.22(...)39.21()39.22()39.24( 22222

92.)39.2(2000

)2(...)1()2()4( 22222

2

Standard Deviation

• But the unit of measurement for σ2 is:– Square Grade Points (???)

• What is a square grade point?

• To get back to the original units (grade points), take the square root of σ2

• STANDARD DEVIATION (STANDARD DEVIATION ()) – the square root of the variance, σ2

2

Calculation of theStandard Deviation (σ)

• For the grade point data:

959.92.

Estimating σ2

• SAMPLE VARIANCE (SAMPLE VARIANCE (ss22))– Best estimate for is 2 is s2

• s2 is found by using the sample data and using the formula for 2 except:

rdenominato in the 1)-(nby Divide

μn rather tha x Use

1-nn

xx

s :1Shortcut (2)

2

i2i

2

Sample Variance Formulas

1-n

)x(xs :Definition (1)

2i2

1-n

xnxs :2Shortcut (3)

22i2

9333.

910

2...324))2(...)3()2()4((

22222

2

s

9333.9

)6.22(...)6.23()6.22()6.24( 22222

s

Calculations for s2

• The data from the sample is: 4,2,3,3,2,2,1,4,3,2

6.2x

9333.9

)6.2(10))2(...)3()2()4(( 222222

s

Sample Standard Deviation, s

• The best estimate for is denoted: s

• It is called the sample standard deviation

• s is found by taking the square root of s2

2ss

.9661.9333s

example, For this

s2 for Grouped Data

• For the grade point example– 4 occurs 2 times– 3 occurs 3 times– 2 occurs 4 times– 1 occurs 1 time

• To calculate the sample variance, s2, rather than write the term down each time:– Multiply the squared deviations by their class

frequencies

Calculation of s2-Grouped Data

9333.

9

)6.21(1)6.22(4)6.23(3)6.24(2 22222

s

9333.

910

)1(1)2(4)3(3)4(2))1(1)2(4)3(3)4(2(

22222

2

s

9333.9

)6.2(10))1(1)2(4)3(3)4(2( 222222

s

Empirical RuleInterpreting s

(Mound Shaped Distribution)

• If data forms a mound shaped distribution– Within 1s from the mean

• Approximately 68% of the measurements

– Within 2s from the mean• Approximately 95% of the measurements

– Within 3s from the mean• Approximately all of the measurements

Chebychev’s InequalityInterpreting s

(Any Distribution)

• If data is not mound shaped ( or shape is unknown)

• Within 2s from the mean• At least 75% of the measurements

– Within 3s from the mean• At least 88.9% of the measurements

– Within ks from the mean (k > 1)• At least 1 -1/k2 of the measurements

Coefficient of Variation

• Another measure of variability that is frequently used to compare different data sets (even if measured in different units) is the: Coefficient of Variation (CV)Coefficient of Variation (CV)

CV = (Standard Deviation/Mean) x 100%CV = (Standard Deviation/Mean) x 100%

Range Approximation for σ

• If data is relatively mound-shaped a “good” approximation for s is:

σ (range)/4

Sometimes, when one is more certain that the sample range captures the entire population of data statisticians use,

σ (range)/6

Using Excel• Suppose population data is in cells A2 to A2001Population variance (2) = VARP(A2:A2001)Population standard dev. () =STDEVP(A2:A2001)

• Suppose sample data is in cells A2 to A11Sample variance (s2) =VAR(A2:A11)

Sample standard dev. (s) =STDEV(A2:A11)

• Data Analysis

Where data values are storedCheckLabels

Check both:Summary StatisticsConfidence Level

Enter Name ofOutput Worksheet

Drag to makeColumn A wider

Sample Standard DeviationSample Variance

Review• Measures of variability for Populations and

Samples– Range– Variance– Standard Deviation

• Interpretation of standard deviation– Empirical Rule for “mound-shaped” data– Chebychev’s Inequality for “other” data

• Excel– Functions– Data Analysis

NUMERICAL DESCRIPTIVE STATISTICS Measures of Variability.

Documents

data data

standard deviation s

sample variance s

mean slide

sample data

sample standard deviation

grouped data slide

calculation of s