Chapter 3

1

Data Description

Chapter 3

2

Through this chapter you will learn Measure of Central tendency Measure of Dispersion Measure of Position

3

A statistic is a characteristic or measure obtained by using the data values from a sample.

A parameter is a characteristic or measure obtained by using all the data values for a specific population.

4

Population Arithmetic Mean

X

N

X : Each value, N: Total number of values in the population

5

Sample Arithmetic Mean

XX

nX: Each value in the samplen: Total number of observations in the

sample (sample size)

6

Example 1Find the mean of the following sample data

7 4 8 8 10 12 12 .

X= 7+4+8+8+10+12+12 = 61

618.71

7

Xx

n

7

Estimate the Mean of a Grouped Data into a Frequency Distribution

f frequency of each class

Xm class midpoint of each class

n Total number of frequencies

mf X

Xn

8

Example 2Given a frequency distribution

Estimate the mean.

Class boundaries Frequency5.5-10.5 1

10.5-15.5 215.5-20.5 320.5-25.5 525.5-30.5 430.5-35.5 335.5-40.5 2

9

Example 2 (Cont.)

Class Frequency Midpoint

5.5 - 0.5 1 8 8

10.5 - 15.5 2 13 2615.5 - 20.5 3 18 5420.5 - 25.5 5 23 11525.5 - 30.5 4 28 11230.5 - 35.5 3 33 9935.5 - 40.5 2 38 76

f mfX

total n =f= 20 f Xm= 490

10

Example 2 (Cont.)

490

24.520

mf XX

n

11

Median

A median is the midpoint of the data array.

Steps in finding the median of a data array: Step1: Arrange the data in order Step2: Select the midpoint of the

array as the median.

12

Example 3

Find the median of the scores 7 2 3 7 6 9 10 8 9 9 10.

Arrange the data in order to obtain

2 3 6 7 7 8 9 9 9 10 10

We have 11 values. 8 is the exact middle value and hence it is the median.

13

Example 4

Find the median of the scores 7 2 3 7 6 9 10 8 9 9

Arrange the data in order

2 3 6 7 7 8 9 9 9 10

With these ten scores, no single score is at the exact middle. Instead, the two scores of 7 and 8 share the middle. We therefore find the mean of these two scores.

14

Example 4 (Cont.)

the median is 7.5.

7 87.5

2

15

The Estimate of Data Grouped into a Frequency Distribution

2Median

nCF

LB Wf

16

LB Lower boundary of the median class

n Total # of frequencies

f frequency of the median class

CF Cumulative frequency of the class preceding the median class.

w class width

17

Example 5Given the frequency distribution as below. Estimate the median.

Class Frequency30-39 440-49 650-59 860-69 1270-79 980-89 790-99 4

18

Example 5First find the cumulative frequency

Class Frequency CF30-39 4 440-49 6 1050-59 8 1860-69 12 3070-79 9 3980-89 7 4690-99 4 50

19

Example 5w = 10, n = 50, and hence, n/2=25. The median falls in the class 60-69 ( 59.5-69.5)

2Median

25 1859.5 10 65.33

12

nCF

LB Wf

20

Example 6Estimate the median for the frequency distribution below

Class Frequency80-89 590-99 9

100-109 20110-119 8120-129 6130-139 2

21

Modeo For grouped data into a frequency

distribution, the estimate of mode can be the class midpoint of the modal class ( the class with the highest frequency)

o It can also be found by the formula

1

1 2

dMode LB w

d d

22

whereo LB Lower boundary of the modal classo W class widtho d1 difference between class frequency of

the modal class and that of the class preceding it.

o d2 difference between class frequency of the modal class and that of the class right after it.

23

Example 7

AClass

BFrequency ( f )

5.5-10.5 110.5-15.5 215.5-20.5 3 20.5-25.5 525.5-30.5 430.5-35.5 335.5-40.5 2

Estimate the mode of the below distribution

Modal class

24

Example 7 (cont.) LB = 20.5

W = 5

d1= 5 - 3 =2

d2 = 5 – 4=1

2

Mode 20.5 5 23.832 1

25

The Midrange

lowest value highest valueMR

2

26

Example 8

The midrange of this data set: 2, 3, 6, 8, 4, 1 is

MR=(8+1)/2=4.5

27

The Weighted Mean

Xi : the values

Wi : the weights

n

i i1 1 2 2 n n i 1

n1 2 n

ii 1

w Xw X w X w X

Xw w w

w

28

Example 8 A student obtained 40, 50, 60, 80, and 45 marks in the subjects of Math, Statistics, Physics, Chemistry and Biology respectively. Assuming weights 5, 2, 4, 3, and 1 respectively for the above mentioned subjects. Find Weighted Arithmetic Mean per subject.

29

Example 8 (cont.)

Subjects Marks

Obtained Weight wxMath 40 5 200Statistics 50 2 100Physics 60 4 240Chemistry 80 3 240Biology 55 1 55Total 15 835

30

Example 8 (cont.)

835x 55.667marks / subject

15

31

Distribution Shapes

Mode Median Mean

a Positively skewed or right-skewed

y

x

32

Distribution Shapes (cont.)

b Negatively skewed or left-skewedModeMedianMean

x

y

33

Distribution Shapes (cont.)

Mean = Median = Mode

x

y

34

Range

The range is the highest value minus the lowest value. The symbol R is used for the range.

highest value lowest valueR

35

Mean Deviation

Mean DeviationX X

n

36

Example 9

The number of patients seen in the emergency room in a hospital for a sample of 5 days last year was: 103, 97, 101, 106, and 103. Determine the mean deviation and interpret.

37

Example 9

First find the arithmetic mean

103 97 101 106 103X 102

5

38

Example 9 (Cont.)

Number of cases Deviation Absolute

Deviation103 103 - 102= 1 197 97 - 102= -5 5

101 101 - 102= -1 1106 106 - 102= 4 4103 103 - 102= 1 1

Total 12

39

Example 9 (Cont.)

X X 12

MD 2.4n 5

Hence the mean deviation is 2.4 patients per day. The number of patients deviates, on average, by 2.4 patients from the mean of 102 patients per day.

40

Example 10

The weight of a group of crates being shipped to Ireland is (in pounds)

95, 103, 105, 110, 104, 105, 112, and 90.

a) What is the range of the weights?

b) Compute the arithmetic mean weight. c) Compute the mean deviation of the weights. (answer: a) 22, b) 103, c) 5.25 pounds)

41

Population Variance and Standard Deviation

2

2 X

N

2

X

N

Remember: Standard deviation is the positive square root of variance.

42

Example 11Find the variance and standard deviation for the population data: 35, 45, 30, 35, 40, 25

Solution

First find the arithmetic mean

X= 35+ 45+ 30+ 35+40+25=210

= 210/6 = 35

then construct the table

43

Example 11(cont.)

X35 0 045 10 10030 -5 2535 0 040 5 2525 -10 100

X 2

X

44

Example 11(cont.)

2

2 X 25041.7

N 6

The population variance is

The population standard deviation is

2

X41.7 6.5

N

45

Sample Variance and Standard DeviationSample Variance (Conceptual formula)

Sample Variance (Computational formula)

2

2

1

X Xs

n

22

2

1

X X ns

n

46

Sample Variance and Standard Deviation (Cont.)Sample Standard Deviation (Conceptual

formula)

Sample Standard Deviation (Computational formula)

2

1

X Xs

n

22

1

X X ns

n

47

Example 12

Find the sample variance and standard deviation for the amount of European auto sales for a sample of 6 years shown. The data are in millions of dollars.

11.2, 11.9, 12.0, 12.8, 13.4, 14.3

48

Example 12 (Cont.)Method 1Find the mean : 12.6

X11.20 -1.40 1.9611.90 -0.70 0.4912.00 -0.60 0.3612.80 0.20 0.0413.40 0.80 0.6414.30 1.70 2.89

x 2x

Total= 6.38

49

Example 12 (Cont.)Method 1The variance is defined by

and hence, the standard deviation is

2 6.381.28

6 1s

1.28 1.13s

50

Example 12 (Cont.)Method 2We compute X= 11.2+11.9+12.0+12.8+13.4+14.3 =75.6X2= 11.22 +11.92 +12.02 +12.82

+13.42 +14.32 =958.94The variance is computed by

Standard deviation is 1.13

2

2958.94 75.6 6

1.285

s

51

Example 13

Suppose the number of minutes you spent for traveling to school on last 7 days are9, 12, 9, 15, 10, 11, 15. Find the variance of the number of minutes by the two formula.

52

Variance and Standard Deviation for Grouped Data

22

2

1m mf X f X n

sn

f : class frequencyXm : class midpoint (class mark)n : Total number of frequencies

53

Example

Find the variance and the standard deviation for the frequency distribution of the data representing the number of miles that 20 runners ran during one week.

54

Example 14 (cont.)Class Frequency

5.5-10.5 110.5-15.5 215.5-20.5 320.5-25.5 525.5-30.5 430.5-35.5 335.5-40.5 2

55

Example 14 (cont.)Class

BoundaryFreq.

fMidpoint

Xm

5.5-10.5 1 8 8 6410.5-15.5 2 13 26 33815.5-20.5 3 18 54 97220.5-25.5 5 23 115 264525.5-30.5 4 28 112 313630.5-35.5 3 33 99 326735.5-40.5 2 38 76 2888

mf X 2mf X

490 13310

56

Example 14 (cont.)

22 13310 490 20

20 168.7

s

Hence, the variance is

and the standard deviation is 8.3

57

Coefficient of VariationThe coefficient of variation is the standard deviation divided by the mean. The result is expressed as a percentage.

CVar 100%s

X

CVar 100%

58

Example 15

The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5. The mean of the commissions is $5225, and the standard deviation is $773. Compare the variations of the two.

59

Example 15 (Cont.)

Sales

Commissions

Since the coefficient of variation is larger for commission, the commissions are more variable than the sales.

5CVar 100% 5.7%

87

s

X

773CVar 100% 14.8%

5225

60

Example 16The mean for the number of pages of women’s fitness magazines is 132, with a variance of 23; the mean for the number of advertisements of a sample of women’s fitness magazines is 182, with a variance of 62. Compare the variances.(answer: 3.6% pages, 4.3% advertisements)

61

Chebyshev’s theoremThe proportion of values from a data set that will fall within k standard deviations of the mean will be at least 1-1/k2, where k is a number greater than 1 (k is not necessarily an integer).

62

Chebyshev’s theorem

3X s 3X s2X s 2X sX

At least

88.89%

At least

75%

63

Example 17

The mean price of houses in a certain neighborhood is $50,000, and the standard deviation is $10, 000. Find the price range for which at least 75% of the houses will sell.

64

Example 17 (Cont.)Chebyshev’s theorem states that three-fourths, or 75%, of the data values will fall within 2 standard deviations of the mean. Thus,

and

Hence, at least 75% of all homes sold in the area will have a price range from $30,000 to $70,000.

$50,000 2 $10,000 $70,000

$50,000 2 $10,000 $30,000

65

Example 18A survey of local companies found that the mean amount of travel allowance for executives was $0.25 per mile. The standard deviation was $ 0.02. Using Chebyshev’s theorems find the minimum percentage of the data values that will fall between $0.20 and $0.30.

66

The Empirical (Normal) Rule

Chebyshev’s theorem applies to any distribution regardless of its shape. However, when a distribution is bell-shaped (or what is called normal), the following statements, which make up the empirical rule, are true.

67


Approximately 68% of the data values will fall within 1 standard deviation of the mean.

Approximately 95% of the data values will fall within 2 standard deviations of the mean.

Approximately 99.7% (almost all) of the data values will fall within 3 standard deviations of the mean.

68


3X s 2X s 1X s X 1X s 2X s 3X s

68%

95%

99.7%

69

Measures of PositionStandard ScoresA z score or standard score for a value is obtained by subtracting the mean from the value and dividing the result by the standard deviation. The symbol for a standard score is z.

value mean

standard deviationz

70

Measures of PositionStandard ScoresThe z score represents the number of standard deviations that a data value falls above or below the mean.

71

Example 19 A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative position on the two tests.

72

Example 19 (Cont.)For calculus, the z score is

For history the z score is

Since the z score for calculus is larger, her relative position in the calculus class is higher than her relative position in the history class.

65 501.5

10

X Xz

s

30 251.0

5

X Xz

s

73

PercentilesPercentiles divide the data set into 100 equal groups.

There are several mathematical methods for computing percentiles for data. These methods can be used to find approximate percentile rank of a data value or to find a data value corresponding to a given percentile.

74

Find a Percentile Rank Corresponding to a ValueThe percentile corresponding to a given value X is computed by using the following formula

#of values 0.5

below Percentile 100%

total#of value

X

75

Example 20

A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile rank of a score of 12.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

76

Example 20 (Cont.)Arrange the data in order from lowest to highest

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

Thus, a student whose score was 12 did better than 65% of the class.

6 0.5Percentile 100%

1065th percentile

77

Finding a Data Value Corresponding to a Given PercentileoArrange the data in order from lowest to highest. oCompute c=(np)/100, where n is the total number of

observations and p the percentile.oIf c is not a whole number, round up to the next

whole number. Starting at the lowest value, count over to the number that corresponds to the rounded-up value.

oIf c is a whole number, use the value halfway between the cth and (c+1)th values when counting up from the lowest value.

78

Example 21A teacher gives a 20-point test to 10 students. The scores are shown here. find the value corresponding to the 25th percentile.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

79

Example 21 (Cont.)oArrange the data in order from lowest to

highest2, 3, 5, 6, 8, 10, 12, 15, 18, 20

o n= 10, p = 25 c= 10×25 / 100=2.5

o We round it up to get c =3. Start at the lowest values and count over to the third value, which is 5. Hence, the value 5 corresponds to the 25th percentile.

80

Example 22

A teacher gives a 20-point test to 10 students. The scores are shown here. find the value corresponding to the 60th percentile.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

81

Example (22 Cont.)oArrange the data in order from smallest to

largest2, 3, 5, 6, 8, 10, 12, 15, 18, 20

on= 10, p = 60 c= 10×60 / 100=6

oSince is a whole number, we use the value halfway between the 6th and 7th values when counting up from the lowest valueoThe 60th percentile is (10+12)/2=11.

82

QuartilesQuartiles divide the distribution into 4 equal groups, separated by Q1, Q2, and Q3.

Q1 Q2 Q3

25% 25% 25% 25%

L H

83

QuartilesQuartiles can be computed using the formula for computing percentiles. o1st quartile corresponds to 25th percentile .o2nd quartile corresponds to 50th percentile.o3rd quartile corresponds to 75th percentile.

2nd quartile = 25th percentile = median

84

Example 23

Find first quartile, second quartile and third quartile for the data set 15, 13, 6, 5, 12, 50, 22, 18.

Arrange the data in order from smallest to the largest. 5 6 12 13 15 18 22 50

85

Example 23 (Cont.)oFirst quartile = 25th percentile.

c = (825)/100=2Hence, the first quartile is equal to the second value plus the third value divided by 2. That is, Q1 = (6+12)/2=9

oSecond quartile = 50th percentilec=(8 50)/100=4Hence, Q2 =(4th value+5th value)/2

=(13+15)/2=14

86

Example 23 (Cont.)oThird quartile = 75th percentile

c=(8 75)/100=6Hence, Q3 =(6th value+7th value)/2

=(18+22)/2=20

87

o Interquartile Range: IQR Q3 Q1

oQuartile deviation: QD (Q3 Q1)/2oSemi-interquartile range is referred to

quartile deviation. oMidquartile Range : (Q3 Q1)/2

Interquartile Range, Quartile Deviation and Midquartile Range

88

oFirst quartile

oSecond quartile (Median)

oThird quartile

Quartiles of Data Grouped into a Freq. Dist.

1

/ 4n CFQ LB w

f

2

/ 2n CFQ LB w

f

3

3 / 4n CFQ LB w

f

The office manager of the Mallard Glass Co. is investigating the ages in months of the company’s PCs currently in use. The ages of 30 units selected at random were organized into a frequency distribution. Compute the quartile deviation.

Example 24

89

90

Example 24 (Cont.)Age

(in months) # of PCs

20-24 3

25- 29 5

30-34 10

35-39 7

40-44 4

45-49 1

91

Example 24 (Cont.)Age

(in months) # of PCsCumu. Freq.

20-24 3 3

25- 29 5 8

30-34 10 18

35-39 7 25

40-44 4 29

45-49 1 30

92

Example 24 (Cont.)

1

30 / 4 324.5 5 29

5Q

2

3 30 / 4 1835.5 5

738.71

Q

Hence, QD 38.7129 4.855 months

93

Example 25

The weekly income of a sample of 60 part time employees of a fast-food restaurant chain was organized into the following frequency distribution. Compute the standard deviation and quartile deviation.

94

Example 25 (Cont.)

Weekly Incomes

Number of Employees

100-149 5

150-199 9

200-249 20

250-299 18

300-349 5

350-399 3

95

Outliers An outlier is an extremely high or an

extremely low data value when compared with the rest of the data values.

An outlier can strongly affect the mean and standard deviation of a variable.

There are several ways to check a data set for outliers. One of which is shown as follows:

96

Outliers (Cont.)Step1 Arrange the data in order and find Q1

and Q3.Step2 Find the inter-quartile range:

IQR=Q3 Q1 Step3 Multiply the IQR by 1.5.Step5 Check the data set for any data value

which is smaller than Q11.5IQR or larger than Q3 1.5IQR .

97

Outliers: Example 26Check the following data set for outliers.

5, 6, 12, 13, 15, 18, 22, 50We found Q19, Q320Inter-quartile Range: IQR 20-9=11Compute the dividing points:

Q11.5IQR 91.5117.5Q3 1.5IQR 201.51136.5

The data value of 50 is greater than the upper dividing point of 36.5. So, the data value of 50 is considered an outlier.

98

Exploratory Data Analysiso In exploratory data analysis (EDA) the

data are presented graphically using a box-plot (sometimes called a box-and-whisker plot).

oThe purpose of exploratory data analysis is to examine data to find out what information can be discovered about the data such as the center and the spread.

oEDA was developed by John Tukey.

99

Exploratory Data Analysis

A box plot can be used to graphically represent the data set. These plots involve five specific values:

o The lowest value (i.e., minimum)o Q1

o Median (Q2)o Q3

o The highest value (i.e., maximum)

100

Example 27 (Box-plot)

A stockbroker recorded the number of clients she saw each day over an 11-day period the data are shown below. Construct a box plot for the data.

33, 38, 43, 30, 29, 40, 51, 27, 42, 23, 31

101

Example 27 (Box-plot)oArrange the data in order from lowest to the

highest: 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, 51

oWe obtain: the lowest value23, Q129, Median Q2 33, Q3 43, and the highest value 15.

20 25 30 35 40 45 50

23 5129 4233

102

THE END!

Chapter 3

Sports