Top Banner
Data Description Chapter 3 1
102

Chapter 3

Jan 27, 2015

Download

Sports

Mong Mara

Measures of central tendency, measures of dispersion, and Measures of Location
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 3

1

Data Description

Chapter 3

Page 2: Chapter 3

2

Through this chapter you will learn Measure of Central tendency Measure of Dispersion Measure of Position

Page 3: Chapter 3

3

A statistic is a characteristic or measure obtained by using the data values from a sample.

A parameter is a characteristic or measure obtained by using all the data values for a specific population.

Page 4: Chapter 3

4

Population Arithmetic Mean

X

N

X : Each value, N: Total number of values in the population

Page 5: Chapter 3

5

Sample Arithmetic Mean

XX

nX: Each value in the samplen: Total number of observations in the

sample (sample size)

Page 6: Chapter 3

6

Example 1Find the mean of the following sample data

7 4 8 8 10 12 12 .

X= 7+4+8+8+10+12+12 = 61

618.71

7

Xx

n

Page 7: Chapter 3

7

Estimate the Mean of a Grouped Data into a Frequency Distribution

f frequency of each class

Xm class midpoint of each class

n Total number of frequencies

mf X

Xn

Page 8: Chapter 3

8

Example 2Given a frequency distribution

Estimate the mean.

Class boundaries Frequency5.5-10.5 1

10.5-15.5 215.5-20.5 320.5-25.5 525.5-30.5 430.5-35.5 335.5-40.5 2

Page 9: Chapter 3

9

Example 2 (Cont.)

Class Frequency Midpoint

5.5 - 0.5 1 8 8

10.5 - 15.5 2 13 2615.5 - 20.5 3 18 5420.5 - 25.5 5 23 11525.5 - 30.5 4 28 11230.5 - 35.5 3 33 9935.5 - 40.5 2 38 76

f mfX

total n =f= 20 f Xm= 490

Page 10: Chapter 3

10

Example 2 (Cont.)

490

24.520

mf XX

n

Page 11: Chapter 3

11

Median

A median is the midpoint of the data array.

Steps in finding the median of a data array: Step1: Arrange the data in order Step2: Select the midpoint of the

array as the median.

Page 12: Chapter 3

12

Example 3

Find the median of the scores 7 2 3 7 6 9 10 8 9 9 10.

Arrange the data in order to obtain

2 3 6 7 7 8 9 9 9 10 10

We have 11 values. 8 is the exact middle value and hence it is the median.

Page 13: Chapter 3

13

Example 4

Find the median of the scores 7 2 3 7 6 9 10 8 9 9

Arrange the data in order

2 3 6 7 7 8 9 9 9 10

With these ten scores, no single score is at the exact middle. Instead, the two scores of 7 and 8 share the middle. We therefore find the mean of these two scores.

Page 14: Chapter 3

14

Example 4 (Cont.)

the median is 7.5.

7 87.5

2

Page 15: Chapter 3

15

The Estimate of Data Grouped into a Frequency Distribution

2Median

nCF

LB Wf

Page 16: Chapter 3

16

LB Lower boundary of the median class

n Total # of frequencies

f frequency of the median class

CF Cumulative frequency of the class preceding the median class.

w class width

Page 17: Chapter 3

17

Example 5Given the frequency distribution as below. Estimate the median.

Class Frequency30-39 440-49 650-59 860-69 1270-79 980-89 790-99 4

Page 18: Chapter 3

18

Example 5First find the cumulative frequency

Class Frequency CF30-39 4 440-49 6 1050-59 8 1860-69 12 3070-79 9 3980-89 7 4690-99 4 50

Page 19: Chapter 3

19

Example 5w = 10, n = 50, and hence, n/2=25. The median falls in the class 60-69 ( 59.5-69.5)

2Median

25 1859.5 10 65.33

12

nCF

LB Wf

Page 20: Chapter 3

20

Example 6Estimate the median for the frequency distribution below

Class Frequency80-89 590-99 9

100-109 20110-119 8120-129 6130-139 2

Page 21: Chapter 3

21

Modeo For grouped data into a frequency

distribution, the estimate of mode can be the class midpoint of the modal class ( the class with the highest frequency)

o It can also be found by the formula

1

1 2

dMode LB w

d d

Page 22: Chapter 3

22

whereo LB Lower boundary of the modal classo W class widtho d1 difference between class frequency of

the modal class and that of the class preceding it.

o d2 difference between class frequency of the modal class and that of the class right after it.

Page 23: Chapter 3

23

Example 7

AClass

BFrequency ( f )

5.5-10.5 110.5-15.5 215.5-20.5 3 20.5-25.5 525.5-30.5 430.5-35.5 335.5-40.5 2

Estimate the mode of the below distribution

Modal class

Page 24: Chapter 3

24

Example 7 (cont.) LB = 20.5

W = 5

d1= 5 - 3 =2

d2 = 5 – 4=1

2

Mode 20.5 5 23.832 1

Page 25: Chapter 3

25

The Midrange

lowest value highest valueMR

2

Page 26: Chapter 3

26

Example 8

The midrange of this data set: 2, 3, 6, 8, 4, 1 is

MR=(8+1)/2=4.5

Page 27: Chapter 3

27

The Weighted Mean

Xi : the values

Wi : the weights

n

i i1 1 2 2 n n i 1

n1 2 n

ii 1

w Xw X w X w X

Xw w w

w

Page 28: Chapter 3

28

Example 8  A student obtained 40, 50, 60, 80, and 45 marks in the subjects of Math, Statistics, Physics, Chemistry and Biology respectively. Assuming weights 5, 2, 4, 3, and 1 respectively for the above mentioned subjects. Find Weighted Arithmetic Mean per subject.

Page 29: Chapter 3

29

Example 8 (cont.)

Subjects Marks

Obtained Weight wxMath 40 5 200Statistics 50 2 100Physics 60 4 240Chemistry 80 3 240Biology 55 1 55Total 15 835

Page 30: Chapter 3

30

Example 8 (cont.)

835x 55.667marks / subject

15

Page 31: Chapter 3

31

Distribution Shapes

Mode Median Mean

a Positively skewed or right-skewed

y

x

Page 32: Chapter 3

32

Distribution Shapes (cont.)

b Negatively skewed or left-skewedModeMedianMean

x

y

Page 33: Chapter 3

33

Distribution Shapes (cont.)

Mean = Median = Mode

x

y

Page 34: Chapter 3

34

Range

The range is the highest value minus the lowest value. The symbol R is used for the range.

highest value lowest valueR

Page 35: Chapter 3

35

Mean Deviation

Mean DeviationX X

n

Page 36: Chapter 3

36

Example 9

The number of patients seen in the emergency room in a hospital for a sample of 5 days last year was: 103, 97, 101, 106, and 103. Determine the mean deviation and interpret.

Page 37: Chapter 3

37

Example 9

First find the arithmetic mean

103 97 101 106 103X 102

5

Page 38: Chapter 3

38

Example 9 (Cont.)

Number of cases Deviation Absolute

Deviation103 103 - 102= 1 197 97 - 102= -5 5

101 101 - 102= -1 1106 106 - 102= 4 4103 103 - 102= 1 1

Total 12

Page 39: Chapter 3

39

Example 9 (Cont.)

X X 12

MD 2.4n 5

Hence the mean deviation is 2.4 patients per day. The number of patients deviates, on average, by 2.4 patients from the mean of 102 patients per day.

Page 40: Chapter 3

40

Example 10

The weight of a group of crates being shipped to Ireland is (in pounds)

95, 103, 105, 110, 104, 105, 112, and 90.

a) What is the range of the weights?

b) Compute the arithmetic mean weight. c) Compute the mean deviation of the weights. (answer: a) 22, b) 103, c) 5.25 pounds)

Page 41: Chapter 3

41

Population Variance and Standard Deviation

2

2 X

N

2

X

N

Remember: Standard deviation is the positive square root of variance.

Page 42: Chapter 3

42

Example 11Find the variance and standard deviation for the population data: 35, 45, 30, 35, 40, 25

Solution

First find the arithmetic mean

X= 35+ 45+ 30+ 35+40+25=210

= 210/6 = 35

then construct the table

Page 43: Chapter 3

43

Example 11(cont.)

X35 0 045 10 10030 -5 2535 0 040 5 2525 -10 100

X 2

X

Page 44: Chapter 3

44

Example 11(cont.)

2

2 X 25041.7

N 6

The population variance is

The population standard deviation is

2

X41.7 6.5

N

Page 45: Chapter 3

45

Sample Variance and Standard DeviationSample Variance (Conceptual formula)

Sample Variance (Computational formula)

2

2

1

X Xs

n

22

2

1

X X ns

n

Page 46: Chapter 3

46

Sample Variance and Standard Deviation (Cont.)Sample Standard Deviation (Conceptual

formula)

Sample Standard Deviation (Computational formula)

2

1

X Xs

n

22

1

X X ns

n

Page 47: Chapter 3

47

Example 12

Find the sample variance and standard deviation for the amount of European auto sales for a sample of 6 years shown. The data are in millions of dollars.

11.2, 11.9, 12.0, 12.8, 13.4, 14.3

Page 48: Chapter 3

48

Example 12 (Cont.)Method 1Find the mean : 12.6

X11.20 -1.40 1.9611.90 -0.70 0.4912.00 -0.60 0.3612.80 0.20 0.0413.40 0.80 0.6414.30 1.70 2.89

x 2x

Total= 6.38

Page 49: Chapter 3

49

Example 12 (Cont.)Method 1The variance is defined by

and hence, the standard deviation is

2 6.381.28

6 1s

1.28 1.13s

Page 50: Chapter 3

50

Example 12 (Cont.)Method 2We compute X= 11.2+11.9+12.0+12.8+13.4+14.3 =75.6X2= 11.22 +11.92 +12.02 +12.82

+13.42 +14.32 =958.94The variance is computed by

Standard deviation is 1.13

2

2958.94 75.6 6

1.285

s

Page 51: Chapter 3

51

Example 13

Suppose the number of minutes you spent for traveling to school on last 7 days are9, 12, 9, 15, 10, 11, 15. Find the variance of the number of minutes by the two formula.

Page 52: Chapter 3

52

Variance and Standard Deviation for Grouped Data

22

2

1m mf X f X n

sn

f : class frequencyXm : class midpoint (class mark)n : Total number of frequencies

Page 53: Chapter 3

53

Example

Find the variance and the standard deviation for the frequency distribution of the data representing the number of miles that 20 runners ran during one week.

Page 54: Chapter 3

54

Example 14 (cont.)Class Frequency

5.5-10.5 110.5-15.5 215.5-20.5 320.5-25.5 525.5-30.5 430.5-35.5 335.5-40.5 2

Page 55: Chapter 3

55

Example 14 (cont.)Class

BoundaryFreq.

fMidpoint

Xm

5.5-10.5 1 8 8 6410.5-15.5 2 13 26 33815.5-20.5 3 18 54 97220.5-25.5 5 23 115 264525.5-30.5 4 28 112 313630.5-35.5 3 33 99 326735.5-40.5 2 38 76 2888

mf X 2mf X

490 13310

Page 56: Chapter 3

56

Example 14 (cont.)

22 13310 490 20

20 168.7

s

Hence, the variance is

and the standard deviation is 8.3

Page 57: Chapter 3

57

Coefficient of VariationThe coefficient of variation is the standard deviation divided by the mean. The result is expressed as a percentage.

CVar 100%s

X

CVar 100%

Page 58: Chapter 3

58

Example 15

The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5. The mean of the commissions is $5225, and the standard deviation is $773. Compare the variations of the two.

Page 59: Chapter 3

59

Example 15 (Cont.)

Sales

Commissions

Since the coefficient of variation is larger for commission, the commissions are more variable than the sales.

5CVar 100% 5.7%

87

s

X

773CVar 100% 14.8%

5225

Page 60: Chapter 3

60

Example 16The mean for the number of pages of women’s fitness magazines is 132, with a variance of 23; the mean for the number of advertisements of a sample of women’s fitness magazines is 182, with a variance of 62. Compare the variances.(answer: 3.6% pages, 4.3% advertisements)

Page 61: Chapter 3

61

Chebyshev’s theoremThe proportion of values from a data set that will fall within k standard deviations of the mean will be at least 1-1/k2, where k is a number greater than 1 (k is not necessarily an integer).

Page 62: Chapter 3

62

Chebyshev’s theorem

3X s 3X s2X s 2X sX

At least

88.89%

At least

75%

Page 63: Chapter 3

63

Example 17

The mean price of houses in a certain neighborhood is $50,000, and the standard deviation is $10, 000. Find the price range for which at least 75% of the houses will sell.

Page 64: Chapter 3

64

Example 17 (Cont.)Chebyshev’s theorem states that three-fourths, or 75%, of the data values will fall within 2 standard deviations of the mean. Thus,

and

Hence, at least 75% of all homes sold in the area will have a price range from $30,000 to $70,000.

$50,000 2 $10,000 $70,000

$50,000 2 $10,000 $30,000

Page 65: Chapter 3

65

Example 18A survey of local companies found that the mean amount of travel allowance for executives was $0.25 per mile. The standard deviation was $ 0.02. Using Chebyshev’s theorems find the minimum percentage of the data values that will fall between $0.20 and $0.30.

Page 66: Chapter 3

66

The Empirical (Normal) Rule

Chebyshev’s theorem applies to any distribution regardless of its shape. However, when a distribution is bell-shaped (or what is called normal), the following statements, which make up the empirical rule, are true.

Page 67: Chapter 3

67

The Empirical (Normal) Rule

Approximately 68% of the data values will fall within 1 standard deviation of the mean.

Approximately 95% of the data values will fall within 2 standard deviations of the mean.

Approximately 99.7% (almost all) of the data values will fall within 3 standard deviations of the mean.

Page 68: Chapter 3

68

The Empirical (Normal) Rule

3X s 2X s 1X s X 1X s 2X s 3X s

68%

95%

99.7%

Page 69: Chapter 3

69

Measures of PositionStandard ScoresA z score or standard score for a value is obtained by subtracting the mean from the value and dividing the result by the standard deviation. The symbol for a standard score is z.

value mean

standard deviationz

Page 70: Chapter 3

70

Measures of PositionStandard ScoresThe z score represents the number of standard deviations that a data value falls above or below the mean.

Page 71: Chapter 3

71

Example 19 A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative position on the two tests.

Page 72: Chapter 3

72

Example 19 (Cont.)For calculus, the z score is

For history the z score is

Since the z score for calculus is larger, her relative position in the calculus class is higher than her relative position in the history class.

65 501.5

10

X Xz

s

30 251.0

5

X Xz

s

Page 73: Chapter 3

73

PercentilesPercentiles divide the data set into 100 equal groups.

There are several mathematical methods for computing percentiles for data. These methods can be used to find approximate percentile rank of a data value or to find a data value corresponding to a given percentile.

Page 74: Chapter 3

74

Find a Percentile Rank Corresponding to a ValueThe percentile corresponding to a given value X is computed by using the following formula

#of values 0.5

below Percentile 100%

total#of value

X

Page 75: Chapter 3

75

Example 20

A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile rank of a score of 12.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

Page 76: Chapter 3

76

Example 20 (Cont.)Arrange the data in order from lowest to highest

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

Thus, a student whose score was 12 did better than 65% of the class.

6 0.5Percentile 100%

1065th percentile

Page 77: Chapter 3

77

Finding a Data Value Corresponding to a Given PercentileoArrange the data in order from lowest to highest. oCompute c=(np)/100, where n is the total number of

observations and p the percentile.oIf c is not a whole number, round up to the next

whole number. Starting at the lowest value, count over to the number that corresponds to the rounded-up value.

oIf c is a whole number, use the value halfway between the cth and (c+1)th values when counting up from the lowest value.

Page 78: Chapter 3

78

Example 21A teacher gives a 20-point test to 10 students. The scores are shown here. find the value corresponding to the 25th percentile.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

Page 79: Chapter 3

79

Example 21 (Cont.)oArrange the data in order from lowest to

highest2, 3, 5, 6, 8, 10, 12, 15, 18, 20

o n= 10, p = 25 c= 10×25 / 100=2.5

o We round it up to get c =3. Start at the lowest values and count over to the third value, which is 5. Hence, the value 5 corresponds to the 25th percentile.

Page 80: Chapter 3

80

Example 22

A teacher gives a 20-point test to 10 students. The scores are shown here. find the value corresponding to the 60th percentile.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

Page 81: Chapter 3

81

Example (22 Cont.)oArrange the data in order from smallest to

largest2, 3, 5, 6, 8, 10, 12, 15, 18, 20

on= 10, p = 60 c= 10×60 / 100=6

oSince is a whole number, we use the value halfway between the 6th and 7th values when counting up from the lowest valueoThe 60th percentile is (10+12)/2=11.

Page 82: Chapter 3

82

QuartilesQuartiles divide the distribution into 4 equal groups, separated by Q1, Q2, and Q3.

Q1 Q2 Q3

25% 25% 25% 25%

L H

Page 83: Chapter 3

83

QuartilesQuartiles can be computed using the formula for computing percentiles. o1st quartile corresponds to 25th percentile .o2nd quartile corresponds to 50th percentile.o3rd quartile corresponds to 75th percentile.

2nd quartile = 25th percentile = median

Page 84: Chapter 3

84

Example 23

Find first quartile, second quartile and third quartile for the data set 15, 13, 6, 5, 12, 50, 22, 18.

Arrange the data in order from smallest to the largest. 5 6 12 13 15 18 22 50

Page 85: Chapter 3

85

Example 23 (Cont.)oFirst quartile = 25th percentile.

c = (825)/100=2Hence, the first quartile is equal to the second value plus the third value divided by 2. That is, Q1 = (6+12)/2=9

oSecond quartile = 50th percentilec=(8 50)/100=4Hence, Q2 =(4th value+5th value)/2

=(13+15)/2=14

Page 86: Chapter 3

86

Example 23 (Cont.)oThird quartile = 75th percentile

c=(8 75)/100=6Hence, Q3 =(6th value+7th value)/2

=(18+22)/2=20

Page 87: Chapter 3

87

o Interquartile Range: IQR Q3 Q1

oQuartile deviation: QD (Q3 Q1)/2oSemi-interquartile range is referred to

quartile deviation. oMidquartile Range : (Q3 Q1)/2

Interquartile Range, Quartile Deviation and Midquartile Range

Page 88: Chapter 3

88

oFirst quartile

oSecond quartile (Median)

oThird quartile

Quartiles of Data Grouped into a Freq. Dist.

1

/ 4n CFQ LB w

f

2

/ 2n CFQ LB w

f

3

3 / 4n CFQ LB w

f

Page 89: Chapter 3

The office manager of the Mallard Glass Co. is investigating the ages in months of the company’s PCs currently in use. The ages of 30 units selected at random were organized into a frequency distribution. Compute the quartile deviation.

Example 24

89

Page 90: Chapter 3

90

Example 24 (Cont.)Age

(in months) # of PCs

20-24 3

25- 29 5

30-34 10

35-39 7

40-44 4

45-49 1

Page 91: Chapter 3

91

Example 24 (Cont.)Age

(in months) # of PCsCumu. Freq.

20-24 3 3

25- 29 5 8

30-34 10 18

35-39 7 25

40-44 4 29

45-49 1 30

Page 92: Chapter 3

92

Example 24 (Cont.)

1

30 / 4 324.5 5 29

5Q

2

3 30 / 4 1835.5 5

738.71

Q

Hence, QD 38.7129 4.855 months

Page 93: Chapter 3

93

Example 25

The weekly income of a sample of 60 part time employees of a fast-food restaurant chain was organized into the following frequency distribution. Compute the standard deviation and quartile deviation.

Page 94: Chapter 3

94

Example 25 (Cont.)

Weekly Incomes

Number of Employees

100-149 5

150-199 9

200-249 20

250-299 18

300-349 5

350-399 3

Page 95: Chapter 3

95

Outliers An outlier is an extremely high or an

extremely low data value when compared with the rest of the data values.

An outlier can strongly affect the mean and standard deviation of a variable.

There are several ways to check a data set for outliers. One of which is shown as follows:

Page 96: Chapter 3

96

Outliers (Cont.)Step1 Arrange the data in order and find Q1

and Q3.Step2 Find the inter-quartile range:

IQR=Q3 Q1 Step3 Multiply the IQR by 1.5.Step5 Check the data set for any data value

which is smaller than Q11.5IQR or larger than Q3 1.5IQR .

Page 97: Chapter 3

97

Outliers: Example 26Check the following data set for outliers.

5, 6, 12, 13, 15, 18, 22, 50We found Q19, Q320Inter-quartile Range: IQR 20-9=11Compute the dividing points:

Q11.5IQR 91.5117.5Q3 1.5IQR 201.51136.5

The data value of 50 is greater than the upper dividing point of 36.5. So, the data value of 50 is considered an outlier.

Page 98: Chapter 3

98

Exploratory Data Analysiso In exploratory data analysis (EDA) the

data are presented graphically using a box-plot (sometimes called a box-and-whisker plot).

oThe purpose of exploratory data analysis is to examine data to find out what information can be discovered about the data such as the center and the spread.

oEDA was developed by John Tukey.

Page 99: Chapter 3

99

Exploratory Data Analysis

A box plot can be used to graphically represent the data set. These plots involve five specific values:

o The lowest value (i.e., minimum)o Q1

o Median (Q2)o Q3

o The highest value (i.e., maximum)

Page 100: Chapter 3

100

Example 27 (Box-plot)

A stockbroker recorded the number of clients she saw each day over an 11-day period the data are shown below. Construct a box plot for the data.

33, 38, 43, 30, 29, 40, 51, 27, 42, 23, 31

Page 101: Chapter 3

101

Example 27 (Box-plot)oArrange the data in order from lowest to the

highest: 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, 51

oWe obtain: the lowest value23, Q129, Median Q2 33, Q3 43, and the highest value 15.

20 25 30 35 40 45 50

23 5129 4233

Page 102: Chapter 3

102

THE END!