Top Banner
Topic-3 Describing Data: Measures of Central Tendency
36
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Topic-3 Describing Data: Measures of Central Tendency.

Topic-3

Describing Data:

Measures of Central Tendency

Page 2: Topic-3 Describing Data: Measures of Central Tendency.

Average: A Measure of Central Tendency

• Average: A single numerical characteristic that represents a specific feature of a set of data - the center of the values.– Population Mean (Overall Average): Computed by the Total Sum

divided by the total number of items in the population. It is a Parameter - a measurable characteristic of a population.

– Sample Mean (Sample Average): Computed by the Total Sum of the sample divided by the total number of items in the sample.

• It is a Statistic - a measurable characteristic of a sample.• Both Population Mean and Sample Mean are the

Arithmetic Mean.

Page 3: Topic-3 Describing Data: Measures of Central Tendency.

Important Propertiesof Arithmetic Mean

• Every set of data (interval and ratio level) has a unique mean.

• All values are included in computing the mean.• The "sum of the deviations" of each value from

the mean will always be equal to zero.• Mean can be viewed as a "balance" point.• Mean can be easily affected by unusually (large

or small) values (called as "outliers").• There is no "mean" if open-end classes are

included in the data.

Page 4: Topic-3 Describing Data: Measures of Central Tendency.

Median

• The value of midpoint - after the data have been ordered from the smallest to the largest (vise verse).– There are the same number of values above the

median as below it.– For an even set, the median is computed as the

arithmetic average of the two middle numbers.– The median is not affected by extreme values in the

set.– Median is unique, can be determined even with open-

end classes.– Median can be computed for all levels of data (except

nominal level data).

Page 5: Topic-3 Describing Data: Measures of Central Tendency.

Mode

• The value of the observation that appears most frequently.– Mode can be determined for all levels of data sets.– Mode is not affected by extreme values in the set.– Mode can be determined even with open-end classes.– However, mode may not be unique for some data

sets, while may not exist in other data sets.

• Another measure of central tendency.

Page 6: Topic-3 Describing Data: Measures of Central Tendency.

Weighted Mean

• To allow each item weighted differently according to its importance in the calculation of the mean.– All weights (Wi) must be determined carefully.– Different weight sets may result in different weighted

mean.– Weighted mean allows subjective managerial

considerations to be considered and evaluated.– Widely used in many business statistical applications.

• Examples

Page 7: Topic-3 Describing Data: Measures of Central Tendency.

Geometric Mean

• The Nth root of the product of N numbers for a set of N positive numbers, normally used:– to average percents, indexes, and relatives, and– to determine average percent changes (in sales,

production) from one time period to another.

• Geometric mean is a more conservative measure of average.

• Examples

Page 8: Topic-3 Describing Data: Measures of Central Tendency.

Mean, Median, Modefor Grouped Data

• In practice, we have to, more often, make estimates about some thing that are of interest from a sample, in a form of grouped data, as frequency distribution.

• Mean: (Arithmetic) the sum of the products of the midpoint by its frequency of each group divided by the total number of frequencies. (Examples)

– This mean may be different from the population mean, because it is only an estimate of the population mean.

• Median: only can be estimated by locating the median group and then calculating the midpoint assuming this group is evenly spaced. (Examples)

– This median is an estimate, may be different from population median.– In the calculation, either frequency or its percentage is enough and open-end

class at either side has no effect.• Mode: can be estimated by the midpoint of the group with the largest group

frequency. (Examples)– There may be more than one "mode" (multimodal) within a group of data.

• "Bimodal data" is for having two modes co-existing.

Page 9: Topic-3 Describing Data: Measures of Central Tendency.

Symmetric Distribution• Symmetric Distribution: symmetrical with bell-shaped, having the

same shape on both sides of the center axis.– Its mean, median, and mode are equal, located at the center.

• Asymmetric Distribution: bell-shaped but skewed toward one side, having different shapes on two sides of the center axis.– If positively skewed, the mean will be the largest among three averages

(Mean > Median > Mode), because the different influences of extreme values on the three averages. (Examples)

– If negatively skewed, the mean will be the lowest among three averages (Mean < Median < Mode). (Examples)

– Approximate relationship among the three: when the data set is large enough and not skewed too much, then the median is 1/3 from the mean to the mode.

– If two averages are known, the third can be estimated by given formulas.

Page 10: Topic-3 Describing Data: Measures of Central Tendency.

Population Mean

• Definition: For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values. To compute the population mean, use the following formula:

• Where:– µ is mu– Σ is Sigma– Χ is the Individual value– Ν is the Population size

N

X

Page 11: Topic-3 Describing Data: Measures of Central Tendency.

Sample Mean

• Definition: For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values. To compute the sample mean, use the following formula:

• Where:– is X-bar– Σ is Sigma– Χ is the Individual value– n is the Sample size

n

XX X

Page 12: Topic-3 Describing Data: Measures of Central Tendency.

Example 1

• Parameter: a measurable characteristic of a population.

• The Kiers family owns 4 cars. The following is the mileage attained by each car: 56,000; 23,000; 42,000; and 73,000. Find the average miles covered by each car.– The mean is (56,000 + 23,000 + 42,000 +

73,000)/4 = 48,500.

Page 13: Topic-3 Describing Data: Measures of Central Tendency.

Example 2

• Statistic: a measurable characteristic of a sample.

• A sample of 5 executives received the following bonuses last year: $14,000; $15,000; $17,000; $16,000 and $15,000. Find the average bonus for these 5 executives.– Since these values represent a sample size of

5, the sample mean is ($14,000 + $15,000 + $17,000 + $16,000 + $15,000)/5 = $15,400.

Page 14: Topic-3 Describing Data: Measures of Central Tendency.

Illustration

• Consider the set of values: 3, 8, and 4. The mean is 5. So, (3 – 5) + (8 – 5) + (4 – 5) = -2 + 3 – 1 = 0. Symbolically, we write:

0)( XX

Page 15: Topic-3 Describing Data: Measures of Central Tendency.

The Weighted Mean

• Definition: The weighted mean of a set of numbers X1, X2, …, Xn, with corresponding weights w1, w2, …, wn, is computed from the following formulas:

n

nnw

www

XwXwXwX

...

...

21

2211

w

XwX w

)(

Page 16: Topic-3 Describing Data: Measures of Central Tendency.

Example 3

• Compute the median for the following data.• The age of a sample of 5 college students is:

21, 25, 19, 20, and 22.• Arranging the data in ascending order gives: 19,

20, 21, 22, and 25. Thus, the median is 21.• The height of 4 basketball players, in inches, is

76, 73, 80, and 75.– Arranging the data is ascending order gives: 73, 75,

76, and 80.– Thus, the median is 75.5.

Page 17: Topic-3 Describing Data: Measures of Central Tendency.

Example 4

• The exam scores for 10 students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, and 87.

• Since the score of 81 occurs the most, the modal score is 81.

Page 18: Topic-3 Describing Data: Measures of Central Tendency.

Example 5

• During a 1-hour period on a hot Saturday afternoon, cabana boy Chris served 50 drinks.

• Compute the weighted mean of the price of the drinks.

• Price ($), Number sold): (0.50, 5); (0.75, 15); (0.90, 15); (1.10, 15).– The weighted mean is: $((0.50 x 5) + (0.75 x

15) + (0.90 x 15) + (1.10 x 15)) = $43.75/50 = $0.875.

Page 19: Topic-3 Describing Data: Measures of Central Tendency.

The Geometric Mean (I)

• Definition: The geometric mean (GM) of a set of n numbers is defined as the nth root of the product of n numbers. The formula for the geometric mean is given by:

• One main use of the geometric mean is to average percents, indexes, and relatives.

nnXXXXGM ))...()()(( 321

Page 20: Topic-3 Describing Data: Measures of Central Tendency.

The Geometric Mean (II)

• The other main use of the geometric mean is to determine the average percent increase in sales, production, or other business or economic series from one time period to another.

• The formula for the geometric mean as applied to this type of problem is:

1____

____1 n

periodofbeginningatValue

periodofendatValueGM

Page 21: Topic-3 Describing Data: Measures of Central Tendency.

Example 6

• The interest rates on 3 bonds were 5, 7, and 4 percent.

• The geometric mean is:

• The arithmetic mean is: (6 + 3 + 2)/3 = 5.333.

• The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 7%.

192.5)4)(5)(7(3 GM

Page 22: Topic-3 Describing Data: Measures of Central Tendency.

Example 7

• The total number of females enrolled in American colleges increased from 755,000 in 1986 to 835,000 in 1995.

• Here n = 10, so (n – 1) = 9.

• That is, the geometric mean rate of increase is 1.27%.

0127.01000,755

000,8358 GM

Page 23: Topic-3 Describing Data: Measures of Central Tendency.

The Mean of Grouped Data

• The mean of a sample of data organized in a frequency distribution is computed by the following formula:

• Where:– is X-bar– For ΣXf, the X value is the class midpoint and the f

value is the class frequency– For Σf is the sum of the frequencies– n is the sample size

n

Xf

f

XfX

X

Page 24: Topic-3 Describing Data: Measures of Central Tendency.

The Median of Grouped Data

• The median of a sample of data organized in a frequency distribution is computed by the following formula:

• Where:– L is the lower limit of the median class– n is the sample size– CF is the cumulative frequency preceding the median class– f is the frequency of the median class– i is the median class interval.

)(2 if

CFnLMedian

Page 25: Topic-3 Describing Data: Measures of Central Tendency.

Example 8

• A sample of 10 movie theatres in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing.

• To determine the median class for grouped data:– Construct a cumulative frequency distribution.– Divide the total number of data values by 2.– Determine which class will contain this value. For

example, if n = 50, 50/2 = 25, then determine which class will contain the 25th value – the median class.

Page 26: Topic-3 Describing Data: Measures of Central Tendency.

Example 8

Movies Showing

Frequency, f

Class Midpoint, X

(f)(X)

1 – 2 1 1.5 1.5

3 – 4 2 3.5 7.0

5 – 6 3 5.5 16.5

7 – 8 1 7.5 7.5

9 – 10 3 9.5 28.5

Total 10 61

61/10 = 6.1 movies

Page 27: Topic-3 Describing Data: Measures of Central Tendency.

Example 8

Movies Showing Frequency, f Cumulative Frequency

1 – 2 1 1

3 – 4 2 3

5 – 6 3 6

7 – 8 1 7

9 – 10 3 10

• The median class is 5 – 6, since it contains the 5th value (n/2=5).

• From the table, L = 5, n = 10, f = 3, i = 2, & CF = 3.

• Thus, the median 33.6)2)(3

4210(5

Page 28: Topic-3 Describing Data: Measures of Central Tendency.

Example 9

• A sample of 20 appliance stores in a large metropolitan area revealed the following number of DVD players sold last week. Compute the mean number sold. The formula and computation is shown below:

n

fX

f

fXX

25.1620

325X

Page 29: Topic-3 Describing Data: Measures of Central Tendency.

Example 9• The table also gives the necessary computations:

Number Sold

Frequency, f

Class Midpoint, X

(f)(X)

5 – 9 2 7 14

10 – 14 4 12 48

15 – 19 10 17 170

20 – 24 3 22 66

25 – 29 1 27 27

Σf = 20 ΣfX = 325

Page 30: Topic-3 Describing Data: Measures of Central Tendency.

Table: Comparison of the Three Averages

Mean (M) Median (Mdn) Mode (Mo)

Definition Balance point Middle point Most frequent score

When to Use Symmetrical distributions of interval/ratio data

When mean is inappropriate (except for nominal data)

Nominal data

Frequency of Use in Scientific Reporting

Very frequent Somewhat frequent Very infrequent1

Relative Values for Positive Skew2

Higher than Mdn and Mo

Between M and Mo Lower than M and Mdn

Relative Values for Negative Skew

Lower than Mdn and Mo

Between M and Mo Higher than M and Mdn

1Although nominal data (for which the mode is appropriate) is frequently reported in scientific writing, most researchers report percentages for each category rather than reporting the modal category. Thus, the mode is seldom used.2See Section 9 to review skewed distributions

Page 31: Topic-3 Describing Data: Measures of Central Tendency.

Symmetric Distribution• Zero skewness Mode = Median = Mean

Page 32: Topic-3 Describing Data: Measures of Central Tendency.

Right Skewed Distribution

– Positively skewed: Mean and Median are to the right of the Mode Mode < Median < Mean

Page 33: Topic-3 Describing Data: Measures of Central Tendency.

Left Skewed Distribution• Negatively skewed: Mean and Median are to the left of

the Mode Mean < Median < Mode

Page 34: Topic-3 Describing Data: Measures of Central Tendency.

Skewness

• If two averages of a moderately skewed frequency distribution are known, the third can be approximated.

• Mode = mean – 3(mean – median)

• Mean = [3(median) – mode]/2

• Median = [2(mean) + mode]/3

Page 35: Topic-3 Describing Data: Measures of Central Tendency.

Exercise 3A

• Springers sold 95 Antonelli men’s suits for the regular price of $400. For the spring sale the suits were reduced to $200 and 126 were sold. At the final clearance, the price was reduced to $100 and the remaining 79 suits were sold.

a. What was the weighted mean price of an Antonelli suit?

b. Springers paid $200 a suit for the 300 suits. Comment on the store’s profit per suit if a salesperson receives a $25 commission for each suit sold.

Page 36: Topic-3 Describing Data: Measures of Central Tendency.

Exercises 3B & 3C

B. In 1950 there were 51 countries that belonged to the United Nations. By 1996 this number had increased to 185. What was the geometric mean annual rate of increase in membership during this period?

C. Darenfest and Associates stated that in 1988 hospitals spent $3.9 billion on computer systems. They estimated that by the year 2000 this would increase to $14.0 billion. If expenditures did increase to $14.0 billion, what is the geometric mean annual rate of increase for the period?