Top Banner
Introduction to Statistics for Built Environment Course Code: AED 1222 Compiled by DEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED) CENTRE FOR FOUNDATION STUDIES (CFS) INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
37
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lesson 6 measures of central tendency

Introduction to Statistics for Built Environment

Course Code: AED 1222

Compiled byDEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED)

CENTRE FOR FOUNDATION STUDIES (CFS)INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

Page 2: Lesson 6 measures of central tendency

Lecture 7Measures of central tendency

Today’s Lecture:Measures of central tendency for grouped andungrouped data: The arithmetic mean/trimmed mean The median The mode Summary of comparative characteristics

Page 3: Lesson 6 measures of central tendency

What is/are Measures of Central Tendency?

●Usually called the average with the purpose to summarize in a single value: the typical size, middle property, or central location of a set of values.

Measures of Central Tendency

●Measures of Central Tendency is a single value situated at the centre of a data and can be taken as a summary value for that data set.

●The three most common measures of central tendency are the mean, median and mode.

Page 4: Lesson 6 measures of central tendency

Center and Location

Mean Mode

An overviewAn overview of common measures of central tendency and location:

Median

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

N

x

n

xx

N

ii

n

ii

1

1

Page 5: Lesson 6 measures of central tendency

The arithmetic mean

●When people use the word average, they are usually referring to the arithmetic mean.●The arithmetic mean is the most commonly used measure of central tendency.

●The mean is the sum of all scores/data divided by the number of scores/data.●Which is the best single number to describe a group of scores.

●Called meu for population and x bar for sample mean.

What is/are Mean?

x

Page 6: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• The Mean is the arithmetic average of data values• Mean = sum of values divided by the number of values

– Population mean

– Sample meann = Sample Size

N = Population Size

n

xx

n

ii

1

N

xN

ii

1

Formula of :

Formula of :

(meu)

(x bar)

The arithmetic mean cont.

Page 7: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• The Mean affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

15

5

54321

4

5

20

5

104321

Example 1: (no outliers) Example 2: (with outliers)

The arithmetic mean cont.

OUTLIERS

Page 8: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Example (Cont.):

DATA ARRAY

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Sorted raw data from low to high:Insulation manufacturer 20 days high temperature record.

• Computing the Mean for ungrouped data

The arithmetic mean cont.

20

5853464443413837353230272726242421171312

20

648

= x

32.4

n

xx

n

ii

1Formula of :

Page 9: Lesson 6 measures of central tendency

●The same process in principle.

●However, since the compression of data in a frequency table results in the loss of actual values of the observations in each class, it becomes necessary to make an assumption about these values.

The assumption is that every observation in a class has a value equal to the class mid-point.

• Computing the Mean for grouped data

The arithmetic mean cont.

Page 10: Lesson 6 measures of central tendency

No. of Liters sold No. of sales staff (f) Class mid-points (m) fm

80 and less than 90 2 85 170

90 and less than 100 6 95 570

100 and less than 110 10 105 1050

110 and less than 120 14 115 1610

120 and less than 130 9 125 1125

130 and less than 140 7 135 945

140 and less than 150 2 145 290

f 50 fm 5760

= x

= 5760/50

The arithmetic mean cont.• Computing the Mean for grouped dataExample :

Formula of :

= 115.2 Liters sold

Page 11: Lesson 6 measures of central tendency

●The mean is a good measure for roughly symmetric distributions.

●Can be misleading in skewed distributions since it can be greatly influenced by extreme values (outliers), and thus it is not the most appropriate measure of central tendency for very skewed distributions.

●This problem associated with the calculation of the arithmetic mean can be overcome by relying on a slightly modified measure of central tendency: the trimmed mean.

The arithmetic mean cont.

Page 12: Lesson 6 measures of central tendency

The trimmed mean

●The trimmed mean is calculated by “trimming” or dropping the smallest and largest numbers from the data set and calculating the mean of the remaining numbers.

There is no rule determining the number of values to be trimmed. This rather depends on the data available.

For example, a 5% trimmed mean would be calculated by dropping the smallest 5% and the largest 5% of the data set and computing the mean for the remaining 90% of the original data.

●The trimmed mean is a compromise between the arithmetic (ordinary) mean and the median. Why?

Page 13: Lesson 6 measures of central tendency

The median

●The median is a measure of central tendency that occupies/lies the middle position in an array of values.

Half (50%) the data items fall below the median, and another half (50%) are above that value.

●The median position (not the median value) can be found using the formula: i=(n+1)/2 or i=(1/2)nwhere ‘n’ is the number of observations or values in a data set.

What is/are Median?

Page 14: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• In an ordered array, the median is the “middle” number, i.e., the number that splits the distribution in half

• The median is not affected by extreme values (outliers)

The median cont.

Example 1: (no outliers)

Example 2: (with outliers)

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

OUTLIERS

Page 15: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• To find the median, sort the n data values from low to high (sorted data is called a data array)

• Find the value in the i = (1/2)n position

• The ith position is called the Median Index Point.– If i is not an integer, round up to next highest

integer

The median cont.

For Example:

Page 16: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• Note that n = 13

• Find the i = (1/2)n position:

i = (1/2)(13) = 6.5

• Since 6.5 is not an integer, round up to 7

• The median is the value in the 7th position:

Md = 12

Data array:

4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24

The median cont.• Computing the Median for ungrouped data

Page 17: Lesson 6 measures of central tendency

●If using the formula results in a non-integer value, we take the average of the two nearest numbers.For example:n=18, based on the formula, the median position is: i=(18+1)/2=9.5, in this case we take the average of the 9th and 10th values as the median of the data set.

The median cont.

More Example:

Page 18: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Example (Cont.):

DATA ARRAY

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Sorted raw data from low to high:Insulation manufacturer 20 days high temperature record.

• Computing the Median for ungrouped data

The median cont.

2

120

2

62

i = 10.5

Formula of : i=(n+1)/2

Md = 31

Median = 31

2

3230Find Average of :

Page 19: Lesson 6 measures of central tendency

• Computing the Median for grouped data

The median cont.

●Since the actual values of a data set are lost when a distribution is constructed, it is only possible to approximate the median value for grouped data.

●The median for grouped data can be estimated using the following formula:

Page 20: Lesson 6 measures of central tendency

Where:Bl = lower boundary of class containing median

n = sample sizecfp = cumulative frequency of classes preceding

class containing the medianfm = number of observations in class containing

the mediani = width of the interval containing the median

Computing the median for grouped data cont.

i)f m

cf p-2n

(+Bl = Med

Formula of :

Page 21: Lesson 6 measures of central tendency

No. of Liters sold No. of sales staff (f) Cumulative frequency(cf)

80 and less than 90 2 2

90 and less than 100 6 8

100 and less than 110 10 18

110 and less than 120 14 32

120 and less than 130 9 41

130 and less than 140 7 48

140 and less than 150 2 50

Computing the median for grouped data cont.

Compute the median for the above data set.

Page 22: Lesson 6 measures of central tendency

i)f m

cf p-2n

(+Bl = Med

1014

1850

110 )-

2(+ =

1014

7110 )(+ =

105.0110 )(+ =

= 115 Liters sold

5110+ =

Answer:

Page 23: Lesson 6 measures of central tendency

The mode

●The mode is the most commonly occurring value in a data set.A distribution may have one mode, two modes (bi-

modal) or more modes (multi-modal). It is also possible for a distribution to have no mode.

●The mode may be an important measure to a clothing manufacturer who must decide how many dresses of each size to make. What is most commonly purchased size?

What is/are Mode?

Page 24: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• A measure of location.• The value that occurs most often.• Not affected by extreme values (outliers)• Used for either numerical or categorical data.• There may be no mode• There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 5

0 1 2 3 4 5 6

No Mode

The mode cont.

Page 25: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Example (Cont.):

DATA ARRAY

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Insulation manufacturer 20 days high temperature record.

• Estimating the Mode for ungrouped data

The mode cont.

Mode = 24 Mode = 27

Page 26: Lesson 6 measures of central tendency

Estimating the mode for grouped data

●When actual data values are unknown, the class in a distribution with the largest frequency is often referred to as the modal class.

●The mode may then be defined to be the mid-point of that class.

●If two or more classes share the distinction of having the largest frequency, then there are two or more mid-point values representing two or more modes.

Page 27: Lesson 6 measures of central tendency

Where:L = lower boundary of class containing the modef0 = frequency of class containing the mode

f1 = frequency of class preceding the class containing the mode

f2 = frequency of class after the class containing the mode

c = size of the class containing the mode

Computing the mode for grouped data cont.

Formula of :

cf-ff-f

f-f+L =Mode

)20()10(10

Page 28: Lesson 6 measures of central tendency

No. of Liters sold No. of sales staff (f) Cumulative frequency(cf)

80 and less than 90 2 2

90 and less than 100 6 8

100 and less than 110 10 18

110 and less than 120 14 32

120 and less than 130 9 41

130 and less than 140 7 48

140 and less than 150 2 50

Estimating the mode for grouped data cont.

Estimate the mode for the above data

Page 29: Lesson 6 measures of central tendency

= 114.4 Liters sold

44.4110+ =

Answer:

cf-ff-f

f-f+L =Mode

)20()10(10

10)914()1014(

1014110

--

-+ =

10)5()4(

4110

+ =

109

4110

+ =

Page 30: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• Five houses on a hill by the beach

$2,000 K

$500 K

$300 K

$100 K

$100 K

House Prices:

RM 2,000,000 RM 500,000 RM 300,000 RM 100,000 RM 100,000

Review Example

RM 2m

RM 500k

RM 300k

RM 100k

RM 100k

Page 31: Lesson 6 measures of central tendency

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Summary Statistics

• Mean: (RM 3,000,000 / 5) = RM 600,000

• Median: middle value of ranked data = RM 300,000

• Mode: most frequent value = RM 100,000

House Prices:

RM2,000,000 500,000 300,000 100,000 100,000

Sum 3,000,000

Review Example cont.

Page 32: Lesson 6 measures of central tendency

Which measure to use?●Not all measures are appropriate for all kinds of variables.●Nominal data (e.g. gender, race)>> mode is the only valid

measure.●Ordinal data (e.g. salary categories)>> mode & median can be

used. • When to use the arithmetic mean?

– The best measure for continuous data. • When to use the median?

– When you know that a distribution is skewed.– When you have a small number of subjects.

• When to use the mode?– Only when describing discrete categorical data.

Page 33: Lesson 6 measures of central tendency

Which measure to use? cont.

Page 34: Lesson 6 measures of central tendency

Summary of comparative characteristics

The arithmetic mean:

1. It is the most familiar and most widely used measure.2. It is a measure that is affected by the value of every

observation in the data set.3. Its value may be distorted too much by a relatively few

extreme values (outliers). And thus can lose its representative quality in badly skewed data. The trimmed mean can help overcome such a problem.

4. It can not be computed from a frequency distribution with an open ended class.

Page 35: Lesson 6 measures of central tendency

The median:

1. It is easy to define and easy to understand.2. It is affected by the number of observations but not by

the values of these observations. Thus extremely high or low values (outliers)do not distort the median.

3. It is frequently used in badly skewed distributions.4. It may be computed in an open-ended distribution,

since the median value is located in the median class interval which is highly unlikely to be an open-ended interval.

Summary of comparative characteristics

Page 36: Lesson 6 measures of central tendency

The mode:

1. It is generally a less widely used measure than the mean and median.

2. It may not exist in some sets of data, or there may be more than one mode in other data sets.

3. It is not affected by extreme values (outliers) in a distribution.

Summary of comparative characteristics

Page 37: Lesson 6 measures of central tendency

Next class…

The following topics will be discussed: Measures of variability / dispersion (Part I):

The range Quartiles & the Interquartile range Percentiles The five number summary