Top Banner
Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou Mrad Ministry of Public Health, Lebanon [email protected]
48

Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

May 12, 2018

Download

Documents

trinhkhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Measures of central tendency and

dispersion Tunis, 28th October 2014

Dr Ghada Abou Mrad Ministry of Public Health, Lebanon

[email protected]

Page 2: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Learning objectives

• Define the different types of variables and data within a population or a sample

• Describe data using the common measures of central tendency (Mode, Median, arithmetic Mean)

• Describe data in terms of their measures of dispersion (range, standard deviation/variance, standard error)

Page 3: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Variable • A population is any complete group of units (such as

person or business) with at least one characteristic in common. It needs to be clearly identified at the beginning of a study.

• A sample is a subset group of units in a population, selected to represent all units in a population of interest

• A variable is any characteristics, number, or quantity that can be measured or counted. It is called a variable because its value may vary in the population and over time; it is represented by “X” in a population and “x” in a sample

Page 4: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Data

• Data are the measurements or observations or values that are collected for a specific variable in a population or a sample; an observation can be represented by “Xi “ in a population and “xi“ in a sample – A data unit (or unit record or record) is one entity (such as a person or

business) in the population being studied, for which data are collected.

– A data item (or variable) is a characteristic (or attribute) of a data unit which is measured or counted, such as height.

# age sex height

1 20 M 175

2 16 F 163

3 23 F 170

Data unit

Data item

Page 5: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Dataset

• A dataset is a complete collection of all observations for a specific variable in a population or a sample; it is called a raw dataset if the data have not been organized; the total number of observation in a dataset can be represented by “N” for a population and “n” for a sample

• Example: Ages of students in a class (years)

Obs Age

27

30

28

31

28

36

29

37

29

34

30

30

27

30

28

31

32

30

29

29

Page 6: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Types of variables

Variable

Qualitative

nominal ordinal

Quantitative

discrete continuous

Page 7: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Types of variables

• Qualitative variable: have value that describe a 'quality' ; it is also called a categorical variable – Nominal: Observations can take a value that is not

able to be organized in a logical sequence like sex or eye color

– Ordinal: Observations can take a value that can be logically ordered from lowest to highest like clothing size (i.e. small, medium, large)

• The data collected for a qualitative variable are qualitative data

Page 8: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Types of variables

• Quantitative variable: have values that describe a measurable quantity ; it is also called numeric variable; it can be ordered from lowest to highest – Discrete: Observations can take a value based on a

count from a set of values. It cannot take the value of a fraction between one value and the next closest value. Ex: number of children in a family

– Continuous: Observations can take any value between a certain set of real numbers. Ex: height

• The data collected for a quantitative variable are quantitative data

Page 9: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Descriptive statistics

Statistics describe or summarize data • Most data can be ordered from lowest to highest • The frequency is the number of times an

observation occurs for a variable; the frequency distribution can be shown in a table or in a graph such as histogram

• Quantitative data can be described using the common measures of central tendency (Mode, Median, Mean) and the measures of dispersion (range, standard deviation/variance, standard error)

Page 10: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Age Frequency

27 2

28 3

29 4

30 5

31 2

32 1

33 0

34 1

35 0

36 1

37 1

Total 20

Obs Age

1 27

2 27

3 28

4 28

5 28

6 29

7 29

8 29

9 29

10 30

11 30

12 30

13 30

14 30

15 31

16 31

17 32

18 34

19 36

20 37

Frequency distribution

Page 11: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Obs Age

1 27

2 27

3 28

4 28

5 28

6 29

7 29

8 29

9 29

10 30

11 30

12 30

13 30

14 30

15 31

16 31

17 32

18 34

19 36

20 37

Histogram

7

6

5

4

3

2

1

27 28 29 30 31 32 33 34 35 36 37

Page 12: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Histogram - Outliers

0

1

2

3

4

5

6

0 5 10 15 20 25 30 35 40 45 50

Nights of stay

Num

ber

of p

atie

nts

Outliers are extreme, or atypical data value(s) that are notably different from the rest of the data.

Page 13: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

0

5

10

15

20

0-9 10-19  20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99

Central Location

Spread

Nu

mb

er o

f p

eop

le

Age

? ?

Epidemic curve

Page 14: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Measures of central tendency and spread

Central Location / Position / Tendency

A single value that is a good summary of an entire distribution of data

Spread / Dispersion / Variability

How much the distribution is spread or dispersed from its central location

Page 15: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Measure of Central Tendency

Also known as measure of central position or location

It is a single value that summarizes an entire distribution of data

Common measures – Mode – Median – Arithmetic mean

Page 16: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Method for identification 1. Arrange data into frequency distribution or

histogram, showing the values of the variable and the frequency with which each value occurs

2. Identify the value that occurs most often

Mode is the value that occurs most frequently

Mode

Page 17: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Age Frequency

27 2

28 3

29 4

30 5

31 2

32 1

33 0

34 1

35 0

36 1

37 1

Total 20

Obs Age

1 27

2 27

3 28

4 28

5 28

6 29

7 29

8 29

9 29

10 30

11 30

12 30

13 30

14 30

15 31

16 31

17 32

18 34

19 36

20 37

Mode

Page 18: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Obs Age

1 27

2 27

3 28

4 28

5 28

6 29

7 29

8 29

9 29

10 30

11 30

12 30

13 30

14 30

15 31

16 31

17 32

18 34

19 36

20 37

Mode

7

6

5

4

3

2

1

27 28 29 30 31 32 33 34 35 36 37

Mode = 30

Page 19: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Po

pu

lati

on

0

2

4

6

8

10

12

14

16

18 Bimodal Distribution

Unimodal Distribution

0

2

4

6

8

10

12

14

16

18

20

Po

pu

lati

on

Page 20: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Mode – Properties / Uses

• Easiest measure to understand, explain, identify

• Always equals an original value

• Does not use all the data

• Insensitive to extreme values (outliers)

• May be more than one mode

• May be no mode

Page 21: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Median

Median is the middle value; it splits the distribution into two equal parts

– 50% of observations are below the median – 50% of observations are above the median

Method for identification

1. Arrange observations in order 2. Find middle position as (n + 1) / 2 3. Identify the value in the middle

Page 22: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Obs Age

1 27

2 27

3 28

4 28

5 28

6 29

7 29

8 29

9 29

10 30

11 30

12 30

13 30

14 30

15 31

16 31

17 32

18 34

19 36

Median: uneven number of values

Median Observation

n = 19

n+1 2 =

19+1 2 =

20 2

=

10 =

Median age = 30 years

Page 23: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Obs Age

1 27

2 27

3 28

4 28

5 28

6 29

7 29

8 29

9 29

10 30

11 30

12 30

13 30

14 30

15 31

16 31

17 32

18 34

19 36

20 37

Median: even number of values

n = 20 Median Observation

n+1 2 =

20+1 2 =

21 2

=

10.5 =

Median age = Average value between 10th and 11th observation

30+30 2

30 years =

Page 24: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Median – Properties / Uses

• Does not use all the data available

• Insensitive to extreme values (outliers)

• Measure of choice for skewed data

Page 25: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Arithmetic Mean

Method for identification

1. Sum up (S) all of the values (xi)

2. Divide the sum by the number of observations (n)

Arithmetic mean = “average” value = m

Page 26: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Obs Age

1 27

2 27

3 28

4 28

5 28

6 29

7 29

8 29

9 29

10 30

11 30

12 30

13 30

14 30

15 31

16 31

17 32

18 34

19 36

20 37

Arithmetic Mean

n = 20 Sxi = 605

30.25

20

605 m = =

n

x m

i =

Page 27: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

0

1

2

3

4

5

6

0 5 10 15 20 25 30 35 40 45 50

Nights of stay

Mean = 12.0

Mean = 15.3

Since the mean uses all data, is sensitive to outliers

0

1

2

3

4

5

6

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Num

be

r of

pa

tien

ts

Nights of stay

Page 28: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Centered distribution Approximately symmetrical Few extreme values (outliers)

When to use the arithmetic mean?

Page 29: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

OK!

When to use the arithmetic mean? (ii)

1 2

4 3

Page 30: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Arithmetic Mean – Properties / Uses

• Use all of the data

• Affected by extreme values (outliers)

• Best for normally distributed data

• Not usually equal to one of the original values

Page 31: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

How does the shape of a distribution influence

the Measures of Central Tendency?

Symmetrical: Mode = Median = Mean

Skewed right: Mode < Median < Mean

Skewed left: Mean < Median < Mode

Page 32: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

0

5

10

15

20

0-9 10-19  20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99

Central Location

Spread

Nu

mb

er o

f p

eop

le

Age

? ?

Epidemic curve

Page 33: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Same center but … different dispersions

Page 34: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Measures of Spread

Measures that quantify the variation or dispersion of a set of data from its central location

• Also known as “Measure of dispersion/ variation” • Common measures

• Range • Variance / standard deviation • Standard error

Page 35: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Range

Properties / Uses:

– Greatly affected by outliers

– Usually used with median

Range = Difference between largest and smallest values in a dataset

Page 36: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Finding the Range of Length of Stay Data

0, 2, 3, 4, 5, 5, 6, 7, 8, 9,

9, 9, 10, 10, 10, 10, 10, 11, 12, 12,

12, 13, 14, 16, 18, 18, 19, 22, 27, 49

0

1

2

3

4

5

6

0 5 10 15 20 25 30 35 40 45 50

Nights of stay

Page 37: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Range = 49 - 0 = 49

Range = 149 - 0 = 149

Range – Sensitive to Outliers?

0

1

2

3

4

5

6

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Nights of stay

Numb

er of

patie

nts

0

1

2

3

4

5

6

0 5 10 15 20 25 30 35 40 45 50

Nights of stay

Page 38: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Variance = average of squared deviations from mean = Sum (each value – mean)2 / (n-1) Standard deviation = square root of variance

Measures of variation that quantifies how closely clustered the observed values are to the mean; measures of the spread of the data around the mean

Variance and Standard Deviation

Page 39: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

x : mean xi : value n : number s²: variance s : standard deviation

s² =

s =

( ) n-1

²

( ) n-1

² - x x i

- x x i

Variance and Standard Deviation (ii)

Page 40: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

1. Calculate the arithmetic mean

2. Subtract the mean from each observation.

3. Square the difference.

4. Sum the squared differences

5. Divide the sum of the squared differences by n – 1

6. Take the square root of the variance

x

( ) ² ( ) ²

- x x i

s² = n-1

( ) ² - x x i

- x x i

- x x i

s = s2

Steps to Calculate Variance and Standard Deviation

x : mean xi : value n : number s²: variance s : standard deviation

Page 41: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Length of Stay Data

(0 – 12)2 = 144 (9 – 12)2 = 9 (12 – 12)2 = 0 (2 – 12)2 = 100 (9 – 12)2 = 9 (13 – 12)2 = 1 (3 – 12)2 = 81 (10 – 12)2 = 4 (14 – 12)2 = 4 (4 – 12)2 = 64 (10 – 12)2 = 4 (16 – 12)2 = 16 (5 – 12)2 = 49 (10 – 12)2 = 4 (18 – 12)2 = 36 (5 – 12)2 = 49 (10 – 12)2 = 4 (18 – 12)2 = 36 (6 – 12)2 = 36 (10 – 12)2 = 4 (19 – 12)2 = 49 (7 – 12)2 = 25 (11 – 12)2 = 1 (22 – 12)2 = 100 (8 – 12)2 = 16 (12 – 12)2 = 0 (27 – 12)2 = 225 (9 – 12)2 = 9 (12 – 12)2 = 0 (49 – 12)2 = 1369 Sum = 2448; Var = 2448 / 29 = 84.4; SD = 84 = 9.2

Page 42: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Standard Deviation

Standard deviation usually calculated only when data are more or less normally distributed (bell shaped curve)

For normally distributed data, • 68.3% of the data fall within plus/minus 1 SD • 95.5% of the data fall within plus/minus 2 SD • 95.0% of the data fall within plus/minus 1.96 SD • 99.7% of the data fall within plus/minus 3 SD

The standard deviation of a normal distribution enables the calculation of confidence intervals

Page 43: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Standard deviation

2.5% 2.5%

68%

Mean

95%

Normal Distribution

Page 44: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Properties of Measures of Central Location and Spread

• For quantitative / continuous variables

• Mode – simple, descriptive, not always useful

• Median – best for skewed data

• Arithmetic mean – best for normally distributed data

• Range – use with median

• Standard deviation – use with mean

• Standard error – used to construct confidence intervals

Page 45: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Distribution Central Location Spread

Single peak, symmetrical

Skewed or Data with outliers

Name the appropriate measures of central Location and Spread

Page 46: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Distribution Central Location Spread

Single peak, Mean* Standard symmetrical deviation

Skewed or Median Range or Data with outliers Interquartile range

Name the appropriate measures of central Location and Spread

* Median and mode will be similar

Page 47: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

0

2

4

6

8

10

12

14

Po

pu

lati

on

1st quartile 3rd quartile

Minimum Maximum

Range

Mode Median

Interquartile interval

Age

Any questions?

Page 48: Measures of central tendency and dispersionmedipiet.eu/wp-content/uploads/2017/01/D2_1_MEASURES...Measures of central tendency and dispersion Tunis, 28th October 2014 Dr Ghada Abou

Thank you!

Dr Ghada Abou Mrad Ministry of Public Health, Lebanon

[email protected]