Top Banner
Describing Distributions Numerically
37

Chapter 5

Jan 26, 2016

Download

Documents

tamma

Chapter 5. Describing Distributions Numerically. Definitions. Mean : the simple average you learned how to do in 6 th grade Median : using a list arranged in order from low to high, the middle number Mode : the value that occurs most often - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter     5

Describing Distributions Numerically

Page 2: Chapter     5

DefinitionsMean: the simple average you learned how to

do in 6th gradeMedian: using a list arranged in order from

low to high, the middle numberMode: the value that occurs most often Midrange: the average of the minimum and

maximum valuesVery sensitive to skewed distributions and

outliers.

Page 3: Chapter     5

ModeThe most frequent value in the data setEx: Here is a list with the number of years

each employee has worked at a company, what is the mode?

1 2 2 4 4 4 5 5 5 5 5 5 5 7 10 11

Page 4: Chapter     5

Median: The Middle of EverythingThe median is the value with exactly half the

data values below it and half above it.It is the middle data

The average of the two numbers in the middleif its an even numberof values

Page 5: Chapter     5

MeanTo find the mean add up all the values and divide

by the total number of cases

Ex: the total number of boxes sold between January and June are: 249, 337, 163, 289, 298, and 104. How many boxes were sold on average?

yTotaly

n n

Page 6: Chapter     5

Mean or Median? In symmetric distributions, the mean and

median are approximately the same in value, so either measure of center may be used.

For skewed data, though, it’s better to report the median than the mean as a measure of center.

Page 7: Chapter     5
Page 8: Chapter     5

RangeAlways report a measure of spread along with a

measure of center when describing a distribution numerically.

The range of the data is the difference between the maximum and minimum values:

Range = max – minA disadvantage of the range is that a single

extreme value can make it very large and, thus, not representative of the data overall.

Page 9: Chapter     5

Spread: The Interquartile RangeThe interquartile range (IQR) lets us ignore

extreme data values and concentrate on the middle of the data.

To find the IQR, we first need to know what quartiles are…

Page 10: Chapter     5

Spread: The Interquartile Range (cont.)

Quartiles divide the data into four equal sections. The lower quartile is the point that 1 quarter of

the data lies belowThe upper quartile is the point that 1 quarter

al the data lies aboveThe difference between the quartiles is the

IQR, so IQR = upper quartile – lower quartile

Page 11: Chapter     5

Spread: The Interquartile Range (cont.)

The lower and upper quartiles are the 25th and 75th percentiles of the data, so…

The IQR (purple) contains 50% of the values of the distribution

Page 12: Chapter     5

The Five-Number SummaryThe five-number

summary of a distribution reports its median, quartiles, and extremes (maximum and minimum).Example: The five-

number summary for the ages at death for rock concert goers who died from being crushed is

Max 47 years

Q3 22

Median 19

Q1 17

Min 13

Page 13: Chapter     5

Finding the IQR by handSort the values from smallest to largestSplit the data into 2 halves at the median

When N is odd, include the median in both halvesQ1: the median of the lower half of the data

Find the median of just the lower half of the dataQ3: the median of the upper half of the data

Find the median of just the upper half of the dataRemember:

Lower Quartile = 25th percentile = Q1Median = 50th percentile = Q2Upper Quartile = 75th percentile = Q3

Page 14: Chapter     5

Example Chapter 5 #4

Here are the annual numbers of deaths from tornados in the United States from 1990 through 2000:

53, 39, 39, 33, 69, 30, 25, 67, 130, 94, 40 Find the Mean, Median, Quartiles, Range and IQR

Mean: (53 + 39+…+40)/11 = 56.3Median: 40Q1 Median of lower half: (33 + 39)/2 = 36Q3 Median of upper half: (67 + 69)/2 = 68Range: 130 – 25 = 105IQR Q3 – Q1: 68 – 36 = 32

Page 15: Chapter     5

Boxplots

A boxplot is a graphical display of the five-number summary.

Boxplots are particularly useful when comparing groups.

Page 16: Chapter     5

Constructing BoxplotsThe Crowdsafe Database lists the ages, names,

causes, and locations of these unfortunate concert-goers. During the period of 1999 and 2000 there were 66 people who died from “crowd crush.” Here is a 5 number summary of their ages: Max 47 Years

Q3 22

Median 19

Q1 17

Min 13

Page 17: Chapter     5

Constructing BoxplotsDraw a single vertical axis

spanning the range of the data (10 to 50).

Draw short horizontal lines at the lower (17) and upper quartiles (22) and at the median (19).

Then connect them with vertical lines to form a box.

Page 18: Chapter     5

Constructing Boxplots (cont.)Erect “fences” around the main part of the

data.

The upper fence is 1.5 IQRs (22-17 =5) above the upper quartile 22+ 1.5(5) = 29.5

The lower fence is 1.5 IQRs below the lower quartile 17 – 1.5(5) = 9.5

Note: the fences only help with constructing the boxplot and should not appear in the final display.

Page 19: Chapter     5

Constructing Boxplots (cont.)Use the fences to grow “whiskers.”

Draw lines from the ends of the box up and down to the most extreme data values found within the fences.

We Don’t have all the data points so lets just say its 28 and 13

If a data value falls outside one of the fences, we do not connect it with a whisker.

Page 20: Chapter     5

Constructing Boxplots (cont.)Add the outliers by displaying

any data values beyond the fences with special symbols.

Our outliers are 37 and 47

We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles (3 x 5).

Page 21: Chapter     5

Constructing Boxplots (cont.)Compare the histogram and boxplot for rock

concert deaths:

Page 22: Chapter     5

Comparing Groups With BoxplotsThe following set of boxplots compares the

effectiveness of various coffee containers:

What does this graphical display tell you?

Page 23: Chapter     5

Summarizing Symmetric Distributions

The distribution of pulse rates for 52 adults is generally symmetric, with a mean of 72.7 beats per minute (bpm) and a median of 73 bpm:

Page 24: Chapter     5

What About Spread? The Standard Deviation

A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean.

A deviation is the distance that a data value is from the mean. Since adding all deviations together would

total zero, we square each deviation and find an average of sorts for the deviations.

Page 25: Chapter     5

What About Spread? The Standard Deviation (cont.)

22

1

y ys

n

The variance, notated by s2, is found by summing the squared deviations and (almost) averaging them:

The variance will play a role later in our study, but it is problematic as a measure of spread—it is measured in squared units!

Page 26: Chapter     5

What About Spread? The Standard Deviation (cont.)

2

1

y ys

n

The standard deviation, s, is just the square root of the variance and is measured in the same units as the original data.

Page 27: Chapter     5

Standard Deviation Ex. (81)Suppose we are given the values: 4, 3, 10, 12,

8, 9, and 3The mean, γ = 7

Original Values

Deviations Squared Deviations

4 4 – 7 = -3 (-3)2 = 9

3 3 – 7 = -4 (-4)2 = 16

10 10 – 7 = 3 9

12 12 – 7 = 5 25

8 8 – 7 = 1 1

9 9 – 7 = 2 4

3 3 – 7 = -4 16

Page 28: Chapter     5

Standard Deviation Example (cont.)

Add up the squared deviations:∑: 9 + 16 + 9 + 25 + 1 + 4 + 16 = 80

Divide by n – 1: 80/6 = 13.33

Finally take the square root: √13.33 = 3.65

Page 29: Chapter     5

Using your calculator TI 83/84STAT1 or EnterLists functions L1 to L6Enter your data valuesSTATCALCEnter1 or Enter (1-Var STATS)Press 2nd button and L1Enter

TI 89APPS6: Data Matrix Editor1 and EnterC1: list your data valuesF5 Calc: OneVar EnterX: type C1 Enter Enter

Page 30: Chapter     5

Using your calculator (cont.)TI 83/84 Printout TI 89 Printout

Page 31: Chapter     5

Thinking About VariationSince Statistics is about variation, spread is

an important fundamental concept of Statistics.

Measures of spread help us talk about what we don’t know.

When the data values are tightly clustered around the center of the distribution, the IQR and standard deviation will be small.

When the data values are scattered far from the center, the IQR and standard deviation will be large.

Page 32: Chapter     5

Shape, Center, and SpreadWhen telling about a quantitative variable,

always report the shape of its distribution, along with a center and a spread.If the shape is skewed, report the median and

IQR.If the shape is symmetric, report the mean and

standard deviation and possibly the median and IQR as well.

Page 33: Chapter     5

What About Outliers?If there are any clear outliers and you are

reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing.

Note: The median and IQR are not likely to be affected by the outliers.

Page 34: Chapter     5

What Can Go Wrong?Don’t forget to do a reality check—don’t let

technology do your thinking for you.

Sort the values before finding the median or percentiles.

Don’t compute numerical summaries of a categorical variable.

Beware of outliers

Make a picture (make a picture and make a picture)

Page 35: Chapter     5

What Can Go Wrong? (cont.)Be careful when

comparing groups that have very different spreads.Consider these

side-by-side boxplots of cotinine levels:

Page 36: Chapter     5

*Re-expressing to Equalize the Spread of GroupsHere are the side-

by-side boxplots of the log (cotinine) values:

Page 37: Chapter     5

For Next weekHomework Chapter 5: 3, 7, 9, 12, 27 a-c, 28

Extra Credit is due (5 points)

Quiz 1: Closed Notes:Know the difference between categorical and

quantitative variablesCalculate the mean, median, and modeFind percents from a contingency tableSketch a stem and leaf plot