Top Banner
Section 6A Section 6A Characterizing a Characterizing a Data Distribution Data Distribution Pages 380-388 Pages 380-388
38

Section 6A Characterizing a Data Distribution

Feb 06, 2016

Download

Documents

twila

Section 6A Characterizing a Data Distribution. Pages 380-388. Definition -The distribution of a variable (or data set) describes the values taken on by the variable and the frequency (or relative frequency) of these values. Example: Lengths of words in the Gettysburg Address. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Section 6A Characterizing a Data Distribution

Section 6ASection 6ACharacterizing a Characterizing a Data DistributionData Distribution

Pages 380-388Pages 380-388

Page 2: Section 6A Characterizing a Data Distribution

Definition -The distribution of a variable (or data set) describes the values taken on by the variable and the frequency (or relative frequency) of these values.

Example: Lengths of words in the Gettysburg Address

word length Frequency

1 1

2 5

3 49

4 53

5 59

6 35

7 24

8 19

9 5

10 10

11 7

total 267

word length

Frequency

1086420

60

50

40

30

20

10

0

710

5

19

24

35

59

53

49

5

1

Histogram of word length

Page 3: Section 6A Characterizing a Data Distribution

How do we characterize a data How do we characterize a data distribution?distribution?

Center =Average

- Mean- Mean- Median- Median- Mode- Mode

- Effect of an Outlier- Effect of an Outlier- Confusion- Confusion

Shape of a Distribution

- Number of Peaks- Number of Peaks- Symmetry or Skewness- Symmetry or Skewness- Variation- Variation

more in section 6Bmore in section 6B

Page 4: Section 6A Characterizing a Data Distribution

AveragesAverages

The word “Average” actually has The word “Average” actually has several meanings.several meanings.

Generally – Generally –

average = average = centercenter of a of a distributiondistribution

or or typical typical representativerepresentative

6-A

Page 5: Section 6A Characterizing a Data Distribution

The mean is what we most commonly call the average value. It is defined as follows:

The median is the middle value in the sorted data set (or halfway between the two middle values if the number of values is even).

The mode is the most common value (or group of values) in a distribution.

Measures of Center in a Measures of Center in a DistributionDistribution

6-A

sum of all valuesmeantotal number of values

Page 6: Section 6A Characterizing a Data Distribution

Mean DistanceMean Distance6-A

PlanetPlanet Distance from sun Distance from sun (millions (millions of miles)of miles)

MercuryMercury 3636

VenusVenus 6767

EarthEarth 9393

MarsMars 142142

JupiterJupiter 484484

SaturnSaturn 887887

UranusUranus 1,7651,765

NeptuneNeptune 2,7912,791

*Pluto*Pluto 3,6543,654

Page 7: Section 6A Characterizing a Data Distribution

MeanMean distance distance

36 + 67 + 93 + 142+ 484 + 887 + 36 + 67 + 93 + 142+ 484 + 887 + 1,765 + 2,791 + 3,654 1,765 + 2,791 + 3,654

= 9,922= 9,922

Mean distance= 9,922/ 9 Mean distance= 9,922/ 9

= = 1,102.41,102.4 million milesmillion miles

sum of all valuesmeantotal number of values

6-A

Page 8: Section 6A Characterizing a Data Distribution

Median DistanceMedian Distance6-A

PlanetPlanet Distance from sun Distance from sun (millions (millions of miles)of miles)

MercuryMercury 3636

VenusVenus 6767

EarthEarth 9393

MarsMars 142142

JupiterJupiter 484484

SaturnSaturn 887887

UranusUranus 1,7651,765

NeptuneNeptune 2,7912,791

*Pluto*Pluto 3,6543,654

The median is the middle value in the sorted data set (or halfway between the two middle values if the number of values is even).

Page 9: Section 6A Characterizing a Data Distribution

Median DistanceMedian Distance6-A

PlanetPlanet Distance from sun Distance from sun (millions (millions of miles)of miles)

MercuryMercury 3636

VenusVenus 6767

EarthEarth 9393

MarsMars 142142

JupiterJupiter 484484

SaturnSaturn 887887

UranusUranus 1,7651,765

NeptuneNeptune 2,7912,791

PlutoPluto 3,6543,654

4 below

4 above

Page 10: Section 6A Characterizing a Data Distribution

Steps for Finding the Steps for Finding the MedianMedian

1.1. Sort the data (put it in order) !!!!!!Sort the data (put it in order) !!!!!!2.2. Count the data (Count the data (nn pieces). Decide pieces). Decide

if if nn is is oddodd or or eveneven..3.3. If If nn is is oddodd – the median will be in – the median will be in

positionposition (n+1)/2. (n+1)/2.4.4. If If nn is is eveneven – the median will be – the median will be

locatedlocated halfway between the halfway between the numbers in positions n/2 and numbers in positions n/2 and (n+1)/2.(n+1)/2.

6-A

Page 11: Section 6A Characterizing a Data Distribution

Median Distance – ‘Real’ Planet Median Distance – ‘Real’ Planet ListList

6-A

PlanetPlanet Distance from sun Distance from sun (millions of (millions of miles)miles)

MercuryMercury 3636

VenusVenus 6767

EarthEarth 9393

MarsMars 142142

JupiterJupiter 484484

SaturnSaturn 887887

UranusUranus 1,7651,765

NeptuneNeptune 2,7912,791

Page 12: Section 6A Characterizing a Data Distribution

Median DistanceMedian Distance 6-A

PlanetPlanet Distance from sun (Distance from sun (millions millions of miles)of miles)

MercuryMercury 3636

VenusVenus 6767

MarsMars 9393

EarthEarth 142142

JupiterJupiter 484484

SaturnSaturn 887887

UranusUranus 1,7651,765

NeptuneNeptune 2,7912,791

Median is halfway between 142 and 484, so

median = (142+484)/2 = 313

Page 13: Section 6A Characterizing a Data Distribution

Comment about the Comment about the MedianMedian

1.1. The median splits the data The median splits the data into two equal-sized pieces.into two equal-sized pieces.

2.2. Half the data (50%) will be Half the data (50%) will be below the median.below the median.

3.3. Half the data (50%) will be Half the data (50%) will be above the median.above the median.

6-A

Page 14: Section 6A Characterizing a Data Distribution

Mode DistanceMode Distance6-A

PlanetPlanet Distance from sun Distance from sun (millions (millions of miles)of miles)

MercuryMercury 3636

VenusVenus 6767

EarthEarth 9393

MarsMars 142142

JupiterJupiter 484484

SaturnSaturn 887887

UranusUranus 1,7651,765

NeptuneNeptune 2,7912,791

*Pluto*Pluto 3,6543,654

The mode is the most common value (or group of values) in a distribution.

Page 15: Section 6A Characterizing a Data Distribution

Mode ExamplesMode Examples6-A

a. 5 5 5 3 1 5 1 4 3 5

b. 1 2 2 2 3 4 5 6 6 6 7 9

c. 1 2 2 6 6 8 9 10 8

Page 16: Section 6A Characterizing a Data Distribution

Mode ExamplesMode Examples6-A

a. 5 5 5 3 1 5 1 4 3 5

b. 1 2 2 2 3 4 5 6 6 6 7 9

c. 1 2 2 6 6 8 9 10 8

Mode is Mode is

55

Page 17: Section 6A Characterizing a Data Distribution

Mode ExamplesMode Examples6-A

a. 5 5 5 3 1 5 1 4 3 5

b. 1 2 2 2 3 4 5 6 6 6 7 9

c. 1 2 2 6 6 8 9 10 8

Mode is Mode is

55

BimodalBimodal

Page 18: Section 6A Characterizing a Data Distribution

Mode ExamplesMode Examples6-A

a. 5 5 5 3 1 5 1 4 3 5

b. 1 2 2 2 3 4 5 6 6 6 7 9

c. 1 2 2 6 6 8 9 10 8

Mode is 5Mode is 5

Bimodal Bimodal

TrimodalTrimodal

Page 19: Section 6A Characterizing a Data Distribution

The ModeThe Mode

You may not have one!You may not have one! Could have multiple modes!Could have multiple modes! The mode is easy to spot in a The mode is easy to spot in a

graph – it occurs at the peak.graph – it occurs at the peak. The mode is the only measure The mode is the only measure

of “center” available for of “center” available for categorical datacategorical data – – e.g. gendere.g. gender

6-A

Page 20: Section 6A Characterizing a Data Distribution

How do we characterize a data distribution?

Average

- Mean- Median- Mode

- Effect of an Outlier- Confusion

Shape of a Distribution

- Number of Peaks- Symmetry or Skewness- Variation

Page 21: Section 6A Characterizing a Data Distribution

OutliersOutliers

An An outlieroutlier is an observation that is is an observation that is much higher (or much lower) than all much higher (or much lower) than all the other values in your list.the other values in your list.

i.e. – an i.e. – an extremely unusualextremely unusual observation.observation.

Note – every not every set of data has Note – every not every set of data has outliers. The minimum and maximum outliers. The minimum and maximum values are not necessarily outliers!!!values are not necessarily outliers!!!

Page 22: Section 6A Characterizing a Data Distribution

The Effect of an OutlierDefinition: An outlier is a data value that is much higher or much lower than almost all other values.

Five graduating seniors on a college basketball team receive the following first-year contract offers to play in the National Basketball Association: $0, $0, $0, $0, $3,500,000

(0+0+0+0+3500000)mean = $700,000

5

median: 0, 0, 0, 0, $3,500,000 median: $0

mode: 0, 0, 0, 0, $3,500,000

mode: $0

Including an outlier can pull the mean significantly upward or downward.Including an outlier does not significantly affect the median.Including an outlier does not affect the mode.

Page 23: Section 6A Characterizing a Data Distribution

A track coach wants to determine an appropriate heart rate for her athletes during their workouts. In the middle of the workout, she reads the following heart rates (beats/min) from five athletes: 130, 135, 140, 145, 325.

The Effect of an Outlier

_____________________________________________Cleary 325 is an outlier. Clearly 325 is a mistake (faulty heart monitor?)

(130+135+140+145+325)mean = 175bpm

5

median: 130, 135, 140, 145, 325

median: 140 bpm

(130+135+140+145)mean = 137.5bpm

4

Throw out the outlier?

median: 130, 135, 140, 145 median: 137.5 bpm

mode: none

mode: none

Page 24: Section 6A Characterizing a Data Distribution

How do we characterize a data distribution?

Average

- Mean- Median- Mode- Effect of an Outlier- Confusion

Shape of a Distribution

- Number of Peaks- Symmetry or Skewness- Variation

Page 25: Section 6A Characterizing a Data Distribution

Mean vs. MedianMean vs. MedianA news article reports that of the 411 A news article reports that of the 411

players on the NBA roster in February, players on the NBA roster in February, 1988, only 139 “made more than the 1988, only 139 “made more than the league league average salaryaverage salary of $2.36 of $2.36 million.”million.”

Recall that the word “average” can have Recall that the word “average” can have several interpretations. In this case, is several interpretations. In this case, is $2.36 million the $2.36 million the meanmean or the or the median median salarysalary for 1988 NBA players? for 1988 NBA players? Explain.Explain.

6-A

Page 26: Section 6A Characterizing a Data Distribution

Confusion about “Average”

A newspaper surveys wages for assembly workers and reports an average of $22 per hour. The workers at one large firm immediately request a pay raise, claiming that they work as hard as other companies but their average wage is only $19. The management rejects their request, telling them that they are overpaid because their average wage, in fact is $23 per hour. Can they both be right?

median: $19

salaries: $19, $19, $19, $19, 39

(19+19+19+19+39) $115mean = $23

5 5

Page 27: Section 6A Characterizing a Data Distribution

Confusion about “Average”

A newspaper survey wages for assembly workers and reports an average of $22 per hour. The workers at one large firm immediately request a pay raise, claiming that they work as hard as other companies but their average wage is only $19. The management rejects their request, telling them that they are overpaid because their average wage, in fact is $23 per hour. Can they both be right?

median: $23

salaries: $6, $20, $23, $23, $23

(6+20+23+23+23) $95mean = $19

5 5

Page 28: Section 6A Characterizing a Data Distribution

Confusion about “Average”All 100 first-year students at a small college take three courses in the Core Studies Program. The first two courses are taught in large lectures, with all 100 students in a single class. The third course is taught in ten classes of 10 students each. The students claim that the mean size of their Core Studies classes is 70. The administrators claim that the mean class size is only 25 students. Explain.

Students say my average class size is:

(100+100+ 10)70

3

Administrators say the average Core Studies class size is:

(total students enrolled in all Core Studies classes) 30025

(number of Core Studies classes) 12

mean class size per student

mean number of students per class

Page 29: Section 6A Characterizing a Data Distribution

How do we characterize a data distribution?

Average

- Mean- Median- Mode- Effect of an Outlier- Confusion

Shape of a Distribution

- Number of Peaks- Symmetry or Skewness- Variation

Page 30: Section 6A Characterizing a Data Distribution

Describing a distributionDescribing a distribution6-A

Page 31: Section 6A Characterizing a Data Distribution

Shape of a DistributionSymmetry and Skewness

Mode = Mean = Median

SYMMETRIC

A distribution is symmetric if its left half is a mirror image of its right half.

Page 32: Section 6A Characterizing a Data Distribution

SKEWED LEFT(negatively)

Mean Mode Median

Shape of a DistributionSymmetry and Skewness

A distribution is left-skewed if its ‘tail’ is on the left.

Page 33: Section 6A Characterizing a Data Distribution

SKEWED RIGHT(positively)

Mean Mode Median

Shape of a DistributionSymmetry and Skewness

A distribution is right-skewed if its ‘tail’ is on the right.

Page 34: Section 6A Characterizing a Data Distribution

Symmetric and Skewed Symmetric and Skewed DistributionsDistributions

6-A

Mode = Mean = Median

SYMMETRIC

SKEWED LEFT(negatively)

Mean Mode Median

SKEWED RIGHT(positively)

Mean Mode Median

Use Mean to describe center

Use Median to describe center

Page 35: Section 6A Characterizing a Data Distribution

Do you expect the distribution of heights of 100 women to be symmetric, left-skewed, or right-skewed? Explain.

Do you expect the distribution of speeds of cars on a road where a visible patrol car is using radar to be symmetric, left-skewed, or right skewed. Explain.

Shape of a DistributionSymmetry and Skewness

Page 36: Section 6A Characterizing a Data Distribution

Variation = horizontal spreadVariation = horizontal spread

6-A

High variationModerate variation

Low variation

How would you expect the variation to differ between times in the Olympic marathon and times in the New York Marathon? Explain.

Page 37: Section 6A Characterizing a Data Distribution

Describing a distributionDescribing a distribution ShapeShape

Number of peaks, symmetry/skewnessNumber of peaks, symmetry/skewness Outliers?Outliers?

CenterCenter Use Use meanmean if the data is if the data is symmetricsymmetric Use Use medianmedian is there is a is there is a strong skewstrong skew or or

are are outliersoutliers Spread

Horizontal spread – Is the data tightly clustered around the center? (low or high variation?)

6-A

Page 38: Section 6A Characterizing a Data Distribution

HomeworkHomework

Pages 388-390Pages 388-390

# 10, 14, 18, 21, 22, 27, 28, 30, # 10, 14, 18, 21, 22, 27, 28, 30, 35, 38*35, 38*

* It is not necessary to draw the * It is not necessary to draw the sketch for this one.sketch for this one.

6-A