Top Banner
Chapter 2 Methods for Describing Sets of Data
46

Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Mar 06, 2018

Download

Documents

buingoc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Chapter 2

Methods for Describing Sets of Data

Page 2: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Objectives

Describe Data using Graphs

Describe Data using Charts

Page 3: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Describing Qualitative Data

•Qualitative data are nonnumeric in nature

•Best described by using Classes

•2 descriptive measures

class frequency – number of data points in a class

class relative = class frequency

frequency total number of data points in data set

class percentage – class relative frequency x 100

Page 4: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Describing Qualitative Data –

Displaying Descriptive Measures

Summary Table

Class

FrequencyClass percentage – class relative frequency x 100

Page 5: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Describing Qualitative Data –

Qualitative Data Displays

Bar Graph

Page 6: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Describing Qualitative Data –

Qualitative Data Displays

Pie chart

Page 7: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Describing Qualitative Data –

Qualitative Data Displays

Pareto Diagram

Page 8: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Graphical Methods for Describing

Quantitative Data

The Data

Company Percentage Company Percentage Company Percentage Company Percentage

1 13.5 14 9.5 27 8.2 39 6.5

2 8.4 15 8.1 28 6.9 40 7.5

3 10.5 16 13.5 29 7.2 41 7.1

4 9.0 17 9.9 30 8.2 42 13.2

5 9.2 18 6.9 31 9.6 43 7.7

6 9.7 19 7.5 32 7.2 44 5.9

7 6.6 20 11.1 33 8.8 45 5.2

8 10.6 21 8.2 34 11.3 46 5.6

9 10.1 22 8.0 35 8.5 47 11.7

10 7.1 23 7.7 36 9.4 48 6.0

11 8.0 24 7.4 37 10.5 49 7.8

12 7.9 25 6.5 38 6.9 50 6.5

13 6.8 26 9.5

Percentage of Revenues Spent on Research and Development

Page 9: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Graphical Methods for Describing

Quantitative Data

Dot Plot

Page 10: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Graphical Methods for Describing

Quantitative Data

Stem-and-Leaf Display

Page 11: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Graphical Methods for Describing

Quantitative Data

Histogram

Page 12: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Graphical Methods for Describing

Quantitative Data

More on Histograms

Number of Observations in Data Set Number of Classes

Less than 25 5-6

25-50 7-14

More than 50 15-20

Page 13: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Summation Notation

Used to simplify summation instructions

Each observation in a data set is identified

by a subscript

x1, x2, x3, x4, x5, …. xn

Notation used to sum the above numbers

together is

n

n

i

i xxxxxx

4321

1

Page 14: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Summation Notation

Data set of 1, 2, 3, 4

Are these the same? and

4

1

2

i

ix

24

1

i

ix

30169412

4

2

3

2

2

2

1

2

4

1

xxxxx

i

i

100104321222

24

1

4321

xxxxx

i

i

Page 15: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Central

Tendency

•Central Tendency – tendency of data to

center about certain numerical values

•3 commonly used measures of Central

Tendency

Mean

Median

Mode

Page 16: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Central

Tendency

The Mean

•Arithmetic average of the elements of the data set

•Sample mean denoted by

•Population mean denoted by

•Calculated as

and

x

n

x

x

n

i

i

1

n

x

n

i

i

1

Page 17: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Central

Tendency

The Median

•Middle number when observations are

arranged in order

•Median denoted by m

•Identified as the observation if n is

odd, and the mean of the and

observations if n is even

5.02

n

2

n1

2

n

Page 18: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Central

Tendency

The Mode

•The most frequently occurring value in the

data set

•Data set can be multi-modal – have more

than one mode

•Data displayed in a histogram will have a

modal class – the class with the largest

frequency

Page 19: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Central

Tendency

The Data set 1 3 5 6 8 8 9 11 12

Mean

Median is the or 5th observation, 8

Mode is 8

79

63

9

121198865311

n

x

x

n

i

i

5.02

n

Page 20: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Variability

•Variability – the spread of the data across

possible values

•3 commonly used measures of Central

Tendency

Range

Variance

Standard Deviation

Page 21: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Variability

The Range

•Largest measurement minus the smallest

measurement

•Loses sensitivity when data sets are large

These 2 distributions

have the same range.

How much does the

range tell you about

the data variability?

Page 22: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Variability

The Sample Variance (s2)

•The sum of the squared deviations from the

mean divided by (n-1). Expressed as units

squared

•Why square the deviations? The sum of the

deviations from the mean is zero

1

)(

1

2

2

n

xx

s

n

i

i

Page 23: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Variability

The Sample Standard Deviation (s)

•The positive square root of the sample

variance

•Expressed in the original units of

measurement

21

2

1

)(

sn

xx

s

n

i

i

Page 24: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Variability

Samples and Populations - Notation

Sample Population

Variance s2

Standard

Deviation s

2

Page 25: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Interpreting the Standard Deviation

How many observations fit within + n s of

the mean?

Chebyshev’s

Rule

Empirical

Rule

orNo useful info Approximately

68%

orAt least 75% Approximately

95%

or At least 8/9 Approximately

99.7%

2s2

3s3

1s1

Page 26: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Interpreting the Standard Deviation

You have purchased compact fluorescent light bulbs for your home.

Average life length is 500 hours, standard deviation is 24, and

frequency distribution for the life length is mound shaped. One of your

bulbs burns out at 450 hours. Would you send the bulb back for a

refund?

Interval Range % of observations

included

% of observations

excluded

476 - 524Approximately

68%

Approximately

32%

452 - 548Approximately

95%

Approximately

5%

428 - 572Approximately

99.7%

Approximately

0.3%

s1

s2

s3

Page 27: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Relative

Standing

Descriptive measures of relationship of a

measurement to the rest of the data

Common measures:

• percentile ranking or percentile score

• z-score

Page 28: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Relative

Standing

Percentile rankings make use of the pthpercentile

The median is an example of percentiles.

Median is the 50th percentile – 50 % of observations lie above it, and 50% lie below it

For any p, the pth percentile has p% of the measures lying below it, and (100-p)% above it

Page 29: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Relative

Standing

z-score – the distance between a

measurement x and the mean, expressed in

standard units

Use of standard units allows comparison

across data sets

xz

s

xxz

Page 30: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Numerical Measures of Relative

Standing

More on z-scores

Z-scores follow the empirical rule for

mounded distributions

Page 31: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Methods for Detecting Outliers

Outlier – an observation that is unusually large or small relative to the data values being described

Causes

• Invalid measurement

• Misclassified measurement

• A rare (chance) event

2 detection methods

• Box Plots

• z-scores

Page 32: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Methods for Detecting Outliers

Box Plots

• based on quartiles, values that divide

the dataset into 4 groups

• Lower Quartile QL – 25th percentile

• Middle Quartile - median

• Upper Quartile QU – 75th percentile

• Interquartile Range (IQR) = QU - QL

Page 33: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Methods for Detecting Outliers

Box Plots

Not on plot – inner and outer fences, which determine potential outliers

QU

(hinge)

QL

(hinge)

Median

Potential Outlier

Whiskers

Page 34: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Methods for Detecting Outliers

Rules of thumb

•Box Plots

–measurements between inner and outer fences are suspect

–measurements beyond outer fences are highly suspect

•Z-scores

–Scores of 3 in mounded distributions (2 in highly skewed distributions) are considered outliers

Page 35: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Graphing Bivariate Relationships

Bivariate relationship – the relationship between

two quantitative variables

Graphically represented with the scattergram

Page 36: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

The Time Series Plot

Time Series Data – data produced and monitored

over time

Graphically represented with the time series plot

Time on x axis Order on x axis

Page 37: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Distorting the Truth with Descriptive

Techniques

•Graphical techniques

–Scale manipulation

Same

data,

different

scales

Page 38: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Distorting the Truth with Descriptive

Techniques

•Graphical techniques

–More Scale manipulation

Page 39: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Distorting the Truth with Descriptive

Techniques

•Graphical techniques

–More Scale manipulation

Page 40: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Distorting the Truth with Descriptive

Techniques

•Numerical techniques

–Mismatch of measure of central tendency and

distribution shape

Use of mean overstates average Use of mean understates average

Page 41: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Distorting the Truth with Descriptive

Techniques

•Numerical techniques

–Discussion of central tendency with no information on

variability

Which model would you

purchase if you knew only

the average MPG?

Would knowing the standard

deviation affect your choice?

Why?

Page 42: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Distorting the Truth with Descriptive

Techniques

•Graphical techniques

–Look past the pictures to the data they represent

•Numerical techniques

–Is measure being used most appropriate for underlying

distribution?

–Are you provided with information on central tendency

and variability?

Page 43: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Summary

Graphical methods for Qualitative Data

–Pie chart

–Bar graph

–Pareto diagram

•Graphical methods for Quantitative Data

–Dot plot

–Stem-and-leaf display

–Histogram

Page 44: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Summary

Numerical measures of central tendency

–Mean

–Median

–Mode

•Numerical measures of variation

–Range

–Variance

–Standard Deviation

Page 45: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Summary

Distribution Rules

–Chebyshev’s Rule

–Empirical Rule

•Measures of relative standing

–Percentile scores

–z-scores

•Methods for detecting Outliers

–Box plots

–z-scores

Page 46: Chapter 2 - Wessa.net · PDF filefrequency total number of data ... and 50% lie below it For any p, ... –Dot plot –Stem-and-leaf display –Histogram. Summary

Summary

Method for graphing the relationship

between two quantitative variables

–Scatterplot