Top Banner
Copyright © The McGraw-Hill Companies, Inc. Permission required or reproduction or display. !-" CHAPTER 3 Data Description
38

Chap03

Jan 07, 2016

Download

Documents

Soulz Zampa

Presenting senior design
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 1/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"

CHAPTER 3

Data Description

Page 2: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 2/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#

Objectives

Summarize data using measures of centraltendency, such as the mean, median, mode,

and midrange.

Describe data using the measures of variation, such as the range, variance, and

standard deviation.

Identify the position of a data value in a dataset using various measures of position, such

as percentiles, deciles, and quartiles.

Page 3: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 3/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-!

Objectives (cont’d.)

Use the techniques of exploratory dataanalysis, including stem-and-leaf plots,

 boxplots, and five number summaries to

discover various aspects of data.

Page 4: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 4/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-$

Introduction

Statistical methods can be used to summarizedata.

Measures of averages are also calledmeasures

of central tendency and include themean,

median,mode, andmidrange.

Measures that determine the spread of data

 values are calledmeasures of variation or

measures of dispersion and include therange,

variance, andstandard deviation.

Page 5: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 5/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-%

Introduction (cont’d.)

Measures of position tell where a specific data value falls within the data set or its relative

position in comparison with other data

 values.

 The most common measures of position are

 percentiles,deciles, andquartiles.

Page 6: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 6/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-&

 Introduction (cont’d.)

 The measures of central tendency, variation,and position are part of what is called

traditional statistics. This type of data is

typically used to confirm conjectures about

the data.

Page 7: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 7/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-'

Introduction (cont’d.)

 Another type of statistics is calledexploratorydata analysis. These techniques include the

stem-and-leaf plot, theboxplot, and the five-

number summary. They can be used to exploredata to see what they show.

Page 8: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 8/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-(

asic !ocabu"ar#

 Astatistic is a characteristic or measureobtained by using the data values from a

sample.

 A parameter is a characteristic or measureobtained by using all the data values for a

specific population.

 When the data in a data set is ordered it is

called adata array.

Page 9: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 9/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-)

$enera" Roundin% Ru"e

In statistics the basicrounding rule is that

 when computations

are done in thecalculation, rounding

should not be done

until the final answeris calculated.

Page 10: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 10/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"*

Centra" Tendenc#& T'e ean

 Themean, also known as thearithmeticaverage, is the sum of the values divided by

the total number of values. Th

Rounding rule: the mean should be rounded

to one more decimal place than occurs in the

raw data.

 The type of mean that considers an additional

factor is called theweighted mean.

Page 11: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 11/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-""

Centra" Tendenc#& T'e ean (cont’d.)

One computes the mean by using all the values of the data.

 The mean varies less than the median or

mode when samples are taken from the same

population and all three measures are

computed for these samples.

 The mean is used in computing other

statistics, such as variance.

Page 12: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 12/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"#

Centra" Tendenc#& T'e ean (cont’d.)

 The mean for the data set is unique, and notnecessarily one of the data values.

 The mean cannot be computed for an open-

ended frequency distribution.

 The mean is affected by extremely high or low

 values and may not be the appropriateaverage to use in these situations.

Page 13: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 13/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"!

Centra" Tendenc#& edian and ode

 Themedian is the halfway point in a data set. The symbol for the median is MD.

 The median is found by arranging the data in

order and selecting the middle point.

 The value that occurs most often in a data set

is called themode.

 The mode for grouped data, or the class with

the highest frequency, is the modal class.

Page 14: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 14/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"$

Centra" Tendenc#& T'e edian

 The median is used when one must find thecenter of middle value of a data set.

 The median is used when one must determine

 whether the data values fall into the upper half orlower half of the distribution.

 The median is used to find the average of an open-

ended distribution. The median is affected less than the mean by

extremely high or extremely low values.

Page 15: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 15/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"%

Centra" Tendenc#& T'e ode

 The mode is used when the most typical caseis desired.

 The mode is the easiest average to compute.

 The mode can be used when the data are

nominal, such as religious preference, gender,

or political affiliation.

 The mode is not always unique. A data set can

have more than one mode, or the mode may

not exist for a data set.

Page 16: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 16/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"&

Centra" Tendenc#& T'e idran%e

 Themidrange is defined as the sum of thelowest and highest values in the data set

divided by 2.

 The symbol for midrange is MR.

Page 17: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 17/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"'

Centra" Tendenc#& T'e idran%e (cont’d.)

 The midrange is easy to compute.

 The midrange gives the midpoint.

 The midrange is affected by extremely high orlow values in a data set.

Page 18: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 18/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-"(

Distribution 'apes

In a positively skewedor right skeweddistribution, the majority of the data values

fall to the left of the mean and cluster at the

lower end of the distribution.

i h h ill i i i i d

Page 19: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 19/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-")

Distribution 'apes (cont’d.)

In asymmetrical distribution, the data valuesare evenly distributed on both sides of the

mean.

C i h © h G ill C i i i i d

Page 20: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 20/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#*

Distribution 'apes (cont’d.)

 When the majority of the data values fall tothe right of the mean and cluster at the upper

end of the distribution, with the tail to the

left, the distribution is said to benegativelyskewed orleft skewed.

C i h © Th M G Hill C i I P i i i d

Page 21: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 21/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#"

T'e Ran%e

 Therange is the highest value minus thelowest value in a data set.

 The symbolR is used for the range.

C i h © Th M G Hill C i I P i i i d

Page 22: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 22/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-##

!ariance and tandard Deviation

 Thevariance is the average of the squares ofthe distance each value is from the mean. The

symbol for the population variance isσ   2.

 Thestandard deviation is the square root of

the variance. The symbol for the population

standard deviation isσ  .Rounding rule: The

final answer should be rounded to one more

decimal place than the original data.

C i ht © Th M G Hill C i I P i i i d

Page 23: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 23/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#!

Coe**icient o* !ariation

 Thecoefficient of variation is the standarddeviation divided by the mean. The result is

expressed as a percentage.

 The coefficient of variation is used to compare

standard deviations when the units are

different for the two variables being compared.

C i ht © Th M G Hill C i I P i i i d

Page 24: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 24/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#$

!ariance and tandard Deviation

 Variances and standard deviations can be

used to determine the spread of the data. If

the variance or standard deviation is large,

the data are more dispersed. The information

is useful in comparing two or more data sets

to determine which is more variable.

 The measures of variance and standarddeviation are used to determine the

consistency of a variable.

C i ht © Th M G Hill C i I P i i i d

Page 25: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 25/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#%

!ariance and tandard Deviation (cont’d.)

 The variance and standard deviation can beused to estimate the percentage of data values

that fall within a specified interval in a

distribution.

 The variance and standard deviation are used

quite often in inferential statistics.

C i ht © Th M G Hill C i I P i i i d

Page 26: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 26/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#&

C'eb#s'ev’s T'eore+

 The proportion of values from a data set that will fall withinkstandard deviations of the

mean will be at least 1 – 1/k2; wherek is a

number greater than 1.

 This theorem applies to any distribution

regardless of its shape.

C i ht © Th M G Hill C i I P i i i d

Page 27: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 27/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#'

E+pirica" Ru"e *or ,or+a" Distributions

 Approximately 68% of the data values fall within one standard deviation of the mean.

 Approximately 95% of the data values will fall

 within two standard deviations of the mean.

 Approximately 99.7% of the data values will

fall within three standard deviations of themean.

Copyright © The McGraw Hill Companies Inc Permission required or

Page 28: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 28/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#(

tandard cores

 Astandard score orz score is used whendirect comparison of raw scores is impossible.

 A standard score orz score for a value is

obtained by subtracting the mean from the

 value and dividing the result by the standard

deviation.

Copyright © The McGraw Hill Companies Inc Permission required or

Page 29: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 29/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-#)

Percenti"es

Percentiles are position measures used ineducational and health-related fields to

indicate the position of an individual in a

group.

Percentiles divide the data set into 100 equal

parts.

 ThePth percentile is a value whereP% of the

data values are less than or equal to the value.

Copyright © The McGraw Hill Companies Inc Permission required or

Page 30: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 30/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-!*

-uarti"es and Deci"es

Quartiles divide the distribution into fourgroups. The quartiles are denoted by

Q1,Q2, andQ3. Note thatQ1 is the same

as the 25th percentile;Q2 is the same asthe 50th percentile or the median; and

Q3 corresponds to the 75th percentile.

Deciles divide the distribution into 10 groups.

 They are denoted byD1,D2, etc.

Copyright © The McGraw Hill Companies Inc Permission required or

Page 31: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 31/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-!"

Out"iers

 Anoutlier is an extremely high or an extremelylow data value when compared with the rest of

the data values.

Outliers can be the result of measurement orobservational error.

 When a distribution is normal or bell-shaped,

data values that are beyond three standard

deviations of the mean can be considered

suspected outliers.

Copyright © The McGraw Hill Companies Inc Permission required or

Page 32: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 32/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-!#

Ep"orator# Data Ana"#sis

 The purpose ofexploratory data analysis is toexamine data in order to find out what

information can be discovered. For example:

 Are there any gaps in the data?

Can any patterns be discerned?

Copyright © The McGraw Hill Companies Inc Permission required or

Page 33: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 33/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-!!

te+/and/0ea* P"ots

 Astem-and-leaf plot is a data plot that usespart of a data value as the stem and part of

the data value as the leaf to form groups or

classes.

It has the advantage over grouped frequency

distribution of retaining the actual data while

showing them in graphic form.

Copyright © The McGraw Hill Companies Inc Permission required or

Page 34: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 34/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-!$

op"ots and 1ive/,u+ber u++aries

Boxplots are graphical representations of a five-number summary of a data set. The five specific

 values that make up a five-number summary are:

 The lowest value of data set (minimum)

Q1(or 25th percentile)

 The median (or 50th percentile)

Q3

 (or 75th percentile) The highest value of data set (maximum)

Copyright © The McGraw-Hill Companies Inc Permission required or

Page 35: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 35/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-!%

u++ar#

Some basic ways to summarize data includemeasures of central tendency, measures of

 variation or dispersion, and measures of

position.

 The three most commonly used measures of

central tendency are the mean, median, and

mode. The midrange is also used to represent

an average.

Copyright © The McGraw-Hill Companies Inc Permission required or

Page 36: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 36/38

Copyright © The McGraw-Hill Companies, Inc. Permission required or

reproduction or display.

!-!&

u++ar# (cont’d.)

 The three most commonly used measurementsof variation are the range, variance, and

standard deviation.

 The most common measures of position are

percentiles, quartiles, and deciles.

Data values are distributed according toChebyshev’s theorem and in special cases, the

empirical rule.

Copyright © The McGraw-Hill Companies Inc Permission required or

Page 37: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 37/38

Copyright © The McGraw Hill Companies, Inc. Permission required or

reproduction or display.

!-!'

u++ar# (cont’d.)

 The coefficient of variation is used to describethe standard deviation in relationship to the

mean.

 These methods are commonly called traditionalstatistics.

Other methods, such as the stem-and-leaf plot,

the boxplot, and five-number summary, are part

of exploratory data analysis; they are used to

examine data to see what they reveal.

Copyright © The McGraw-Hill Companies Inc Permission required or

Page 38: Chap03

7/17/2019 Chap03

http://slidepdf.com/reader/full/chap03-568e4ec61f353 38/38

Copyright © The McGraw Hill Companies, Inc. Permission required or

reproduction or display. Conc"usions

By combining all of thesetechniques together, the

student is now able to

collect, organize,summarize and present

data.