Copyright © The McGraw-Hill Companies, Inc. Permission required or reproduction or display. !-" CHAPTER 3 Data Description
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 1/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"
CHAPTER 3
Data Description
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 2/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#
Objectives
Summarize data using measures of centraltendency, such as the mean, median, mode,
and midrange.
Describe data using the measures of variation, such as the range, variance, and
standard deviation.
Identify the position of a data value in a dataset using various measures of position, such
as percentiles, deciles, and quartiles.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 3/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-!
Objectives (cont’d.)
Use the techniques of exploratory dataanalysis, including stem-and-leaf plots,
boxplots, and five number summaries to
discover various aspects of data.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 4/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-$
Introduction
Statistical methods can be used to summarizedata.
Measures of averages are also calledmeasures
of central tendency and include themean,
median,mode, andmidrange.
Measures that determine the spread of data
values are calledmeasures of variation or
measures of dispersion and include therange,
variance, andstandard deviation.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 5/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-%
Introduction (cont’d.)
Measures of position tell where a specific data value falls within the data set or its relative
position in comparison with other data
values.
The most common measures of position are
percentiles,deciles, andquartiles.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 6/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-&
Introduction (cont’d.)
The measures of central tendency, variation,and position are part of what is called
traditional statistics. This type of data is
typically used to confirm conjectures about
the data.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 7/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-'
Introduction (cont’d.)
Another type of statistics is calledexploratorydata analysis. These techniques include the
stem-and-leaf plot, theboxplot, and the five-
number summary. They can be used to exploredata to see what they show.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 8/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-(
asic !ocabu"ar#
Astatistic is a characteristic or measureobtained by using the data values from a
sample.
A parameter is a characteristic or measureobtained by using all the data values for a
specific population.
When the data in a data set is ordered it is
called adata array.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 9/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-)
$enera" Roundin% Ru"e
In statistics the basicrounding rule is that
when computations
are done in thecalculation, rounding
should not be done
until the final answeris calculated.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 10/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"*
Centra" Tendenc#& T'e ean
Themean, also known as thearithmeticaverage, is the sum of the values divided by
the total number of values. Th
Rounding rule: the mean should be rounded
to one more decimal place than occurs in the
raw data.
The type of mean that considers an additional
factor is called theweighted mean.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 11/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-""
Centra" Tendenc#& T'e ean (cont’d.)
One computes the mean by using all the values of the data.
The mean varies less than the median or
mode when samples are taken from the same
population and all three measures are
computed for these samples.
The mean is used in computing other
statistics, such as variance.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 12/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"#
Centra" Tendenc#& T'e ean (cont’d.)
The mean for the data set is unique, and notnecessarily one of the data values.
The mean cannot be computed for an open-
ended frequency distribution.
The mean is affected by extremely high or low
values and may not be the appropriateaverage to use in these situations.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 13/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"!
Centra" Tendenc#& edian and ode
Themedian is the halfway point in a data set. The symbol for the median is MD.
The median is found by arranging the data in
order and selecting the middle point.
The value that occurs most often in a data set
is called themode.
The mode for grouped data, or the class with
the highest frequency, is the modal class.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 14/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"$
Centra" Tendenc#& T'e edian
The median is used when one must find thecenter of middle value of a data set.
The median is used when one must determine
whether the data values fall into the upper half orlower half of the distribution.
The median is used to find the average of an open-
ended distribution. The median is affected less than the mean by
extremely high or extremely low values.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 15/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"%
Centra" Tendenc#& T'e ode
The mode is used when the most typical caseis desired.
The mode is the easiest average to compute.
The mode can be used when the data are
nominal, such as religious preference, gender,
or political affiliation.
The mode is not always unique. A data set can
have more than one mode, or the mode may
not exist for a data set.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 16/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"&
Centra" Tendenc#& T'e idran%e
Themidrange is defined as the sum of thelowest and highest values in the data set
divided by 2.
The symbol for midrange is MR.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 17/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"'
Centra" Tendenc#& T'e idran%e (cont’d.)
The midrange is easy to compute.
The midrange gives the midpoint.
The midrange is affected by extremely high orlow values in a data set.
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 18/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-"(
Distribution 'apes
In a positively skewedor right skeweddistribution, the majority of the data values
fall to the left of the mean and cluster at the
lower end of the distribution.
i h h ill i i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 19/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-")
Distribution 'apes (cont’d.)
In asymmetrical distribution, the data valuesare evenly distributed on both sides of the
mean.
C i h © h G ill C i i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 20/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#*
Distribution 'apes (cont’d.)
When the majority of the data values fall tothe right of the mean and cluster at the upper
end of the distribution, with the tail to the
left, the distribution is said to benegativelyskewed orleft skewed.
C i h © Th M G Hill C i I P i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 21/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#"
T'e Ran%e
Therange is the highest value minus thelowest value in a data set.
The symbolR is used for the range.
C i h © Th M G Hill C i I P i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 22/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-##
!ariance and tandard Deviation
Thevariance is the average of the squares ofthe distance each value is from the mean. The
symbol for the population variance isσ 2.
Thestandard deviation is the square root of
the variance. The symbol for the population
standard deviation isσ .Rounding rule: The
final answer should be rounded to one more
decimal place than the original data.
C i ht © Th M G Hill C i I P i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 23/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#!
Coe**icient o* !ariation
Thecoefficient of variation is the standarddeviation divided by the mean. The result is
expressed as a percentage.
The coefficient of variation is used to compare
standard deviations when the units are
different for the two variables being compared.
C i ht © Th M G Hill C i I P i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 24/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#$
!ariance and tandard Deviation
Variances and standard deviations can be
used to determine the spread of the data. If
the variance or standard deviation is large,
the data are more dispersed. The information
is useful in comparing two or more data sets
to determine which is more variable.
The measures of variance and standarddeviation are used to determine the
consistency of a variable.
C i ht © Th M G Hill C i I P i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 25/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#%
!ariance and tandard Deviation (cont’d.)
The variance and standard deviation can beused to estimate the percentage of data values
that fall within a specified interval in a
distribution.
The variance and standard deviation are used
quite often in inferential statistics.
C i ht © Th M G Hill C i I P i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 26/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#&
C'eb#s'ev’s T'eore+
The proportion of values from a data set that will fall withinkstandard deviations of the
mean will be at least 1 – 1/k2; wherek is a
number greater than 1.
This theorem applies to any distribution
regardless of its shape.
C i ht © Th M G Hill C i I P i i i d
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 27/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#'
E+pirica" Ru"e *or ,or+a" Distributions
Approximately 68% of the data values fall within one standard deviation of the mean.
Approximately 95% of the data values will fall
within two standard deviations of the mean.
Approximately 99.7% of the data values will
fall within three standard deviations of themean.
Copyright © The McGraw Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 28/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#(
tandard cores
Astandard score orz score is used whendirect comparison of raw scores is impossible.
A standard score orz score for a value is
obtained by subtracting the mean from the
value and dividing the result by the standard
deviation.
Copyright © The McGraw Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 29/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-#)
Percenti"es
Percentiles are position measures used ineducational and health-related fields to
indicate the position of an individual in a
group.
Percentiles divide the data set into 100 equal
parts.
ThePth percentile is a value whereP% of the
data values are less than or equal to the value.
Copyright © The McGraw Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 30/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-!*
-uarti"es and Deci"es
Quartiles divide the distribution into fourgroups. The quartiles are denoted by
Q1,Q2, andQ3. Note thatQ1 is the same
as the 25th percentile;Q2 is the same asthe 50th percentile or the median; and
Q3 corresponds to the 75th percentile.
Deciles divide the distribution into 10 groups.
They are denoted byD1,D2, etc.
Copyright © The McGraw Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 31/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-!"
Out"iers
Anoutlier is an extremely high or an extremelylow data value when compared with the rest of
the data values.
Outliers can be the result of measurement orobservational error.
When a distribution is normal or bell-shaped,
data values that are beyond three standard
deviations of the mean can be considered
suspected outliers.
Copyright © The McGraw Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 32/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-!#
Ep"orator# Data Ana"#sis
The purpose ofexploratory data analysis is toexamine data in order to find out what
information can be discovered. For example:
Are there any gaps in the data?
Can any patterns be discerned?
Copyright © The McGraw Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 33/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-!!
te+/and/0ea* P"ots
Astem-and-leaf plot is a data plot that usespart of a data value as the stem and part of
the data value as the leaf to form groups or
classes.
It has the advantage over grouped frequency
distribution of retaining the actual data while
showing them in graphic form.
Copyright © The McGraw Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 34/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-!$
op"ots and 1ive/,u+ber u++aries
Boxplots are graphical representations of a five-number summary of a data set. The five specific
values that make up a five-number summary are:
The lowest value of data set (minimum)
Q1(or 25th percentile)
The median (or 50th percentile)
Q3
(or 75th percentile) The highest value of data set (maximum)
Copyright © The McGraw-Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 35/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-!%
u++ar#
Some basic ways to summarize data includemeasures of central tendency, measures of
variation or dispersion, and measures of
position.
The three most commonly used measures of
central tendency are the mean, median, and
mode. The midrange is also used to represent
an average.
Copyright © The McGraw-Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 36/38
Copyright © The McGraw-Hill Companies, Inc. Permission required or
reproduction or display.
!-!&
u++ar# (cont’d.)
The three most commonly used measurementsof variation are the range, variance, and
standard deviation.
The most common measures of position are
percentiles, quartiles, and deciles.
Data values are distributed according toChebyshev’s theorem and in special cases, the
empirical rule.
Copyright © The McGraw-Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 37/38
Copyright © The McGraw Hill Companies, Inc. Permission required or
reproduction or display.
!-!'
u++ar# (cont’d.)
The coefficient of variation is used to describethe standard deviation in relationship to the
mean.
These methods are commonly called traditionalstatistics.
Other methods, such as the stem-and-leaf plot,
the boxplot, and five-number summary, are part
of exploratory data analysis; they are used to
examine data to see what they reveal.
Copyright © The McGraw-Hill Companies Inc Permission required or
7/17/2019 Chap03
http://slidepdf.com/reader/full/chap03-568e4ec61f353 38/38
Copyright © The McGraw Hill Companies, Inc. Permission required or
reproduction or display. Conc"usions
By combining all of thesetechniques together, the
student is now able to
collect, organize,summarize and present
data.