Chapter 3 Describing Data Using Numerical Measures` Business Statistics: A Decision-Making Approach 8 th Edition Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 3-1
Sep 11, 2015
Chapter 3
Describing Data Using
Numerical Measures`
Business Statistics: A Decision-Making Approach
8th Edition
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-1
Chapter Goals
After completing this chapter, you should be able to:
Compute and interpret the mean, median, and mode for a set
of data
Compute the range, variance, and standard deviation and
know what these values mean
Construct and interpret a box and whisker graph
Compute and explain the coefficient of variation and z scores
Compute and explain the covariance and coefficient of
correlation
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-2
Descriptive Procedures
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-3
Center and Location
Mean
Median
Mode
Other Measures of Location
Weighted Mean
Describing Data Numerically
Variation
Variance
Standard Deviation
Coefficient of Variation
RangePercentiles
Interquartile RangeQuartiles
Measures of Center and Location
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-4
Center and Location
Mean Median Mode Weighted Mean
N
x
n
x
x
N
i
i
n
i
i
1
1
i
ii
W
i
iiW
w
xw
w
xwX
Balance point
Midpoint of ranked values
Most frequently observed value
Mean (Arithmetic Average)
The most common measure of central tendency
Mean = sum of values divided by the number of values
Affected by extreme values (outliers)
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-5
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
35
15
5
54321
4
5
20
5
104321
Mean (Arithmetic Average)
The Mean is the arithmetic average of data values
Population mean
Sample mean
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-6
n = Sample Size
N = Population Size
n
xxx
n
x
x n
n
i
i
211
N
xxx
N
xN
N
i
i
211
(continued)
Median
In an ordered array (lowest to highest), the median is the
middle number, i.e., the number that splits the distribution
in half numerically
50% of the data is above the median, 50% is below
Represented as Md
The median is not affected by extreme values
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-7
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median
To find the median, sort the n data values from low to high
(sorted data is called a data array)
Find the value in the i = (1/2)n position
The ith position is the Median Index Point
If i is not an integer, round up to next highest integer
If i is an integer, the median is the average of the values in
position i and i + 1
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-8
(continued)
Median Example
Note that n = 13
Find the i = (1/2)n position:
i = (1/2)(13) = 6.5
Since 6.5 is not an integer, round up to 7
The median is the value in the 7th position:
Md = 12
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-9
Data array:4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24
Finding the Median
The location of the median:
If the number of values is odd, the median is the middle
number
If the number of values is even, the median is the average of
the two middle numbers
Note that is not the value of the median, only the
position of the median in the ranked data
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-10
dataorderedtheinpositionn
positionMedian2
1
2
1n
Median Example
n = 15 (odd)
Median is the midle number = 85
n = 10 (even)
Median is (n + 1)/2 = 11/2 = 5,5
Median value is (35 + 45)/ 2 = 40
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-11
Data array:60 68 75 77 80 80 80 85 88 90 95 95 95 95 99
Data array:23 25 25 34 35 45 46 47 52 54
Shape of a Distribution
Describes how data is distributed
Symmetric or skewed
The greater the difference between the mean and the median,
the more skewed the distribution
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-12
Mean = MedianMean < Median Median < Mean
Right-SkewedLeft-Skewed Symmetric
(Longer tail extends to left) (Longer tail extends to right)
Mode
A measure of location
The value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may be no mode
There may be several modes (2 modes = bimodal)
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-13
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5
0 1 2 3 4 5 6
No Mode
Weighted Mean
Used when values are grouped by frequency or relative
importance
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-14
Days to Complete
Frequency
5 4
6 12
7 8
8 2
Example: Sample of 26 Repair Projects
Weighted Mean Days to Complete:
days 6.31 26
164
28124
8)(27)(86)(125)(4
w
xwX
i
iiW
Review Example
Five houses on a hill by the beach
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-15
$2,000 K
$500 K
$300 K
$100 K
$100 K
House Prices:
$2,000,000500,000300,000100,000100,000
Summary Statistics
Mean: ($3,000,000/5)
m = $600,000
Median: middle value of ranked data
Md = $300,000
Mode: most frequent value
Mode = $100,000
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-16
House Prices:
$2,000,000500,000300,000100,000100,000
Sum 3,000,000
Which measure of location is the best?
Mean is generally used, unless extreme values (outliers) exist
Then Median is often used, since the median is not sensitive to
extreme values.
Example: Median home prices may be reported for a region
less sensitive to outliers
Mode is good for determining most likely to occur
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-17
Other Location Measures
The pth percentile in a data array:
p% are less than or equal to
this value
(100 p)% are greater than or equal to this value
(where 0 p 100)
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-18
Percentiles
1st quartile = 25th percentile
2nd quartile = 50th percentileAlso the median
3rd quartile = 75th percentile
Quartiles
Percentiles
The pth percentile in an ordered array of n values is
the value in ith position, where
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-19
Example: Find the 60th percentile in an ordered array of
19 values.
(n)100
pi
11.4(19)100
60(n)
100
pi
If i is not an integer, round up to the next higher integer value
So use value in the i = 12th position
Percentile Location
Index
Quartiles
Quartiles split the ranked data into 4 equal groups:
Note that
Q1 is the value for which 25% of the observations are
smaller & 75% are larger.
Q2 quartile (the 50th percentile) is the median
Only 25% of the observations are greater than the Q3 IQR (interquartile range) = Q3 Q1
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-20
25%
Q1 Q2 Q3
25% 25% 25%
Quartile Formulas
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-21
Find a quartile by determining the value in the appropriate position in the ranked data, where
First quartile position: Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position)
Third quartile position: Q3 = 3(n+1)/4
where n is the number of observed values
Quartiles
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-22
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Example: Find the first quartile
(n = 9)
Q1 = 25th percentile, so find i : i = (9) = 2.25
so round up and use the value in the 3rd position: Q1 = 13
25100
Round up to 3 since not an
integer
Interpretation: 25% of the data is below 13
Quartiles
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-23
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = 12.5
Q2 is in the (9+1)/2 = 5th position of the ranked data,
so Q2 = median = 16
Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,
so Q3 = 19.5
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Example:
(continued)
Box and Whisker Plot
A graphical display of data using a central box and extended
whiskers
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-24
Example:
25% 25% 25% 25%
Outliers Lower 1st Median 3rd UpperLimit Quartile Quartile Limit
**
Constructing the Box and Whisker Plot
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-25
Outliers Lower 1st Median 3rd UpperLimit Quartile Quartile Limit
**
The lower limit is Q1 1.5 (Q3 Q1)
The upper limit is Q3+ 1.5 (Q3 Q1)
The center box extends from Q1 to Q3 The line within the box is the median
The whiskers extend to the smallest and largest values within the calculated limits
Outliers are plotted outside the calculated limits
Shape of Box and Whisker Plots
The Box and central line are centered between the endpoints if data is symmetric around the median
A Box and Whisker plot can be shown in either vertical or horizontal format
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-26
Distribution Shape and Box and Whisker Plot
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-27
Right-SkewedLeft-Skewed Symmetric
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Constructing a Box and Whisker Plot
1. Sort values from lowest to highest
2. Find Q1, Q2, Q33. Draw the box so that the ends are at Q1 and Q34. Draw a vertical line through the median
5. Calculate the interquartile range (Q3 Q1)
6. Extend dashed lines from each end to the highest and
lowest values within the limits
7. Identify outliers with an asterisk (*)
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-28
Box-and-Whisker Plot Example
Below is a Box-and-Whisker plot for the following data:
0 2 2 2 3 3 4 5 6 11 27
This data is right skewed, as the plot depicts
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-29
0 2 3 6 12 27
Min Q1 Q2 Q3 Max
*
Upper limit = Q3 + 1.5 (Q3 Q1)
= 6 + 1.5 (6 2) = 12
27 is above the upper limit so is shown as an outlier
Measures of Variation
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-30
Variation
Variance Standard Deviation Coefficient of Variation
PopulationVariance
Sample Variance
PopulationStandardDeviation
Sample Standard Deviation
Range
Interquartile Range
Variation
Measures of variation give information on the spread or
variability of the data values.
Smaller value Less variation
Larger value More variation
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-31
Same center,
different variation
Range
Simplest measure of variation
Difference between the largest and the smallest observations:
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-32
Range = xmaximum xminimum
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
Disadvantages of the Range
Ignores the way in which data are distributed
Sensitive to outliers
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-33
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
Interquartile Range
Can eliminate some outlier problems by using the interquartile range
Eliminate some high-and low-valued observations and calculate the range from the remaining values.
Interquartile range = Q3 Q1
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-34
Interquartile Range Example
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-35
Median(Q2)
XmaximumX minimum Q1 Q3
Example:
25% 25% 25% 25%
12 30 45 57 70
Interquartile range = 57 30 = 27
Variance
Average of squared deviations of values from the
mean (in squared units) Population variance:
Sample variance:
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-36
N
)(x
N
1i
2
i2
1- n
)x(x
s
n
1i
2
i2
Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
Population standard deviation:
Sample standard deviation:
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-37
N
)(x
N
1i
2
i
1-n
)x(x
s
n
1i
2
i
Calculation Example : Sample Standard Deviation
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-38
Sample Data (Xi) : 10 12 14 15 17 18 18 24
n = 8 Mean = x = 16
4.30957
130
18
16)(2416)(1416)(1216)(10
1n
)x(24)x(14)x(12)x(10s
2222
2222
A measure of the average scatter around the mean
Measuring variation
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-39
Small standard deviation
Large standard deviation
Comparing Standard Deviations
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-40
Mean = 15.5s = 3.33811 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
Same mean, but different standard deviations:
Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Is used to compare two or more sets of data measured in
different units
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-41
100%x
sCV
100%
CV
Population Sample
Comparing Coefficients of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
Stock B:
Average price last year = $100
Standard deviation = $5
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-42
5% 100%*$100
$5100%*
x
sCVB
10% 100%*$50
$5100%*
x
sCVA
Both stocks have the same standard deviation, but stock B is less variable relative to its price
The Empirical Rule
If the data distribution is bell-shaped, then the interval (mean
= median thus not skewed):
contains about 68% of the values in the
population or the sample
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-43
1
68%
1
The Empirical Rule
contains about 95% of the values in the population
or the sample
contains about 99.7% of the values in the
population or the sample
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-44
2
3
3
99.7%
95%
2
Tchebysheffs Theorem
Regardless of how the data are distributed, at least (1 - 1/k2)
of the values will fall within k standard deviations of the mean
Examples:
(1 - 1/12) = 0% ..... k=1 ( 1)
(1 - 1/22) = 75% ........ k=2 ( 2)
(1 - 1/32) = 89% . k=3 ( 3)
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-45
withinAt least
Standardized Data Values
A standardized data value refers to the number of standard deviations a value is from the mean
Standardized data values are sometimes referred to as z-scores
Can be used to compare datasets
Will be addressed in more detail later in the text
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-46
Standardized Population Values
where:
x = original data value
= population mean
= population standard deviation
z = standard score
(number of standard deviations x is from )
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-47
x z
Standardized Sample Values
where:
x = original data value
x = sample mean
s = sample standard deviation
z = standard score
(number of standard deviations x is from )
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-48
s
xx z
Standardized Value Example
IQ scores in a large population have a bell-shaped
distribution with mean = 100 and standard deviation
= 15
Find the standardized score (z-score) for
a person with an IQ of 121.
Someone with an IQ of 121 is 1.4 standard deviations above
the mean
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-49
1.415
100121
x z
Answer:
Using Microsoft Excel
Descriptive Statistics are easy to obtain from Microsoft Excel
Use menu choice:
Data / data analysis / descriptive statistics
Enter details in dialog box
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-50
Using Excel
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-51
Select:
Data / data analysis / descriptive statistics
Using Excel
Enter dialog box details
Check box for summary statistics
Click OK
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-52
(continued)
Excel output
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-53
Microsoft Excel
descriptive statistics output
using the house price data:
House Prices:
$2,000,000500,000300,000100,000100,000
The Sample Covariance
The sample covariance measures the strength of the
linear relationship between two variables (called
bivariate data)
The sample covariance:
Only concerned with the strength of the relationship
No causal effect is implied
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-54
1n
)YY)(XX(
)Y,X(cov
n
1i
ii
Interpreting Covariance
Covariance between two random variables:
cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-55
Coefficient of Correlation
Measures the relative strength of the linear relationship
between two variables
Sample coefficient of correlation:
where
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-56
YXSS
Y),(Xcovr
1n
)X(X
S
n
1i
2
i
X
1n
)Y)(YX(X
Y),(Xcov
n
1i
ii
1n
)Y(Y
S
n
1i
2
i
Y
Features of Correlation Coefficient, r
Unit free
Ranges between 1 and 1
The closer to 1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear
relationship
The closer to 0, the weaker the linear relationship
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-57
Scatter Plots of Data with Various Correlation Coefficients
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-58
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3r = +1
Y
Xr = 0
Using Excel to Find the Correlation Coefficient
Select
Tools/Data Analysis
Choose Correlation
from the selection
menu
Click OK . . .
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-59
Using Excel to Find the Correlation Coefficient
Input data range and select appropriate options
Click OK to get output
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-60
(continued)
Interpreting the Result
r = .733
There is a relatively strong positive linear relationship between test score #1 and test score #2
Students who scored high on the first test tended to score high on second test, and students who scored low on the first test tended to score low on the second test
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-61
Scatter Plot of Test Scores
70
75
80
85
90
95
100
70 75 80 85 90 95 100
Test #1 ScoreT
est
#2 S
co
re
Pitfalls in Numerical Descriptive Measures
Data analysis is objective
Should report the summary measures that best meet the
assumptions about the data set
Data interpretation is subjective
Should be done in fair, neutral and clear manner
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-62
Ethical Considerations
Numerical descriptive measures:
Should document both good and bad results
Should be presented in a fair, objective and neutral manner
Should not use inappropriate summary measures to distort
facts
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-63
Chapter Summary
Described measures of center and location
Mean, median, mode, weighted mean
Discussed percentiles and quartiles
Created Box and Whisker Plots
Illustrated distribution shapes
Symmetric, skewed
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-64
Chapter Summary
Described measures of variation
Range, interquartile range, variance,
standard deviation, coefficient of variation
Discussed Tchebysheffs Theorem
Calculated standardized data values
Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-65
(continued)