This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
,{cuepuel I?4uec Jo seJnseeru eeJql---€poru pue 'uutpeur 'IIseIu eq} lnoqe ,(lleunogut'ir--1e1 ere ,fuqi ..'enlerr luenberg lsoru,, eql Jo ..enlel elppFu,, eql Jo ..onI€A o?erotre,, u€ lnoq€qn aldoed ueqry1 '1urod le4uec € prmoJu dnor8 ol ,{cuepuel lcuqslp e
^\oqs elep Jo sles lsol tr
A)N3CN3r lVUrNff, JO $UnSV3l l L'E
'{ees
,,;,:1 sremsue eql ecr^Jes sJno sI ecroqJ eql Jo sJeluolsnc eql entS plnon soJns€eu eseql
ffi:::::ffi;fid-:, r€rluec eqr reprsuoc ol peeu no .z;?ff#j#"#:H::#"ff#'J;J:l#'Hl':ffiffi'#rEql aJoru op ol poou no,{ 'selqeuel I?cuelunu Surqucsep puu Eurzuetutuns uell\\ 'solqe
The arithmetic mean (typically referred to as the mean) is the most common measure of cen-tral tendency. The mean is the only common measure in which all the values play an equal role.The mean serves as a "balance point" in a set of data (like the fulcrum on a seesaw). you calcu-late the mean by adding together all the values in a data set and then dividing that sum by thenumber of values in the data set.
The symbol X, called,X-bar,isused to represent the mean of a sample. For a sample con-taining n values, the equation for the mean of a sample is written as
X=Sum of the values
Number of values
Using the seies Xr, Xr, . . . , Xnto represent the set of n values and n torepresent the number ofvalues, the equation becomes:
Xr+Xr+.. .+Xn
By using summation notation (discussed fully in Appendix B), you replace the numerator
Xr+ X2* ' ' ' * Xnby the term f *r,which means sum all theX, values from the firstXvalue,l= l
Xt, to the lastXvalue, Xn,to form Equation (3.1), a formal definition of the sample mean.
x=
SAMPLE MEAN
The sample rneaR,: ;. u of the va ues on,*ro uv *re nu;-, "r*ur:
ffii=i"n, ,. : '
..t..i ".t'.,;t..;, '.,r.1.t,,,..,.t,l.f
;,il.,t t1,'. ;,..,f ..;;:.,
','',t. ..t'..i ...ttil . .. :lt t.",: ,.
(3.1)
ff,.,:.*: r:11rl rl;::;::::1;: i,::,t,::::
where;ilir+
ffJ,tii.,i,:lii.:
.r',r:. i,:,i:,:tt,tt::,fi
ffiff#
a *u*nt.
' ' . .in the sample
$ian$!#:si:s{ifi$q*,is&fitfi:l*ii*si*u€!l*!*tifr$i:'a:S$i$:Sfe;$!*lii:}S$!S-*Sp[#_ilqf*i*S,ii#ii i
Because all the values play an equal role, a mean is greatly affected by any value that isgreatly different from the others in the data set. When you have such extreme values, youshould avoid using the mean.
The mean can suggest a typical or central value for a data set. For example, if you knew thetypical time it takes you to get ready in the morning, you might be able to better plan yourmorning andminimize any excessive lateness (or earliness) going to your destination. Supposeyou define the time to get ready as the time (rounded to the nearest minute) from when you getout of bed to when you leave your home. You collect the times shown below for 10 consecutivework days (stored in the data file@S$:
Baron GrowthColumbia Acorn ZFBR Small CapPerritt Micro Cap OpportunitiesSchroder Capital US Opportunities InvValue Line Emerging OpportunitiesWells Fargo Advtg Small Cap Opp Adm
Small CapSmall CapSmall CapSmall CapSmall CapSmall CapSmall Cap
Compute the mean three-year annualized return for the small-cap growth funds with low risk.
SOLUTION The mean three-year annualized return for the small-cap growth funds with lowrisk is 23.61. calculated as follows:
Sum of the values
Number of values
n
)x,,LJ '_ i-|
n
- 16s'3 - 23.6143
7
The ordere d array for the seven small- cap growth funds with low risk is:
19.0 20.8 223 22.4 24.9 26.0 29.9
Four of these returns are below the mean of 23.61. and three of them are above the mean.
The MedianThe median is the middle value in a set of data that has been ranked from smallest to largest.Half the values are smaller than or equal to the median, and half the values are larger than orequal to the median.The median is not affected by extreme values, so you can use the medianwhen extreme values are present.
To calculate the median for a set of data, you first rank the values from smallest to largestand then use Equation (3.2) to compute the rank of the value that is the median.
Mi'b Ar.r
t3#}:,.:.:.:.l;tllit;l::it,:tt: i::
You compute-the median value by following one of two rules:
, Rule 1 If there are an odd nttnrber of values in the data set, the median is the middle-rankedvalue.
; Rule 2If there ate arl even number of values in the data set, then the median isthe averageof the two middle ranked values.
To compute the median for the sample of 10 times to get ready in the morning, you rank thedailv times as follows:
'xtt.rr1 smcco senl€A eseqlJo gc?e osruceq 'selnutul W pw selnurrrr 6E 'Sepotu o $ eJe eJeql
:rldWVS CIZIS-GA6 NV NOUI NVlCSt l SHr gNEndNOJ z' t l ld l lvx3
vv EV 0v 6E 6E SE I€
EIUHI UEIdVHJ 00I
3.3
3.4
rlnnrnmfiiieil,-" and O?
-i5,:- 50th,il@rflEBr: ies,
E:gli-eftons
mrd 3"4) can
i' ;€,^era lly intrn,c esl'c'enti/es;
merce"ntr/e -
'llllll'imrr6 e,3 rra/ue.
3.1 : Measures of Central Tendency 1 0 I
COMPUTING THE MODE
A systems manager in charge of a company's network keeps track of the number of server fail-ures that occur in a day. Compute the mode for the follow,ing data, which represents the num-ber of server failures in a day for the past two weeks:
130326274023363
SOLUTION The ordered array for these data is
001223333346726
Because 3 appears five times, more times than any other value, the mode is 3. Thus, the systemsmar:riger can say that the most common occurrence is having three server failures in a day. For thisdata set, the median is also equal to 3, and the mean is equal to 4.5. The extreme value 26 is an out-lier. For these data,the median and the mode better measure central tendencv than the mean.
A set of data has no mode if none of the values is "most typical." Example 3.4 presents a dataset with no mode.
DATA WITH NO MODE
Compute the mode for the three-year annualized return for the small-cap growth funds([@[@@) with low risk (see page 99).
SOLUTION The ordered arrav for these data is
19.0 20.8 22.3 22.4 24.9 26.0 29.9
These data have no mode. None of the values is most typical because each value appeaxs once.
Cluartiles
Quartiles split a set of data into four equal parts-the first quartileo 01, divides the smallest25.0% of the values from the other 75.0% that are larger. The second quartile, Q2, is themedian-50 .0o/o of the values are smaller than the median and, 50.0o/o are larger. The thirdquartile, Q, divides the smallest 75.0% of the values from the largest 25.0%. Equations (3.3)and (3.4) define the first and third quartiles.l
Therefore, using Rule l, Qris the second ranked value. Because the second ranked value is20.8, the first quartile , Qp is 20.8.
To find the third quariile, Qr:
1/r + l \O, - -''" -' ranked value
43(7 + l)= --)---------11s11ked value = 6th ranked value
4
Therefore, using Rule l, Qris the sixth ranked value. Because the sixth ranked value is 26.0,Q3is26.
The first quartile of 20.8 indicates that25Yo of the returns are below or equal to 20.8 and75o/o are greater than or equal to 20.8. The third quartile of 26.0 indicates thatT5Yo of thereturns are below or equal to 26.0 and25o/o are greater than or equal to 26.0.
The Geometric MsanThe geometric mean measures the rate of change of a variable over time. Equation (3.5)
.defines the geometric mean.
The geometric mean rate of return measures the average percentage return of an invest-ment over time. Equation (3.6) defines the geometric mean rate of return.
To illustrate these measures, consider an investment of $100.000 that declined to a value of$50,000 at the end ofYear 1 and then rebounded back to its orisinal $100.000 value at the end
YoEZ'II sr sJ?ef o1lrl eql roJ xepq 0002 ilessnu eql uI umlerJo el€r uselu cl4eluoeS eql
3.2 VARIATION AND SHAPEIn addition to central tendency, every data set can be characterizedby its variation and shape.Variation measures the spread, or dispersion, of values in a data set. One simple measure ofvariation is the range, the difference between the largest and smallest values. More commonlyused in statistics are the standard deviation and variance, two measures explained later in thissection. The shape of a data set represents a pattern ofall the values, from the lowest to highestvalue. As you will learn later in this section, many data sets have apaltemthat looks approxi-mately like a bell, with a peak of values somewhere in the middle.
The RangeThe range is the simplest numerical descriptive measure of variation in a set of data.
fiThe rtnge is equal to,the largest value,minus,the
Range = Xlurg"rt
smallest va1ue.' , , '
'Xi**llest(3.7)
3.7
To determine the range of the times to get ready in the morning, you rank the data from small-est to largest:
29 3L 35 39 39 40 43 44 44 s2
Using Equation (3.7), the range is 52 - 29 : 23 minutes. The range of 23 minutes indicatesthat the largest difference between any two days in the time to get ready in the morning is 23minutes.
COMPUTING THE RANGE IN THE THREE-YEAR ANNUALIZED RETURNSFOR SMALL.CAP GROWTH MUTUAL FUNDS WITH LOW RISK
The 838 mutual funds (E!!@@ that are part of the Using Statistics scenario (see page96) are classified according to the category (small cap, mid cap, and large cap), the type(growth or value), and the risk level of the mutual funds (low, average, and high). Compute therange of the three-year arcnalized returns for the small-cap growth funds with low risk (seepage 99).
SOLUTION Ranked from smallest to largest, the three-year annualized returns for the sevensmall-cap growth funds with low risk are
19.0 20.8 22.3 22.4 24.9 26.0 29.9
Therefore, using Equation (3.7),the range - 29.9 - 19.0 - 10.9.The largest difference between any two returns is 10.9.
The range measures the total spread inthe set of data. Although the range is a simple mea-sure ofthe total variation in the data, it does not take into account how the data are distributedbetween the smallest and largest values. In other words, the range does not indicate whether thevalues are evenly distributed throughout the data set, clustered near the middle, or clusterednear one or both extremes. Thus, using the range as a measure of variation when at least onevalue is an extreme value is misleadine.
A simple measure of variation around the mean might take the difference between eachvalue and the mean and then sum these difflerences. However, if you did that, you would findthat because the mean is the balance point in a set of data, for every set of data, these differ-ences would sum to zero. One measure of variation that differs from data set to data set squaresthe difference between each value and the mean and then sums these squared differences. In,statistics, this quantity is called a sum of squares (or .S^S). This sum is then divided by the num-ber of values minus I (for sample data) to get the sample variance (S2;. fne square root of th6sample variance is the sample standard deviation (^9).
Because the sum of squares is a sum of squared differences that by the rules of arithmetic,will always be nonnegative, neither the variance nor the standard deviation cqn ever be nega-tive. For virtually all sets of data, the variance and standard deviation will be a positive value,although both of these statistics will be zero if there is no variation at all in a set of data andeach value in the sample is the same.
For a-sample containing n values, xp x2, X3, . . . , Xn, rhe sample variance (given by thesymbol 52) is
q2L)
(x, - X)2 + (x, - X)2 + ... + (x, - X)2
Equation (3.9) expresses the sample variance;-:rn-ation notation, and Equation (3.10)expresses the sample standard deviation.
. 'u,til',
,,'(X'-',
If the denominator were n instead of n - l, Equation (3.9) [and the inner term in Equation(3.10)] would calculate the average of the squared differences around the mean. However, n - 1is used because of certain desirable mathematical properties possessed by the statistic 52 that
Because the variance is in squared units (in squared minutes, for these data), to compute
the standard deviation, you take the square root ofthe variance. Using Equation (3.10) on page
l07,the sample standard deviation, S, is
= 6.77
This indicates that the getting-ready times in this sample arerclustering within 6.77 minutes
around the mean of 39.6 minutes (i.e., clustering between X - lS:32.83 and X + 1S:
46.37).ln fact, 7 out of l0 getting-ready times lie within this interval.Using the second column of Table 3.1, you can also calculate the sum of the differences
between each value and the mean to be zerc. For any set of data, this sum will always be zero:
- X) = 0 for all sets of data
This property is one of the reasons that the mean is used as the most cofilmon measure of cen-
ffal tendency.
COMPUTING THE VARIANCE AND STANDARD DEVIATION OF THETHREE-YEAR ANNUALIZED RETURNS FOR SMALL-CAP GROWTH MUTUAL
FUNDS WITH LOW RISK
The 838 mutual fu"ds (E@tf,@El@ that are part of the Using Statistics scenario (see page
96) are classified according to the category (small cap, mid cap, and large cap), the type
(gro6h or value), and the risk level of the mutual funds (low, average, and high). Compute the
variance and standard deviation of the three-year annualized returns for the small-cap growth
funds with low risk (see page 99).
SOLUTION Table 3.2 illustrates the computation of the variance and standard deviation for
the three-year annualizedreturns for the small-cap growth funds with low risk.
'ueeru eql ol e^rlelo t elep eql ur JolJqr semseoru '13 pquf,s eqt ,{q pelouep ouorleue,rgo luercrJJooc oqJ '"1€p reyncrged eqt
erpJo suilel ur rreqt JorDeJ e8eluecred e se pesserdxe sfelqe sr l€ql uoq€lJ? LJo a"msoawa/ e sr uoIlaIJEa Jo luolr.rJJaoc aq1 peluessrd uo4euerr 3o semsseru snor,rard eql e{llun
'orez lenbe il" IIII!\ uorlerlep pJ€puels pue ,ecueu?,r ,e8uer ep1agur oe8u€r eql'(ewp oql ul uoq€rJ€A ou sr eJeql 1eq1 os) e{u"s eql ile er€ senlel oqlJl r
'uo4er^ep prcpuels pu? 'ocueuett ,eauet
:gnbrelur 'e8uur eql Jollerus eqtr 'snoeueSoruoq Jo pep4uecuoc en ewp eql eJour eqr r'uorl€r^ep pJepu€ls pue .ecu€rJ€A
eyprenbralur oe8uer eql re8re1 eql pesredsrp Jo lno peerds era etep eql oJorrr eqr r
serns?e14 errrldrrcsoc IecrrerunNl EEuHI UIIJdVHJ OI I
3.2: Yaiationand Shape I I I
For the sample of l0 getting-ready times, because X : 39.6 and. S : 6.77, the coefficient ofvariation is
= (q)roo%= r7.ro%[3e.6 )
cT/ -( 3 'g \r/w = lfr )r00%
_ 15.0%
For volume, the coefficient of variation is
#
cv = [+)'oo%
For the getting-ready times, the standard deviation is I7.l% of the size of the mean.The coefficient of variation is very useful when comparing two or more sets of data that
are measured in different units, as Example 3.10 illustrates.
3.10 COMPARING TWO COEFFICIENTS OF VARhilON WHEN TWO VARIABLESHAVE DIFFERENT UNITS OF MEASUREMENT
The operations manager of a package delivery service is deciding whether to purchase a new fleetof trucks. When packages are stored in the trucks in preparation for delivery you need to considertwo major constraints-the weight (in pounds) and the volume (in cubic feet) for each item.
The operations manager samples 200 packages and finds that the mean weight is 26.0pounds, with a standard deviation of 3.9 pounds, and the mean volume is 8.8 cubic ieet, with astandard deviation of 2.2 cubic feet. How can the operations manager compare the variation ofthe weight and the volume?
SOLUTION Because the measurement units differ for the weight and volume constqaints, theoperations manager should compare the relative variability in the two types of measurements.
For weight, the coefficient of variation is
Cvr - 25.0%
Thus, relative to the mean, the package volume is much more variable than the package weight.
Z ScorasAn extreme value or outlier is a value located far away frorn the mean. Z scores are useful inidentifying outliers' The larger the Z score,the greater the distance from the value to the mean.The Zscore is the difference between the value and the mean, divided by the standard deviation.
= ('-r-\roo%\ .8.8 i
, .2,,*,
, : : : : I , , ' , '
z scoRES
Q,l2)
For the time-to-get-ready data, the mean is 39.6 minutes, and the standard deviation is 6.77minutes' The time to get ready on the first day is 39.0 minutes. You compute the Z scorcfor Day1 by using Equation (3.12):
)slu A 01 Hrn scNnJ lvnlnN Hr/v\ou9 dvf,-]]vlls uolsNunl3u oSzllvnNNv uVSA-33UHI 3Ht lo sluoSs z 3HL 9N|Indno)
E
iI
L L' t 11d hlvxl I6
fi'."'"'.,"."-,".[
89'0-s9'0LT'T-90'099'060'0-g8'r0s'0LS'I_60'0-
LL'99'68
9EvvTE0vvv6E7,9EV676E
uopul^op prspuulsuuat\l
arcJs, Z W) outtr
'srerTlno peJeprsuoc eq ol uolrsllJc lBql lelu seu4 eqlJo euo\l 'Q'[1 u€ql rel€eJts ro 0'€-illnr;: ssal sr lr Jr Jelllno ue peJeprsuoc sr eJoos 7 e'elnt yereue8 € sV 'selnulru 6Z selll. r(peer 1sB
),rj ; -rrrl oql qcr+!\ uo 'Z teO.JoJ ,g'I- s€,l\ eJocs Z lso1rcl oqJ, 'selnulru ZS sal{peer 1eB ol erul}
;iilrr: rJrq.ry\ uo '7 ,ftq JoJ €g'I sr erocs Z lse8rel eql 's,fup 0I II€ roJ seJocs Z eql s1l\or{s €'€ elqeJ,
soul l f{peey-6u!}}eD 0 L
oL.ll lo+ sorDs z
t 'g : l18VI
sernsseN e^rldlrcsec leclrelunN EEUHJ UEIdVHJ ZII
3.2: Yaiationand Shape
Shape influences the relationship of the mean to the median in the following ways:Mean < median: negative, or left-skewedMean: median: symmetric, or zero skewnessMean > median: positive, or right-skewed
Figure 3.1 depicts three data sets, each with a different shape.
113
ffi
ffi
ffi
3"1
fri;-'l:,: n of th ree15 ] -e i l ' tng In
Panel ANegative, or left-skewed
Panel BSymmetrical
Panel CPositive, or right-skewed
The data in Panel A are negative, or left-skewed. In this panel, most of the values are in theupper portion of the distribution. A long tail and distortion to the left is caused by someextremely small values. These extremely small values pull the mean downward so that themean is less than the median.
The data in Panel B are symmetrical. Each half of the curve is a mirror image of the otherhalf of the curve. The low and high values on the scale balance, and the mean equals the median.
The data in Panel C are positive, or right-skewed. In this panel, most of the values are in thelower portion of the distribution. A long tail on the right is caused by some extremely large values.These extremely large values pull the mean upward so that the mean is greater than the median.
AL EXPLORATIONS Exploring Descriptive Statistics
iiitoiinuu -;i61 use the Visual Explorations Descriptiveaililillliltmic$ trocedure to see the effect of changing datavalues
diagram for the sample of 10 getting-ready times usedthroughout this chapter.
Experiment by entering an extreme value such as 10minutes into one of the tinted cells of column A. Whichmeasures are affected by this change? Which ones are not?You can flip between the "before" and "aftef' dtagrams byrepeatedly pressing Crtl + Z (undo) followed by Crtl + Y(redo) to help see the changes the extreme value caused inthe diagram.
i;res of central tendency, variation, and shape. Openadd-in workbook (see Appendix D)
lrffirrvr: \isualExplorations t Descriptive Statistics: --1003) or Add-ins + Visual Explorations t
h e Statistics (Excel 2007) from the Microsoft"mreriu bar. Read the instructions in the pop-up box
il;ur*mation below) and click oK to examine a dot-scale
sod go ecueprlo eruos pe./Koqs slelel >lslJ er{} Jo r{cBE 'sdnor8 eeJr{}
tffiilm, and coefficient of variation.Z gcores. Are there any outliers?shape of the data set.
Suppose the rate of return for a particularduring the past two years was l0% and
Compute the geometric mean rate of return.of return of 1 0o/o is recorded as 0.10, and affi3'0f/o is recorded as 0.30.)
Concepts
The operations manager of a plant thats tires wants to compare the actual
diameters of two grades of tires, each of
3.2: Variation and Shape 115
the results represent-ranked from smallest
Grade Y
568 s70 s75 578 584 573 s74 575 577 578
a. For each of the two grades of tires, compute the mean,median, and standard deviation.
b. Which grade of tire is providing better quality? Explain.c. What would be the effect on your answers in (a) and (b)
if the last value for grade )'were 588 instead of 5 7g?Explain.
3.7 The datain the file @ contain the pricefor two tickets with online service charges, large popcorn,and two medium soft drinks at a sample of six theatrechains:'
$36. 1 s $3 I .00 $3 5.0s $40.2s $33.75 $43.00
source: Extractedfrom K. Kelly, "The Multiplex (Inder siege," TheWall Street Journal , December 24-25, 200i, pp. pI, p5.
a. Compute the mean, median, first quartile, and thirdquartile.
b. Compute the va:nance, standard deviation, range,interquartile range, and coefficient of variation.
c. Are the data skewed? If so, how?d. Based on the results of (a) through (c), what conclusions
canyou reach concerning the cost of going to the movies?
3.8 A total of 92,000 new single-family homes were soldin the united States during February 2006. The medianprice of the homes was $230,400, a decrease of 2.9% fromFebruary 2005 (U.S. Census Bureau, www.census.gov).Why do you think the Census Bureau refers to the medianprice instead of the mean price?
3.9 The data in the file @ contain the bouncedcheck fees, in dollars, for a su*pt. of 23 banks for direct-deposit customers who marntain a$100 balance:
26 28 20 20 2t 22 25 25 18 2s 15 20
18 20 2s 2s 22 30 30 30 ls 20 29
Source: Extractedfrom "Tlte l{ew Face of Bankiftg," June 2000.Copyright @ 2000 by Consumers (Jnion of (J.5., Inc.,yonkers, Nyr0703-r057.
L. Compute the mean, median, first quartile, and thirdquartile.
b. Compute the variance, standard deviation, range,interquartile rarrge, coefficient of variation, and Z scores.
c. Are the data skewed? If so, how?d. Based on the results of (a) through (c), what conclusions
can you reach concerning the bounced check fees?
tires of each grade was selecte{ anding the inner diameters of the tires,to largest, are as follows:
Grade X
tf f i
to be 57 5 millimeters. A sample of five
6L'E 6r '9 9V'9 Zy S 8€'0 0I '9 js 'v
ff i i l i lE V9'E VE Z LL'V E I '9 Z0'E S9'S rZ'V
the r-ariance, standard deviation, tange,range, coefficient of vanattofl, and Z
fiiMG thEre any outliers? Explain.skewed? If so, how?
rvalks into the branch office during theshe asks the branch manager how long she
tm u-ait. The branch manager replies, 'Almostfiqxi's than five minutes." On the basis of the( a p through (c), evaluate the accuracy of this
that another branch. located in a residen-rurilsn concerned with the noon-to- 1 p.m. lunch
w,ururing time, in minutes (defined as the timeqnters the line to when he or she reaches the). of a sample of 15 customers during this
over a period of one week. The results
5 ar) 8.02 5.79 9.73 3.82 9.01 9.35
ffi"ffi8 5.64 4.09 6.t7 g.gI 5.47
tfue mean, median, first quartile, and third
the variance, standard deviation, range,.of variation. Arerange, and coefficient
mwC .trwm finance.yahoo.com, April I 7, 2006).
rhe geometric mean rate of increase for the
S 1,000 of GE stock at the start of 2004.mmn ralue at the end of 2005?ttttrne result of (b) to that of Problem 3.18 (b).
lnternational, Inc., develops, manufac-su,[,[,s nonlethal self-defense devices known as
3.2: Variation and Shape ll7
tions institutions, and the military, TASER's popularityhas enjoyed a roller-coaster ride. The stock price in 2004increased 36r .4%, but in 2005, it decreased 78.0%(Source: Extracted from finance.yahoo.com, April I 7, 2 006) .a. Compute the geometric mean rate of increase for fhe
two-year period 2004-2005. (Hint: Denote an increaseof 3 61.4% as Rl : 3 .614.)
b. If you purchased $ 1,000 of TASER stock at the start of2004, what was its value at the end of 20A5?
c. compare the result of (b) to that of problem 3.17 (b).
3.19 In 2002, all the major stock market indexesdecreased dramatically as the attacks on glll drove stockprices spiraling downward. Stocks soon rebounde4 butwhat type of mean return did investors experience over thefour-year period from 2002 to 2005? The data in the fol-1owingtab1e(containedinthedataf i1eEBI@)repre-sent the total rate of return (in percentage) for the DowJones Industrtal Average (DJIA), the Standard & Poor's500 (s&P 500), and the technology-heavy NASDAeComposite (Nasdaq).
Year DJIA s&P 500 NASDAQ
2005200412001200t/
-0._63.4
30.0* 16.8
2.9g.L
26.4-24.2
r .48.6
50.0-3 1.5
ruruthers? Explain.skewed? If so" how?
walks into the branch office during thehe asks the branch manager how long he can
tM, wrait. The branch manager replies, 'Almostiless than five minutes." On the basis of the(iat nhrough (c), evaluate the accuracy of this
Electric (GE) is one of the world's largestm fuelops, manufactures, and markets a wide
. including medical diagnostic imagingcngines, lighting products, and chemicals.
ilare, NBC Universal, GE produces and deliv- Year Platinum'ierision and motion pictures. In 2004., GE's20.6%, but in2005, the price dropped 1.4% 12.3
5.736.024.6
Mcriod 2004-2005. (Hint' Denote an increase;msrRt:0.206.)
source: Extracted from finance.yahoo.com, April I 4, 2006.
a. Calculate the geometric mean rateof return for the DJIA,S&P 500, and Nasdaq.
b. What conclusions can you reach concerning the geomet-nc rates of return of the three market indexes?
c. Compare the results of (b) to those of Problem 3.20 (b).
3.20 In 2002-2005 precious metals changed rapidly invalue. The data in the following table (contained in the dataf i1e@@representthetota|rcteofreturn(inpercent.age) for platinum, gold, and silver:
Gold Silver
200520042003i2402
17.84.6
19.92,5.6
29.514,:C77 i&'."3'3
Source: Extracted from www.kitco.com , April 14, 2006.
a. Calculate the geometric mean ruteof refurn for platinum,go14 and silver.
b. What conclusions can you reach concerning the geomet-ric rates of return of the three precious metals?
c. Compare the results of (b) to those of Problem 3.19 (b).ing primarily to law enforcement, correc-
( ffiffiolIJ eqt ur poureluoc er€mm :q1) '9002'lt,ftenue1go se (slesse l€rolJo sruel ur) spurg puoq lse8rel e^rJ eql roJ runler.;m;i-auo eql sur€luoc qclq \'g't elquJ,trerleJ lsrr;'sreleure;ed eseql eleJlsnllr dleq o1
'uorl€rlep pJspuels uorleyndod pue ,ecueuel uorlepdod ,ueeur uorl
--,lod eql :sJelelu€red uotlelndod err.tlducsep eeJql lnoqe rrJ€el ilLr\ nof 'uorlcss sql uI .uor]"fl'-.'lod € JoJ seJns?our freuuns 'sta1awo"md lerdrelur pu€ e1elnclec ol peeu notl,,uogn1ndodililrLra ue JoJ slueluoms"elu leclJerunu slueserder les elep mof g1 'a1dwos e JoJ uorl"lJe1 pue,':r-:puel lg4uecJo sergedord eql paqrJcsop teqt ffillsrlqrs snou?A lueserd z'tpw I.€ suorlces
NOUVlndOd V UOI StUnSVSru 3^trdtutrs]q lv)tuSwnN E'g
sernseel4i enrldrrcsoq lecrreunN EEUHJ UEIdVHJ g I I
3.3: Numerical Descriptive Measures for a population 119
The Population Variance and Standard DeviationThe population variance and the population standard deviation measure variation in a popu-lation. Like the related sample statistics, the population standard deviation is the square rool ofthe population variance. The symbol o2,the Greek lowerc aselerter sigma squared, represents thepopulation variance, and the symbol o, the Greek lowercase lelter sigma,represents the popula-tion standard deviation. Equations (3.14) and (3.15) define these parameters. The denominatorsfor the right-side terms in these equations use N and not the (n - l) term that is used in the equa-tions for the sample variance and standard deviation [see Equations (3.9) and (3.10) on page tOZ1.
*'TP"on mean
o2= (3;14)
ffiere
POPULATION STANDARD
l"r : population mean
O=
1r xi:
Lr ' , - ,u)r -i=!
ith value of the var rrble X
summation of all the squared differences between the
:Ji;,il''nr
I t t i -p)ri=7
ff(3.15)
To compute the population variance for the data of Table 3.5, you use Equation (3.14):
Thus, the variance ofthe one-year returns is 0.46 squared percentage return. The squaredunits make the variance hard to interpret. You should use the standard deviation that isexpressed in the original units ofthe data (percentage return). From Equation (3.15),
{,?loc Jo secrmo zI veql ssel uleluoc ilF& u€c e pq1 ,!e>11 dre,r 1t s1 's1q8tem-1llJ Jo uo4
ry eql eqrrcseq 'pedeqs ileq eq otr urtou>I st uo4eyndod eql'20'0Jo uoll€Ilep prupuets €s'ouno g0'ZI Jo lqElemlg u€eru € effiq ol ux\otDl sI elocJo su€c ectmo-Z1go uo4epdod y
3.3: Numerical Descriptive Measures for a Population l2I
You can use this rule for any value of ft greater than 1. Consider k : 2. The Chebyshev rulestates that at least 1t - 1ttZ121x 100%io:75% of the values must be found within +2 standarddeviations of the mean.
The Chebyshev rule is very general and applies to any type of distribution. The rule indi-cates at least what percentage of the values fall within a given distance from the mean.However, if the data set is approximately bell shaped, the empirical rule will more accuratelyreflect the greater concentration of data close to the mean. Table 3.6 compares the Chebyshevand empirical rules.
l r i l l l l i l r
ililillll\\\h*
illllliltffililll
" iilllllililll'
rrrrrliiillllli'''
ililrfilulltll]r
hlllrtttttillllrir
m||m|tfill[l|rr
'"i,rililliittlllll[[,,-,S: 3 6
i i r r r t l l l i ' i , ' . r , ,11111, l i t : : " -" . , ; , , ' - , +fCUnd
o/o afValues Found in Intervals Around the Mean
IntervalChebyshev
(an distribution)Empirical Rule
(bell-shaped distribution)
(p-o,Fr+o)
0r-26,p+2o)(p-3o,p+3o)
At least 0%At least75%At least 88 .89%
Approximately 68%Approximately 95%Approximately 99.7%
,'J. E 3.1 3 USING THE CHEBYSHEV RULE
As in Example 3.12, a population of 12-ounce cans of cola is known to have a mean fill-weightof 12.06 ounces and a standard deviation of 0.02. However, the shape of the population isunknown, and you cannot assume that it is bell shaped. Describe the distribution of fill-weights. Is it very likely that a can will contain less than 12 ounces of cola?
SOLUTION p + o - 12.06+ 0.02 : (12.04, 12.08)
p + 2o - 12.06 + 2(0.02) : (12.02, t2.10)
p + 30 - 12.06 + 3(0.02) - (12.00, 12.12)
Because the distribution may be skewed, you cannot use the empirical rule. Using theChebyshev rule, you cannot say anything about the percentage of cans containing betweent2.04 and 12.08 ounces. You can state that at least 75Yo of the cans will contain between 12.02arrd 12.10 onnces and at least 88.89% will contain between 12.00 and 12.12 ounces. Therefore ,between 0 and l l.Il% of the cans will contain less than 12 ounces.
You can use these two rules for understanding how data are distributed around the meanwhen you have sample data. In each case, you use the value you calculate d for X in place of trrand the value you calculated for ̂ S in place of o. The results you compute using the sample sta-tistics are approximations because you used sample statistics ( X, ^9) and not population param-eters (p, o).
the Basics
3"21 The following is a set of data for a popula-. I \ \ i rh i / -10:
8 3 62 98
::e population mean.:re population standard deviation.
3.22 The following is a set of data for a popula-tion with l/: 10:
7 5 6664 8 6931t
a. Compute the population mean.b. Compute the population standard deviation.
'(V prur- g secuere;er) 1o1d rolsqm.-pu?-xoq oqt pu€ Ar€luums
-€rAOp pJepuels €+ lo 'Z+ 'IT ulqllzlr sldtecerxal seles
^(pepenb el€q sosseulsnq esel{1 Jo uolgodord }eI{16 'q'uoqelndod sFil
JoJ uoll€lnop pJ€pu€ls pue 'eoueuen 'u€eul oql olndulo3 'e
9'8 6 '8 I '0I 9 ' L S'0I
8 ' L 9 ' I I IU S' I I E '6
9'0I tz l t 'OI E'6 9 ' ( ' r
8 '7,1 0 '0I z '6 6 'Z1 0 '0I
9 'L s'9 9'ZI I 'S I 9 ' I I
I ' I I Z 'OI I ' I I
L '8 8 ' I I 0 '8
E'L Z' IT O'€I
0 ' I I L '9 0 '€ I
9 '6 I ' I I € 'OI
:elsJol
l€ql rn sluoulqslqelse sseusnq 0g II€ f,q 9OOZqrml I Eul
-pue poged erp roJ e{eT rl€C 3io eEelll1 oII} Jo rollo4duloc
eql ol poilFuqns (sre11op Jo spu€snoql ur) sldlaoer xel sel€s
,(1repenbeql}uesoJder@eIIJeI{}vI .e lepeqIez.e
sldasuoS eql Fu;{ddY
1141 enqdlrcseq I€cIreIunN 1I1IUHI UEIdVHJ ZZI
pury€4uoJ ,,qllepld
V:r{s16 spund u€clretuv
V:VOI spun{ u€clreurv
^q ixopul 00S P.ren8ue.6
y iorg spunC u€clroruv
8'6 6'6s'6 9 '0I9 'Zr E'S
€'0I t '8s'vl 0'6
r '09v'290'Lgi'eBg',tL
s lseErel
slessv punf,
'spury
o^lJ or{} Jo 'sre11op Jo suoIIIIq uI 'slessu
@ollJ or{} ur ewp oqJ, g?'tlueseJder
, I
; illilril
Wffiruw ffiHqdw-ffi wmhwr Swffisffiffiryr
A five-number summarv that consists
Xsm.allest
provides a way to determine the shapeships among the "five numbers" allows
3.4: Exploratory DataAnalysis 123
of
Qt Median Qt Xrurg"rt
of a distribution. Table 3.7 explains how the relation-you to reco gnrze the shape of a data set.
i i i[ruiriifl l|,i"! 3.7 ?elationships Among the Five-Number Summary and the Type of Distribution
Tlpe of DistributionifillmN$llll
ilillitltttttun',
,ttlitliililulrmmnmutufftlnlr [l ffir
lil]Iltllh, iillnlululii*rnu,. Irom Xsmallest
Trru"-I;:ll \-efSUS
l; :rOm the
r; * ftom Xsmallest,(ms*is the
i: in O. to4-J
: ,e from Qltor,iinm ", ensus the:: nn the
i t !u
IPLE 3.14
Left-Skewed
The distance from
Right*Skewed
Q3 to xlurn.rt'
Both distancesare the same.
Both distancesare the same.
Both distancesare the same.
The distance fromXsmalt"st, to the medianis less than thedistance from themedian to Xtarg..t.
The distance fromXsmattert to Q, is lessthan the distance from
Symmetric
For the sample of 10 getting-ready times, the smallest valu e ts 29value is 52 minutes (see page 100). Calculations done in Section 3.139.5, Qt - 35, and Qt: 4L.Therefore, the five-number summary is
Xsmall.st to the median isgreater than the distancefrom the median to1/Xlargest'
The distance fromXsmaltest to Ql ts greaterthan the distance frorn
Q3 toxlurg.rt'
The distance from Ql tothe median is greaterthan the distance fromthe median to Qs.
The distanc e from Q,to the median is lessthan the distance fromthe median to Qz.
minutes and the largestshow that the median -
29 35 39.5 44 52
The distance fromX.-uu"., to the median (39.5 - 29: 10.5) is slightly less than the distancefrom the median to Xl.g"rt 62 - 39.5 : 12.5). The distance from Xr.uur, ,to et e5 - 29 : 6) isslightly less than the disiance from Qrto Xl*n.rt 62 - 44: 8). Therefoii,-ihe getting-ready timesare slightly right-skewed
COMPUTING THE FIVE.NUMBER SUMMARY OF THE THREE.YEAR ANNUALIZEDRETURNS FOR SMALL-CAP GROWTH MUTUAL FUNDS WITH LOW RISK
The 838 mutual funds ([!E[[@ED that are part of the Using Statistics scenario (see page96) are classified according to the category (small cap, mid cap,and large cap), the type(growth or value), and the risk level of the mutual firnds (low, average, and high). Compute thefive-number summary of the three-year annualized returns for the small-cap growth flrnds withlow risk (see page 99).
SOLUTION From previous computations for the three-year annualized returns for the small-cap growth funds with low risk (see pages 100, 102, and 103), the median : 22.4, e1: 20.8,*fi, Qz: 26.0.In addition, the smallest value in the data set is 19.0, and the largest value is29.9.Therefore. the five-number summary is
n*lr"rsrp eqr u€qr ssel sr (g'1 : 0'6I - g'oz) IA o1Nelleursx.'ou ecu?trsrp :11- -:^l{:l,HiPfS
nuLrro{I G't:v'Zz- e'Adecu€lslp eqtru€ql sse1 sr (7'g : 0'6I - V'zOu€Ipeu eqlol " x
[&j :tr eJrr€lsrp eql 'sseua\e{s o}€nlele ol pesn ole L'E eIq€I ur pe}slT suosu€dluoc eeJql oql
sernssel I e^4dlrcso0 l€clrelunl\{ Ef,UHI UEIdVHJ VZI
3.4: Exploratory DataAnalysis I25
i i ipE 3.4
:-cel box-' , , : - c lots of the
'" : - :^ ' tual ized" I *^u'-r iSk,
-" : . and high-' . . : t - lds
Tfires'Year Annudized Return By Risk
r i ' , ' , l i l l
i , , l i l l l l l l l l l l r
l t l l
illlllllilililrr,
" Average
ffi-
- i?5"3
: i : "" '- s<er plotsiililli, tl'l*':'*":;1 : I C i n g
l l t ' ' t : i " : - -CUf
Figure 3.5 demonstrates the relationship between the box-and-whisker plot and the poly-gon for four different types of distributions. (Note: The area under each polygon is split intoquartiles corresponding to the five-number sunmary for the box-and-whisker plot.)
h-- ffi --jPanel A
Bel l -sha ped distr i but ion
L** **_r**T*L**tf L**L*J E
Panel BLeft-skewed d istri bution
Panel DRectangu la r d istr ibution
h--ffiPanel C
Rig ht-skewed distr i bution
Panels A and D of Figure 3.5 are symmetrical. In these distributions, the mean and medianare equal. In addition, the length of the left whisker is equal to the length of the right whisker,and the median line divides the box in half.
Panel B of Figure 3.5 is left-skewed. The few small values distort the mean toward the left tail.For this left-skewed distribution, the skernness indicates that there is a healy clustering of values atthe high end of the scale (i.e., the right side);75% of aJl values are found between the left edge ofthe box (01) and tne end of the right whisker (Xl.*.J.Therefore, the long left whisker contains thesmallest 25%o of Ihe values, demonstrating the distortion from slzmmetry in this data set.
Panel C of Figure 3.5 is right-skewed. The concentration of values is on the low end of thescale (i.e., the left side of the box-and-whisker plot). Here, 75Yo of all data values are foundbetween the beginning of the left whisker (Xr.urr"rt) and the right edge of the box (Q), andtheremaining 25%o of the values are dispersed along the long right whisker at the upper end of thescale.
eAJ ree.(-ent; e PTffi'CO-:ro 'lunocc€ 1o>lJeur .(euoul erpJo plelf eqt JoJ Suo4
i , r:-inch located rn a commercial district of a:: : :r3d an improved process for serving cus-
-_ : -e noon-to-1:00 p.m. lunch per iod. The":; - :ninutes (defined as the time the customer
ilnri " : n hen he or she reaches the teller window),imllllt : - 5 customers during this hour is recordedilrlrir:r"" - : rrfle week. The results are contained in the
iril'lii 'irrh:, and are listed below:
: : 3.02 5.13 4.77 2.34 3.54 3.20
0 3 8 5.12 6.46 6.19 3.79
located Ln a residential are1 is also con-noon-to-1 p.m. lunch hour. The waiting
3.5: The Covariance and the Coefficient of Correlation I27
time, in minutes (defined as the time the customer entersthe line to when he or she reaches the teller window), of asample of 15 customers during this hour is recorded over aperiod of one week. The results are contained in the datafile ffi and are listed below:
9.66 5.90 8.02 5.79 8.73 3.82 8.01 8.35
10.49 6.68 5 .64 4.08 6.n 9 .9T s .47
a. List the five-number summaries of the waiting times atthe two bank branches.
b. Construct box-and-whisker plots and describe the shapeof the distribution of each for the two bank branches.
c. What similarities and differences are there in the distrib-utions of the waiting time at the two bank branches?
3.5 THE COVARIANCE AND THE COEFFICIENT OF CORRELATIONIn Section 2.5, yott used scatter plots to visually examine the relationship between two numeri-cal variables. This section presents lnryo numerical measures that examine the relationshipbetween two numerical variables: the covariance and the coefficient of correlation.
The Covariance
The covariance measures the strength of the linear relationship between two numerical variables(X and Y). Equation (3.16) defines the sample covariance, and Example 3.16 illustrates its use.
THE SAMPLE COVARIANCE
- x)v, -v1cov(X,Y)
/1";"'1(3.16)
; : : / l : , .
Xt:i,:,i+r
t E 3.15
'ar tt{ tfl
' - t , , ' lU
COMPUTING THE SAMPLE COVARIANCE
In Section 2.5 on page 58, you examined the relationship between the cost of a fast-food ham-burger meal and the cost of two movie tickets in 10 cities around the world (extracted from K.Spors, "KeepingUp with . . . Yourself," TheWall StreetJournal,Apilll,2005,p.R4).The datafileEltEflffifficontains the complete data set. Compute the sample covariance.
SOLUTION Table 3.8 provides the cost of a fast-food hamburger meal and the cost of twomovie tickets in l0 cities around the world.
, , I . i .a.{ . l . j I j r r i . i i t } r r r r i r j . ; r j l r i l r r r t r r t l l t t t t t I I r : ! . . , r , , . . , . , i . . . , . . .dLaa*
3.5: The Covariance and the Coefficient of Correlation 129
In Panel A of Figure 3.7, there is a perfect negative linear relationship between X and Y.Thus, the coefficient ofcorrelation, p, equals -1, and whenXincreases, fdecreases in a per-fectly predictable manner. Panel B shows a situation in which there is no relationship betweenX and Y. In this case, the coefficient ofcorrelation, p, equals 0, and as Xincreases, there is notendency for Ito increase or decrease. Panel C illustrates a perfect positive relationship wherep equals +1. In this case, Iincreases in a perfectly predictable manner whenXincreases.
When you have sample data, the sample coefficient of correlation, r, is calculated. Whenusing sample data, you are unlikely to have a sample coefficient of exactly +1, 0, or -1. Figure 3.8presents scatter plots along with their respective sample coefficients of correlation, r, for six datasets. each of which contains 100 values of X and Y.
.300 "t4t "?0$
Panel B (r : -0.6)
+
.} t
t
30s {00 $m
Panel D (r :0.3)
.{0 .2S U * dO 6S _
* t00 120 140 t60 i80A
Panel F(r :0.9)
+
t
+a
l+
+
It
o
0lr
- J, i
&8
r- ,: r, is created from Microsoft Excel and their sample coeff ic ients of correlat ion, r
T,( 4 * !i)
ilr'fi
= '{S
- x,s,
; G,'y)troi
T?.
, (X -t*rT
(j - I.r)(x -l::li:,,:.lr ::::: i , !: I
!#l
t't')lsxs
qt--" t
NOrrvllUUOf JO i*= r lgaoe
sJerlm
,,,.,,,,... ir,,,,,,1,,,,,,, i ,,, l
!t'tdtAtvs IHI
'esn slr sele4r 1 1 'g eydurexg pue '"r 'uopulo-rroJ Jo luoIJIJJaor eldures oqt seurJep (1 1
-ro uorl"Insl?c eql uI poJeprsuoc 10u elqsrJe^ pJlql ? Jo iceJJe eql ,tq 'ecueqc ,(q ,{ldrurspord eq uec uorleleJroc Euorls v 'olqerJ€^ Jerllo eql ur e8ueqc eqr pasnDJ elqerJgl euo
JnleA eql ul e8ueqc eql let{} 'sI leql-lceJJo uorlesnsr e sr oJeql leql e,rord }ouu€o euol€rcleJJoJ'esodrnd uo pesn sen Burpron suqT's\calla puy sasnDc se lou pue sanuapualleqrrcsep ,,(1e1ereqr1ep ere,1. sdrqsuorl€lor erll 'g'€ ernSrg Jo uorssncslp eql uI
7 Jo senL aEJsl r{}r^\ pelercoss" eq o} puelxJo senlsl e3re1 eqt pue
"{Jo
sonl"A Ilerus qlr^\ perred'Jr puel,YJo sanle^ IIsIus osn€coq uorleleJJoc Jo sluercrJJeoc e,rrllsod e^Bq l€ql sles €lepbp g q8norql c sleued 7 Jo senle re8rel eqr qrrrvr perred eq orxJo senl?A lleurs eql roJnpuel lqEqs e fpo sr eJoqr pue '€'0- : "r'4eem,{re,rr sr,J puexuoe \leq dqsuor1e1", rl"ur1'J Ieu"d uI 'v leu"d ul leqt se e,n1e3eu s? lou sr g leued ur uorlsleJJoc Jo luercrJJeoc eql
q.[ v leu?d ur leql se 3uo4s se rou sr g Ieued ar f, pt:r- x uee,/rueq drqsuorlelsr Jeeurl eqJitr SOftl?A eErel qlr,l. perred eq otr puelxJo senl€A llslus eql pue .g.0- o1 lenbe uo4elerro":uorcrJJeoc e 0A€rI g Ioued lu:'erep eq7'Tcattad se peqrJcsep eq touu€J
^ prnxuee geq
r?rJosse eql os 'eury lq8rerls e uo IIeJ IIe lou op etr?p eqJ Z Jo senl€A II?us qlr^\ perredol puetrxJo senle^ e8rel eqt 'esum4r1 'eEre1 eq o11 JoJ fcuepuel 8uor1s .(rerr e sr ereql ,aienl€A llerus JoJ leql oes uec no^ '6'0- sl ',t 'uotlelettoc Jo luarcrJJeoc ar{l ,y
1aue4 u1
I= l
sernseel4 errrldrrssec IecrrerunN aEuHI uiIJdvHJ 0E I
3.5: The Covariance and the Coefficient of Correlation 131
. . l l i
i , , i i i ' r l l l i r l
) : l - l l i
" " r l l l l l l l l l l r
Itl l lt l ltr
,r t |{ l l rrrrJ&"- f 3 "17
{r fi,&,v
COMPUTING THE SAMPLE COEFFICIENT OF CORRELATION
Consider the cost of a fast-food hamburger meal and the cost of two movie tickets in 10 citiesaround the world (see Table 3.8 on page 127). From Figure 3.9 and Equation (3.17), computethe sample coefficient of correlation.
'r #tf}*$*ftY{*t* I Ht$**Sffiffi#|"{&tue1i, ffi${t}
F,S?.s.F$l4,4S,4-S$.S;*Si4.SS's,f i
4"$?,3.$S;
Kt*ffi$,'W*S-*.***#s,ss3ds,ts3fir#ws"3tr$.s
The cost of a fast-food hamburger meal and the cost of two movie tickets are positivelycorrelated. Those cities with the lowest cost of a fast-food hamburger meal tend to be associ-ated with the lowest cost of two movie tickets. Those cities with the highest cost of a fast-foodhamburger meal tend to be associated with the highest cost of two movie tickets. This relation-ship is fairly strong, as indicated by a coefficient of correlation, r: 0.8348.
You cannot assume that having a low cost of a fast-food hamburger meal caused thelow cost of two movie tickets. You can only say that this is what tended to happen in thesample.
In summary the coefficient of correlation indicates the linear relationship, or association,between two numerical variables. When the coefficient of correlation gets closer to *1 or -1,the linear relationship between the two variables is stronger. When the coefficient of correla-tion is near 0, little or no linear relationship exists. The sign of the coefficient of correlationindicates whether the data are positively correlated (i.e., the larger values of X ne typicallypaired with the larger values of I) or negatively correlated (i.e., the larger values ofXare typi-cally paired with the smaller values of I). The existence of a strong correlation does not implya causation effect. It only indicates the tendencies present in the data.
M,, i , , f i I i l r , i l r r ) r { i l i l i l . . i i l l l l l
dec II€us Ieuolleurelul pu€ spuoq 'S'O 'tl'g- s€.,K s>lcols
dec e8rel Ieuorl€uralul pue spuoq 'S'n Jo luetulsonuluo uJnleJ er{} uee,&ueq uor}€leJJoc Jo }uolclileoc eq} wqlpelels spuoq u8rerog ur ]uerulselur pqssncslp wqt ( t C'd 6
nyz 'gZ JegruenoN 'pu,tnoy pa"tts na/ll aqJ ,.'spunguEre;og ur or loJuod {co}S r leql Jo %08 o} dn }ndplnoqs srolsenul {q4,, 'slueulolJ 'f) elcl}r€ uV 6g'g
' (e) $turolqord Jo esoql ol (e) Jo sllnser er{} ereduro3 'q
eslueIulse^ul
Jo sedfi rer{}o e^lJ esoqt Jo r{c€e pue s>lco}s 'S 'n Joluorulsonul uo uJnleJ eq] uao.ttleq dlqsuoll€leJ er{} Joql8uer1s eql lnoqe e>leru nof uec suolsnlouoc 13q16 'E
'89'0 seln lqep sle>lreru Eur8rerue pu€ s>lcols 'S'n pue
dec lleurs Iauortaurelul pu€ s>lcols 'S'n '08'0 sen{ s>lco}s
dec e8rel Ieuolleurelul pue s>lcols 'S'n Jo luetulsonuluo uJnloJ eI{} uee./Kleq uol}sleJJoc Jo }uelclJJeoc eq} Wqlpel€ls s>lcols u8rerog ur luetulse^ul pessncslp wql (tC'd 6
n1Z 'gZ Jeqruolo5l 'lnutnoy paus Ua/U aqJ ..'spunguSrerog ur orloJuod {co}S rlaql Jo %08 o} dn }ndplnoqs srolsalul f,{16,, 'slueulel3 'f) elclue uV 8t't
sldaeuo3 aql 6u;{ddy
'ureldxE
iJ pue X uee,,vrleq drqsuolleler eql sr Suorls zlroH 'J'uonelerroc Jo luercrJJeoc eql olnduro3 'q
'ocuerJeloc eql olndulo3 'B
V9 SV LZ U 9E OE 8I 6 VZ 9I IZ T
8I 9I 6 V ZI OI 9 E 8 9 L X
:sluell I I- u go eldures e ruo4 elep Jo les e sl Eurzlr.olloJ erll Lg'e
sflseg alll 6ulu.lee-l
.J
'q.E
" , 99 ELE:Eg; 8Z L'IT,J"8 r 8 '91'r"Lv 8'8t;"F[ 6'LZ:"97, 8'LZ!"1, r 0'9 I$ '9r E Vr
imi3.6: Pitfalls in Numerical Descriptive Measures and Ethical Issues 1 3 3
,*,',ilil-l [*tn,[lege basketball is big business, with coaches'rllrrLr$tti,. rrqvenues, and expenses in millions of dollars. The
'riiiiillu* contains the coaches'ir,,rllnrrrtti$i nnC revenue for college basketball at selected;,,,rrr,irlrilllrlillrltilliliii fim & recent year (extracted from R. Adams, "Pay.ttr,rrrllllllluumuirrruffi."' The Wall Street Journal, March I 1-12, 20A6,
iilffi1r irrrrrr , ili'$
,rrr,rn|lnmmrume the cov arLance .the coefficient of correlation.
rnmclusions can you reach about the relationshiplllMunen a coach's salary and revenue?
3.6 PITFALLS IN NUMERICAL DESCRIPTIVE MEASURESAND ETHICAL ISSUESIn this chapter, you have studied how a set ofnumerical data can be characterizedby variousstatistics that measure the properties ofcentral tendency, variation, and shape. Your next step isqnalysis and interpretation ofthe calculated statistics. Your analysis is objective; your interpre-tation is subjective.You must avoid errors that may arise either in the objectivity of your analy-sis or in the subjectivity of your interpretation.
The analysis of the mutual funds is objective and reveals several impartial findings.Objectivity in data analysis means reporting the most appropriate numerical descriptive mea-sures for a given data set. Now that you have read the chapter and have become familiar withvarious numerical descriptive measures and their strengths and weaknesses, how should youproceed with the objective analysis? Because the data distribute in a slightly asymmetricalmanner, shouldn't you report the median in addition to the mean? Doesn't the standard devia-tion provide more information about the property of variation than the range? Should youdescribe the data set as right-skewed?
On the othe* hand, data interpretation is subjective. Different people form different conclu-sions when interpreting the analytical findings. Everyone sees the world from different per-spectives. Thus, because data interpretation is subjective, you must do it in a fair, neutral, andclear manner.
Ethical lssuesEthical issues are vitally important to all data analysis. As a daily consumer of information, youneed to question what you read in newspapers and magazines, what you hear on the radio ortelevision, and what you see while surfing the Internet. Over time, much skepticism has beenexpressed about the purpose, the focus, and the objectivity ofpublished studies. Perhaps nocomment on this topic is more telling than a quip often attributed to the famous,nineteenth-century British statesman Benjamin Disraeli: "There are three kinds of lies: lies, damned lies,and statistics."
Ethical considerations arise when you are deciding what results to include in a report.You should document both good and bad results. In addition, when making oral presenta-tions and presenting written reports, you need to give results in a fair, objective, and neutralmanner. Unethical behavior occurs when you willfully choose an inappropriate summarymeasure (for example, the mean for a very skewed set of data) to distort the facts in order tosupport a particular position. In addition, unethical behavior occurs when you selectivelyfail to report pertinent findings because it would be detrimental to the support of a particularposition.
3.43 College football players trying out for the NFL aregiven the Wonderlic standardtzedintelligence test. The datain the file@ contains the average Wonderlic scoreof football players tryitrg out for the NFL and the gradua-tion rate for football players at selected schools (extractedfrom S. Walker, "The NFI-s Smartest Team," The WallStreet Journal, September 30,2005, pp. Wl, W10).a. Compute the covanance.b. Compute the coefficient of correlation.c. What conclusions can you reach about the relationship
between the averuge Wonderlic score and graduation rate?
:lllll:: ! l l ! :
(s'g)
(v'E)
(t'g)
(z'E)
u,r(uxx" 'x zxxrx)_ "X
l l i t :=
"S/\=S:___lilflllr i
H;frirl
venle^pe>luer@
venle^ pe>lueJ
r+u
UBoIAI clrleruoec
=t0
tO orqlrun| prlrtrI
=w
'O'rqlrun| lsrld
uBIpoHI
uuetrAl aldruug
sornseon enrlduf,soc|etruoLlrnN jo fueuuLuns
uopu1,roq prupuulS eldulug
l -uCl
ZJ
ecuslrul eldulug
t6 - t0 - e8uer eppunbrelul
aEuug elprunbrelul
rsellerusx _ tse8relx _ eEueg
eBuug
,51( 'v + I ) x " 'x (zY + I ) x ( Iu+ I) l =
urnlou Jo alBu u?atrAl rlrlaluoeo
uolloo5) uo|1€lorroc Jo luolclJJeoc'scuBIl€AoJ
ZonlsA pe>lu€r = ueIpeIAI
r+u
l=!
, lx- 'x lK
[ - ev (t 'g)u
17
I . r Ar17'xs
u
(V' E-f' g suolrras) lold re>lsry/!\-pu3-xoq'soJocs y'uo\ternalJo luelclJJeoc'ocu€u€A
iriillililllillllllll; ilmr ::le;&lnr riledian, range, and standard devia-lltillil' jffiffi *:,',ith. Interpret these measures of central
uiurffiilllilIlul)$r rumffi -"'aiability.
rlnrltfl illllllllm ilitiiinruits -"1 -lmb er sufirmary.
ililffililillllis ,& ':n: x-end-whisker plot and describe its shape.,"iliillllffiilttttutirumr ,,/:r* conclude about the number of troughs,rffilttrrrffiiiluilMi ulnreff uhe company's requirement of troughsrtilffi;tllll||tmmm rilmn E"3 I and 8.61 inches wide?
irrilMrnr;fecruring company in Problem 3.58 alsou umsulators. If the insulators break when in
-rnm,w;lt ils likely to occur. To test the strength of
ciestmctive testing is carried out to deter-rrmnuush tonce is required to break the insulators.
by observing how many pounds must beilfu mmsmtrator before it breaks. The data from 30
this experiment are contained in the file
trfffii56 il .6 10 1,634 I ,7 84 | ,522 I ,696 I ,592 | ,662
mLru4 N.662 1,734 1,774 1,550 r,756 r,762 1,966
il[/ff i i fr [,688 1,910 1,752 1,690 1,910 1,652 1,736
ffiB mrean, median, range, and standard devia-fforce variable.
ffis measures of central tendency and variabil-
,nihox-and-whisker plot and describe its shape.
Sml conclude about the strength of the insula-
'unqpany requires a force measurement of atporrnds before break age?
with a telephone line that prevent a cus-meiving or making calls are disconcerting to
and the telephone company. The datarffimr 'ffi€ file EEEIE represent samples of 20
to two different offices of a telephoneffiE time to clear these probleffiS, in minutes,
' l ines:
ffice I Time to Clear Problems (minutes)
, f f i ' r-78 2.85 0.52 1.60 4.15 3.97 1.48 3.10
rffit93 1.60 0.80 1.05 6.32 3.93 5.45 0.97
ffice II Time to Clear Problems (minutes)
t f f i i [0 1.10 0.60 0.52 3.30 2.10 0.58 4.A2
m.97 0.60 1.s3 4.23 0.08 1.48 r.65 0.72
two central office locations:first quartile, and thirdffis mean, median,
fu range, interquartile range, variance, stan-
Chapter Review Problems I37
c. Construct side-by-side box-and-whisker plots. Are thedata skewed? If so, how?
d. on the basis of the results of (a) through (c), are thereany differences between the two central offices?Explain.
3.61 In many manufacturing processes, the term work-in-process (often abbreviated wIP) is used. In a bookmanufacturing plant, the WIP represents the time it takesfor sheets from a press to be folded, gathered sewn, tippedon end sheets, and bound. The data contained in the file
U[ilG represent samples of 20 books at each of two pro-duction plants and the processing time (operationallydefined as the time, in days, from when the books cameoff the press to when they were packed in cartons) forthese jobs:
For each of the two plants:a. Compute the mean, median, first quartile, and third
quartile.b. Compute the range, interquartile range, varrance, stan-
dard deviation, and coefficient of variation.c. Construct side-by-side box-and-whisker plots. Are the
data skewed? If so, how?d. on the basis of the results of (a) through (c), are there
arly differences between the two plants? Explain.
3.52 The data contained in the fil.@consistof the in-state tuition and fees and the out-of-state tuitionand fees for four-year colleges with the highest percent ageof students graduating within six years.
Source: US. Department of Education, 2006.
For each variable:a. Compute the mean, median, first quartile, and third
quartile.b. Compute the tange, interquartile tange, variance, stan-
dard deviation, and coefficient of variation.c. Construct a box-and-whisbr plot. Are the data skewed?
If so, how?d. Compute the coefficient of correlation between the in-
state tuition and fees and the out-of-state tuition andfees.
e. What conclusions canyou reach concerning the in-statetuition and fees and the out-of-state tuition and fees?iilmion- and coefficient of variation.
l; i l i l l l l l l l l l lt it l
ffimrdffir
, -"u mneasurements made on the company'sil utillilrffitis)lrr;S 3.ffid 140 measurements made on vermont
t:rrltttutrllrl,
illnirllililililtnu *r*, :-:lurnber summary for the Boston shinglesrrruulllliiltlttmrr"rrtffitr -, cnrront shingles.r,irrilltillllMffilllliurrirr,r s,1e-by-side box-and-whisker plots for the
rumnntnlrrllittilrruumlil* :'i shingles and describe the shapes of the,,mnniiiilitttrurutn,fl ,ti )nius
r',,,,,,,,, i,,iuuuillililffimflm[fi, :,r. uhre shingles' ability to achieve a granulei{ffiuiurililt''irrtilr li E:J.rTI or less.
,ffiur,d" im the file lssffi represent the results ofCcrnmunity Survey, a sampling of 700,000
mrums,n m each state during the 2000 U.S. Census.
'rmtili'ffi,e u-ariables average travel-to-work time in
mmnrrulcr3mge of homes with eight or more rooms,
oild income, and percentage of mortgage'
mers whose housing costs exceed 3 |oh of
rnhE rnean, median, first quartile, and third
ffie range, interquartile range, variance, stan-
and coefficient of variation.
# **-and-whisker plot. Are the data skewed?
snons can you reach concerning the mean
r time in minutes, percentage of homes
$'r' more rooms. medran household income,ge of mortgage-paying homeowners whose
luroilsms exceed 30% of income?
cs of baseball has caused a great deal ofw,ith owners arguing that they are losing
aryuingthatowners are making money, andrnrn,i'rno about how expensive it is to attend a
Eames on cable television. In addition toffim $eam statistics for the 2001 season" the file
ins team-by-team statistics on ticket prices;
'rmflrnd€K' regular season gate receipts; local televi-
;md cahle receipts; all other operating revenue;lon and benefits; national and other local
rncorne from baseball operations. For each
ffire mean, median, first quartile, and third
l' -s tra.nge, interquartile range, variance, stan-urm- and coefficient of variation.
;nr hor-and-whisker plot. Are the data skewed?,10)
tffire correlation between the number of wins
clrrmpensation and benefits. How strong is the
rmm hem€en these two vartables?rons can you reach concerning the regular
r@ receipts; local television, radio, and cable
Chapter Review Problems I39
tion and benefits; national and other local expenses; andincome from baseball operations?
3.69 In Section 3.5 on page 131, the correlation coeffi-cient between the cost of a fast-food hamburger meal andthe cost of movie tickets in 10 different cities was com-puted. The datafile@also includes the overallcost index, the monthly rent for a two bedroom apartment,and the costs of a cup of coffee with service, dry cleaningfor a men's blazer, and toothpaste.a. Compute the correlation coefficient between the overall
cost index and the monthly rent for a two-bedroomapartment, the cost of a cup of coffee with service, the
cost of a fast food hamburger meal, the cost of drycleanin g a men's blazer, the cost of toothpaste, and the
cost of movie tickets. (There will be six separate corre-lation coefficients.)
b. What conclusions carr you reach about the relationshipof the overall cost index to each of these six vniables?
3.7O The data in the file EBIIfr contains the character-istics for a sample of 20 chicken sandwiches from fast-foodchains.a. Compute the correlation coefficient between calories
and carbohydrates.b. Compute the correlation coefficient between calories
and sodium.c. Compute the correlation coefficient between calories
and total fat.d. Which variable (total fat, carbohydrates, or sodium)
seems to be most closely related to calories? Explain.
3.71Thedatainthef [email protected] (in $millions) of CEOs of the 100 largest compa-nies, by revenue (extracted from "Special Report:Executive Compensation," USA Tbday, April 10, 2006, pp.
38,4B).a. Compute the mean, median, first quartile, and third
quartile.b. Compute the range, interquartile tange, variance, stan-
dard deviation, and coefficient of variation.c. Construct a box-and-whisker plot. Are the data skewed?
If so, how?d. What conclusions can you draw concerning the total
c9-mpensation (in $millions) of CEOs?
3.72 The data in the file @ is the per capttaspending, in thousands of dollars, for each state in2004.a. Compute the mean, median, first quartile, and third
quartile.b. Compute the tange, interquartile tange, variance, stan-
dard deviation, and coefficient of variation.c. Construct side-by-side box-and-whisker plots. Are the
data skewed? If so, how?d. What conclusions can you reach concerning per caprta
spending, in thousands of dollars, for each state in2004?orffiili omher operating revenue; player compensa-
rntttutu''i'rr{rilur:iinrb"es from a sample of 838 mutual funds:,ulturuumnl,,-T)?e of stocks comprising the mutual fund
rrrrrnnMlllilll urffiffi^, mid cdp,large cap)illllllfuMm-,r-Cbjective of stocks comprising the mutualirrffillllilltqgmrrrrnm,,*-il o r value)lrrillffiililtii---: nnilli ons of dollarsiiitimr'- S'tl,lE: ch&rges (no or yes)lillfihpnmrrrm
rmmgrense ratio in percentage, 2005 return, three-md five-year return,the mean, median, first quartile, and third
rthe range, interquartile tange, vatiance, stan-io'o. and coefficient of variation.
n box-and-whisker plot. Are the data skewed?'?
,rommmclusions can you reach concerning these
Chapter Review Problems L4I
d. What conclusions can you reach about differencesbetween mutual funds that have a growth objective andthose that have a value objective?
3.79 You wish to compare sm aII cap mid e&p, and largecap mutual funds. For each of these three groups, for thevariables expense ratio in percentage , 2005 return, three-year return, and five-year return,a. Compute the mean, median, first quartile, and third
quartile.b. Compute the range, interquartile tange) variance, stan-
dard deviation, and coefficient of variation.c. Construct a box-and-whisker plot. Are the data skewed?
If so, how?d. What conclusions can you reach about differences
between small cap, mid edp, and large cap mutual funds?
Student Survey Data Base3.80 Problem I.27 on page 15 describes a survey of 50undergraduate students (see the file ).For these data, for each numerical variablea. Compute the mean, median, first quartile, and third
quartile.b. Compute the range, interquartile range, variance, stan-
dard deviation, and coefficient of variation.c. Construct a box-and-whisker plot. Are the data skewed?
If so, how?d. Write a report summafiztngyour conclusions.
3.81 Problem 1.27 on page 15 describes a survey of 50undergraduate students (see the file ).L. Select a sample of 50 undergraduate students at your
school and conduct a similar survey for those students.b. For the data collected in (a), repeat (a) through (d) of
Problem 3.80.c. Compare the results of (b) to those of Problem 3.80.
3.82 Problem I.28 on page 15 describes a survey of 50MBA students (see the file ffi). For these data,for each numerical variable,a. Compute the mean, median, first quartile, and third
quartile.b. Compute the tange, interquartile tange, vatrance, stan-
dard deviation, and coefficient of variation.c. Construct a box-and-whisker plot. Are the data skewed?
If so, how?d. Write a report summarrzing your conclusions.
3.83 Problem I.28 on page 15 describes a survey of 50MBA students (see the fileffi).a,, Select a sample of 50 graduate students from your MBA
program and conduct a similar survey for those students.b. For the data collected in (a), repeat (a) through (d) of
Problem 3 .82.c. Compare the results of (b) to those of Problem 3 .82.
to net assets ln per-
mutual fund (low,
,,fllll
w"""""""ush to compare mutual funds that have fees to,ffi, not have fees. For each of these two groups,
expense ratio in percentage, 2005 return,and five-year return,
the mean, median, first quartile, and third
ffie range, interquartile range, variance, stan-ion- and coefficient of variation.
n hox-and-whisker plot. Are the data skewed?
lu@mmmctusions aan you reach about differencesand those that dommunral funds that have fees
fuss?
wrsh to compare mutual funds that have ah"e to those that have a value objective. Forfino groups, for the variables expense ratio in
3005 return, three -yeat return, and five-year
uhe nnean, median, first quartile, and third
ffie range, interquartile range, vatiance, stan-Mon. and coefficient of variation.a box-and-whisker plot. Are the data skewed?l
'( t SO t 'ssel4 .ftnqxnq:gl srsrfJouy otoe {"toqruo1dxE Io SuryndwoJ puos 'sttorlncryddy'urlEeoH 'C 'CI puu 'C A 'ueurelle1
' ( t tA t'r(e1s e16-uos rppv"frurpeeU) slsrQnuy otoe rfuorutoldxE ''1'fe>1n; 't