Chapter ! aaaaaoaaaaaaaaooaaaaaaaaaaaaaoaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaoaaaaaaaaoaa Data and Statistics Statistics in practice: The Economist l.l Applications in business and economics Accounting Finance Marketing Production Economics 1.2 Data Elements, variables and observations Scales oí measurement Qualitative and quantitative data Cross-sectional and time series data !.3 Data sources Existing sources Statistical studies Data acquisition errors 1.4 Descriptive statistics 1.5 Statistical inference l.ó Computers and statistical analysis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
l.l Applications in business and economicsAccountingFinance
Marketing
ProductionEconomics
1.2 Data
Elements, variables and observations
Scales oí measurement
Qualitative and quantitative data
Cross-sectional and time series data
!.3 Data sources
Existing sources
Statistical studies
Data acquisition errors
1.4 Descriptive statistics
1.5 Statistical inference
l.ó Computers and statistical analysis
CHAPTER I DATA AND STATISTICS
After readrng this chapter and dorng the exercises, you should be able to:
I Appreciate the breadth of statistical applications
in business and economics.
2 Understand the meaning of the terms elements,
variables, and observations as they are used in
statistics.
3 Understand the difference between qualitative,
quantitative, cross-sectional and time series data.
4 Find out about data sources available for statistical
analysis both internal and external to the Íirm.
Appreciate how errors can arise in data.
Understand the meaning of descriptive statistics
and statistical inference.
Distinguish between a population and a sample.
Understand the role a sample plays in making
statistical inferences about the population.
Frequently, we see the following kinds of statements in newspaper and magazine articles:
+ The Ifo World Economic Climate Index fell again substantially in January 2009.The climate indicator stands at 50.1 (1995 : 100)tits historically lowest levelsince introduction in the early 1980s (CESifo, April 2009).
r The IMF projected the global economy would shrink 1.3 per cent in 2009 (Fin24,23 April 2009).
r The Footsie finished the week on a winning streak despite shock figures thatshowed the economy has contracted by almost 2 per cent already in 2009 (This isMoney,25 April2009).
. China's growth rate fell to 6.1 per cent in the year to the first quarter (TheEconomist, I 6 April 2009).
.:, GM receives further $2 bn in loans (BBC News,24 April2009).]! Handset shipments to drop by 20 per cenÍ ('In-Stat' 2009).
The numerical facts in the preceding statements (50.1' 1 .3 per CenÍ,2 per cent, 6.1 percent, $2 bl,20 per cent) are called statistics. Thus' in everyday usage, the term statisÍicsrefers to numerical facts. However, the field, or subject, of statistics involves much morethan numerical facts. In a broad sense, statistics is the art and science of collecting,analyzing, presenting and interpreting data. Particularly in business and economics, theinformation provided by collecting, analyzing, presenting and interpreting data givesmanagers and decision-makers a better understanding of the business and economic envi-ronment and thus enables them to make more informed and better decisions. In this text,we emphasize the use of statistics for business and economic decision-making.
Chapter 1 begins with some illustrations of the applications of statistics in businessandeconomics. InSection I.2we definethetermdata andintroducethe conceptof adata set. This section also introduces key terms such as variables and observatlons, dis-cusses the difference between quantitative and qualitative data, and illustrates the usesof cross-sectional and time series data. Section 1.3 discusses how data can be obtainedfrom existing sources or through survey and experimental studies designed to obtain newdata. The important role that the Internet now plays in obtaining data is also highlighted.The use of data in developing descriptive statistics and in making statistical inferences isdescribed in Sections 1.4 and 1.5.
The Economist
f ounded rn 1843, The Economtst is an rnternational
I weeily news and business magazine wntten for top-eve business executves and politrcal decslon makers.
The publicatlon ams to provide readers wth in-depth
analyses oí intemationa| politics, business news and trends,
global economics and culture,
Economist lntelligence llnit website. Reproduced with permision.
!al&Le-d-!.)E!r ( {r
APPLICATIONS IN BUSINESS AND ECONOMICS
The Economist is published by the Economist Group -an intemational company employing neady 1000 staff
wor]dwde _ with offlces in London' Frankíurt, Paris and
Venna; in New Yod< Boston and Washington DC; and in
Hong Kong, mainland China, Singapore and Tollyo,
Between l99B and 2008 the magazine's woddwidecirculatlon grew by I OO per cent recently exceedlng
I 80 000 in the UK, 230 000 in continental Europe, 780 000plus copies in North America and neady 30 0OO in the
Asia-PacrÍlc region. |t s read in more than 2O0 countries
and with a readership oí 4 mil|ion' is one of the wodd'smost influentral business publrcations. Along with the
Ftnanctalrimes, it s arguab|y one oíthe two most successfu|
print publications to be introduced in the US market durrng
the past decade.
Comp ementing The Economlst brand within theEconomist Brand family, the Economist lntelligence
Unit provides access to a Comprehensive database oÍ
woddwlde indicators and lorecasts coverlng more than
200 countries, 45 regions and eight l<ey industries, TheEconomist lntelligence Unit aims to help executives
mal<e nformed busrness decislons through dependable
intelligence del vered online, rn print, in custornized research
as well as through conferences and peer interchange.
Alongsde the Economist Brand fami y, the Grouprnanages and runs the CFo and Government brand íam |ies
íor the beneílt oí senior Ílnance executives and govemment
declsion makers (in Brussels and Washington) respectively,
ln today's global business and economic environmelt. anyone can access vast amoLlntsof statistical information. The most successful managers and decision-makers understandthe information and know how to use it effectively. In this section, we provide examplesthat illustrate some of the uses of statistics in business and economics.
AccountingPublic accounting firms use statistical sampling procedures when conducting audits fbrtheir clients. For instance, suppose an accounting firm wants to determine whether theamount of accounts receivable shown on a client's balance sheet f'airly represents the
actual amount of accounts receivable. Usually the large number of individual accounts
nD-APTER I DATA AND STATISTICS
receivable makes revi.ewing and validating every account too time-consuming and expen-sive. As common practice in such situations, the audit staff selects a subset of the accountscalled a sample. After reviewing the accuracy of the sarnpled accounts, the auditors drawa conclusion as to whether the accounts receivable amount shown on the client's balancesheet is acceptable.
FinanceFinancial analysts use a variety of statistical information to guide their investment recom-mendations. In the case of stocks, the analysts review a variety of financial data includingprice/earnings ratios and dividend yields. By comparing the information for an individualstock with information about the stock market averages, a financial analyst can begin todraw a conclusion as to whether an individual stock is over- or under-priced. Similarly, his-torical trends in stock prices can provide a helpful indication on when investors might con-sider entering (or re-entering) the market. For example , Mortet Week (3 April 2009) reporteda Goldman Sachs analysis that indicated because stocks were unusually cheap at the time,real average returns of up to 6 per cent in the US and 7 per cent in Britain might be possibleover the next decade based on long-term cyclically adjusted price/earnings ratios.
MarketingElectronic scanners at retail checkout counters collect data for a variety of marketingresearch applications. For example, data suppliers such as ACNielsen purchase point-of-sale scanner data from grocery stores, process the data and then sell statistical summariesof the data to manufacturers. Manufacturers spend vast amounts per product category toobtain this type of scanner data. ManufactureÍS also purchase data and statistical sum-maries on promotional activities such as special pricing and the use of in-store displays.Brand managers can review the scanner statistics and the promotional activity statisticsto gain a better understanding of the relationship between promotional activities andsales. Such analyses often prove helpful in establishing Íuture marketing strategies forthe various products.
ProductionToday's emphasis on quality makes quality control an important application of sta-tistics in production. A variety of statistical quality control charts are used to monitorthe output of a production process. In particular, an r-bar chart can be used to monitorthe average output. Suppose, for example, that a machine fills containers with 330 gof a soft drink. Periodically, a production worker selects a sample of containers andcomputes the average number of grams in the sample. This average, or;r-bar value, isplotted on an x-bar chart. A plotted value above the chart's upper control limit indi-cates overfilling, and a plotted value below the chart's lower control limit indicatesunderfilling. The process is termed 'in control' and allowed to continue as long as theplottedx-bar values fall between the chart's upper and lower control limits. Properlyinterpreted, an x-bar chart can help determine when adjustments are necessary to cor-rect a production process.
EconomicsEconomists frequently provide forecasts about the future of the economy or some aspectof it. They use a variety of statistical information in making such forecasts. For instance,
DATA
in forecasting inflation rates, economists use statistical information on such indicators asthe Producer Price Index, the unemployment rate, and manufacturing capacity utilization.Often these statistical indicators are entered into computerized forecasting models thatpredict inflation rates.
Applications of statistics such as those described in this section are an integral part ofthis text. Such examples provide an overview of the breadth of statistical applications. Tosupplement these examples, chapter-opening Statistics in Practice ar-ticles obtained froma variety of topical sources are used to introduce the material covered in each chapter.These articles show the importance of statistics in a wide variety of business and eco-nomic situations.
Data are the facts and figures collected, analyzed and summarized for presentation andinterpretation. All the data collected in a particular study are referred to as the data setfor the study. Table 1.1 shows a data set summarizing information for equity (share) trad-ing at the 22 European Stock Exchanges in March 2009.
Elements, variables and observationsElements are the entities on which data are collected. For the data set in Table I .1, eachindividual European exchange is an element; the element names appear in the first col-umn. With 22 exchanges, the data set contains 22 elements.
A varÍable is a characteristic of interest for the elements. The data set in Table 1.1
includes the following three variables:
n Exchanges: at which the equities were traded.
', Trades: number of trades during the month.,,, Trrrorrr: value of trades (€m) during the month.
Measurements collected on each variable for every element in a study provide thedata. The set of measurements obtained for a particular element is called an observation.Refening to Table 1.1, we see that the set of measurements for the first observation(Athens Exchange) is 599 192 and 2009.8. The set of measurements for the second obser-vation (Borsa ltaliana) is 5 921 099 and 44 385.9; and so on. A data set with 22 elementscontains 22 observations.
Scales of measurementData collection requires one of the following scales of measurement: nominal, ordinal,interval or ratio. The scale of measurement determines the amount of information con-tained in the data and indicates the most appropriate data summarization and statisticalanalyses.
When the data for a variable consist of labels or names used to identify an attributeof the element, the scale of measurement is considered a nominal scale. For example,referring to the data in Table I . 1, we see that the scale of measurement for the exchangevariable is nominal because Athens Exchange, Borsa Italiana . . . Wiener Börse arelabels used to identify where the equities are traded. In cases where the scale of meas-urement is nominal, a numeric code as well as non-numeric labels may be used. Forexample, to facilitate data collection and to prepare the data for entry into a computer
database, we might use a nuÍleric code by letting l denote the Athens Exchange, 2,
the Borsa Italiana. . . and 22,Wiener Börse. In this case the numeric Values I,2, . . .
22 provide the labels used to identify where the stock is traded. The scale of measure-ment is nominal even though the data appear as numeric values.
The scale of measurement for a variable is called an ordinal scale if the data exhibitthe properties of nominal data and the order or rank of the data is meaningful. For exam-ple, Eastside Automotive sends customers a questionnaire designed to obtain data on thequality of its automotive repair service. Each customer provides a repair service ratingof excellent, good or poor. Because the data obtained are the labels - excellent, good orpoor - the data have the properties of nominal data. In addition, the data can be ranked,or ordered, with respect to the service quality. Data recorded as excellent indicate the bestservice, followed by good and then poor. Thus, the scale of measurement is ordinal. Notethat the ordinal data can also be recorded using a numeric code. For example, we coulduse 1 for excellent, 2 for good and 3 for poor to maintain the properties of ordinal data.Thus. dala lor an ordinal scale may be either non-numeric or numeric.
The scale of measurement for a variable becomes an interval scale if the data showthe properties of ordinal data and the interval between values is expressed in terms
l
yr:-::-;:
Many situations require data for a large group of elements (individuals, companies,voters, households, products, customers and so on). Because of time, cost and otherconsiderations, data can be collected from only a small portion of the group. The largergroup of elements in a particular study is called the population, and the smaller group iscalled the sample. Formally, we use the following definitions.
Population
A populotion is the set of a|| elements oí interest ln a particular study'
Sample
A sompíe is a subset oíthe population'
The process of conducting a sllrvey to collect data for the entire population is calleda census. The process of conducting a survey to collect data for a sample is called asample survcy. As one of its major contributions, statistics uses data Íiom a Sample tomake estimates and test hypotheses about the characteristics of a population through aprocess refened to as statistical inference.
Hours''until failure for a.sample of 200 light bulbs forthe ElectronicaNieve1example
tedious without a computer. To facilitate computer usage, the larger data sets in thisbook are available on the CD that accompanies the text. A logo in the left margin ofthe text (e.g. Nieves) identifies each of these data sets. The data files are available inMINITAB, PASW and EXCEL formats. In addition, we provide instructions at the endof chapters for carrying out many of the statistical procedures using MINITAB, PASWand EXCEL.
Discuss the differences between statistics as numerical íacts and statistics as a discipline oríie|d oí study'
Every year Condé Nost Troveler conducts an annual survey ofsubscribers to determine
the best new places to stay throughout the wodd. Table 1.6 shows the ten hotels that
Were most highly ranked in their 200ó 'hot list' survey. Note that (daily) rates quoted
are íor double rooms and are variously expressed in US dol|ars, British pounds oreuros.
a. How many elements are in this data set?
b. How many variables are in this data set?
c, Which variables are qualitative and which variables are quantitative?
d. What type oí measurement scale is used for each of the variables?
Reíer to Table | '6.
a. What is the average number of rooms for the ten hotels?
b. |í€| : US$l'3149 _- {0'8986 cornputethe average roorn rate in euros.
c. What is the percentage oí hotels located in Portugal?
d What is the percentage of hotels with 20 rooms or fewer?
COMPUTERS AND STATISTICAL ANALYSIS
Audio systems are typically made up of an l'1P3 player, a mini drsl< player, a cassette player,
a CD player and separate speal<ers. The data n Table 1.7 shows the product rating and retail
price range íor a popu|ar se|ection oí systems. Note that the code Y is used to conflrm whena player is included ln the system, N when it is not. Output power (watts) details are also
provided (Kelkoo Eleclronics 2006),
a. How many elements does thrs data set contain?
b. What is the population?
c' Compute the average output power íorthe samp|e'
Consider the data set íor the samp|e oí eight audio systems ]n Table l.7'
a. How many variables are in the data set?
b, Which of the variables are quantrtative and which are qualitative?
c' What percentage oíthe audio systems has a four star rating or higher?
d. What percentage olthe audio systems rncludes an MP3 player?
ProductBrand and rating
model (# of stars)
MiniMP3 diskplayer player
CDCassette (watts)
player player OutputPrice(f)
Technics I
SCEHT9OYamaha 3
r'1 170
Panasonic 5
SCPM29
Pure Digltal 3
DMX5OSony 5
CI.4TNEZ3Philips 4
FWI4589
PHILIPS 5
l"lcl'19
Samsung 5
IYM C6
Sourte: Kelkoo (http://audiovisual.kelkoo.co.uk)
320-400
167-)90
IBB
I B0 230
60- I 00
I 43-200
93 t10
t00-t30
N 360
N 50
7A
BO
30
400
r00
40
N
N
N
Columbia House provides CDs to rts mail order club members. A Columbia House Ylusic
Survey asked new club memberc to complete an I | -question survey, Some of the questions
asked were:
a. How many CDs have you bought in the last l2 months?
b' Are you currentLy a member oía nationaI mal]-order bool< club? (Yes or No)c. What is your age?
d. lncluding yoursell how many people (adults and children) are in your household?
e. What kinds oí music are you interested in buying? (15 categories were listed, including
hard rock, soft rock, adult contemporary, heavy metal, rap and country.)
Comment on whether each questron provldes qualitative or quantitative data.
CHAPTER I DATA AND STATISTICS
r0
II
The Health & Wellbeing Survey ran over a three week period (end ng l9 October 2007)
and 389 respondents took part. The survey asked the respondents to respond to the
statement, 'How would you describe your own physlca| hea|th at this time?' (http:/iiníorm'
glam.ac.uk/newsl2007l l0lT4lhealth-wellbeing-staff survey-results/). Response categories were
strongly agree, agree, neither agree or disagree, disagree, and strongly disagree.
a. What was the sample srze for this survey?
b. Are the data qualitative or quantitative?
c. Would it make more sense to use averages or percentages aS a Surnmary oíthe data íorthis question?
d. oíthe respondents, 57 per cent agreed with the statement' How many individua|s
provided this response?
State whether each oíthe ío|lowing vadab|es is qua|itative or quantltative and indicate its
measurement scale.
a. Age.
b. Gender.
c. Class rank.
d. Y]ake oí car.
e. Number oí people íavouring closer European integrztion'
Figure |'7providesabarchartsummarizingtheactua earningsforVolkswageníortheyears2000 to 2008 (Source: Volkswagen AG Annuol Reporcs 2401-2408).
a. Are the data qualitative or quantitative?
b. Are the data times series or cross-sectional?
c' What is the variable oí interest?
d. Comment on the trend in Volkswagen's earnings over time. Would you expect to see an
increase or decrease in 2009?
Reíer again to the data ln Table l'7 forthe audlo systems. Are the data cross-sectiona] ortime series? Why?
The marketing group at your cornpany developed a new diet soft dnnk that it claims will
capture a large share ofthe young adult market,
a' What data wou|d you Want to see beíore deciding to invest substantla] íunds nintroducing the new product into the maketplace?
b. Howwould you expectthe data mentioned in parl (a) to be obtaned?
1 20000
1 00000
80000
60000
40000
20000
0E
Year
oE).Etr(gIJJ
II-ITI - -
COMPUTERS AND STATISTICAL ANALYSIS
12 ln a recent study of causes of death in men 60 years of age and older, a sample of I 20 men
indicated that 48 died as a resuh of some form of heart disease.
b.
c.
Develop a descriptive statiíic that can be used as an estimate oíthe percentage of men
60 years of age or older who die from some form of heart disease.
Are the data on cause of death qualitative or quantitative?
Discuss the role of statistical inference in this type oí medical research'
I 3 ln 2007, 75.4 per cent of Economist readers had stayed in a hotel on business in the previous
l2 months with 32.4 per cent of readers using first / business class for travel.
a.
b.
c.
What is the population oí interest in this study?
ls class of travel a qualitative or quantitative variable?
lí a reader had stayed in a hotel on business in the previous l 2 months would this be
classed as a qualitative or quantitatlve variable?
Does this study involve cross-sectional or time series data?
Describe any statistical iníerences lhe Economist might make on the basis oíthe survey,
d.
CHAPTER 2 D ESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
Coke Classic
Diet CokePepsi-Cola
Diet CokeCoke Classic
Coke Classic
Dr Pepper
Diet CokePepsr-Cola
Pepsi-Cola
Coke Classic
Dr Pepper
Sprite
Coke Classic
Diet Col<e
Coke Classic
Coke Classic
:::!='i.:a.t.t =+.!=t' ''a:.a.!= . 'a::-::::
Soft drink
Sprite
Coke Classic
Diet CokeCoke Classic
Diet CokeCol<e ClassicSnri+p"Y .'
Pepsi-Cola
Coke Classlc
Coke Classic
Coke Classic
Pepsi-Cola
Coke Classic
Sprite
Dr Pepper
Pepsi-Cola
Diet Coke
Pepsi-Cola
Coke Classic
Col<e Classic
Coke Classic
Pepsi ColaDr Pepp-.r
Coke Classic
Diet CokePeps ColaPepsi-Cola
Pepsi-Cola
Pepsi-Cola
Coke Classic
Dr Pepper
Pepsi ColaSprite
more insight than the original data shown in Table 2. 1. We see that Coke Classic is the leader,Pepsi-Cola is second, Diet Coke is third and Sprite and Dr Pepper are tied for fourth.
Relative frequency and percentagefreq uency distributionsA frequency distribution shows the number (frequency) of items in each of several non-overlapping classes. We are often interested in the proportion. or percentage, of items in eachclass. The relative frequency of a class equals the fraction or proportion of items belongingto a class. For a data set with n observations, the relative frequency of each class is:
Relative frequency
Frequency ofthe classRe|ative írequency oí a class : (2.t )
The percentage frequenc;,- of a class is the relative frequency multiplied by 100.
Frequency
Coke Classic
Diet Col<e
Dr PepperPepsi-Cola
Spnte
Total
)9
8
5
t3
5
s0
ir lj:'
SUMMARIZING QUALITATIVE DATA
Percentage frequencySoft drink Relative frequency
Coke Classrc
Diet Col<e
Dr Pepper
Pepsi ColaSprite
Total
0380 t60 r002.6
0 t0
t.00
38
6
IO
26
t0
t00
A relative frequency distribution is a tabular summary showing the relative frequencyfor each class. A percentage frequency dÍstribution Summarizes the percentage fre-quency for each class. Thble 2.3 shows these distributions for the soft drink data. Therelative frequency for Coke Classic is 19150 : 0.38, the relative frequency for Diet Cokeis 8/50 : 0.16 and so on. From the percentage frequency distribution, we see that 38 percent of the purchases were Coke Classic, 16 per cent of the purchases were Diet Coke andso on. We can also note that 38 per cent + 26 per cent + I 6 per cent : 80 per cent of thepurchases were of the top three soft drinks.
Bar charts and ple chartsA bar chart, or bar graph, is a graphical device for depicting qualitative data summa-rized in a frequency, relative frequency, or percentage frequency distribution. On one axisofthe chart (usually the horizontal axis), we specify the labels for the classes (categories)of data. A frequency, relative frequency or percentage frequency scale can be used forthe other axis of the charl (usually the vertical axis). Then, using a bar of fixed widthdrawn above each class label, we make the length of the bar equal the frequency, relativefrequency, or percentage frequency of the class. For qualitative data, the bars should beseparated to emphasize the fact that each class is separate. Figure 2.1 shows a bar chart
Bar chart oi:,S.o .drink pur.Chat;$
otroctol!
20
18
16
14
12'10
I6
4
zU
CokeClassic
DrPepper
Soft Drink
DietCoke
Pepsi-Cola
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
of the frequency distribution for the 50 soft drink purchases. The graphical presentationshows Coke Classic, Pepsi-Cola and Diet Coke to be the most preferred brands.
A pÍe chart is another way of presenting relative frequency and percentage frequencydistributions for qualitative data. We first draw a circle to represent all of the data. Thenwe use the relative frequencies to subdivide the circle into sectors, or parts, that cor-respond to the relative frequency for each class. For example, because a circle contains360 degrees and Coke Classic shows a relative frequency of 0.38, the sector of the piechart labelled Coke Classic consists of 0.38(360) : 136.8 degrees. The sector of the piechar-t labelled Diet Coke consists of 0.16(360) : 5'7.6 degrees. Similar calculations forthe other classes give the pie chart in Figure 2.2. The numerical values shown for eachsector can be frequencies, relative frequencies or percentage frequencies.
Often the number of classes in a frequency distribution is the same as the number ofcategories found in the data, as is the case for the soft drink purchase data in this section.Data that included all soft drinks would require many categories, most of which wouldhave a small number of purchases. Classes with smaller frequencies can be grouped intoan aggregate class labelled'other'. Classes with frequencies of 5 per cent or less wouldmost often be treated in this fashion.
In quality control applications, bar charts are used to identify the most important causesof problems. When the bars are arranged in descending order of height from left to rightwith the most frequently occurring cause appearing first, the bar chart is called a Paretodiagram, named after its founder, Vilfredo Pareto, an Italian economist.
MethodsI The response to a question has three altematives: A, B and C. A sample of 2C responses
provides 60 A'74 B and 3ó C. Construct the írequency and relative írequency' c;stributlons
suMMARrzrNG Qr t-"a--,*u
2 A partial relative írequency distribut on is given below
Class Relative frequency
A 0.72
B O. IB
c 0.40
D
a. What is the relative frequency of class D?
b' Thetota| samp|e size is 2OO' What isthe frequency oíclass D?
c. Construct the írequency distribution.
d, Construct the percentage frequency distributron,
3 A questionnaire provides 58 Yes, 42 No and 20 No-opinion answers.
a. lntheconstructionofapiechart,howmanydegreeswou|dbeinthesectoroíthepieshowing the Yes answers?
b' How many degrees wou|d be in the sector oíthe pie showing the No answers?
c. Construct a ple char1.
d. Construcl a bar chaft,
Applications4 Figures available on the Broadcasters' Audience Research Board website in October 2008
showed that íour of the most popular shows broadcast on terrestria| television in theUK were The X Foctor, Coronotton Street, A Touch of Frost and Stnct/y Come Doncing. Dataind cating the íavourite show oí a sample oí 50 viewerc ío|lows'
a. Are these data qualitative or quantitative?
b' Construct írequency and percentage írequency distributions.
c. Construct a bar chart and a pie chart,
d, On the basis oíthe sample' which television show was the most popular? Which one was
second?
A Wikipedia article (November 2008) listed the Ílve most common last names in lsrael as
(in alphabetica| order): Biton, Cohen, Levi, Yizrachi and Peretz' A sample oí50 rndividuas
with one of these last names provided the following data.
Cohen Cohen Peretz Cohen Cohen Cohen Levr Levi Cohen 14rzrachi
Biton I evr Cohen PereJz Levi I evi Cohen Cohen Levi Levt
Cohen Cohen Cohen Levi Cohen Cohen Mizrachi Biton Biton Cohen
Levi Peretz Cohen Cohen Mizrachi Cohen Cohen Mizrachi 14izmchi Cohen
Summarize the data by constructing the following:
a' Re|ative and percentage írequency distributions'
b. A bar chart.
c, A pie chart.
d. Based on these data, what are the three most common last names?
Strictly Strctly X Factor Coronatlon X Facror X Factor Coronation X Fador X Factor Strlctly
Strictly F.ost Coronation X Factor Coronation Stnarly X Factor X Fa.tor X Faaor Coronation
Coronation X Factor Frost X Factor Coronat on Frost Strict y Coronat on Str ct y X Factor
Stricty Frost Frost X Factor Strict y Strictly X Facor X Factor coronaÍion X Facior
X Factor Coronatron Coronatlon Coronation X Factor Strctly X Fa-ror Frost Frost Stricty
2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
The flexitime system at Electronics Associates allows employees to begtn their working day
at7:00,7:30, B:00, 8:30, or 9:00 a.m, The follow ng data represent a sample of the stafting
times selected by the employees.
7:00 8:30 9:00 8:00 7:30
8:30 B:30 8:OO B OO 7 3A
730 8:30
B:30 7:00
8:30 734 7:00
9:00 B:30 8;00
Summarize the data by constructing the lollowing:
a' A írequency distrbution'
b' A percentage írequency distribution.
c, A bar char1.!
^ ^t^ -L^,ru, n prc Lr rdr L.
e. What do the summaries te|| you about employee preíerences ln the flexitime system?
A Merrill Lynch Client Satisíaction Survey asked clients to indicate how satisÍled they were
with thejrÍlnancial consultant. C|ient responses Were coded l to 7, With l indicatlng'not at
all satisÍled' and 7 indicating'extremely Satlsíled'. The íollowing data are from a sample oí60 responses íor a particular flnancial consu|tant.
5
7
6
5
6
5
a. Comment on why these data are qualttatlve,
b' Construct a írequency dlstr]butlon and a relative írequency dlstribution íorthe data
c, Construct a bar chart.
d' On the basis oíyoursummarjes, Comment on the clients'overal evaluation of the
flnancial consultant.
766716666441151653776617
557365567761676415766666556466
Frequency distributionAs defined in Section 2.1, a frequency distribution is a tabular SummaÍy of data showingthe number (frequency) of items in each of several non-overlapping classes. This defini-tion holds for quantitative as well as qualitative data. However, with quantitative datathere is usually more work involved in defining the non-overlapping classes to be usedin the frequency distribution.
Consider the quantitative data in Table 2.4. These data show the time in days requiredto complete year-end audits for a sample of 20 clients of Sanderson and Clifford, a smallaccounting firm. The data are rounded to the nearest day. The three steps necessary todefine the classes for a frequency distribution with quantitative data are:
I Determine the number of non-overlapping classes.
2 Determine the width of each class.
3 Determine the class limits.
SUMMARIZING QUANTITATIVE DATA
1)
)7t4
73
9
2)IB
)lt5
33
t5
78
IB
)4
17
IB
)0]ó
)7t3
We demonstrate these steps by constructing a frequency distribution for the audit timedata in Table 2.4.
Number of c/osses
Classes are Íbrmed by specifying ranges that will be used to group the data. As a gen-eral guideline, we recommend using between 5 and 20 classes. For a small number ofdata items, as Í'ew as five or six classes may be used to summarize the data. For a largernumber of data items, a larger number of classes is usually required. The goal is to useenough classes to show the variation in the data, but not so many classes that some con-tain only a Í'ew data items. Because the number of data items in Table 2.4 is relativelysmall (n : 20)' we chose to construct a Íiequency distribution with five classes.
Width of the c/osses
The second step is to choose a width for the classes. As a general gLrideline, we recom-mend that the width be the same tbr each class, which reduces the chance of inappropri-ate interpretations by the user. The choices for the number of classes and the width ofclasses are not independent decisions. A larger number of classes means a smaller classwidth and vice versa. To determine an approximate class width, we identify the largestand smallest data values. Then we can Llse the following expression to determine theapproximate class width.
Approximate class width
Largest data value - Smallest data value
Number of classes(2.2)
The approximate class width given by equation (2.2) can be rounded to a more conven-ient value. For example, an approximate class width of 9.28 might be rounded to 10.
For the year-end audit times, the largest value is 33 and the smallest value is 12. Wedecided to summarize the data with flve classes, so equation (2.2) provides an approxi-mate class width of (33 - IZ)/5 : 4.2.We decided to round up and use a class width offive days in the frequency distribution.
In practice, the number of classes and the appropriate class width are determined bytrial and error. Once a possible number of classes is chosen, equation (2.2) is used to findthe approximate class width. The process can be repeated for a diÍferent number of classes.Ultimately, the analyst uses judgment to determine the combination of the nr-rmber ofclasses and class width that provides a good frequency distribution Íbr summarizing thedata. Different people may construct different, but equally acceptable, frequency distribu-tions. The goal is to reveal the natural grouping and variation in the data.
For the audit time data, after deciding to use five classes, each with a width of fivedays, the next task is to specify the class limits for each of the classes.
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
Applicationsl 4 A doctor s offlce íaff studied the waiting times for patients who arrive at the offlce with
a request íor emergency service. The fo||owing data with waiting times in minutes were
collected over a one-month period,
2 5 t0 t2 4 4 5 l7 | 8 9 I 1) )l 6 I 7 t3 t8 3
Use classes of 04,5 9 and so on in the íollowing:
a. Show the frequency dislribution.
b. Show the relative frequency distribution.
c. Show the cumulative írequency distribution.
l0
lt
t2
6.3
SUMMARIZI NG QUANTITATIVE
d, Show the cumulative relative frequency distribution.e. What proportion oí patients needing emergency service wait nine mlnutes or |ess?
l5 Data for the numbers of units produced by a production employee dudng the most recent20 days are shown here.
160 170 t8t t56 )76 t4B
16) 15ó 179 l]8 ]5l l57
Summarize the data by constructing the íol|owing:
a, A frequency distnbution,
b, A relative frequency distribution,
c. A cumuIative frequency distríbution'
d' A cumu|ative re|ative írequency distribution.
e. An ogive.
ló The c|oslng prices oí40 company shares (in euros) íoIlow'
29.63 34.00 4325 8.75 37,88 8.63 7.63 30,38
35.25 t9.38 925 t6.50 38.00 53,38 t6.63 1.25
48.38 t8.00 9.38 9.75 t0.00 75.02 t8,00 8.00
28.50 2425 )t .63 I 8.50 33.&3 3 | . I 3 3225 )9.63
79.38 I t.3B 38,88 I i.50 52,00 t4.00 9.00 33.50
a' Construct írequency and re|ative írequency distributions'b. Construct cumu]ative írequency and cumu]atrve relative frequency distributions.c. Construct a histogram.
d. Using your summaries, make comments and observations about the price oíshares.
I 7 The table below shows the egcimated 2009 mid-year population of Zambta, by age group,rounded to the nearesl thousand (from the US Census Bureau lnternattonal Data Base).
Age group Population (000s)
| 98 179 )62 | 50
t54 t9 148 |56
045- 9
to- t4
15 19
70 -)4lq-lq
30-3435 39
4A -4445-4950-5455-5960-6465 69
7A -74aF 70
B0+
2005
t749
159 I
l44Ar 253l07)f70536
36s288
721
r86
t46
l13B3
50_
a.
b
c.
d
Construct a Percentage írequency distri bution.
Construct a cu m ulative percentage íreq uency d istri bution.
Construct an ogive.
Uslng the ogive, estimate the median age oíthe population'
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
l8 The Nle/sen Home Technology Report provided information about home technology and
its usage by individuals aged l2 and o|der' The íollowing data are the hours of persona|
computer usage during one week íor a sample oí 50 individua|s.
4.t t,5 5,9 3.4 57I L | 3.5 4.1 4.1 8,8
4,0 9.2 4.4 5.t 7.7
r4.B 5.4 42 3.9 4.1
6, 3.0 3.7 3,
4,3 7.t t0,3 6.2
5.7 5,9 4.1 3.9
9.5 tZ.9 6.1 3.I
4.8 7,0 3,3
7.6 l0.B 4,7
3.7 3. I 12. I
t0.4
t.6
5.6
6.t
).8
Summarize the data by constructing the following:
a. A írequency distribution (use a class wldth oíthree hours)'
b' A re]ative írequency distribution.
c. A histogram,
d. An ogive.
e. Comment on what the data indicate about Personal computer usage at home.
l9 The daily high and low ternPeratures (in degrees Celsius) íor 20 cities on one particular
day íollow'
City High Low City High Low
t0
ilt3
\6
t7
r0
)4t3
t5
6
a' Prepare a stem and-|eaí disp|ay íor the high temperatures.
b. Prepare a stem and-leaf disp|ay íor the low temperatures.
c. Compare the stem-and-leaf displays from parts (a) and (b), and comment on the
diííerences between daily high and |ow temPeratures'
d. Use the stem-and-|eaídisplay írom parr (a) to determine the number oíclties havng a
high temperature of 25 degrees or above,
Provide írequency distr]butions for both high and low temperature data'
Athens 74
Bangkok 33
Cairo 29
Copenhagen I B
Dublin lB
Havana 30
Hong Kong 27
Johannesburg l6London 23
lvlanila 34
17 Melboume
)3 Montreal
14 Paris
4 Rio de JaneiroI Rome
)0 Seoul
)7 Singapore
l0 Sydney
9 Tokyo
)4 Vancouver
lo
IB
25
27
27
IB
32
20
26I4
So far in this chapter, we have focused on tabular and graphical methods used to sum-malize the data for one variable aÍ a time. often a manager or decision-maker requirestabular and graphical methods that will assist in the understanding of the relationshipbetween rwo variables. Cross-tabulation and scatter diagrams are two such methods.
Cross-tabulationA cross-tabulation is a tabular summary of data for two variables. Consider the follow-ing data from a consumer restaurant review, based on a sample of 300 restaurants locatedin a large European city. Table 2.9 shows the data for the first five restaurants. Data on
CROSS-TABULATIONS AND SCATTE* O'O"*O"' U
Quality rating Meal price (€)
I
)3
4a)
GoodVery GoodGoodExcellent
Very Good
]B
)))B38
33
a restaurant's quality rating and typical meal price are reported. Quality rating is a quali-tative variable with rating categories of good, very good and excellent. Meal price is aquantitative variable that ranges fiom €10 to €49.
A cross-tabulation of the data is shown in Table 2.10. The left and top margin labelsdefine the classes for the two variables. In the left margin, the row labels (good, verygood and excellent) coÍTespond to the three classes of the quality rating variable. In thetop margin, the colr-rmn labels (€l0-I9, €20-29' €30-39 and €40_49) correspond to thefour classes of the meal price variable. Each restaurant in the sample provides a qualityrating and a meal price, and so is associated with a cell appearing in one of the rows andone of the columns of the cross-tabulation. For example, restaurant 5 is identified as hav-ing a very good quality rating and a meal price of €33. This restaurant belongs to the ceIIin row 2 and column 3 of Table 2.10. In constructing a cross-tabulation, we simply countthe number of restaurants that belong to each of the cells in the cross-tabulation.
We see that the greatest number of restaurants in the sample (64) have a very good rat-ing and a meal price in the €20-29 range. only two restaurants have an excellent ratingand a meal price in the €l0 19 range. in addition, note that the right and bottorn marginsof the cross-tabulation provide the frequency distributions for quality rating and meal priceseparately. From the frequency distribution in the right margin, we see that data on qualityratings show 84 good restaurants, 150 very good restaurants and 66 excellent restaurants.
Dividing the totals in the right margin of the cross-tabulation by the total for that columnprovides relative and percentage frequency distributions for the quality rating variable.
Quality rating Relative frequency Percentage frequency
GoodVery good
Excellent
Total
0280.50
02)t.00
)850
72
t00
Meal price
Quality rating € l0-|9 €20_29 €30_39 €4049 Total
GoodVery goodExcellent
Total
4)_
34
)78
40
64
14
il8
7
46
28
76
0
6
)2
28
84
r50
66
300
l2-4.!! r !!!!r|pr gwHrcAL pREsENrArroNs
for the original cross-tabulation, we see that the type of ctgreemenl is a hidden variable thatshould not be ignored when evaluating the records of the sales executives.
Because of Simpson's paradox, we need to be especially careful when drawing con-clusions using aggregated data. Before drawing any conciusions about the relationshipbetween two variables shown for a cross-tabulation - or, indeed, any type of displayinvolving two variables (like the scatter diagram illustrated in the next section) - youshould consider whether anv hidden variable or variables could affect the results.
Scatter diagram and trend lineA scatter diagram is a graphical presentation of the relationship between two quantira-tive variables, and a trend line is a line that provides an approximation of the relationship.Consider the advertising/sales relationship for a hi-Íi equipment Store. on ten occasionsduring the past three months, the store used weekend television commercials to pro-mote sales at its stores. The managers want to investigate whether a relationship existsbetween the number of commercials shown and sales at the store during the followingweek. Sample data for the ten weeks with sales in thousands of euros (€000s) are showninTable 2.12.
Figure 2.7 shows the scatter diagram and the trend linex for the data in Table 2.12. Thenumber of commercials (r) is shown on the horizontal axis and the sales ( .y) are shownon the vertical axis. For week l, x - 2 and y' : 50. A point with those coordinates isplotted on the scatter diagram. Similar points are plotted for the other nine weeks. Notethat during two of the weeks one commercial was shown, during two of the weeks twocommercials were shown. and so on.
The completed scatter diagram in Figure 2.7 indicates a positive relationship betweenthe number of commercials and sales. Higher sales are associated with a higher numberof commercials. The relationship is not perfect in that all points are not on a straightIine. However, the general pattern of the points and the trend line suggest that the overallrelationship is positive.
Some general scatter diagram patterns and the types of relationships they suggest areshown in Figure 2.8. The top left panel depicts a positive relationship similar to the one
Week Number of commercials Sales in €000s
7
3
4
5
6
1
B
o
t0
l
5
I
3
4
I
5
3
1
)
50
57
4t
54
54
3B
63
4B
59
46
*The equation of the trend line is,r.' - '1.95x + 36.15. The slope of the trend line is '1.95 and the,r'intercept(the point where the line intersects the y, axis) is 36.15. We will discuss in detail the intelpretation of theslope and .y-inteÍcept Íbr a linear trend line in Chapter l4 when we Study simple linear regression.
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
Methods20 The following data are íor 30 observations invo|ving two qualitative variabIes, X and Y' The
categories íor X are A, B and C; the categories for Y are l and 2'
Observation Observation
I
2
3
4
5
6
7o
9
t0
It2
t3
t4
t5
AIBIBIC)Ól
C)BIC)AIBIAIBIC7C7C2
t6
t7
t8
t9
70
2l)))374
)526
27
2B
29
30
B)CIB]ctBIC)BIC)AIBIC)C)AIBIB2
a' Construct a cross-tabu]ation íor the data, with X as the row var able and Y as the co|umn
variable.
b. Ca culate the row percentages.
c. Calculate the column percentages.
d. What is the relationship, if any, between X and I2l The fo owlng 20 observations are íor two quanttative variab|es.
Observation Observation XI
)3
4
5
6
1
B
9
t0
-)7 ))-33 49
2B79 -t613 t0
)t -28
-t3 27
-)3 35
t453 -3
lt
t7
r3
t4
t5
t6
t7
IB
t9
1A
-37
34
9
-33
)0
-3-15
2
-20
-7
4B
-79
-t83l
-t614
l8
\1
ll-))
a.
b
Construct a scatter diagram íor the relationship between X and Y.
What s the relationship' iíany, between X and Í
CROSS.TABULATIONS AND SCATTER DIAGRAMS
Applications22 Recent|y, management at oak Tree Golí Course rece ved a íew complaints about the
cond tion ofthe greens. Several players complained that the greens are too fast. Rather thanreact to the comments of just a few, the Golf Association conducted a survey of 100 maleand 100 female golfers. The survey results are summarized here.
Male golfers
Greens condition
Handicap Too Íast Fine
Female golfers
Greens condition
Handicap Too fast Fine
Under I 5
l5 or morer0
75
40
25
I
39
9
5lUnder l5l5 or more
a' Combine these two cross-tabulations into one with male, íemale as the row labels and
the co|umn labe|s too fast and Ílne. Which group shows the highest percentage saying
that the greens are too fast?
b. Referto the initial cross-tabulations, Forthose players with low handicaps (better players),
which group (male or fumale) shows the highest percentage saying the greens are tooíast?
c. Reíerto the initia| cross-tabu|ations' Forthose players with higher handicaps, which group(male or íemale) shows the highest Percentage saying the greens are too íast?
d' What conclusions can you draw about the preíerences of men and women Concern ng
the speed oíthe greens? Are the conc|uslons you draw írom par1 (a) as compared wrth
parts (b) and (c) consistent? Exp a n any apparent inconsrstencies.
23 The fl|e 'House Sales' on the accompanying CD contains data íor a sample of 50 houses
adver1ised for sa|e in a regional UK newspaper in autumn 2008. The ÍlrÍ Íjve rows of data
are shown íor illustration below'
Reception Bedrooms * GaragePrice (f) Location House type Bedrooms rooms Receptions capacity
4
4
)4
3
7
7
l
2
)
6
6
3
6
5
I
I
0
)I
a. Prepare a cross-tabulation using sale price (rows) and house type (columns). Use classes
of l OO 000_ | 99 999 ' 200 000_299 999, etc' íor sa|e price.
b. Compute row percentages and comment on any relationship between the varrab es.
Reíer to the data in Exercise 23.
a. Prepare a cross-tabulation using number of bedrooms and house type.
b' Prepare a írequency distribution íor number of bedrooms.
c' Prepare a írequency distrlbution íor house type.
d. How has the cross-tabu|atlon helped in preparing the írequency distributions in parts (b)
and (c)?
The Íl|e 'lncome lnequality' on the accompanying CD contains data íor 29 countries prepared
by the organization íor Economic Cooperatlon & Development (oECD) and published n
an afticle in the Guardtan newspaper in October 2OO8. The two var ab es ]n the Íl|e are theGini coefficient for each country and the percentage of children rn the country estimated
234995 Town319 000 Town
154995 Town
349 950 V llage
244995 Town
Detached
Detached
Semi-detached
Detached
Detached
24
25
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
to be living in poverty. The Gjni CoeíÍlcient is a wide|y used measure oí income inequality'|t varies between 0 and 1, with higher coefflcients indicating more inequality. The Ílrst flverows oí data are shown íor il]ustrat]on below.
Child poverty (%) lncome inequaliry
TurkeyMexicoPoland
US
Spain
24.6
27.2
2t.5
24.6
17.3
0.430
0.474
0.37)_
0.38I
0,3 t9
a. Prepare a scatter diagram using the data on child poverty and income inequality
b' Comment on the relationship, ií any, between the variab|es.
For additional online summary questions and answers goto the companion website at www.cengage.co.uldaswsbe2
CASE PROBLEM IN THE MODE FASHION STORES
Customer Items Discount
r 9!0
1 __ q!05 .lal]O
_ 2 _ rloti
1 - Ú{0
2 19 50
Sales Gender
39 í! l-il!*102 ]Ü Fenrale
?2 5! Fqrlalg
!00 r! Flll.- - 51 0ü |-e11a]e
_ _ {J 5it fgyale7i] 0ű Ferriale
Ageaa,tz
.,u?a-tL-- - ""
--!r-- JJ
30
I _!!q ___2- , ??1F _
1 000
zz s0_19!1q!9
r!- i?-[-"-!,q1.
29 5Ü Feniale
31 !0 Fg-11ale
19! ]Ü |911ale6J itJ Fenrale
r!l i! lilrl.i1 JCt tJale
9J 00 Fsrriale
["1arrieC
-!,'!91igd:-""lJ 5Ü Fenrale Í'"1arriecl
_l!{cI Store Card
'lÜ Store Card -1 ll
1 Ü00
1 _ lgcl
_ I 1q!602 il50
_ 9002 12 Ett
: i3003 Ü00
l;larried
!:13t i*ll."larried
,- :1q;lÜ
+íJ
!l9!'{ J6
JL
,si1gl-e _
Single
SinEle
z4-;^-. .;:
]íá] 5Ú |9pa|e lJarried
2 19 0ir 30 50 Ferrrale l;l.criied
Managerial report
-se tabular and graphical descriptive statistics to help--anagement develop a customer prof le and to evaluate
I othes on a rail at a women\ íashion store. @ manin mcelligott.
the promotonal campagn. At a m n mum, your repoft
shou d include the íollowing'
l Percentage írequency distributions íor l<ey varrab|es'
2 A bar chart or p e chart showing the percentage
oí customer purchases possibly attributable to thepromotional campaign,
3 A cross-tabu|ation oítype of customer (regu|ar
or promotional) versus sales. Comment on any
similaritres or differences present,
4 A scatter diagram oí sa es versus d scount íor on y
those customers responding to the promotion.
Comment on any relationship apparent between sales
and d scount.
5 A scatter diagram to explore the relationship
between sales and custon'rer age,
Software Sectionfor Chapter 7
MINITAB offers extensive capabilities for constructing tabular and graphical summariesof data. In this section we show how MINITAB can be used to constn.rct several graphicalsummaries and a cross-tabulation. The graphical methods presented are the dot plot, thehistogram and the scatter diagram.
Dot plotAssume the audit times data of Table 2.4 are inThe following steps will generate a dot plot.
SteplGraph>Dotplot
Step 2 Select One Y, SimpleClick OK
Step 3 Enter C I in the Graph Variables boxClick OK
HistogramAgain, assume the audit times data are in column Clfollowing steps will generate a histogram.
SteplGraph>Histogram
Step 2 Select SimpleClicl< OK
Step 3 Enter Cl ln the Graph Variables boxClick OK
column C1 of a MINITAB worksheet
!"1ain menu bar]
fDotplots panel]
fDotplot - One Y, Simple panel]
of a MINITAB worksheet. The
lYain menu bar]
IHistogram panel]
[Histogram - Simple panel]
52
TABULAR AND GRAPHICAL PRESENTATIONS USING MINITAB
When the Histogram appears:
Step 4 Position the mouse pointer over any one of the bars, and Double ClickSelect the Binning tab [Edit Bars panel]Select Midpoint for lnterval TypeSelect Midpoint/Cutpoint positions for lnterval DefinitionEnter l2z32l5 in the Midpoint/Cutpoint positions boxxClicl< OK
Scatter diagramWe use the hi-fi equipment store data in Table 2.12 to demonstrate the construction of ascatter diagram. The weeks are numbered from I to 10 in column C1, the data for numberof commercials are in column C2. and the data for sales are in column C3 of a MINITABworksheet. The following steps will generate the scatter diagram shown inFigure 2.7.
Step I Graph > Scatterplot
Step 2 Select SimpleClick OK
Step 3 Enter C3 under Y VariablesEnter C2 under X VariablesClick OK
lYain menu barl
[Scatterplot panel]
[Scatterplot - Simple panel]
Cross-tabulationWe use the data from the restaurant review of section 2.4,part of which is shown in Table2.9, to demonstrate. The restaurants are numbered from 1 to 300 in column Cl of theMINITAB worksheet. The quality ratings are in column C2, and the meal prices are incolumn C3. MINITAB can create a cross-tabulation only for qualitative variables, so weneed to first code the meal price data by specifying a category (class) to which each mealprice belongs. The following steps will code the meal price data to create four categoriesof meal price in column C4: €I0-l9, €20-29' €30-39 and €4019.
Step I Data > Code > Numeric to Text fYain menu bar]
Step 2 Enter C3 in the Code data from columns box [Code - Numeric to Textpanell
Enter C4 in the Store coded data in columns boxEnter I0: I9 in the flrst Original values boxEnter € l0-l9 in the first New box
Repeat the last two operations using )0:29,30:39 and 4a:49 in the second, thirdand fourth original values boxes, and using €20 29' €30-39 and €40 49 in
the second, third and fourth New boxes,Click OK
*The entry 1.2:3515 indicates that 12 is the midpoint of the first class, 32 is the midpoint of the last class,and 5 is the class width.
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
For each meal price in column C3 the associated meal price category will now appearin column C4. We can now construct a cross-tabulation for quality rating and the mealprice categories by using the data in columns C2 and C4. The following steps will createa cross-tabulation containing the same information as shown in Table 2. 10.
Step 3 Stat > Tables > Cross Tabulation and Chi-Square lMain menu bar]
Step 4 Enter C2 in the For rows box [Cross Tabulation and Chi-Square panel]Enter C4 in the For columns boxSelect Counts under DisplayClick OK
EXCEL offers extensive capabilities for constructing tabular and graphical summariesof data. In this appendix, we show how EXCEL can be used to construct a frequencydistribution, bar chart, pie chafi, histogram, scatter diagram and cross-tabulation. We willdemonstrate two of EXCEL's most powerful tools for data analysis: creating charts andcreating PivotTable Reports.
Frequency distribution and bar chartfor qualitative dataIn this section we show how EXCEL can be used to construct a frequency distributionand a bar chart for qualitative data. We illustrate each using the data on soft drink pur-chases in Table 2.1.
F requ ency distribution
We begin by showing how the COUNTIF function can be used to construct a frequencydistribution. Refer to Figure 2.10 as we describe the steps involved. The formula work-sheet (showing the functions and formulae used) is set in the background, and the valueworksheet (showing the results obtained using the functions and formulae) appears inthe foreground.
The label 'Brand Purchased' and the data for the 50 soft drink purchases are in cells.A1:451. We also entered the labels'Soft Drink'and'Frequency'in cells C1:D1. Thefive soft drink names are entered into cells C2:C6. EXCEL's COUNTIF function cannow be used to count the number of times each soft drink appears in cells A2:A51. Thefollowing steps are used.
Step I Select cell D2
Step 2 Enter :COUNTIF($A$2:$A$5 I,C2)
Step 3 Copy cell D2 to cells D3:D6
The formula worksheet in Figure 2.10 shows the cell formulae inserted by applying thesesteps. The value worksheet shows the values computed by the cell formulae. This work-sheet shows the same frequency distribution that we constructed inTable 2.2.
TABULAR AND GRAPHICAL PRESENTATIONS USING EXCEL
Figure 2.10 Frequency distribution for soft drink purchases constructed using EXCEL'sCountif funCion
ABrand PurchasedCoke Classic
Diet Coke
Pepsi-Cola
Diet Coke
Coke Classic
Ccke ClassicDr Pepper
Diet Coke
Pepsi-Cola
Pepsi-Cola
Pepsi-Cola
Pepsi-Cola
Coke ClassicDr Pepper
Pepsi-Cola
Sprite
BCSoft Drink
Coke ClassicDiet Coke
Dr Pepper
Pepsi-Cola
Sprite
1
:rt
1-lA
:sIi0.'-+f{6{:'
+ü
+9
5Ü
51
DFrequency
=COUI'IT|F{SAS2 SASS 1 C2?
=COUI jT|F(SAS2.SAS5 1 C3i
=COUÍ,jTIFiSAS2 5A'55 1 c"l;
=COUIJT|Fi$AS2 SAS51 C5:
=COUI'ITIF{SA.S2 SA55 1 CSi
ABrand Purchased
Coke Classic
Diet Ccke
Pepsi-Cola
Diet Coke
Coke Classic
Coke Classic
Dr Pepper
Diet Ccke
Pepsi-Cola
Pepsi-Ccla
Pepsi-Cala
Pepsi-Cola
Coke Classic
Dr Pepper
Pepsi-Cola
Sprite
BCSoft Drink
Coke Classic
Diet Coke
Dr Pepper
Pepsi-Cola
Sprite
Í\lJ
Frequency19
o
5
13
U
Bor chort
Here we show how EXCEL's chart tools can be used to construct a bar chart for thesoft drink data. Refer to the frequency distribution shown in the value worksheet ofFigure 2.10. The bar chart that we are going to develop is an extension of this worksheet.The worksheet and the bar chart developed are shown in Figure 2.1I. The steps are asfollows:
Step ! Select celis C2.D6
Step 2 Click the lnsert tab on the Ribbon
Step 3 ln the Charts group, click Column
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
Step 3 ln the Charts group, clicl< Scatter
Step 4 When the list oí scatter diagram subtypes aPpears:Click Scatter with only Markers (the chart ln the upper-left corner)
Step 5 ln the Chart Layouts group, clicl< Layout I
Step ó Select the Chart Title and rep|ace it with Scatter Diagram for the H-FiEquipment Store
Step 7 Select the Horizontal (Value) Axis Title and replace it with Number ofCommercials
Step I Select the Vertical (Value) Axis Title and replace it with Sales Volume
Step 9 Right-click the Series I Legend EntryClick Delete
Step l0 Right click the vertical axisClick Format Axis
Step I I When the Format Axis panel appears:Go to the Axis Options sectionSelect Fixed for Minimum and enter 35 in the corresponding boxSe ect Fixed for Maximum and enter ó5 n the corresponding boxSelect Fixed for Major Unit and enter 5 in the corresponding boxCl cl< Close
A trendline can be added to the scatter diagram as follows.
Step l2 Posit on the mouse pointer over any data point in the scatter diagram and right-click to display a list of options
Step l3 Choose Add Trendline
Step I4 When the Add Format Trendline dialog box appears:Go to the Trendline Options sectronChoose Linear in the Trend/Regression Type sectionC lck Close
The worksheet in Figure 2.13 shows the scatter diagram with the trendline added.
PivotTable reportEXCEL's PivotTable Report provides a valuable tool for managing data sets involvingmore than one variable. We will illustrate its use by showing how to develop a cross-tabulation using the restaurant data in Figure 2.14. Labels are entered in row l, and thedata for each of the 300 restaurants are entered into cells A2:C301.
Creoting the initial worksheet
The following steps are needed to create aReport and PivotTable Field List.
worksheet containing the initial PivotTable
TABULAR AND GRAPHICAL PRESENTATIONS USING EXCEL
Figure 2.14 EXCEL wor*sheet containing restaurant data
''....--:-..."": A i B : CRestaurant Quality Rating Meal Price {€}
1j,-''*'"."..*''.i
"rlJ:
{.]:
É-.1lt:
"' -...-.-.- .;
8_l
9l-*- .'*-'l
t0 I
lt i
)o)^-"*-----'i
?o?--'*--"j
294;-*-*-**i
2esi,,.*-_)
?9ő]
t91l298i""."^*{?oo;**-*-i
19_qj
301 i
3q3j
1 Good
2 Very Good
3 Good
4 Excellent
5 Very Good
6 Good
7 Very Good
I Very Good
9 Very Good
10 Good
291 Very Gocd
292 Very Good
293 Excellent
294 Good
295 Good
296 Good
297 Good
298 Good
299 Very Good
300 Very Good
18
22
28
38
33
28
19
11
23
13
23
24
45
14
18
17
16
15
38
31
Step l Click the lnseÉ tab on the Ribbon
Step 2 ln the Tables group, click the icon above PivotTable
Step 3 When the Create PivotTable panel appears:Choose Select a table or rangeEnter Al:C30 ! in the Table/Range boxSelect New WorksheetClick OK
The resulting PivotTable Field List is shown in Figure 2.15.
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
Figure 2.l5 PivotTable Íleld list
PivotTabl* field LtÉt
Choose fields to add to report:
vxl:nH-l;ry t:
IRestaurantDQuality Reurrg
f]I'lealPrice {á
Drag fields betireen areas belor'.':
\í Repori Filter t't Cciumn tabels
:ia:i!iJ,t
a!a:!,:iii:_---,-- _.--,_-,.-_ -. -,-*. - -._,!
1r.J Rovr Labelst*-*-*--***--.*'- *-"*l:i
E Values
, Oeftr Layout Update
Using the PivotToble Fie/d List
Each column in Figure 2.14 (Restaurant, Quality Rating, and Meal Price) is considered afield by EXCEL. The following steps show how to use EXCEL's PivotTable Field Listto move the Quality Rating field to the row section, the Meal Price (€) Íield to the columnsection, and the Restaurant field to the values section of the PivotTable report.
Step I ln the PivotTable Field List, go to Choose Fields to add to report:Drag the Quality Rating Íle|d to the Row Labels area
Drag the Meal Price (€) ae d to the Column Labels area
Drag the Restaurant field to the Values area
TABULAR AND GRAPHICAL PRESENTATIONS USING EXCEL
Figure 2.l ó Completed PivotTable Í]eld list and a portion of PivotTable Repor1
D iATFivotTablé FieE Ligt
Choose fiekis to add to Íeport: t:lCormt of Restaurant \,Ieal Price (€) i'10 11 12 -+l It 3rand Total
Step 3 When the Value Field Settings panel appears:
Under Summarize value field by, choose CountClick OK
Figure 2.16 shows the completed PivotTable Field List and a portion of the PivotTableReport.
Finalizing the PivotTable Report
To complete the PivotTable Report, the following steps are used to group the columnsrepresenting meal prices and place the row labels for quality rating in the proper order.
Step I Right-click in cell 84 or in any other cell containing meal pricesSelect Group
Step 2 When the Grouping panel appears:Enter l0 in the StaÉing at boxEnter 49 in the Ending at boxEnter I0 in the By boxClick OK
CHAPTER 2 DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL PRESENTATIONS
Figure 2,I 7 Final PivotTable Report
&
Step 3 Right-click on Excellent in ce I 45Choose MoveSelect Move "Excellent" to END
Step 4 Close the PlvotTable Fleld Llst dialog box
The final PivotTable Report is shown in Figure 2.17. Note that it provides the sameinformation as the cross-tabulation shown in Table 2. 10.
L)
1
;-1
I
:$:*
*
1r:
Ü*unt pí Restauranl í''i*a] Price Él **uaiit 'r Ratina 1Ü-1s 2Ü'29 3Ü_3s -í0-1! Grand Total
Gnod
Ysry' ücod
Exrelleni
414Ü234 E-1 .1i5 F,
l1J2E?2
Ü+
15Ü
6É
Grand Total iD ttn 76 3Ü |:
PASW offers extensive capabilities for constructing tabular and graphical summaries ofdata. In this section we show how PASW can be used to construct a histogram, a scatterdiagram, and a cross-tabulation.
HistogramAssume the audit times data of Table 2.4 are in the first column of the PASW DataEditor. The following steps will generate a histogram.
Step I Graph > Chart Builder |Yain menu bar]
Step 2 Under Gallery, choose Histogram fChart Builder panel]Drag and drop the Simple Histogram icon into the Chart Preview areaDrag and drop the audit t mes variable to the X-axis area in Chart PreviewC lcl< OK
Scatter diagramWe use the hi-fi equipment store data in Table 2.I2 to demonstrate the construction ofa scatter diagram. The weeks are numbered from 1 to 10 in the first column of the DataEditor, the data for number of commercials are in column 2 and the data for sales are incolumn 3. The following steps will generate the scatter diagram shown in Figure 2.7.
l
IABU!4! 4Nq GR4PHICAL pRESENrAloNs ustNG ar,a !
Drag and drop the Simple Scatter icon into the Chart Preview areaDrag and drop the sales volume varjable to the Y-axis area in Chart PreviewDrag and drop the number of commercials variable to the X-axis area in ChartPreviewCllcl< OK
Cross-tabulationWe use the data from the restaurant review of section 2.4, part of which is shown in Table2.9, to demonstrate. The restaurants are numbered from i to 300 in the first column ofthe PASW Data Editor. The quality ratings are in column 2 and the meal prices are incolumn 3. PASW can create a cross-tabulation only for categorized variables, so we needto Írrst code the meal price data by speciÍying a category (class) to which each meal pricebelongs. The following steps will code the meal price data to create four categories ofmeal price in column 4: €l0-19, €20-29, €30-39 and€4049.
Step I Transform > Recode lnto Different variables fMain menu bar]
Step 2 Transferthe meal price vadable to the lnput Variable->Output Variable box
fRecode !nto Different variables panel]Under Output Variable, give the new variable a name and labelClick ChangeClicl< OId and New Values
Step 3 Under Old Values, check Range, and enter l0 and l9 in the two boxes
fRecode lnto Different variables: Old and New Values panel]Under New Value, check Value and enter I in the boxClick Add
Step 3 aIlocates code l to the € l 0- l 9 meal price range' Repeat this step for the)0 29,30-39 and 40-49 ranges, allocatingthem codes 2,3 and 4 respectively,
Clicl< Continue
Step 4 Click OK [Recode lnto Different variables panel]
The new categorized variable will be added to the Data Editor, in column 4.Appropriate labels can be defined for the codes of this new variable in the Variablesview of the Data Editor.
We can now construct a cross-tabulation for quality rating and the meal price catego-ries by using the data in columns 2 and 4 of the Data Editor. The following steps willcreate a cross-tabulation containing the same information as shown in Table 2.10.
The sample mean class size is 44 students.Here is a second illustration. Suppose a university careers office has sent a question-
naire to a sample of business school graduates requesting information on monthly start-ing salaries. Table 3.1 shows the data collected. The mean monthly starting salary for thesample of 12 business school graduates is computed as
x, I xr* "' I xr, _ 2O2O + ZO?s + ... + ZO4O
Equation (3.1) shows how the mean is computed for a sample with n observa-tions. The formula for computing the mean of a population remains the same, butwe use different notation to indicate that we are working with the entire population.We denote the number of observations in a population by N, and the population meanas p.
>Á._lx:-:n
12I2
>,x._lx:
-:n24 840 : 20.70
I2
GraduateMonthly starting salary
(€) GraduateMonthly starting salary
(€)
I
z3
4
5
6
7070
74757125
7040r 980
I 955
7
I9
t0lttz
2050)t 65
2074))602460)a4a
MEASURES OF LOCATION
Again, because I is an integer, step 3(b) indicates that the third quartile, or 75th percen-tile, is the average of the ninth and tenth data values; hence,
Q.: (2'075 + 2rZ5)12:2100.
The quartiles divide the starting salary data into four parts, with each part containing25 per cent of the observations.
lgss 1980 2O2O|2O4O 2O4O
Q,:20302050 | 2060 2070 2O75|2Í25 2165 2260
Qr:2055 Q.:2100(Median)
We defined the quartiles as the 25th, 50th and 75th percentiles. Hence, we computedthe quartiles in the same way as percentiles. However, other conventions are sometimesused to compute quartiles and the actual values reported for quartiles may vary slightlydepending on the convention used (see the Software Section at the end of the chapter).Nevertheless, the objective of all procedures for computing quartiles is to divide the datainto four equal parts.
minlRank (w.ww.minrrank.com) rates the populartty of websttes tn most countries of the
word, using a points system. The 25 most popular sites in Cyprus as listed rn November
2O0B were as follows (the po nts scores have been rounded to one decimal place):
Website Points Website Points
www.dad.com,cywr,vw.dvds.com,cy
wlvw.íitness'com 'cy
w ww,ai rl inetrckets.com.cy
w ww.weightloss.com.cy
www,cyprus.gov.cy
www.netcars,com,cy
wtr,w,vis itcypru s. org. cy
w^ww'í|owershop'com'Cy
wvrw'netinío'com 'cyw wvr,interprom.cy
www.c)ta.com.cy
www.drivenet,com,cy
www,ch ris-mr chael.com.cy
w ww.music.net.cy
drivenet.com.cy
www.prismastore.com.cy
w^vtw'íorce'com.cy
www.prisma.com.cy
www.prismanet,cy
wr.lvr,ebos.com.cy
w ww.cytanet.com.cy
www,hrdauth,org.cy
wvvw.ucy.ac.cy
w ww,eplaza.com,cy
59.)21020.s
200t9.B
t].314.3
t4.3
t3lt).5
959.4
9.1
BB8.7
868,6
B5B58.5
736.1
6.7
\R57
a, Compute the mean and median,
b. Do you think it would be betterto use the mean orthe median as the measure oícentral location for these data? Explain.
c. Compute the lrst and third quar1l es,
d. Compute and interpret the 85th percentile.
Fol owing is a sample oí age data íor indlv dua|s working írom home by 'telecommuting''
rB 5'1 2a 46 25 48 53 )7 )6 37
40 36 42 25 )7 33 28 4a 45 75
a. Compute the mean and the mode,
b. Suppose the median age oíthe population oíalI adu|ts is 35.5 years' Use the median age
of the preced ng data to comment on whether the at-home workers tend to be younger
or older than the population oí all adults.
c. Compute the flrst and third quaftiles,
d. Compute and interpret the 32nd percent le.
In addition to measures of location, it is often desirable to consider measures ofvariability, or dispersion. For example, suppose you are a purchasing agent for a largemanufacturing firm and that you regularly place orders with two different suppliers.After several months of operation, you find that the mean number of days required tofiIl orders is ten days for both of the suppliers. The histograms summarizing the numberof working days required to fill orders from the suppliers are shown in Figure 3.2.Although the mean number of days is ten for both suppliers, do the two suppliersdemonstrate the same degree of reliability in terms of making deliveries on schedule?
MEASURES OF VARIABILITY
Coefficient of variation
(Standard deviation
Meanx roo)% (3.8)
For the class size data, we found a sample mean of 44 and a sample standard deviationof 8. The coefficient of variation is [(8/aa) x 100]Vo : 18.27o. The coefficient of varia-tion tells us that the sample standard deviation is 18.2 per cent of the value of the samplemean. For the starting salary data with a sample mean of 2010 and a sample standarddeviation of 82.2, the coefficient of variation,IG2.2/2010) X 10017o : 4.0Vo, tells usthe sample standard deviation is only 4.0 per cent of the value of the sample mean. Ingeneral, the coefficient of variation is a useful statistic for comparing the variability ofvariables that have different standard deviations and different means.
Methods9 Conslderasamp|ewithdatava|uesoíl0'20 |), lf and l6.Calcu|atetherangeand
! 4 The following data were USed to construct the h istograms oí the num ber of days req u ired toílll orders íor Dawson Supply and íor-J.C. Clark Distributors (see Figure 3.2).
Dowson Supply doys for delivery:
Aak Distributors days for delivery:
l t0 9 l0 l ! t0 | t0 t0
8t0t37t0lt07t512Use the range and standard deviation to suppoft the previous observation that DawsonSupply provides the more consistent and reliable delivery times.
l5 Po|ice records show the íol|owing numbers oí daily crime reports íor a sample oí days dudng
the winter months and a sample of days during the summer months,
Winter:
Summer:
18 20 15 t6 7t
78 18 74 32 18
z0 12 ló 19 z029 23 38 2B t8
a. Compute the range and interquartile range íor each period.
b. Compute the variance and standard deviation for each period.
c. Compute the coeÍÍlclent oívadatlon íor each penod.
d, Compare the variability of the two periods.
l ó A production department uses a sampling procedure to test the quality of newly produceditems, The department employs the following decision rule at an inspection station: ií asampe oí l4 tems has a variance oímore than 0.005' the production ine must be shutdown íor repairs. Suppose the íollowing data have just been co|lected:
3.43 3.45
3 48 3.41
Should the production line be shut down? Why or why not?
We described several measures of location and variability for data distributions. It is alsooften important to have a measure of the shape of a distribution. In Chapter 2 we notedthat a histogram offers an excellent graphical display showing the shape of a distribution.An important numerical measure of the shape of a distribution is skewness.
Distributional shapeFour histograms constructed from relative frequency distributions are shown inFigure 3.3. The histograms in Panels A and B are moderately skewed. The one inPanel A is skewed to the left: its skewness is -0.85 (negative skewness). The histo-gram in Panel B is skewed to the right: its skewness is +0.85 (positive skewness).The histogram in Panel C is symmetrical: its skewness is zero. The histogram in PanelD is highly skewed to the right: its skewness is 1.62. The formula used to compureskewness is somewhat complex.x However, the skewness can be easily computedusing statistical software (see Software Section at rhe end of this chapter).
Detecting outliersSometimes a data set will have one or more observations with unusually large or unusu-ally small values. These extreme values are called outlÍers. Experienced statisticians takesteps to identify outliers and then review each one carefully. An outlier may be a datavalue that has been incorrectly recorded. If so, it can be corected before further analysis.An outlier may also be from an observation that was incorrectly included in the data set.If so, it can be removed. Finally, an outlier may be an unusual data value that has beenrecorded correctly and belongs in the data set. In such cases it should remain.
Standardized values (z-scores) can be used to identify outliers. The empirical ruleallows us to conclude that for data with a bell-shaped distribution, almost all the datavalues will be within three standard deviations of the mean. Hence, we recommend treat-ing any data value with a z-score less than -3 or greater than f3 as an outlier, if thesample is small or moderately sized. Such data values can then be reviewed for accuracland to determine whether they belong in the data set.
Refer to the z-scores for the class size data in Table 3.4. The z-score of - 1.50 showsthe fifth class size is furthest from the mean. However, this standardized value is wellwithin the -3 to *3 guideline for outliers. Hence, the z-scores do not indicate thatoutliers are present in the class size data.
Methods17 Consider a sample with data values oí l0, 20, |2, 17 and | 6. Calculate the z-score íor each
oíthe five observations.
l8 Consider a sample with a mean of 500 and a standard devlation oí |00' What are theZ-scores íor the íol]owing data values: 520, 650 500 450 and 2BO?
l9 Consider a sample with a mean of 30 and a standard deviation of 5. Use Chebyshev'stheorem to determine the percentage of the data within each of the following ranges.
a. 20 to 40 b. l5 to 45 c. zZto 38 d. l8to42 e. l2to48
MEASURES OF DISTRIBUTIONAL SHAPE, RELATIVE LOCATION AND DETECTING OUTLIERS
20 Suppose the data have a bell-shaped diírjbution with a mean oí 30 and a standard deviation of 5'
Use the empirica| rule to determlne the percentage oí data within each of the ío|lowing rarrges'
during the working week. Suppose that the standard deviation is 1.2 hours,
a, Use Chebyshevs theorem to calculate the percentage of individuals who sleep between4.5 and 9.3 hours per day,
b' Use Chebyshev's theorem to calcu ate the percentage oí individuals who sJeep between3.9 and 9.9 hours per day,
c, Assumethatthenumberofhoursofsleepfollowsabell-shapeddistribution,Usetheemprdcal rule to calculate the percentage of ind v duals who s eep between 4.5 and
9.3 hours per day. How does this result compare to the value that you obtained using
Chebyshev's theorem in part (a)?
22 Suppose that lQ scores have a be _shaped distr]but]on with a mean oí 0O and a standard
deviation of 15.
a. Whatpercentage oípeop|e have an |Q score between 85 and ll5?b, What percentage of people have an lQ score between 70 and 130?
c. What percentage oí people have an |Q score oí more than l30?
d, A person with an lQ score greater than 145 is considered a genius. Does the empidcalrule supporl thls statement? Explain,
23 Suppose the average hourly labour cost for car servrcing in lohannesburg s ZAR(South Aírican rand) 75.00, and the standard deviation is ZAR 20.00.
a. What is the z-score íor a car serv ce wtth an hour|y ]abour cost oí ZAR 5ó 00?
b. What is the z-score íor a car serv ce with an hour|y ]abour cost oí ZAR l 53'00?
c. lnterpret the z-scores in parts (a) and (b). Comment on whether either should be
considered an outlier,
24 Consumer Reylew poícs revlews and ratings of a varlety oí produccs on the ntemet' The ío |owing
is a sample of 20 speakersystems and their ratings, on a sca|e oí | to 5, with 5 being best.
a. Compute the mean and the median,
b. Compute the f rst and thlrd quart les,
c. Compute the standard deviation.
d, The skewness of this data is 1.67. Comment on the shape of the distribution,
e, What are the z-scores associated with Allison One and Omni Audio?í Do the data contain any outliers? ExpLain'
loseph Audio RMTsiYlarlin Logan AeriusOmni Audio SA l2 3
PolkAudo RTl2SunÍlre True SubwooíerYamaha N5-A636
4.004.t)_
3824004.564374.33
4.504.644.)O
4.677.t44.094.17
4884.76).374,504.t72.17
EXPLORATORY DATA ANALYSIS
In Figure 3.5 we included lines showing the location of the upper and lower limits.These lines were drawn to show how the limits are computed and where they are locatedfor the salary data. Although the limits are always computed, generally they are notdrawn on the bÓx plots. Figure 3.6 shows the usual appearance of a box plot for the sal-ary data. Box plots provide another way to identify outliers. But they do not necessarilyidentify the same values as those with a z-score less than -3 or greater than +3. Either,or both, procedures may be used.
Methods25 Consider a sample with data values of 27, 25, 20, I 5, 30, 34, 28 and 25. Provide the five-
number summary íor the data,
ző Construct a box plot for the data in Exercise 25'
27 Prepare the five-number summary and the box plot for the following data: 5,
t2, 16, tO,6.
t8, t0,8,
28 A data set has a Íirst quartile of 42 and a third quartile of 50. Compute the lower and
upper limits íorthe corresponding box plot' Should a oata Value oí65 be consideredan outlied
8408 t374 tB72 8879 7459 I t4t 3 60814 r 38 6457 | 850 28 I B I 356 I 0498 747840t9 434t 139 7t)1 3653 5794 8305
a. Provrde a Ílve-number summary'
b' Compute the lower and upper limits (íorthe box plot).
c. Do the data contain any outliers?
d. Johnson & Johnson's sales are the largest on the list at $ l4 I 38 milllon, Suppose a
data entry error (a transposition) had been made and the sales had been entered as
$4 l l38 million' Wouldthe method oídetecling outliers in part (c) identifythis prob|em
and allow for correction ofthe data entry error?
d. Construct a box plot,
A goal of management is to help their company eam as much as possible relative to thecapital invested' One measure oísuccess is return on equity _ the ratlo oí net income toíockholders' equity. Retum on equity percentages are shown here íor 25 companres'
9.0 t9.6 7).9 41.6 t t.4 15.8 57.7 t7.3 tZ.3 5. t
17.3 3t.t 9.6 8.6 lt.z t2.8 17.?, t45 9.2 16.6
5.0 30.3 t4.7 192 6.2
a' Provide a Ílve-number summary.
b. Compute the |ower and upper lim ts (íor the box p|ot)'
c. Do the data contain any outliers? How would this iníormation be helpíu| to a Í'inancial ana|yst?
d. Construcl a box plot.
In 2008, stock markets around the wodd lost vaiue. The website wvr'w.owneverystock,com
|isted the fo||owing percentage íal|s in stock market indices between the star1 oíthe year and
the beginning oí October.
Country % Fall Country % Fall
30
3t
New ZealandCanada
Switzerland
MexicoAustralia
KoreaUnited Kingdom
Spain
Malaysia
ArgentinaFrance
lsrael
GermanyTaiwan
Brazil 39.59japan 39,88
Sweden 40.35Egypt 4l.57Singapore 4lr.60
Italy 42.88
Belgium 43.70lndia 44.16Hong Kong 44.52Netherlands 44.61
Norway 46.98
lndonesia 47.13Austria 50.06China. 6024
27.05
27,30
28.47
29.99
3 r.95
32.t832.37
32.69
3)..86
36.83
37.71
31.84
37.85
38.79
a. What are the mean and median percentage changes forthese countries?b. What are the Ílrst and third quartiles?
c. Do the data contain any outliers? Construct a box plot.
d. What percenti|e would you report íor Belgium?
MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
variable causes the other. For instance, we may find that a restaurant's quality rating andits typical meal price are positively correlated. However, increasing the meal price willnot cause quality to increase.
Methods32 Five observations taken for two variables follow.
;,á,3]J;]:a. Construct a scatter diagram with the x, values on the horizontal axis.
b. What'dOes the scatter diagram developed in part (a) indicate about the relationshipbetween the twÖ variables?
c. Compute and interpret the sample covariance.d. Compute and interpret the sample corre.lation coefflcient.
33 Five observations taken for two variab|es íollow,
x, 6 ll 15 21 27y,69617tz
a. Construct a scatter diagram for these data.
b. What does the scatter diagram indicate about a relationship between X and )?c. Compute and interpret the sample covariance.d. Compute and interpret the sample correlation coefilcient.
Apptications34 PCWotd provided perlormance scores and ratings for l5 notebook PG. The perlormance score
is a measure of how fast a PC can run a mix of common business applications as compared to
a baseline machine. For example, a PC wrth a performance score of 200 is twice as fast as the
baseline machine. A 1OO-point scale was used to provide an overall rating for each notebook
tested in the study, with higher scores indicating a better rating. The data are shown below.
Notebook Performance score Overall rating
AMS Tech Roadster l5CTA380Compaq Armada M700
Compaq Prosignia Notebook 150
Dell lnspiron 3700 C466GTDell lnspiron 7500 R500WDe|| Latitude Cpi A36óXTEnpower ENP-3 l3 Pro
Gateway Solo 9300LS
HP Pavillion Notebook PCIBM ThinkPad I Series 1480
Micro Express NP7400Micron TransPort NX Pll-400
NEC Versa SXSceptre Soundx 5200Sony VAIO PCG-F340
lt5t9t153
t94736r84
t842t6r85
r83
l89702t92t4tt87
67
]B79
80
84
76
7l9)83
7B
77
78
78
73
77
a' Construct a scatter diagram with períonnance score on the horrzonta| ax]s.
b. Is there any relationship between performance score and overall rating? Explain,
c. Compute and intepret the sample covariance,
d. Compute and interpret the sample correlation coelTlcient,
e. What does the sample correlation coeíÍlcient te|l you about the relationship between theperíormance score and the overall rating?
35 The Dow.Jones lndustrial Average (DJIA) and the Standard & Poors (S&P) 500 lndex
are both used as measures of overall movement in the US stock market. The DJIA is based
on the price movements oí30 large companies; the S&P 500 ls an index composed oí500stocks. Some say the S&P 500 is a better measure oí stock market per'formance because it
is broader based. The index levels of the DJIA and the S&P 500 for l0 weeks beginning with
I July 2008 are shown below (file 'DowS&POB' on the accompanying CD).
Date DJrA s&P
I July8 july
l5 July77)uly29 )uly5 Augustl2 Augustl9 August26 August
2 September
1t 387.26
r I 384.21
t0 962.54
r r 602.s0I I 397.56
t I 615.77
t I 642.47
r | 348.55
I I 4t2.87I I 5169)
t284.91
t273.70
tzt4.9tt277.00
1763.70
r284.88
r289.59
t266.69
127 t.51
t277.58
a.
b.
Compute the sample correlation coefflcient for these data.
Are they poorly correlated, or do they have a close association?
Methods3ó Consider the íollowing data and corresponding weights
x Weight
3262.0 3
)5 2
5.0 B
a. Compute the weighted mean,
b' Compute the sample mean oíthe íour data values without weighting' Note the difference
in the results provided by the two computations,
37 Consider the sample data in the following írequency distribution.
Class Midpoint Frequency
3-78-12
t3-t7t8-)7
4
7
9
5
(3. r 8)
(3. r e)N
5
t0
t5
z0
THE WEIGHTED MEAN AND WORKING WITH GROUPED DATA
a. Compute the sample mean.
b. Compute the sample vanance and sample standard deviation.
Applications38 Bloomberg Personal Finonce ()ulylAugust 200 I ) included the following companies in its
recommended investment portfolio. For a porLfolio value of €25 000, the recommendedeuro amounts allocated to each íock are shown.
Company Portfolio (€)Estimated Dividend
growth rate (%) yield (%)
CitigroupGeneral Electric
Kimberley-Clark
OraclePharmacia
SBC CommunicationsWoddCom
300055004700
3000300038002500
t5
t4
tz25
z0t235
t.zlt.4B
t.72
0.00
0.96
2.48
0.00
a. Using the portíolio euro amounts as the weights, what ls the weighted average estimated
growth rate forthe portfolio?
b. What is the weighted average dividend yield forthe portfolio?
39 A petro| station recorded the ío|lowing írequency distribution íorthe number of litres ofpetrol sold per car in a sample of 680 cars.
Petrol (litres) Frequency
r-t5I 6-303 t+546-606t-7576-90Total
74t92280t05
73
6
680
Compute the mean, variance and standard deviation íor these grouped data. líthe petrol
station expects to serye petrol to about I 20 cars on a given day, estimate the totalnumber of litres of petrol that will be sold.
For additional online summary questions and answers goto the companion website at www.cengage.co.uk/aswsbe2
oaaaaa aaaaaa aaaaaa...
Software Sectionfor Chapter 3
Table 3.1 listed the starting salaries for 12 business school graduates. Panel A of Figure3.11 shows the descriptive statistics obtained by using MINITAB to summarize thescdata. Definitions of the headings in Panel A follow.
N number of data valuesN* number of missing data valuesMean meanSE Mean standard error of meanStDev standard deviation
Min minimum data value
Ql first quartileMedian median
Q3 third quartileMax maximum data value
The label SE Mean refers to the standard error of the mean,which is computed t';,
dividing the standard deviation by the square root of the number of data values. Thi.statistic is discussed in Chapter 7 when we introduce the topics of sampling and samplinsdistributions. Although the range, interquartile range, variance and coefficient of varia-tion do not appear on the MINITAB output, these values can be easily computed frorr-the results in Figure 3.11 as follows.
Range: Max-MinrQR:Q3-Q1
Variance : (StDev)'z
Coefficient of Variation : (StDev/Mean) X 100
Note that MINITAB's quartiles Q1 : 2025 and Q3 - 2112.5 are slightly differer.from the quartiles Q,: 2030 and Q.: 2100 computed in Section 3.1. The differen.conventionsx used to identify the quartiles explain this difference. The values providecby one convention may not be identical to the values by another convention, but the dif-ferences tend to be negligible so far as interpretation is concerned.
The statistics in Figure 3.ll are generated as follows. The starting salary data are ircolumn C2 of a MINITAB worksheet.
*With the n observations arranged in ascending order (smallest value to largest value), MINITAB use.the positions given by (n + l)l4 and 3(n + 1)l4tolocaÍe Q, and Q_,, respectively. When a position:,fractional, MINITAB interpolates between the two adjacent ordered data values to determine the cor-responding quartile.
t08
DESCRIPTIVE STATISTICS USING MINITAB
Step I Stat > Basic Statistics > Display Descriptive Statistics fMain menu bar]
Step 2 Enter C2 in the Variables boxClick OK
fDescriptive Statlstics panel]
Panel B of Figure 3.11 is a MINITAB box plot. The box drawn from the first to thirdquartiles contains the middle 50 per cent of the data. The line within the box locatesthe median. The asterisk indicates an outlier at 2260. The following steps generate thebox plot.
Step I Graph>Boxplot
Step 2 Select SimpleClick OK
Step 3 Enter C2 in the Graph variables boxClick OK
The skewness measure also does not appear as paÍt of MINITAB's standard descrip-tive statistics output. However, we can include it in the descriptive statistics display byfollowing these steps.
Step I Stat > Basic Statistics > Display Descriptive Statistics [Main menu b:-
IDescriptive Statistics par.
The skewness measure of 1.07 will then appear in your Session window.Figure 3.12 shows the covariance and correlation output that MINITAB provide:
for the hi-fi equipment store data in Table 3.5. In the covariance portion of the fig-ure, No. of Commercia denotes the number of weekend television commercials an;Sales Volume denotes the sales during the following week. The value in column No. c-
Commercia and row Sales Volume, 11.00, is the sample covariance as compute:in Section 3.5. The value in column No. of Commercia and row No. of Commerci;.2.22, is the sample variance for the number of commercials and the value in colurr,Sales Volume and row Sales Volume, 62.89, is the sample variance for sales. Th.sample correlation coefficient, 0.93, is shown in the correlation portion of the ou:-put. The interpretation and use of the p-value provided in the output are discussed i:Chapter 9.
To obtain the information in Figure 3.Í2, we entered the data for the number of corn-mercials into column C2 and the data for sales volume into column C3 of a MINITAEworksheet. The steps necessary to generate the covariance output are:
Step 2 Enter C2 in the Variables boxCl ck the Statistics button
Step 3 Check SkewnessCckOK
Step 4 Clck OK
[Descriptive Statistics par :
lDescriptive Statistics - Statistics par':
flYain menu b.'[Covariance pan:-
change is necessary tmenu (step 1), choos;
Step I Stat > Basic Statistics > CovarianceEnter C2 C3 in the Variables boxClick OK
To obtain the correlation output in Figure 3.12, only onethe steps for obtaining the covariance: on the Basic StatisticsCorrelation rather than Covariance.
'.'' . Covarianaéánd con elaiiáa, .oui6"6.fi_$.=iglNlTAB'íg=a.. -number:cÍ.c-o.mmercials
=:.. aÍ..ld sales€áta,
Covariances: No. of Commercials, Sales Volume
l'lo. ci Cc:rx,erciaH2 lé= 1r^lltYé
Correlations: No.
Eca*q^n n^-rsj ;r i ^n!-''.ia1ue = (t .8QÜ
tÍ^ ^-- ^^?féyF.i =
á a^^^^1. tazla11. !f,on(r
Q=ls= i.'^lrr+a
of Commercials, Sales Volume
ci I'l*. ci 3c:r:u,e:eia1s anC Sales r,;c!t:ne = 0.930
DESCRIPTIVE STATISTICS USING EXCEL
We show how EXCEL can be used to generate several measures of location andvariability for a single variable and to generate the covariance and correlation coefficientas measures of association between two variables.
Using EXCEL FunctionsEXCEL provides functions for computing the mean, median, mode, sample variance, andsample standard deviation. We illustrate the use of these EXCEL functions by computingthe mean, median, mode, sample variance and sample standard deviation for the startingsalary data in Table 3.1. Refer to Figure 3.13 as we describe the steps involved. The dataare entered in column B.
EXCEL's AVERAGE function can be used to compute the mean by entering thefollowing formula into cell E,1:
: AVERAGE(B2:B13)
Similarly, the formulae :MEDIAN(B2:B l3), :MODE(B2:B l3), :y4R(B2:B13), and:STDEV(B2:813) are entered into cells E2:85, respectively, to compute the median,mode, variance, and standard deviation. The worksheet in the foreground shows that thevalues computed using the EXCEL functions are the same as we computed earlier in thechapter.
= i--re 3.13 Using EXCELfunctions forcomputingthe mean, median, mode, variance and
EXCEL also provides functions for computing the covariance and correlatio:-coefficient. You must be careful when using these functions because the covarianc;function treats the data as a population and the correlation function treats the data a.a sample. So the result obtained using EXCEL's covariance function must be adjustecto provide the sample covariance. We show here how these functions can be used r"
compute the sample covariance and the sample correlation coefficient for the stere-and sound equipment store data in Table 3.7. Refer to Figure 3.14 as we present th.steps involved.
EXCEL's covariance function, COVAR, can be used to compute the population cor-ariance by entering the following formula into cell Fl:
: COVAR(B 2:B r I,C2:CI I)
Similarly, the formula:CORREL(B2:B1I,C2:CI1) is entered into cell F2 to con-pute the sample correlation coefficient. The worksheet in the foreground shows tl:.values computed using the EXCEL functions. Note that the value of the sample co:-relation coefficient (0.93) is the same as computed using equation (3.12). Howere:the result provided by the EXCEL COVAR function, 9.9, was obtained by treating th:data as a population. We must adjust the EXCEL result of 9.9 to obtain the samp-:covariance. The adjustment is rather simple. First, note that the formula for the popul.-tion covariance, equation (3.11), requires dividing by the total number of observatior-.in the data set. But the formula for the sample covariance, equation (3.10), require.dividing by the total number of observations minus 1. So, to use the EXCEL result r r
9.9 to compute the sample covariance, we simply multiply 9.9 by nl(n - 1). Becau.:n : 10, we obtain
/10\.\':l';,|ee: n
The sample covariance for the stereo and sound equipment data is 1 1.
Figure 3.l4 Using EXCEL functions íor computing covariance and correlation
A B _ó - ó-
Week Ho, of Commercials Sales VolumeEF
Population Covariance =COVAR(B2:811 C2 C11i
sample coÍrelatioI =coRREL{82 B11 c2 c11l50
57
il
5ll38)bJ.^i5sI
6i8
9
10
11
il
6
,i
8o
10
11
1l
AWeek
I
2
3
.1
5
5
7
o
10
I'lo. of Commercials
5
1
3
,1
1
5t
j
2
CSales Volume
50a1
t4
5.1
EI
38
63
+o
59
J6
DEPopUlatíon Covariance
Sample Correlation
990 9-1
10
DESCRIPTIVE STATISTICS USING EXCEL
Using EXCEL's descriptive statistics toolAs we already demonstrated, EXCEL provides statistical functions to compute descrip-tive statistics for a data set. These functions can be used to compute one statistic at a time(e.g. mean, variance, etc.). EXCEL also provides a set of Data Analysis Tools. One ofthese tools, called Descriptive Statistics, allows the user to compute a variety of descriptivestatistics at once. We show here how it can be used to compute descriptive statistics for thestarting salary data in Table 3.1. Refer to Figure 3.15 as we describe the steps involved.
Step I Click the Data tab on the Ribbon
Step 2 In the Analysis group, click Data Analysis
Step 3 Choose Descriptive StatisticsClick OK
Step 4 Enter B I:B !3 in the lnput Range boxSelect Grouped By ColumnsCheck Labels in First RowSelect Output RangeEnter D I in the Output Range boxCheck Summary statisticsClick OK
IData Analysis panel]
IDescriptive Statistics panel]
Cells D1:E15 of Figure 3.15 show the descriptive statistics provided by EXCEL. Theboldface entries are the descriptive statistics we covered in this chapter. The descriptivestatistics that are not boldface are either covered subsequently in the text or discussed inmore advanced (exts.
In PASW, a limited set of descriptive statistics can be produced as follows:
Step I Analyze > Descriptive Statistics > Descriptives fMain menu D.
Step 2 Transferthe varable(s) to be analyzed to the Variables box [Descriptives pa-=
Click OK
The default PASW output for the graduate starting salaries data is shown in the first pr-of Figure 3.16. As you can see there, PASW calculates the mean, the standard deviatic:-the minimum and the maximum. The variance, the range and the skewness can be add;:to these defaults by using the Options button on the Descriptives panel.
To produce the median and quartiles, a different PASW routine is required:
Step I Analyze > Descriptive Statistics ) Frequencies !Yain menu :.
Step 2 Transferthe variable(s) to be analyzed to the Variables box fFrequencies pa-=
Clicl< Statistics
Step 3 Check the statlstics you wish to calculateClicl< Continue
IFrequencies:Statistics pa- =
Step 4 Remove the checl< in the Display frequency tables boxClick OK
[Frequencies pa-=
Output for the starting salaries data is shown in the second part of Figure 3.16. Use:defined percentiles can also be produced using this routine, by making the appropri;.:choices on the Frequencies:Statistics dialogue panel.
Note that PASW's quartiles (25th percenÍile : 2025 and 75th percentile : 2lrl' :are slightly difÍ'erent from the quartiles Q, : 2030 and Q.: 2100 computed in Secti,:':3.1. The different conventions'r' used to identify the quartiles explain this difference. T:r:values provided by one convention may not be identical to the values by another conve.-tion, but any differences tend to be negligible for interpretation purposes.
Figure 3.i7 is a box plot produced by PASW for the graduate starting salaries da:iThe box drawn from the first to third quartiles contains the middle 50 per cent of the da:.The line within the box locates the median. The small open circle indicates an outlieÍ J2260 (identified as the 10th data value). The following steps generate the box plot.
Step I Graphs > Legacy Dialogs > Boxplot
Step 2 Select SimpleCheck Summaries of separate variablesClicl< Define
Step 3 TransÍérthe variabIe(s)to be analyzed to the
' Boxes represent boxClicl< OK
[Main menu :.
[Boxplot pa-.
fDefine Simp|e BoxploüSummaries of SeparaeVariables pa-.
*With the n observations ananged in ascending order (smallest value to largest value), PASW uses -:o
positions given by (n + I)l4 and3(n + l)l4 to IocaÍe Q, and Q.' respectively. When a position is fractio:'PASW interpolates between the two adjacent ordered data values to determine the corresponding quarr,:
DESCRIPTIVE STATISTICS USING PASW
Figure 3.ló Descriptive statistics provided by PASW
Statistics
Figure 3.1 7 Box plot provided by PASW
Descriptiw Statistics
N Minimum Maximum Mean Std. DevialionStarting Salary {€}
Figure 3.18 shows the covariance and correlation output that PASW provided for thehi-fi equipment store data in Table 3.5. The bottom left and top right panels in the tablcare identical and each shows the sample correlation coefficient (0.930) and the samplecovariance (11.00). Also shown, in the row labelled Sum of Squares and Cross-products.is the numerator in the variance calculation
I(r,-Í)$,_y):99
The interpretation and use of the figure in the row labelled Sig. (2-tailed), and th;asterisked note below the table, are discussed in Chapter 9.
The top left panel in the table shows the sample variance for the number of commer-cials (2.22), and the numerator in the variance calculation
Z(x _ Í)2 :20
Similarly, the bottom right panel shows the sample variance for the sales volume (62.9,.and the numerator in the variance calculation
t(y,-t)'- 566
To obtain the information in Figure 3. 18, we entered the data for the number of cont-mercials into the second column of the PASW Data Editor and the data for sales volunteinto the third column.
Step I Analyze > Correlate > Bivariate ["1ain menu ba--
Step 2 Transíer the two vadab|es to the Variables box [Bivariate Correlations par-:Under Correlation Coefficients, ensure that the Pearson box is checl<edClick Options
Step 3 Checkthe Cross-productdeviations and covariances boxClick ContinueClick OK
IBivariate Correlations:Options par-:
Figure 3. l8 Covariance and corre|ation provided by PASW íor the numberof commercials and sales data
Correlations
Number oíCommercials
Sales Volume(€0tl0s)
Number of Commercials Pearson Correlation
Siq. {2-tailed)
Sum ofSquares andCross-produclsCovariance
N
1
2Ü.000
2.222
10
.930"-
'0Ü0
9s.0Ü0
11 .ÜB0
10
Sales Volume (€000s) Pearson Correlation
9ig. (2{ailed)
$um oíSquares andCross-products
Covariance
N
.930""
.Ü0Ü
ss.0Ú0
11.0nCI
10
1
566.000
62.8S9
10*. Correlation is signiÍicant atlhe 0.CI1 level i2_tailedi.
Sample point
EXPERIMENTS, COUNTING RULES AND ASSIGNING PROBABILITIES
Proiect completion time Probabiliry of sample point
In using the data in Table 4.2 to compute probabilities, we note that outcome('2,6) _ Stage 1 completed in two months and stage 2 completed in six months - occuÍTedsix times in the 40 projects. We can use the relative frequency method to assign a prob-ability of 6140: 0.15 to this outcome. Similarly, outcome (2,7) also occurred in sixof the 40 projects, providing a 6140 : 0.15 probability. Continuing in this manner, weobtain the probability assignments for the sample points of the KPL project shown inTable 4.3. Note that P(2, 6) represents the probability of the sample point (2, 6), P(2,7)represents the probability of the sample point (2, 7) and so on.
Methodsl An experiment has three steps with three outcomes possible for the Í'jrst step, two
outcomes possible for the second step, and four oulcomes possible íor the third step. Howmany experimental outcomes exist forthe entire experiment?
2 How many Ways can three iterns be selected from a group oí six items? Use the letters A, B,
c' D' E' and Fto identiíythe items, and list each of the different combinations of three items.
3 How many Permutations olthree items can be se|ected írom a group of six? Use Lhe
lettersA, B' C' D' E, and Fto identiíythe items, and listeach ofthe permutations of items B,
D, and F.
4 Consioer the experiment oítossing a coin three times,
a. Develop a tree diagram for the experiment.
b. List the experimental outcomes.
c. What is the probabillty íor each experimental outcome?
5 Suppose an experiment has five equally likely outcomes: E1, E2, E3, Eo, Er. Assign probabilities
to each outcome and show that the requirements in equations (4.3) and (4.4) are satisfied.
What method did you use?
CHAPTER 4 INTRODUCTION TO PROBABILITY
An experiment with three outcomes has been repeated 50 tlmes, and it was learned thatE occurred 20 times, E, occurred l3 times, and E, occurred l7 times, Assign probabtltties tothe outcomes, What method did you use?
A decision-ma|<er subjectlvely assigned the íol|owing probabilities to the íour outcomesoían experimen| P(E,) : 0 l0 P(Er) : 0 l5 P(Er) : 0.40, and P(E.) - O.2O' Aretheseprobability assrgnments valrd? Explain.
ApplicationsI App ications lor zonrng changes in a large metropolitan city go through a two-step process:
a rev]ew by the planning commission and a Ílnal decision by the city counc ]. At step I theplannrng commiss on rev ews the zoning change request and mai<es a positlve or negatverecommendation concerning the change. At step 2 the city council reviews the planning
commission's recommendation and then votes to approve orto disapprove the zonrng
change. Suppose the deve|oper of an apaftment complex submits an application íor a zoning
change. Considerthe application process as an experment,
a' How many sample po nts are there íor this experiment? List the sample points.
b' Construct a tree diagram íor the experiment.
9 Simple random sampling uses a sample of size n írom a popu|ation of slze N to obtain data
thatcanbeusedtomake níerencesaboutthecharacterlsticsofapopulation'Supposethat,írom a population of 50 bank accounts, we Want to take a random sample oí four accounts
in order to ]earn about the population' How many d fferent random samples oí íouraccounts are possible?
l 0 A company that íranch ses cofíee houses conducted taste tests lor a new coífee product'Four blends were prepared, then randomly chosen individuals were asked to taste theblends and state which one they liked best. ResuLts of the taste test for 100 individuals
are given.
Number choosing
I
2
3
4
20
30
35
15
a' DeÍlne the experiment being conducted. How many times was it repeated?
b. Priortoconductngtheexperlment,itisreasonabletoassumepreferencesíortheíourblends are equal. What probabilities would you assrgn to the experimental outcomesprior to conducting the tasie test? What method did you use?
c. After conducting the taste test, what probabilities would you assign to the expenmental
outcomes? What method did you use?
l l A company that manuíactures toothpaste is studying llve difíerent package destgns.
Assuming that one design is just as likely to be selected by a consumer as any other design,
what selection probabil ty wou|d you assign to each oíthe package deslgns? ln an actual
experiment, IO0 consumers were asked to picl<the design they preíerred. The íollowing data
were obtained. Do the data conflrm the belieíthat one design is just as lil<ely to be selected
as another? Explain.
6
Blend
EVENTS AND THEIR PROBABILITIES
Design times Number of preferred
I
2
3
4
5
5
t5
30
40t0
In the introduction to this chapter we used the term event muchas it would be used in everydaylanguage. Then, in Section 4.I we introduced the concept of an experiment and its associatedexperimental outcomes or sample points. Sanple points and events provide the foundation forthe study of probability. We must now introduce the formal definition of an event as it relatesto sample points. Doing so will provide the basis Íbr determining the probability of an event.
Event
An event is a collection oísample points.
For example, let us return to the KPL project and assume that the project manageris interested in the event that the entire project can be completed in ten months or less.Referring to Table 4.3, we see that six sample points - (2, 6), (2,1), (2, 8), (3, 6), (3, 7)and (4, 6) _ provide a project completion timé of ten months or less. Let C denote theevent that the project is completed in 10 months or less; we write
Event C is said to occur if ctn.v' one of these six sample points appears as the experimentaloutcome.
other events that might be of interest to KPL management include the Íbllowing.
The event that the project is completed in /ess than ten monthsThe event that the project is completed in more than ten months
Using the information in Table 4.3, we see that these events consist of the Íbllowin-esample points.
Í(2, 6)' (2,1), (3,6)){3, 8), (4, 7), (4, 8)}
A variety of additional events can be defined for the KPL project, but in each case the
event must be identified as a coliection of sample points for the experiment.Given the probabilities of the sample points shown in Table 4.3, we can use the fbl-
lowing definition to compute the probability of any event that KPI- management mi-eht
want to consider.
L:M:
L_M:
CHAPTER 4 INTRODUCTION TO PROBABILITY
Probability of an event
The probabilrty of any event is equa| to the sum of the probabilrties of the sample points íorthe event
Using this definition, we calculate the probability of a particular event by adding th:probabilities of the sample points (experimental outcomes) that make up the event. \Á-.
can now compute the probability that the project will take ten months or less to complereBecausethiseventisgivenbyC: {(2,6),(2,7),(2,8),(3,6), (3,1),(4,6)},theprob-ability of event C, denoted P(C), is given by
Similarly, because the event that the project is completed in less than ten months is give:by L : l(2, 6), (2, 1), (3, 6) ), the probability of this event is given by
Using these probability results, we can now tell KPL management that there is a 0.--lprobability that the project will be completed in ten months or less, a 0.40 probabili:.,that the project will be completed in less than ten months, and a 0.30 probability that LBproject will be completed in more than ten months. This procedure of computing evenprobabilities can be repeated for any event of interest to the KPL management.
Any time that we can identify all the sample points of an experiment and assign prc'r-abiliües to each, we can compute the probability of an event using the definition. However. nmany experiments the lmge number of sample points makes the identification of the samptpoints, as well as the determination of their associated probabilities, extemely cumbersonEif not impossible. In the remaining sections of this chapter, we present some basic probabili:irelationships that can be used to compute the probability of an event without knowledge ;rf
ali the sample point probabilities.
Methods!2 An expenment has íbur equally likely outcomes: E,, E, E.', and E.'
a. What is the probability that E, occurs?
b. What is the probability that any two of the outcomes occur (e.g, E, or Er)?
c. What is the probability that any three of the outcomes occur (e,g. E, or Eror Eo)?.
!3
EVENTS AND THEIR PROBABILITIES
Consider the experiment of selecting a playing card from a deck of 52 playing cards. Each
card corresponds to a sample point with a I/52 probabilrty.
a' Lií the sample points in the event an ace is selected.
b. List the sample points in the event a club is selected.
c. List the sample points in the event a face card (acK queen, or king) is selected.
d. Find the probabilities associated with each of the events in parts (a), (b) and (c).
Consider the experiment oí rolling a Pair of dice. Suppose that we are interested in the sum
of the face values showing on the dice.
a, How many sample points are possible? (Hlnt Use the counting rule for multiple-step
experiments.)
b. List the sample points,
c. What is the probability of obtaining a value of 7?
d What is the probability of obtaining a value of 9 or greaterl
e. Because each roll has six possible even values (2, 4, 6,8, I 0 and I 2) and only five possible
odd values (3, 5,7, 9 and I I ), the dice should show even values more often than oddvalues. Do you agree with this statement? Explain.
f What method did you use to assign the probabilities requested?
ApplicationsI 5 Refer to the KPL sample points and sample point probabilities in Tables 4.2 and 4.3.
a, The design stage (stage |) will run over budget ií it takes four months to complete' List
the sample points in the event the design stage is over budget.
b, What is the probability that the design stage is over budget?
c' The coníruction íage (stage 2) will run over budget if it takes eight months to complete.
Listthe sample points rn the eventthe construction stage is over budget.
d. What is the probability that the construction stage is over budget?
e. What is the probability that both stages are over budget?
l ó Suppose that a manager of a large aPartment complex provides the following subjective
probability eslimates about the number of vacancies that will exist next month.
Vacancies Probability
0
I
2
3
45
0.l0o. t5
0.30
0.20
0. t5
0. t0
Provide the probability oí each of the íollowing events'
a. No vacancies.
b. At least four vacancies.
c. Two or fewer vacancies.
l 7 A sunvey of 50 college students about the number oí e{racurricular activities resulted in the
data shown.
a. Let A be the event that a student participates in at least one activity. Find P(A).
b. Let B be the event that a student participates in three or more activities. Find P(B),
t4
CHAPTER 4 INTRODUCTION TO PROBABILITY
c, What is the probablllty that a student part crpates in exactly two actrvitres?
Number of activities Frequency
0
I
7
3
4
5
B
)0t2
6
3
I
Complement of an eventGiven an event A, the complement of A is defined to be the event consisting of ai,sample points that are not in Á. The complement of Á is denoted by Á' Figure 4.4 is :diagram, known as a Venn dÍagram, which illustrates the concept of a complement. Th"rectangular area represents the sample space for the experiment and as such contains a-,
possible sample points. The circle represents event Á and contains only the sample point'that belong to A. The shaded region of the rectangle contains all sample points not i:,
event A, and is by deÍinition the complement of Á.In any probability application, either event Á or its complement Á must occur. Therefore.
we have
P|A) l PtÁl- 1
Solving for P(Á), we obtain the following result.
Computing probability using the complement
P(A): |-P(Á) (4.s)
Sample Space S
Complementof Event Á
Event Á
INTRODUCTION TO PROBABILITY
then the event of interest is Á Íl B. Given no other information, we can reasonably assuthatÁ and B are independent events. Thus,
P(Á n B): P(A)P(B) : 0.80 X 0.80 :0.64
To summarize this section, we note that our interest in conditional probability is mc:vated by the fact that events are often related. In such cases, we say the events are depe:-.'.
ent and the conditional probability formulae in equations (4.7) and (4.8) must be usec icompute the event probabilities. If two events are not related, they are independent; in ---
case neither event's probability is affected by whether the other event occurred.
Methods2l Suppose that we have two events, A and B, with P(A) : 0.50, P(B) : 0.60, and P(A n B) : g a1
a, Find P(A I B)
b' Find P(B | Á)
c, Are A and B independent? Why or why not?
22 Assume that we have two events, Á and B' that are mutually exc|usive. Assume further that
we know P(A) : 0.30 and P(B) : 0.40.
a. What is P(Á n B)?
b. What is P(Á | B)?
c. A student in statistics argues that the concepts oí mutually exc]usive events and
independent events are rea y the same, and that ií events are mutua y exclusive theymust be independent' Do you agree with this statement? Use the probability iníormatio_
in this problem to justiíy your answer.
d. What general conclusion would you make about mutually exclusive and independentevents given the results oíthis problem?
Applications23 A Paris nightclub obtains the íollowing data on the age and marital status oí l 40 customers'
Marital status
Single MarriedAge
Under 30
30 or over77
z8
t4
)I
a.
b.
c.
d.
í.
Develop a joint probabiIity tab|e íor these data'
Use the marginal probabilities to comment on the age of customers attending the club.
Use the marginal probabiIities to comment on the madtal status oí customers atlending
the club,
What is the probab lty oí flnd ing a customer who ls sing|e and under the age oí 30?
lf a customer rs under 30, what is the probability that he or she is single?
ls marital status independent oí age? Exp|ain' using probabilities'
BAYES'THEOREM
74. ln a survey oí YBA students, the fo||owing data were obtained on 'students' Í'irst reason íorapplication to the school ln which they matriculated'.
Reason for application
Schoolquality
School cost orconvenience Other Totals
Enrolmentstatus
Full timeParc timeTotals
42t400BZt
393
593986
76
46
t22
890| 039
1929
a. Deve|op a joint probabiIity table íor these data.
b. Use the marginal probabilities of school quality, school cost or convenience, and othertocomment on the most important reason for choosing a school.
c' lí a student goes full time, what is the probability that school quality is the Í'irst reason forchoosing a school?
d' |fastudentgoespar1 time'whatistheprobabilitythatschoo| qua|ityistheflrstreasoníorchoosing a school?
e. Let A denote the event that a student is full time and let B denote the event thatthe íudent |ists school quality as the Ílrst reason for app|ying. Are events A and B
independent? Justify your answer,
25. A sample oíconvictions and compensation orders issued at a number of Scottish courts Was
followed up to see whether the offender had paid the compensation to the victim. Details by
gender oí oííender are as íoIlows:
Offendergender
Payment outcome
Part paid Nothing paidPaid in full
Male
Female
754
t57
62
7
6t
6
a. What is the probability that no compensation was paid?
b. What is the probability that the offender was not male given that compensation was
part paid?
2ő A purchasing agent placed rush orders íor a particular raw material with two different supp|iers,
Á and B. lf neither order arrives in four days, the production Process must be shut down until
at least one oíthe onders arrives. The probabilrty that supplierA can deliverthe material in four
days is 0.55' The probability that supplier B can deliver the matedd in íour days ls O'35,
a. What is the probability that both suppliers will deliverthe material in four days? Because
two separate suppliers are involved, we are willing to assume independence.
b. What is the probability that at least one supplier will deliver the material in four days?
c. What is the probabi|ity that the production process wi|| be shut down in íour days
because of a shortage oí raw material (that is' both orders are late)?
ln the discussion of conditional probability, we indicated that revising probabilities whennew information is obtained is an important phase of probability analysis. Often, we beginthe analysis with initial or prior probability estimates for specific events of interest. Then,
INTRODUCTION TO PROBABILITY
Methods27 The prlor probabilitles íor events A and A' are P(Á ) : O.40 and P(Ar) : O'óO' lt s also
]<nownthatP(A nÁr) : O.SupposeP(B IA): O'20andP(B IAr) :005.a. Are A and A, mutually exclusive? Explain,
b. Compute P(A n B) and P(4, n B).
c. Compute P(B).
d. Apply Bayes'theorem to compute P(Á l B) and P(A' I B)'
28 The prior probabilities for events A,, A' and A, are P(A,) : 020, P(Ar) : O,5O and p(4,) :0.30. The condit]ona| probabrilties of event B given A
' Á', and A, are P(B |Á ) : 050,
P(B Á,) : 0'40 and P(B I A' : 0.30.
a. Compute P(B n A ) P(B n Ár) and P(B n Á)'b. Apply Bayes' theorem, equatlon (a. l9) to cornpute the posterior probabi|ity P(Á, I B)c' Use the tabu|ar approach to applying Bayes'theorem to compute P(Á l B) P(A' I B) and
P(A3 B)
Applications29 A consu]ting Íirm submitted a bid for a large research project' The frm's management inltial ,
íe]t they had a 50-50 chance oígetling the project' However' the agency to wh ch the bidwas submitted subsequently requested addit]ona] iníormation on the b d' Past experenceindicatesthatíor75percentofthesuccessíul bldsand40percentoítheunsuccessíu bldsthe agency requested additiona| lníormation.
a' Whatlsthepriorprobabi|ityoítheb]dbeingsuccessíul (thatis,priortotherequestíoradditional lníormation)?
b. What is the condit onai probability of a request lor additronal rnformation given that thebid wiIl ultimateiy be successíul?
c' Compute the posterior probabi|ity that the bid wi|| be successíu| given a request íoradd liona in'o-rnaLion,
30 A |oca] banl< revlewed its credit card po|icy with the intention oí reca|ling some of its credrtcards' In the past approximate|y 5 per cent oícardholders defaulted, |eaving the banl< unab]eto col|ect the outstanding ba|ance' Hence, management established a prior probabiIity oí0'C:that any part cular cardho|der wi|| default. The bank a|so íound that the probabllity oí m ssin:a monthly payment is 0'20 for Customers who do not default' olcourse, the probabi|ity oímissing a month y payment íor those who defau|t is ]
'
a. Grven that a customer missed one or more monthly payments, cornpute the posteriorprobability that the customer will default.
b' Thebankwou]d|iketoreca|l itscardiítheprobabilitythatacustomerwi|l deíau|tisgreat:than 0'20' Shou]d the bank reca|| its card ríthe customer misses a month|y payment?Why or why not?
3l |n 2006,there were 3|72fatalities recorded on Brltain's roads, ]69 oíwhich were íorch ldren (Department of Transport, 2007). Correspondingly, serious injuries totalled 28 390oí which 25 625 were íor adults'
a' What ls the probability oí a serious injury glven the victim was a chi|d?
b. What is the probability that the victim was an adult given a fatalrty occurred?
32 The following cross'tabulation shows industrytype and Price/Eamings (P/E) ratio íor100 companies in the consumer products and banking industries.
P/E ratio
lndustry 5-9 I O- l4 l5-19 20-24 25-29 Total
33.
8504s012 rO0
a. What is the probability that a company had a PiE greater than 9 and belonged to the
consumer industry?
b. What is the probability that a company with a P/E in the range I 5- I 9 belonged to the
banking industry?
A large investment advisory service has a number oí analysts who prepare detai|ed studies
of individual companies. On the basis of these studies the analyís make 'buy' or'sell'recommendations on the companies' shares. The company classes an excellent analyst as
one who will be correct 80 per cent of the time, a good analyst as who will be correct
60 per cent of the time, and a poor analyst who wi|| be con^ect 40 per cent oíthe time.
Two years ago, the advisory service hired Mr Smith who came with considerable
experience from the research department oí another flrm. At the time he was hired it was
thought that the probability was 0.90 that he was an excellent analyí, 0.09 that he was a
good ana|yst and 0'0 l that he was a poor analyíc. |n the past two years he has made ten
recommendations of which onlythree have been correct.
Assuming that each recommendation is an independent event what probability would
you assign to Mr Smith being:
a. An excellent analyst?
b. A good analyst?
c. A poor analyst?
An electronic component is produced by íour production lines in a manufacturing operation'
The components are costly, are quite reliable and are shipped to suppliers in SO-component
lots. Because testing is destructive, most buyers oíthe components test only a sma|| number
before deciding to accept or relect lots of incoming components. All four production lines
usually only produce I per cent defective components which are randomly dispersed in the
output. Uníortunately, production line l suffered mechanical difficulty and produced l0 per
cent defectives during the month of April, This situation became known to the manufacturer
after the comPonents had been shipped. A customer received a lot in April and teíed five
components, Two failed. What is the probability that this lot came from production line l?
For additional online summary questions and answers go
to the companion website at www.cengage.co.uk/aswsbe2
Consumer 4 l0 lB l0Banking 14 14 lZ 6
Total I I 24 30 16
34
URANDOM VARIABLES
Methodsl Conslder the experiment oítosslng a coin twlce'
a. List the experimental outcornes,
b' Deflne a random variable that represents the number oí heads occuning on the two tosses'
c' Show what value the random vanable would assur-e fo. each oílie expenmental outcomes.
d, ls this random vanable drscrete or conirnuousi
2 Consider the experiment oí a worker assemb|ing a product'
a' DeÍlne a random variable that represents the time in minutes required to assemb]e
the product,
b. What values may the random variable assume?
c, ls the random variable discrete or conttnuous?
Applications3 Three students have intervlews scheduled for summer employment, 1n each case the
interview results in either an oííer íor a position or no oííer. Experimenta| outcomes aré
deÍlned in terms of the resu|ts of the three interviews.
a. List the experimental outcomes.
b' DeÍlne a random varlable that represents the number oíoffers made' ls the random
variable continuous?
c' Show the va ue of the random variab]e for each oíthe experimenta outcomes'
4 Suppose we know home mortgag-^ rates íor l2 Danish |ending institutrons' Assume that the
random variable oí interest is the number oí lending institutions in this group that oííers a
30-year Í'ixed rate oí l'5 per cent or less' What values may thls random variabJe assume?
5 To perform a cer1ain type oí blood analysis, |ab technicians must Perform two procedures'
The ílrí procedure requires either I or 2 separate steps' and the second procedure requ res
either l, 2 or 3 steps.
a. List the experimental outcomes associated with performing the blood analysis.
b' lf the random variable oílnterest ]s the tota| number of steps required to do the
complete analysis (both procedures), show what value the random variable will assume
for each ofthe expenmental outcomes,
ó Listed is a series oíexperiments and associated random variables. ln each case, identiíy
the values that the random vanable can assume and state whether the random vadable is
discrete or continuous.
Experiment Random variable (X)
a. Take a 2O-question examination
b. Observe cars arriving at a
tollbooth for one hour
c. Audit 50 tax retums
d. Observe an employee's woR
e. Weigh a shipment oígoods
Number of questions answered correclly
Number oí cars arriving at tollbooth
Number oí returns containing errors
Number oí non-productive hours in an
eight-hour workday
Number oí kilograms
CHAPTER 5 DISCRETE PROBABILITY DISTRIBUTIONS
The possible values of the random variable and the associated probabilities are shour
I
z3
4
5
6
1t6
J61t6
lt6
]6]6
As another example,ability distribution.
consider the random variable X with the followins discrete
I
)_
3
4
l/ t0
zt03/ r0
4il0
This probability distribution can be defined by the formula
p(x) for x : 1,2,3 or 4
Evaluating p(x) for a given value of the random variable will provide the associated pr,'n*ability. For example, using the preceding probability function, we see that p(2) : 1 1
provides the probability that the random variable assumes a value of 2. The more u'itlrused discrete probability distributions generally are specified by formulae. Three impor:cases are the binomial, Poisson and hypergeometric distributions; these are discussed l:rin the chapter
Methods7 The probability distributlon íor the random variab|e X fo||ows
p(x)
P6)
x10
P8)
)0)5
30
35
ls this probability distribution valid? Exp ain.
What ls the probability that X : 30?
0.20
0. t5
0.25
0.40
a,
b
DISCRETE PROBABILITY DISTRIBUTIONS
c. What is the probability that X is less than or equal to 25?
d, What is the probability that X is greater than 30?
ApplicationsThe following data were collected by counting the number of operating rooms in use at ageneral hospital over a 20-day period, On three of the days only one operating room wasused, on Ílve ofthe days two were used, on eight ofthe days three were used' and on fourdays all four of the hospital's operating rooms were used.
b,
c,
Use the relative írequency approach to construcl a probability distrjbution forthe numberof operating rooms in use on any given day,
Draw a graph of the probability distribution,
Show that your probability distrjbution satisÍ']es the required conditions íor a valid discreteprobability distribution.
Table 5'4 shows the Percent frequency distributions oí job satisfaction scores íor a sample ofiníormation systems (lS) senior executives and lS middle manages. The scores range from a
low of l (very dissatisÍled) to a high of 5 (very satisÍied).
Job satisfaction lS senior executives (%) lS middle score managers (%)
I
2
3
4
5
5
9
3
424t
4
t0
t2
46
28
a. Develop a probability distribution for the job satisíaciion score of a seniorexecutive.
b. Develop a probability distribution forthe job satisfaction score of a middle manager.
c. What is the probability a senior executive will report a job satisíaction score of4or5?
d. What is the probability a middle manager is very satisÍ'ied?
e. Compare the overall job satisíaction of senior executives and middle manage6.
l0 A technician services mailing machines at companies in the Beme area. Depending on
the type of malfunction, the service call can take l, 2, 3 or 4 hours. The different types ofmalfunctions occur at about the same írequency.
Develop a probability distribution íor the duration oí a service ca||'
Draw a graph oíthe probability distribLrtion.
Show that your probability distribution satisÍles the conditions required íor a discreteprobability function,
What is the probability a service call will take three hours?
A service ca|l has just come in, butthetype of malíunction is unknown. lt is 3:OO p'm,
and service technicians usually get off at 5:00 p.m, What is the probability the senvice
technician will have to work overtime to Ílx the machine today?
a.
b.
c.
d.
e.
CHAPTER 5 DISCRETE PROBABILITY DISTRIBUTIONS
I I A college admissions tutor sublectively assessed a probability distribution lorX the numberof entering students, as íollows'
p(x)
r 000I r00| 200r 300
r 400
0.l50.20
0.30
0.25
0. r0
a. ls this probability distribution valid? Exp a n.
b' What ls the probabi|ity of |200 or íewer entering students?
l 2 A psycho|ogist determined that the number of sessions requ ired to obtain the truí of a
new patient is either |, 2 or 3' Let X be a random varable indicating the number oísessionsrequired to gain the patrent's trust. The followng probability functon has been proposed.
P(x) : x6
íorx : l, 2, or 3
a. ls this probability function valid? Expla n,
b. What is the probability that it takes exaclly two sessions to gain the patient's trust?
c. What is the probability that it takes at least two sessions to gain the patient's trust?
l3 The ío|lowlng tab|e is a partial probability distribution Íbrthe l'4RA Company's pro]ected
proÍits (X : profit in €'000s) íor the Ílrst year oí operation (the negative value denotesa loss).
P(r)
- 100
0
50
100
150
200
a' What is the proper value íor p(200)? What is your interpretation of this va|ue?
b What is the probabi|ity that MM w] l be proÍltab]e?
c. What is the probabi|ity that MRA wr l make at |east € l00 0O0?
0. r0
0200300.25
0. t0
Expected valueThe expected value, or mean, of a random variable isfor the random variable. The formula for the expectedable X follows.
a measure of the central locar, ,'n:
value of a discrete random '. a--
CHAPTER 7 SAMPLING AND SAMPLING DISTRIBUTIONS
The head of personnel services for E-Applications & Informatics plc (EAI) hi'given the task of developing a profile of the company's 2500 managers. The char.:ctics to be identified include the mean annual salary for the managers and the pro:of managers who have completed the company'S management training prografirÍ:r2500 managers are the population for this study. We can find the annual salary an;ing programme Status for each individual by referring to the firm's personnel recoÍ_}data file containing this information for all 2500 managers in the population is onthat accompanies the text, in the file EAI.
Using the EAI data set and the formulae presented in Chapter 3, we calcu-lnupopulation mean and the population standard deviation for the annual salary dat.
Population meaÍ7: p: €51 800Population standard deviation: o: €4000
The data for training programme status show that 1500 of the 2500 ill&Ílll$úl;pleted the training programme. Let a denote the proportion of the population tl,.r,
pleted the training programme: n : 150012500 : 0.60. The population mearsalary (p : €51 800), the population standard deviation of annual salary (o: :and the population proportion that completed the training programme (x : t
parameters of the population of EAI managers.Now, suppose the necessary information on all the EAI managers was /1o.-
available in the company's database. How can the firm's head of personnelobtain estimates of the population parameters by using a sample of managers..than all 2500 managers in the population? Suppose a sample of 30 manager:used. Clearly, the time and the cost of developing a profile would be substaniifor 30 managers than for the entire population. Ifthe head ofpersonnel could bethat a sample of 30 managers would provide adequate information about the pc'iof 2500 managers, working with a sample would be preferable to working with r:epopulation. Often the cost of collecting information from a sample is substanr:than from a population, especially when personal interviews must be conducted i;the information.
First we consider how we can identify a sample of 30 managers.
Several methods can be used to select a sample from a population. One of the rrr:'nmon is simple random samplÍng. The definition of a simple random sampleprocess of selecting such a sample depend on whether the population is finite o:We Íirst consider sampling from a finite population, because the EAI samplinrinvolves a finite population of 2500 managers.
Sampling from a finite populationA simple random sample of size n from a finite population of size Iy' is defined a'. í
CHAPTER 7 SAMPLING AND SAMPLING DISTRIBUTIONS
Population parameter Parameter value Point estimator Point estin'z::
Population mean annual salary
Populat on standard dev ation íorannual salary
Population proporlion who have
completed the management
training programme
Sample mean annual salary
Samp e standard deviatlon íorannual salary
Sample proporlron who have
completed the management
training prograÍ-orrre
Methods7 The íollowing data are írom a simple random samp|e'
s B l0 7 la t4
a' Ca culate a point est mate oíthe population mean'
b. Calculate a point estimate olthe populaton standard devation,
8 A sur^vey qUest on íor a samp e of i5O ind vidua]s yielded 75 Yes responses, 55 Noresponses, and 20 No Op nion responses,
a' Calculate a point estimate oíthe proPonon n the populaton who respond Yes'
b' Ca|cu]ate a po nt estimate oíthe propor1ion in the population who respond No'
Applications9 A simp|e random sample oífive months of saes data provided the fo owng ]nforma::-
Month. l)345Units so/d: 94 00 85 94 9)
a' Calcu ate a point estlmate oíthe popu ation mean numberof unlts so|d per mon_:
b, Ca culate a point estlmate oíthe population standard devlat on'
l0 The cata set lYutual Fund contains data on a sample oí40 mutual íunds' These wererandorr y selected írom 2B3 funds íeatured in Buslness Week Use the data setto ans.'.-
íollowing CL_]est ons'
a' Compure a oclni estimaie of the propor1lon oíthe Busjness Week mutual funds:_.-_
load funds.
b' Compute a polnt es-ilmate of the propofton oíthe íunds that are cassiÍled as hi;_
c' Compute a po nt esirnrate oíthe propofton olthe íunds that have a be]ow áV€'-:I:
risk rat ng,
I I ln an ICM poll for the Guordian newspaper in October 2008, durlng the turbulence - ."wodd's Ílnancial markets, respondents were asked to what extent they fe|t they anc __.
íamilies would be aííected Írnanclally' The op nions oíthe l 007 adu t respondents v,:'.
98 Suffer a great deal
320 Su"e qurle a lor
p:€5l B00
o:€400O
/t - u.ou
':€5].s-€33]:
P - u.oj
INTRODUCTION TO SAMPLING DISTR B,- C'.,J U
426 Suííer a |itt|e
l37 Not sufíer at a|l
3 | Don't <now
Calculate point estrmates of the followrng populatron parameters.
a' The proportion oí all adults who íee| they wou|d suffer a little.
b. The propor1ion oía|| adults who íeel they wou]d not suffer at al '
c' Theproportionoíall adultswhoíeei theywouldsuííerquitea|otoragreatdea|'
12 l4any drugs used to treat cancer are expensive, BusinessWeek reporled on the cost pertreatment of Herceptin, a drug used to treat breast cancer. Typrcal treatment costs (rn
dol ars) íor Herceptin are provded by a simple random samp|e ol |0 patlents.
4376 5578 27 t7 49ZA 4495
4t9B 6446 4t t9 4237 38 r 4
a. Calculate a point estimate of the mean cost per treatment with Herceptin.
b' Calculate a point estimate oíthe standard deviatlon of the cost pertreatment wthHerceptin,
For the simple random Sample of 30 EAI managers shown in Table ] .2, Íhe point estimateof l iS Í : €5l 814 and the point estimate of rris p : 0.63. Suppose we Select anothersimple random Sample of 30 EAI managers and obtain the ÍblIowing point estimates:
Sample mean:' : €52 610Sample proportion: p : 0.10
Note that different values of the sample mean and sample proportion were obtained. Asecond simple random sample of 30 EAI managers cannot be expected to plovide exactlythe same point estimates as the first sample.
Now, suppose we repeat the process of selecting a simple random sample of 30 EAImanagers over and over again, each time computing the values of the sample mean andSample proportion. Table 7.4 contains a portion of the results obtained Íbr 500 simplerandom samples, and Table 7.5 shows the frequency and relative frequency distributionsÍbr the 500 values. Figure 7.1 shows the relative Íiequency histogram tbr the values.
ln Chapter 5 we defined a random variable as a numerical description of the outcome of an
experiment. If we consider selecting a simple random salnple as an experiment, the samplemean is a numerical description of the outcome of the experiment. So, the sample mean is arandom variable. In accordance with the naming conventions for random variables describedin Chapters 5 and 6 (i.e. use of capital letters for names of random variables), we denotethis random variableX. Just like other random variables, X has a mean or expected value, a
standard deviation, and a probability distribution. Because the various possible values ofXare the result of different simple random samples, the probability distribution of Í is calledthe sampling distribution of X. Knowledge of this sampling distribr"rtion will enable r"rs tomake probability statements about how close the sample mean is to the population mean p.
Let us return to Figure 7.1. We would need to enumerate every possible sarnple of30 managers and compute each sample mean to completely determine the sampling
aoaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaa
Software Secticnfor Chapter 7
If a list of the elements in a population is available in a MINITAB worksheet, MINITABcan be used to select a simple random sample. For example, a list of the top 100 golfers inthe official world rankings, as at July 2008, is given in the MINITAB file 'Golfers. MTW'.Column I contains the ranking, column 2 the name and country of the golfer, column 3the golfer's points average, and column 4 the number of events over which the average hasbeen calculated. The first five rows in the data set are shown in Table 7.6. Suppose that youwould like to select a simple random sample of 20 golfers from the top 100. The followingsteps can be used to select the sample.
Step I Calc > Random Data > Sample From Columns
Step 2 Enter 20 in the Number of rows to sample box
flYain menu bar]
fSample From Columns panel]Enter C l-C4 in the From columns boxEnter C5-C8 in the Store samples in boxC ick OK
The random sample of 20 golfers appears in columns C5-C8.
If a list of the elements in a population is available in an EXCEL file, EXCEL can r.to select a simple random sample. For example, a list of the top 100 golfers in the . -world rankings, as at July 2008, is given in the EXCEL file 'Golfers.XlS'. Co.;rcontains the ranking, column 2 the name and country of the golfer, column 3 the 5. .:
points average, and column 4 the number of events over which the average h..calculated. The first five rows in the data set are shown in Table 7.6. Assume rwould like to select a simple random sample of 20 golfers from the top 100.
The rows of any EXCEL data set can be placed in a random order by adding .:column to the data set and filling the column with random numbers using the :R4.function. Then using EXCEL's sorting capability on the random number colurm.rows of the data set will be reordered randomly. The random sample of size n
the first n rows of the reordered data set. In the Golfers data set, labels are in ror.the 100 golfers are in rows 2 to 101. The following steps can be used to select; -::
random sample of 20 golfers.
Step I Enter :RAND( ) in cell E2
Step 2 Copy cell E2 to cells E3:E lO I
Step 3 Select any cell in Column E
Step 4 Clickthe Home tab on the Ribbon
Step 5 ln the Editing group, click Sort & Filter
Step ó Click Sort SmalIest to Largest
The random sample of 20 golfers appears in rows 2 to 2I of the reordered data .:-random numbers in column E are no longer necessary and can be deleted.
If a list of the elements in a population is available in a PASW data file, PAS\\used to select a simple random sample. For example, a list of the top 100 golle:,official world rankings, as at July 2008, is given in the PASW data file 'Go1te:.Column 1 contains the ranking, column 2 the name and country of the golfer. ; 'r
the golfer's points average, and column 4 the number of events over which the ar::,:lbeen calculated. The first five rows in the data set are shown in Table 7.6. Suppose :would like to select a simple random sample of 20 golfers from the top 100. The :
steps can be used to select the sample.
Step I Data > Select Cases
Step 2 Select Random sample of casesClick on the Sample button
[Mar" ^-:
[Select Cases
RANDOM SAMPLING USING PASv\
Step 3 Specify Exactly 20 cases from the first !00 cases
[Select Cases:Random Sample pane -
Click Continue to return to the Select Cases paneL
Step 4 Select Deleted if you Want to create a Ílle
containing only the 20 sampled golfersClick OK
[Select Cases panel]
If you opt to delete the non-selected cases, the 20 randomly selected cases can be savedin a new data file.
ILCHAPTER 8 INTERVAL ESTIMATION
to a random sample of customers who placed an order or requested service du:previous month. The questionnaire asks customers to rate their satisfaction u:-:things as ease of placing orders, timely delivery, accurate order filling and te -
advice. The team summarizes each customer's questionnaire by computing ansatisfaction score -r that ranges from 0 (worst possible score) to 100 (best possible .
A sample mean customer satisf'action score is then computed.The sample mean satisfaction score provides a point estimate of the mean .-
tion score p for the population of all CJW customers. With this regular measuretomer service, CJW can promptly take comective action if a low customer sari>,*"
score results. The company conducted this satisfaction survey for a number of rand consistently obtained an estimate near 12 for the standard deviation of sati..scores. Based on these historical data. CJW now assumes a known value of o:the population standard deviation. The historical data also indicate that the popul; ilrl
satisfaction scores Íbllows an approximately normal distribution.During the most recent month, the quality assurance team surveyed 100 cu.
(n : 100) and obtained a sample mean satisfaction Score of Í - 72' This prcpoint estimate of the population mean satisfaction score,u. We show how to c-'rthe margin of error for this estimate and construct an interval estimate of the :tion mean.
Margin of error and the interval estimateIn Chapter 7 we showed that the sampling distribution of the sample mean f . -used to compute the probability that X will be within a given distance of p. In i:..example, the historical data show that the population of satisfaction scores is n
distributed with a standard deviation of o: 12. So, using what we learned in Cl-- .rwe can conclude that the sampling distribution of X fbllows a normal distribura standard error of
or: ol-tn : 12/{100 : 1.2
This sampling distribution is shown in Figure 8.1.* The sampling distriburr,provides information about the possible differences between X and p.
Using the table of cumulative probabilities for the standard normal distribution. ,
that 95 per cent of the values of any normally distributed random variable are witlu:standard deviations of the mean. So, 95 per cent of the X values must be within -of the mean tr-t. In the CJW example, we know that the sampling distribution of X i. '
with a standard error of o, : 1.2. Because -+ 1.96o, : + 1.96(1.2) : -12.35, u'e ;.that 95 per cent of ill X values obtained using a sampie size of n : 100 will be withr:units of the population mean p. See Figure 8.1.
In the introduction to this chapter we said that the general form of an:-estimate of the population mean p is Í + Margin of error. For the CJW exanl:-,pose we set the margin of error equal to 2.35 and compute the interval estint*using - )- 2.35. To provide an interpretation for this interval estimate, let us.the values of t -f 2.35 that could be obtained if we took three different simple :.samples, each consisting of 100 CJW customers.
*The population ofsatisfaction scores has a normal distribution, so we can conclude that the .-distribution of X is a normal distribr-rtion. If the population did not have a normal distribution. r,.
rely on the central limit theorem, and the sarnple size of n : 100, to conclude that the samplir:-bution of X is approximately normal. In either case' the sampling distributíon woul<J appear..in Figure 8. l.
CHAPTER 8 INTERVAL ESTIMATION
margin of effor is then -+ t o/2sl^n, and the general expression for an interval e-
a population mean when ois unknown is:
lnterval estimate of a population mean: ounknown
sa+i" - '*1,1-n
where s is the sample standard deviation, (| - a) is the conÍldence coefflcient, and to_ ;
t value providlng an area ot alL n the upper tail oíthe t distributjon with n - l degrees
lreedom'.
Consider a study designed to estimate the mean credit card debt for a define;tion of households. A sample of n : 85 households provided the credit card b.,the file 'Balance' on the accompanying CD. The first few rows of this data set ;.in the EXCEL screenshot in Figure 8.4 below. For this situation, no previou. .of the population standard deviation o is available. As a conseqllence, the sa:-.: :
must be used to estimate both the population mean and the population standard c.Using the data in the 'Balance' file, we compute the sample mean Í - 5900 t€sample standard deviation s : 3058 (€).
Figure 8.4 First few data rows and summary statistics for credit card
C, Balance
s619
536-töt!^oJ+Ö?a l öI -1+O
. 381
I 2$9ff
1686
i 1362
4920
m8an :standard del'iation =
59Ü0
3058
DBA1
:J,
-5
6
$I1Ü
'3The reason the number of degrees of freedom associated with the / value in expression (E.iconcems the use of s as an estimate of the population standard deviation. The expression for --:
standald deviation 15 5 : 1Epr,' ;r171'l t;. Degrees of Íieeclom reÍ'ers to the number of ir'_:.
piecesofinformationthatgointothecomputationofI(.r' .t)].ThenpiecesofinÍbrmatic: l'in computing I'(x. Í)2 are as fol1ows: r' - r, x. - jr, . . . ,.r,, -i In Section 3.2 we inc _-'llll
I'('.r, * Í) : 0. Hence' only lr - 1 ofthex - Íva1ues are independent; thatis, ifweknov'i; _
values, the remaining value can be determined exactly by using the condition that L(x. :.
n - 1 is the number of degrees of freedom associated with I(r. Í)'] and hence the numberof freedom Íbr the t distribution in expression (8.2).
CHAPTER 8 INTERVAL ESTIMATION
9 Find the t va\ue(s) íor each oí the ío\\owing cases'
a' Upper tail area oí 0'025 with \2 degrees oí íreedomb' LowertalI area of O.O5 with 50 degrees oííreedomc. Upper tal area of 0'0 | with 30 degrees oí íreedom
d' Where 90 per cent oíthe area fals between these two t values with 25 degrees
oí íreedome' Where 95 per cent oíthe area laLls between these two r values with 45 r-]éorééC
oí íreedom
I0 Thefollowingsampledataarefromanormal population: l0 B l2 15, 13, ll,6,5.
a' What s the point estlmate oíthe popu ation mean?
b, What is the point estmate olthe populatron standard devraton?
c' With 95 per cent conf dence, what is the margin oí error for the est mation oí tl'=population mean?
d' What s the 95 Per Cent coníldence interva] íorthe populatlon mean?
l l A simp|e random sample with n : 54 provided a sample mean oí22'5 and a sample
standard dev ation oí4'4.
a, Construct a 90 per cent conldence nterval for the population mean.
b. Construct a 95 per cent confldence interval forthe popu ation mean.
c' Construct a 99 per cent conÍldence interva] íor the populatlon mean'
d. What happens to the margin of error and the conldence interval as the confder-:=
contacts made during the week' A samp e oí 65 week|y reports showed a samp|e n :.- _
19,5 customer contacts per week, The sample standard deviatron was 5,2, Prov de
90 per cent and 95 per cent conÍldence lnterva]s íor the populatlon mean numbeT c' .- ..customer contacts for the sales personne.
l3 Consumptlon ola]coho c beverages by young Women oídrnking age has been incl':..in the UK, Europe and the US (Ihe Wall Street ]ournol, l 5 February' 200ó). Data (an: -.consumption in litres) consistent with the f ndings reported ia The Woll Street]ourna, . . : :
are shown íor a sample of 20 European young Women'
)66IlA164
93
82
77)a2
0
)99
t5
t3
93
t74r30
17r
t0
97
t69
0
r30
Assumlng the population rs rough y symmetrica Ly d stributed, construct a 95 per cer -
confldence rnter^va lor the Tnean annua consumptlon oí alcoho c beverages by you-.European women,
l4 The lnternationa Air Transport Associat on sur^veys busrness travel ers to develop q-. .
ratings íor international airports' The maximum possible ratng s ten, Suppose a s m: .random sample of bus ness traveLlers s se ected and each traveller s asked to prov c- -ratlng íor S ngapore Changi lnternationa Airpor1' The rat ngs obtained írom the sar-:
=
of 50 bus ness trave ers íol|ow' Construct a 95 per cent conÍldence interva| estima're ' .
populat on mean rating íor Changi'
DETERMINING THE SAI',1PLE SIZE
8
6
9
7
z6
5
.5
Suppose a survey of 40 Ílrst-time home buyers finds that the mean of annual household
income is €40 000 and the sample standard deviation is € l5 3O0.
a' At 95 per cent conÍldence, what rs the margin oí error for estimating the population
mean householo 'ncomelb. What is the 95 per cent confldence rnterval forthe population mean annual household
income íor first-trme home buyers?
Thifty íast-food restaurants inc uding lYcDonald's and Burger King were visited during the
summer oí 2009. During each visrt, the customer went to the drive-through and ordered a
basic mea| such as a burger, íries and drink, The time between pul|ing up to the order |<iosk and
receivingthe Íllled orderwas recorded. The times in minutes íorthe 30 visits are as ío|lows:
a' Provide a point eíimate of the population mean drive_through time at fast-íood restaurants
b At 95 per cent coníldence, what is the margin oíerror?c. What is the 95 per cent confldence interval estimate oíthe popu|átion mean?
d, Discuss skewness that may be present in this population, What suggestion would you
make for a repeat oíthis study?
l7 A survey by Accountemps asked a sample oí 2O0 executives to provide data on the number
oí minutes per day offjce worl<ers waste trying to |ocate mislabelled, misíl|ed or misp|aced
tems. Data consistent with this survey are contained in the data set 'ActTemps'.
a' Use 'ActTemps' to develop a point estlmate oíthe number oí minutes per day oflceworkers waste tryrng to locate mislabelled, misfiled or misplaced items.
b. What rs the sample standard deviation?
c. What is the 95 Per cent confldence interva| for the mean number oí minutes wasted
-^.- tr- lPcÍ Udyl
ln providing practical advice in the two preceding sections, we commented on the role of.:e sample size in providing good approximate confidence intervals when the population:: not normally distributed. In this section, we focus on another aspect of the sample size::sue. We describe how to choose a sample size large enough to provide a desired margin-,i error. To understand how this process is done, we return to the oknown case presentedr:. Section 8.1. Using expression (8.1), the interval estimate isx + zrrol^!n We see that
-- -. the population standard deviation o, and the sample size n combine to determine the:-Jrgin of error. once we Select a conÍidence coefficient I _ a, zd2 can be determined.T-:en. if we have a value for o, we can determine the sample size n needed to provide
-r desired margin of error. Let E : the desired margin of error.
^o*'^n
6
9
9
4
5
9
6
IB
B
4
4
7
3
B
7
I9
6
5
9
3
5rJ
3
4
9
I4
l
l0 4
4B83
8745t0 B
ts
ló
The general expression for an interval estimate of a population proportion is:
lnterval estimate of a population proportion
(8.ó)
where | - a is the conÍldence coeÍ1lcient and z* s the z va]ue providing an area oí ul7 nlheuppertail ofthe standard normal disinbuton.
Consider the followin-s example. A national survey of 900 women golfers was con-ducted to learn how women golfers view their treatment at golf courses. (The data areavailable in the file 'TeeTimes' on the CD.) The survey found that 396 of the womengolfers were satisÍied with the availability of tee times. So, the point estimate of the pro-porlion of the population of women golfers who are satisfied with the availability of teetimes is 3961900 : 0.44. Using expression (8.6) and a 95 per cent confidence level,
POPULATION PROPORTION
: O.44 -r 0.0324PtZazu(I - n\
n
The margin of error is 0.0324 and the 95 per cent confidence interval estimate of thepopulation proportion is 0.408 to 0.472. Using percentages, the survey results enable usto state that with 95 per cent confidence between 40.8 per cent and 47.2 per cent of allwomen golfers are satisfied with the availability of tee times.
Determining the sample sizeThe rationale for the sample size determination in developing interval estimates of a issimilar to the rationale used in Section 8.3 to determine the sample size for estimating apopulation mean.
Previously in this section we said that the margin of error associated with an intervalestimateofapopulationproportio,isz,,"]P(-t,\l,.Themarginoferrorisbasedonthe values sf Zaz, the sample proportion p, and the sample size n. Larger sample sizesprovide a smaller margin of error and better precision. Let E denote the desired mar-vinof error.
E--L - 4oJ)
Solving this equation for n provides a formula for the sample size that u i1l provide a
margin of error of size E.
krrf pG - r)n:E1
Note, however, that we cannot use this formula to compute the sample size that will pro-vide the desired margin of error because p will not be knoun until after we select thesample. What we need, then, is a planning value Íbr p that can be r"rsed to make the com-putation. Using p* to denote the planning value for p. the following tbrmula can be usedto compute the sample size that will provide a margin of en'or of size E.
CHAPTER 8 INTERVAL ESTIMATION
Sample size for an interval estimate of a population mean(z .)zo2
t!-t-
lnterval estimate of a population proportion
Sample size for an interval estimate of a population proportion
(z*r)'p*(l - p*)
E7
he manager of a city-centre branch of a well-knownrnternat ona bank commiss oned a customer
satislaction survey. The sunzey investgatedthree areas oí customer satisfaction: theirexperience waiting for service at a till, theirexperience being served at the t | , and
the r experence of se|í-serv ce íacl lties at
the branch. Within each oíthese categories'
respondents to the survey were asked togive ratings on a number oí aspects oíthe bank's service,
These rat ngs were then summed to give an overa I
satisíacton ratng n each oíthe three areas oí service.
The summed ratings are scaled such thatthey lie between0 and |00, with 0 representing extreme dissatisíaction
and |0O representing e*lreme satrsíacton' The data
Íl e for this case study ('IntnIBank' on the accompany ng
CD) contains the 0 l00 ratings íor the three areas ofservice' together with particuIars oí respondents' genderand whether they would recommend the banl< to other
People using automated self seryice machines at a main bank branch. @ david
peanon/Alamy.
people (a simple Yes/No response was required tcquestlon)' A tab e containing the f rst íew rows o'
data Íl e is shown below'
Self-
Waiting Service service Gender Recommena
55
50
30
65
55
40t5
45
5550
65 50 male no80 88 male no40 44 male no
60 69 male yes
65 63 male no
60 56 male no
65 38 male yes
60 56 male no65 75 male no
50 69 male yes
Managerial reportl Use descriptive statistics to summarize each oít-=
flve variab|es in the data Íl|e (the three service ra. -:customer gender and customer recommendatio-
2 Calcu ate a 95 per cent confldence interval estrn ..=
of the mean ser'vice rating forthe population oícustomerc oíthe branch, íor each of the three s:-areas' Provide a manageria nterpretat on oí eac-rnterval estimate,
3 Calcu|ate a 95 per cent conÍldence interva| es: ^_.-
of the proportion oí the branch's customers W - :
would recommend the banl<, and a 95 per cer.:
conÍldence nterva] estmate oíthe propoftor- :'
-: : ':^.n's customers who are female, Provide a
:' ::.'a interpretatron oí each nterva estimate
,: - :.: rhe branch manager required an estlmate-: rercentage of branch customeB who would
- -__^_end the branch wthin a margn oíerroroí_ :'-:^-.age points. Using 95 per cent conÍrdence,
i-:e should the sample size be?
Y : j __essiono/ magazine Was developed íor ail _
= . . -d ence oí recent univers ty gradUates Who
' _=- |rst l0 years in a business/proíessiona|: _ :j :wo years oí publication' the magazine has
' : slccessíul. Now the pubIisher is interested- - .,' - ^ i ihe magazrne's advertising base. Potentral-, - =-. :ontinually ask about the demographics
-_:'::_s oí subscrbers to Young Proflessionol. To' - :- . :formation, the magazine commissioned' : .: cevelop a profile of its subscribers. The: '::, ,s will be used to help the magazine: :1 ! €S oí interest and prov de advertisers, _':'e of subscribers. As a new employee oí
- ,..- -e. you have been asked to help analyze
, =. -:sults.
' -'= :' the survey questions íollow (these are': -::::- y in the order they were asl<ed in the
cur age?
Yale_lemale_
and woman reading loung Proíesional l'|agazine. @ }larcin Balcenak.
i
CASE PROBLEM 2 YOUNG PROFESSIONAL MAGAZINE
5 Suppose the branch manager required an est male :'the percentage of branch customers who are íema e
within a margin of error of 5 percentage points. Us ng
95 per cent confidence, how large should the samp e
size be?
Do you plan to make any real estate purchases in thenext two years? Yes _ No _What is the approximate total value of ílnancia|
investments, exc|uslve oíyour home, owned by you
or members oíyour househo d?
How many stock/bond/mutua| íUnd transactions have
you made in the past year?
Do you have broadband access to the lnternet at
home? Yes _ No _Please ndicate yourtota household ncome astyear,
Do you have children? Yes _ No _The fe enttled Professional contans the responses
to these questions. The Ílle is on the CD accompanying
the text.
Managerial Report
Prepare a manageria repoft summarzngthe results oí the survey' ln addition tostatistical summaries, discuss how themagazine might use these results to attract
adveftisers, You might also comment on
how the survey results could be used by
the magazine's editors to dentíy topics that would be
of interest to readers, Your report should address thefollowing issues, but do not limit your analysis to just
these areas,
I Deverop approprraTe descrrptrve srdtrstrcs tosummarize the data,
7
I
aaaaaa aaaaaaaaaaaaaoaaaaaaaaaa
Seiftwane Sectionfor Chapter I
We describe the use of MINITAB in constructing confidence intervals for a populationmeun and a population proportion.
Population mean: oknownWe illustrate using the CJW example in Section 8.1 (file 'CJW.MTW' on the accompa-nying CD). The satisfaction scores for the sample of 100 customers are in column C1 ofa MINITAB worksheet. The population standard deviation o : 20 is assumed known.The following steps can be used to compute a 95 per cent confidence interval estimateoi the population mean.
Step I Stat > Basic Statistics > l-Sample Z
Step 2 Enter C I in the Samples in columns box
! -Sample Z (Test and Confidence lnterval) panellEnter 20 in the Standard deviation boxClicl< OK
The YlN|TAB default s a 95 per cent conÍldence leve]' To specfy a
different conldence evel such as 90 per cent:
Step 2 Enter C I in the Samples in columns box
! -Sample Z (Test and Confidence lnterval) panellEnter 20 in the Standard deviation boxSelect Options
Step 3 Enter 90 rn the Confidence level boxClicl< OK
! -Sample Z - Options panell
[Marn menu bar]
Step 4 Clcl< OK ! -Sample Z (Test and Confidence !nterval) panelr
CHAPTER 8 INTERVAL ESTIMATION
Population mean: o unknownWe illustrate using the credit card balance data Íbr a sample of 85 households that '''' -an example in section 8.2 (file 'Balance.MTW' on the accompanying CD). The data --,in column C I of a MINITAB worksheet. In this case the population standard deviatio: -will be estimated by the sample standard deviation .1. The following steps can be use c '
compute a 90 per cent confidence interval estimate of the population mean. The dialot -,panels involved are quite similar to those above (but in this case do not involve inputr. -the value for the standard deviation).
Step I Stat > Basic Statistics > l-Sample t !Yain menu :
Step 2 Enter C I n the Samples in columns!-Sample t (Test and Confidence lnterval) p:.-.
Click OKThe M|N|TAB default rs a 95 percent conldence leve]' To specify a diíferentconíldence eve such as 90 per cent:
Enter C I in the Samples in columns box
I I -Sample t (Test and Confidence !nterva!) pa-.Select Options
Step 2
Step 3 Enter 90 n the Confidence level boxClcl< OK
! -Sample t - Options pa-
Step 4 Click OK ! -Sample t (Test and Confidence lnterval) pa.-.
The results of the MINITAB inten'al estimation procedure are shown in Figure E :The sample of 85 households provides a sample mean credit card balance of €5900. _
sample Standard deviation of €3058. an estimate (after rounding) of the standard errorthe mean of €332' and a 90 per cent confidence interval of €5348 to €6452.
Population proportionWe illr-rstrate using the survey deita Íbr Women goJfers presented in Section 8'4 (fi..'TeeTimes.MTW' on the accompanying CD). The data are in column Cl of a MINIT.{:worksheet. Individual responses are recorded as Yes if the golfer is satisfied with rl-,.
availability of tee times and No otherwise. The Íbllowing Steps can be used to compu.:
M l N lTAB,.,conÍl den ce'i ntewal ío r th e cr.edit card balaneé-urvey
Results for: Balance.MTW
Sne-Sample ? Balance
','::: ab - e li \i: a:tsalarce .13 Slta
-rE Ye:: 3:g ::a'a^ l:... áli]ljjé 1JJ*-! r'r i:
TNTERVAL EsrMAroN usrNgJ{!+ ua 9-5 per cent confidence interval estimate of the proportion of women golfers who aresatisÍ]ed with the availability of tee tines. The rnain dialogue panel is quite similar tothose for the population ntean procedr.rres described above.
Step I Stat> Basic Statistics > I Proportion
Step 2 Enter C I in the Samples in columns-l Proportion (Test and Confidence tnterval) panel]Select Options
Step 3 Check Use test and interval based on normal distribution! Proportion - Options panel]
(The YlN|TAB default is a 95 per cent conÍldence eve ' To spec Íy a d fíerentconldence leve , enter the appropnate lgure in the Confidence Level box)
Click OK
Step 4 Click OK ! Proportion (Test and Confidence !nterval) panell
MINITAB's 1 Proportion routine uses an alphabetical ordering of the responsesand selects Íhe 'second re''ponse for the population proportion of interest. In the womengolÍ'ers example, MINITAB uSeS the alphabetical ordering No-Yes and then provides theconfidence interval for the proportion of Yes responses. Becanse Yes was the responseof interest, the MINITAB output was fine. However. if MINITAB's alphabetical order-ing does not provide the response of interest, select any cell in the column and usethe seqr-rence: Editor > Column > Value Order. It will provide you with the optionof entering a user-specified order. You must list the response of interest second in thedefine-an-order box.
[Yain menu bar]
We describe the use of EXCEL in constructing confidence intervals Íbr a populationmean (there is no inbuilt routine for a population proportion).
Population mean: oknownWe illustrate r-rsing the CJW example in Section 8.1 (file'CJW.XLS'on the accompa-nying CD). The population standard deviation o : 20 is assumed known. The satisfac-tion scores for the sample of 100 customers are in column A of an EXCEL worksheet.The following steps can be used to compute the margin of error for an estimate of thepopr,rlation mean. We begin by Lrsing EXCEL's Descriptive Statistics Tool described inChapter 3.
Step I C cl< the Data tab on the R bbon
Step 2 ln the Analysis group, clicl< Data Analysis
Step 3 Choose Descriptive Statistics from the list of Analysis Tools
:..1APTER 8 INTERVAL ESTIMATION
Step 4 EnterAl:Al0l n the lnput Range box fDescriptive StatisticsSelect Grouped by ColumnsSelect Labels in First RowSelect Output RangeEnter C I in the Output Range boxSelect Summary StatisticsClicl< OK
The summary statistics will appear in columns C and D. Continue by compu: - - rr,
margin of error Lrsing EXCEL's Confidence function as follows:
Step 5 Select cell C | ó and enter the label Margin of Error
Step ó Select cel D i 6 and enter the EXCEL formula : coNFlDENcE(.05'20,l00 r
The three parameters oíthe ConÍldence function are
Alpha : J - conldence coefílcient : ] - O.95 : 0.05The population standard deviation : 20The sample size : l0O (Note; This parameter appears as Count in cell D I 5.
The point estimate of the population mean is in cell D3 and the margin of error is :: . ,
D 1 6. The point estimate (82) and the margin of enor (3.92) allow the conÍidence in . = - -
for the population mean to be easily computed.
Population mean: d unknownWe illustrate usin-E the credit card balance data for a sample of 85 household: ,--was an example in section 8.2 (file 'Balance.XLS' on the accompanying CDt. ."data are in column A of an EXCEL worksheet. The following steps can be u.,to compute the point estimate and the margin of error for an interval estimate -
population mean. We will use EXCEL's Descriptive Statistics Tool describecChapter 3.
Step I Clicl< the Data tab on the Ribbon
Step 2 ln the Analysis group, clicl< Data Analysis
Step 3 Choose Descriptive Statistics írom the List of Analysis Too s
C cl< OK
Step 4 Enter A l:A8ó rn the Input Range box [Descriptive Statistics pa-.Se ect Grouped by ColumnsChec< Labels in First RowSe ect Output RangeEnter C I in the Output Range boxChecl< Summary StatisticsChecl< Confidence Level for MeanEnter 95 in the Confidence Level for Mean boxClick OK
The summary statistics will appear in columns C and D. The point estimate of the popu-lation mean appears in cell D3. The margin of error, labelled 'Confidence Level (95.0 percent)', appears in cell D16. The point estimate (€5900) and the margin of error (€660r
INTERVAL ESTIMATION USING PASW
Figure 8.9 lnterrral estimation of the populaton mean credt card baianceusing EXCEL
^1
Balance96 19
á3eJa1 l Óu-r{u
, *'iu
*r,J I
2tt9,2
l|]Ú|]
1 962
]32Ü
5ÜJ7
6_tr21
5159
tJU-{ i
3g2J
3JjÜÍoÖ l
5938
5213lr
1 0658
3910
7503
It'lZ
allow the confidence intervalfrom this EXCEL procedure
l;lean
,stanCarC Errcr
Í''ledian
í;lc ce
-qtandard De',iaticn
Sarnple Variance
Kurtosis
*Qke''í]es5
Rarrge
l,linirriurrr
ÍJaxirlturrt
Suni
Count
ConÍidencr Lt,.'eliil5 0 o;;
:
str
i011
1:tr
1-;
i51ttL!
1*
si,\a
..s5
-\(]
4900
!]'] | |)ÚD|]]
5759
ErlJi
-1rJ3l)
93i1363 Iv LJr: z 14
Ü JÜ76J_17
J]Ü6 1
-1Ú l
lJll25* 1 5Ü0
B5
s5t 5953
for the population mean to be easily computed. The outputis shown in Figure 8.9.
@
ö
We describe the use of PASW in constructing conÍidence intervals for a population meanin the o unknown condition. There are no inbuilt routines in PASW for the o'knowncondition, nor for a population proportion.
Population mean: o unknownWe illustrate using the credit card balance data for a sample of 85 households that wasan example in section 8.2 (file 'Balance.SAV' on the accompanying CD). The data arein the first column of the data file. The fbllowing steps can be used to compute the pointestimate and the malgin of error for an interval estimate of a population lnean.
E,rl. 8 tlrE\v4lEsrrM4f|oN
Figure 8. l0 PASW confldence interval for the credit card balance survey
One-Sarnple Statislacs
N l'JÍean stcl. DeVialiÜnEtd Error
['leanBalanre s5 59ÜÜ.00 305B.0rltl .ji I uo 1
Step I Analyze ) Compare Means > One-Sample T Test
Step 2 Transfer the Balance variable to the Test Variable(s) box
[Main menu b.
[One-Sample T Test par=
The PASW default is a 95 per cent confidence level. To specify a differentconÍldence eve ' c lck options
Step 3 Enterthe appropriate Írgure ln the Confidence lnterval box
[One-Sample T Test Options par^: -
Cl cl< Continue
Step 4 Click OK [One-Sample T Test pan:'
PASW produces two tables, shown in Figure 8.10. These include the sample mea:(€5900)' the sample Standard deviation (€3058)' the estimated standard error of the mear_-
(€33 l.7) and the conÍidence interval (this is labelled as a confidence interval for 'th;Difference'). The second table also includes the result of a hypothesis test (we deal u'it:.the hypothesis test in Chapter 9).
Because a p-value is a probability, it ranges from 0 to 1. A small p-value indicates a : 'r :lrrL
result that is unusual given the assumption that f1u is true. Smallp-values lead to rejec.. - rr
110, whereas large p-values indicate the null hypothesis should not be rejected.Two steps are required to use the p-value approach. First, we must use the value , - ::ttrLr
test statistic to compute the p-value. The method used to compute a p-value deper:: u
whether the test is lower tail, upper tail, or a two-tailed test. For a lower tail test, the p- - uL
is the probability of obtaining a value for the test statistic at least as small as that prc, -rill
by the sample. To compute the p-value for the lower tail test in the oknown case, \\ e :- r. ,l
find the area under the standard normal curve to the left of the test statistic. After comp- 'll
the p-value, we must then decide whether it is small enough to reject the null hypothes,, 'r,
we will show, this involves comparing it to the level of significance.We now illustrate the p-value approach by computing the p-value for the C u,i
bottling lower tail test. Suppose the sample of 36 cola bottles provides a sample r-..11
of Í : 2.92 litres.Is Í : 2.92 smal7 enough to callse us to reject 11n? Because this':,is a lower tail test, the p-value is the area under the standard normal curve to the le ,-
the test statistic. Using Í : 2.92, o: 0.18, and n : 36' we compute the value: o. ill
test statistic:
_ *-lr_2.92-3 _ .)^1"- olJi -ü18/.'36- -'U/
The p-value is the probability that the test statistic Z is less than or equal to -2.67area under the standard normal curve to the left of ; : -2.61).
Using the standard normal distribution table, we find that the cumulative probabilitr
z.: -2.61, which in this case is the p-value, is 0.00382. Figure 9.2 shows that Í : ]
mean tr.t. : 295 by a significant amount, H,, will not be rejected and no action ri r
taken to adjust the manufacturing process.The quality control team selected a:0.0_5 as the level of signiÍicance for the
Data from previous tests conducted when the process was known to be in adjLr.::'
show that the population standard deviation can be assumed known with a ral*.o : 12. With a sample size of n : 50,the standard error of the sample mean is
oi: o_12\i7 \E0
: 1.1
)q1 6 - )q5I El
-- -1.-)-)121^150
Because the sample size is large, the central limit theorem (see Chapter 7) allous -.conclude that the sampling distribution of Xcan be approximated by a normal dist::.tion. Figure 9.4 shows the sampling distribution of X for the Maxflight hypothesi: ,
with a hypothesized population mean of lt,, - 295.Suppose that a sample of 50 golf balls is selected and that the sample mean is l - -
metres. This sample mean suggests that the population mean may be larger than -' '
metres. Is this value Í : 291 .6 sr-rfficiently larger than 295 to cause uS to reject É ",the 0.05 level of signiÍicance? In the previous section we described two approu-- ,,
that can be used to answer this qr"restion: the 7r-value approach ancl the criticiil r * -
approach.
p-value opproach
Recall that the 7r-value is a probability, compr-rted using the test statistic, that mea:-':the support (or lack of support) provided by the sample for the null hypothesis. F -
two-tailed test, values of the test statistic in eitlter tail show a lack of support tbr the -,hypothesis. For a two-tailed test, the p-value is the probability of obtaining a value '
the test statistic at least cts unlikeh, as that provided by the santple. Let us see hori ,"
7r-value is computed tbr the MaxFlight hypothesis test.
First we compute the value of the test statistic. For the o known case, the test stat:.Zis a standard normal random variable. Using equation (9.l) withÍ :291.6. the r* _'
of the test statistic is
, - l-ta
O l\ri
Sam pling'&ri b uti on' of''Xíor th e M axFlght hyp oth e s i s test
o : n = 12 =r.l" ,t; J*
' cr4frlr e HYPorHEsl9 EsrJ
provided aSample meanrating of Í:1.25 and a san-rple Standarddeviation of .l: l.05- _
the data indicate that Munich shoLrld be desi-gnated as a superior seruice airport?We want to develop a hypothesis test for which the decision to reject É1. will le*-
the conclusion that the population mean rating Íbr Munich Airport is greater than se ,
Accordingly, an upper tail test with 11,: 1.t > 7 is required. The nill and altern-hypotheses Íbr this upper tail test are as Íbllows:
Hr:pt=7Hr: trt> 7
We wi]l use a : 0'05 as the level of significance Íbr the test.
Using expression (9.4) with r :7.25..s - 1.052. and rr : 60, the value of the :,statistic is
.\- u 1 )5 - 1
s/'fn 1.051/1 60
The sampling distribLrtion of rhas n - 1 :60 - I : 59 degrees of freedom. Bec--,the test is an upper tail test, the 7r-value is the area under the curve of the r distributo the right ol t : L84.
The l distribution table provided in most textbooks wil1 not contain sufÍicient det.determine the exact p-value, such as the p-value corresponding to / : I .84. For inst: - .
r-rsing Table 2 in Appendix B. the t distribution with 59 degrees of freedom provide . '',
fbllowing information.
Area in upper tail 0.20 0. l0 0.05 0.025 0.0 t 0.005
t value (59 df) O B4B t.2e6 t.67t \ 2001 2.39t 2.66)\
t : I.B4
We see that I : 1.84 is between I .671 and 2.001. Although the table does not pror id; ,'::
exact p-value, the values in the 'Area in upper tail' row show that the p-value lrru:. "
less than 0.05 and greater than 0.025. With a level of significance of cr: 0.05, this p,--,ment is all we need to know to make the decision to reject the nr-rll hypothesis and . r
clude that Munich should be classified as a superior service airport. Computer pack*.,st-tch its MINITAB, PASW and EXCEL can easily determine the exact p-value associ. : -
with the test statistic r : 1.84. Each of these packages will show that the p-value is (.t. . 'for this example. Ap-value : 0.035 < 0.05 leads to the rejection of the null hypoth,.and to the conclusion Munich should be classified as a superior service airport.
The critical value approach can also be used to make the rejection decision. With ir .0.05 andthe rdistribution with 59 de-erees of freedom, t,,,,r: 1.67 I is the critical r. -,for the test. The rejection rule is therefbre
Reject H,,if t:= 1671
With the test statistic t - 1.84 > 1.61 l. H,, is rejected and we can conclude that Mu:, -'can be classified as a superior service airport.
Two-tailed testTo illustrate how to do a two-tailed test about a population mean Íbr the o unknt ' '
case, let us consider the hypothesis testing situation facing Mega Toys. The comp.-
Y!4ry."9N[\o^ umanufactures and distributes its products through more than 1000 retail outlets. In plan-ning production levels for the coming winter season, Mega Toys must decide how manyunits of each product to produce prior to knowin-e the actual demand at the retail level.For this year's most important new toy, Mega Toys' marketing director is expectingdemand to average 40 units per letail outlet. Prior to making the final prodr-rction deci-sion based upon this estimate. Mega Toys decided to survey a sample of 25 retailers inorder to develop more information about the demand for the new product. Each retailerwas provided with infornation about the features of the new toy along with the costand the suggested selling price' Then each retailet'was asked to speciÍy an anticipatedorder quantity.
With pt denoting the population mean older quantity per retail outlet. the sanple datawill be used to conduct the Íbilowin-s two_tailed hypothesis test:
H,,: 1t: 40H,: trt * 10
If 11,, cannot be rejected. Mega Toys will continue its production planning based onthe marketing director's estimate that the population mean order quantity per retailoutlet will be Lt:40 units. However, if 11,, is rejected, Mega Toys will inrmediatelyre-evaluate its production plan for the product. A two-tailed hypothesis test is usedbecause Mega Toys wants to re-evaluate the production plan if the population meanquantity per retail outlet is less than anticipated or greater than anticipated. Becauseno historical data are available (it is a new product), the population mean and thepopulation standard deviation must both be estimated using,r and s from the sampledata.
The sample of 25 retailers provided a trlean of Í : 31 .4 and a standard deviationof s : 1 1.79 units. Before going ahead with the use of the r distribution, the analystconstructed a histogriim of the sample data in order to check on the forrn of the popu-lation distribution. The histograrn of the sample data showed no evidence of skewnessor any extreme outliers, so the analyst concluded that the use of the r distribution withn - | _ 24 degrees of tl"eedom was applopriate. Using equation (9.4) with Í : 3'7 .4^
l-tr: 40, s : 11.79. and n:25, the value of the test statistic is
31.4 - 40 l.l0r1.19 t l2s
Because we have a two-tailed test. the p-value is two times the area under the curr e t'or
the r distribution to the left of t : - 1. 10. Using Table 2 in Appendix B. the / distributitrntable Íbr 24 degrees of freedom provides the following inÍbrmation.
Area in upper tail 0.20 0. t0 005 0.025 00 ::,:
Í- u.'' (t
.s / ^Li
t value (24 dí) l3lB l.7ll 2.464 7.49) ,-:-t. t0
The r distribution table only contains positive r values. Because the r distrrbLrtion issymmetrical.however,wecanÍindtheareaunderthecurretothe righttlit:1.l0anddouble it to find thep-value. We see that /: 1.10 is betueen 0.858 and 1.318. Fromthe 'Area in r-rpper tail' row. we see that the area in the tail to the light of r - l.l0is between 0.20 and 0. 10. Doubling these amounts. \\e see that the 7r-r'alue must bebetween 0'40 and 0'20. With a level of signiÍicance of a : 0.05. \\e no\\'know that the
ourtJ:
POPULATION PROPORTION
24 Joan's Nursery specializes in custom_deslgned landscap ng íor residential areas. The estimated
labour cost associated with a particular landscap,ng Droposa is based on the number ofplantings oítrees, shrubs' and so on to be usec] ic':^e :ro]ect' For cost-esttmating purposes,
managers use two hours oí labour trme íor the p a.. _i c' a medium_slzed tree' Actua] times
lrom a sample oíten plantlngs durlngthe past mo.l_'c o.,, il.nes n hours)'
t.7 1.5 tl ?^ .4 2.3
: lle differs íromWith a 0'O5 level of sign Í'lcance' iesi 10 see
two hours,
a, State the nul and alternat ve hypotheses
b, Compute the samp e mean.
c, Compute the sample standard deviatlon,
d. What is the p-value?
e. What is your conclusion?
).4).2).6
In this section we show how to conduct a hypothesis test about a population proportion z.Using an to denote the hypothesized value for the population proportion, the three formsfor a hypothesis test about a population propofiion are as follows.
H ,,'. 7t - 7T, H ,r'.
1T 3 7T. H u'.
tT - roH ,'. tT I 7T, H,'. tt ) x, H
r'. t * n,,
The first form is called a ]ower tail test' the Second Íbrm is called an upper tail test, andthe third form is called a two-tailed test.
Hypothesis tests about a population proportion are based on the difference betweenthe sample proportion 7r and the hypothesized population proportion .q,. The methodsused to do the hypothesis test are similar to those used Íbr hypothesis tests ahout a popu-lation mean. The only difference is that we use the sample proportion and its standarderror to compute the test statistic. The p-value approach or the critic;rl value approach isthen used to determine whether the null hypothesis shoulcl be rejected.
Let us consider an example involving a situation faced by Aspire gymnasium. Overthe past year, 20 per cent of the users of Aspire were women. In an effort to increase theproportion of women users, Aspire implemented a special promotion designed to attractwomen. One month atter the promotion was implemented, the gym manager requesteda statistical study to determine whether the proportion of women users at Aspire hadincreased. Because the objective of the study is to determine whether the proportion ofwomen users increased, an upper tail test with FI,: 7T> 0.20 is appropriate. The null andalternative hypotheses for the Aspire hypothesis test are as Íbllows:
H,.,: x<0.20H,: x) 0.20
If 11,, can be rejected, the test results will give statistical support for the conclusion thatthe proportion of women users increased and the promotion was beneficial. The
-u1'nrmanager specified that a level of significance of a : 0.05 be used in canying out thishypothesis test.
CHAPTER 9 HYPOTHESIS TESTS
Methods
25 Consider the following hypothesis test:
Ho: tr: 020H: n * 4.70
A samp|e oí4O0 provided a samp|e proportlon p : 0.175.
a. Compute the value of the test statlstic.
b. What is the p-value?
c. At a : 0.05, what is your conclusion?
d. What is the rejection rule using the critical value? What is your conclusion?
26 Consider the following hypothesis test:
Ho: tt >- 0.75
H:r<a.75
A sample of 300 items was se|ected' AÍ a : O'05' compute the p-value and íate yourconclusion íor each oíthe íollowing sample resu|ts'
d. p - U'oÖ
b. p: o.t)c' P : 0'70d. p:0.77
Applications
27 An airline promotion to business travellers is based on the assumption that two_thirds oíbusiness travellers use a laptop computer on overnrght business trips.
a, State the hypotheses that can be used to test the assumption.b' What is the samp|e propor^tion from an American Express sponsored survey that íound
355 of 546 business travellers use a laptop computer on overnight business trips?c. What is the p-value?
d. Use a : 0.05. What is your conclusion?
28 Eagle outfitters is a chain of stores specializing in outdoor clothing and camping gear.They are considering a promotion that involves sending discount coupons to all their creditcard customers by direct mail. This promotion wrll be considered a success iímore thanIO per cent oíthose receiving the coupons use them' Before going natlonwide With thepromotlon' coupons Were sent to a samp|e of |00 credit card cuíomers.
c.
Formulate hypotheses that can be used to test whetherthe population proportion oíthose who will use the coupons is sufflcient to go national.The file 'Eag|e' contains the sample data. Compute a point eíimate oíthe popu|ationproportion.
Use a : 0.05 to conduct your hypothesis test. Should Eagle go natronal with thepromotion?
29 Beíore the |raqi election in January 2005, an Abu Dhabi 'l\lZogby
|nternational poll asked asample oí |raqi adu|ts whether they wou|d prefer an lslamic or a secular government.
CHAPTER 9 HYPOTHESIS TESTS
Test statistic for hypothesis tests about a population mean: íunknowni- u-
L-s /rfi
Test statistic for hypothesis tests about a population proportion
F-ftaz:
Sample size for a one-tailed hypothesis test about a population mean(2,+ zr)1 o2
Uro lt,),
|n a two-tai|ed teí' replace z"with zrr'
/'r1ua|rty Assocrates' a consu|ting Ílrm' advises its clients
\f aOout sampling and statistical procedures that can
be used to control their manufacturing processes. In oneparticular application, a client gave Quality Associates a
samp|e oí B00 observat ons taken during a time in which
that c ent's process Was operat ng sat síactorlly' Thesample standard deviat on for these data was 0,2 I ; hence,
wth so much data, the population standard deviation was
assumed to be 0.2 l. Quality Associates then suggested
that random samples of size 30 be taken periodically tomonrtor the process on an ongo ng basis, By analyzing the
new sarnp es, the client cou d quic<ly learn whether theprocess was operating satisfactorily, When the process
Quality control inspector checking that an electricai transíormer meets snndard
requirements. @ Edward Todd.
was not operating satisfactori y, correct ve actio- - -
taken to elimrnate the problem. The design sp=:-
indicated the mean íor the process should :=
hypothesis test suggested by Quality Associaies '-
H,,,. 1t: 2
H:1t* l)
Corrective action will be taken any time H. is r= : -
Hr íi I the{ntdaloíooe-a[o^ofther^e". __'.'- -
\$&/ conTrol or^oceo'e'
Managerial reportConduct a hypothes s test for each sample a: --.eve of signiÍlcance and determ ne what act c_ ' '
sno,lo be ra(en, Drov de Lhe Lesr statrstic arc .- -
for each test.
Compute the standard deviation for each oi---=
íour samples' Does the assumpton oí 0.2 ] fo _-__.
popu ation standard deviatron appear reasona: :
CorpuLe rrits fo. rhe sa^lp e 'nean X a o-' .
p: 17 such that, as long as a new sample n'=.-
within those limits, the process will be consioe':
CASE PROBLEM QUALITY ASSOC,ATES
'. :reratlng satisíactorly. líX exceeds the upper lmit- ' s below the lower mit, corrective act on wrll be. =- hese imils ar e .ele-red lo as uppe- dnd owe-
-, -,'ol I m ts for quality control purposes.
- - -,ss the impi catlons oí chang ng the level of- .-'cance to a larger va ue. What m stake or.-:_ cou|d increase lthe eve| oísignifcance is
-'=.sed?
Sarnple *
t 1.::1 J.iA! t í':
t l.7sr 1.9S
11.* j{ { il
12.*3
11.*4
l'l 3Í12.13
ll.!]Y
I l.iJ
t! !I
t 1.!3
11.*5
I t.ia12,1il í f :
tl.uLta n.
11.**
t!'3üll.lLa
l l_5í
ll.itt r +c{ l.t-
ll.Jil
$ample ?
1l"s:1 r.nsI t.:9
1 i.5:1 1-g?
t t.?1
I l.ii
J2.'tÍtl.l I
1i.eíÍz.l,I2"LL
14.-J
l4.uu
I t.Y.4
! I :a
1t.!:
I 1.5.
12.S;
r?.111) f s
I ? ?:
1 2.a5
I t.*r12.2r
I l.Jl
1 ?.3?1i >)
Sample 3
11.91
11.:{
1 1"?:
J 1"S5
12.14
It.f!.
1 1.41
1l.sÍ
11.*r
1e.tÉ
r l.d I
12.r.1
1l'5€
I I.-$:
I l.L I
I d. iJC
!{:i
I t.5r
12.1?,
ttar"
1J.g-í'
1 1.*{r-r an
l Í.?iI r.5C
11.3:
I i.3*l 1..É.s
1 1.93
Sample.l1Z.t?12.;2
I2.C:-
t2.!a1?_11
1;.a;
I t.a-
tJ.{:lz.tl
I r.5U
1?.2?
I 'lSSIa fa
12.3:
I /.'v 5
I l.r t
1?.?t1r,79
1i.3?
t!.- í
Íl.L:tt.1TI t.:b
l1-*7'1i..2?
l !.É5
Software Sectionfor Chapter 9
We describe the use of MINITAB to conduct hypothesis tests about a population mea:a population proportion. MINITAB provides both hypothesis testing and interval estirrresults simultaneously, so the routines illustrated here were also used in Chapter 8.
Population mean: o knownWe illustrate using the MaxFlight golf ball distance example in Section 9.1.data are in column C1 of a MINITAB worksheet (file 'GolfTest.MTW' on the ac;panying CD). The population standard deviation o : 12 is assumed known anclevel of significance is a: 0.05. The following steps can be used to test the hr:esis 110: p: 295 versus H,: p * 295.
Step I Stat > Basic Statistics > l-Sample Z
Step 2 Enter C I in the Samples in columns box
It-Sample Z (Test and Confidence lnterval) :.'*Enter 20 in the Standard deviation boxChecl< the Perform Hypothesis Test boxEnter 295 n the Hypothesized mean boxClicl< Options
Step 3 Enter 95 rn the Confidence level box I I -Sample Z - Options : .- *
Select not equal on the Alternative menuClicl< OK
Step 4 Clicl< OK I l-Sample Z (Test and Confidence
In addition to the hypothesis testing results, MINITAB provides a 95 per cent confid;:irinterval for the population mean. The MINITAB output is shown below as Figure 9 iThe procedure can be easily modified for a one-tailed hypothesis test by selecting the ltxlthan or greater than option on the Alternative drop-down menu (Step 3).
Population mean: o unknownThe ratings that 60 business travellers gave for Munich Airport are entered in column 'lof a MINITAB worksheet (file 'AirRating.MTW' on the accompanying CD). The le .:r
of significance for the test is a : 0.05, and the population standard deviation o wil- ru
[Y1ain me- -
tanao,ur,"t,N" "r,." ",t,r^
]|Tq!í\i*AB ou|put for the] lY,axF_tight hypothesls test
kJ:3 GolfTest.MTW
ir-ru-.!,a- ple Z: Metres
..,rrrated by the sample standard deviation s. The following steps can be used to test the.:othesis H,,: [ts 7 against H,: p> 7.
Step I Stat > Basic Statistics > l-Sample t lYain menu barl
Step 2 Enter C I n the Samples in columns box
| -Sample t (Test and Confidence lnterval) pane lCheck the Perform Hypothesis Test boxEnter 7 in the Hypothesized mean boxClick Options
Step 3 Enter 95 in tbe Confidence level boxSelect greater than on the Alternative menuClic< OK
! -Sample t - Options panell
Step 4 Clicl< OK [-Sample t (Test and Confidence lnterval) panel]
The MINITAB results are shown below in Figure 9.13. The Munich Airport ratingstudy involved a 'greater than' alternative hypothesis. The preceding steps can be easilymodified for other hypothesis tests by selecting the less than or not equal options on theAlternative drop-down menu (Step 3).
Population proportionWe illustrate using the Aspire gymnasium example in Section 9.5. The dataresponses Female and Male are in column Cl of a MINITAB worksheet'WomenGym' on the accompanying CD). MINITAB uses an alphabetical orderir:.the responses and selects the second respon.\e for the population proportion of i: ,
est. In this example' MINITAB by deÍault uSeS the ordering Female-Male and g' ,
results for the population proportion of Male responses. Because Female i: '
response of interest, we change MINITAB's ordering as follows. Select any cell ir. '
column and use the sequence:
Step ! Editor > Column > Value Order ff4ain mer, -
Step 2 Choose User-specified order IValue Order for Cl (Gym User) :.'Enterthe responses Male Female in the Define-an-order (one value perIine) boxClicl< OK
Then proceed as Íbllows to test the hypothesis Hr: lt < 0'2 a-eainstMINITAB results are shown in Figure 9.14.
Step 3 Stat > Basic Statistics > I Proportion
Step 4
Step 3
H,: tt > 0.2. -
[Main mer, :
Enter C I in the Samples in columns box
! Proportion (Test and Confidence lnterval) p.Check the Perform Hypothesis Test boxEnter 0.20 in the Test proportion boxSe ect Options
Check Use test and interval based on normal distribution! Proportion - Options :.
Enter 95 in the Confidence Level boxSelect greater than on the Alternative menu
Step 4 Click OK [! Proportion (Test and Confidence !nteryal) pa^e
EXCEL does not provide inbuilt routines for the hypothesis tests presented in thischapter. To handle these situations, we present EXCEL worksheets that we designed totest hypotheses about a population mean and a population proportion. The worksheets areeasy to use and can be modified to handle any sample data. The worksheets are availableon the CD that accompanies this book.
Population mean: d knownWe illustrate using the MaxFlight golf ball distance example in Section 9.3. The data arein column A of an EXCEL worksheet. The population standard deviation o : 12 isassumed known and the level of significance ts a, : 0.05. The following steps can beused to test the hypothesis Hn: LL:295 versus H,: trt * 295. Refer to Figure 9.15 as wedescribe the procedure. The data are entered into cells A2:A5 l. The following steps arenecessary to use the template for this data set.
Step I Enter the data range A2:A5 I into the :
Step 2 Enter the data range A2:A5 I into the :
CoUNT cell íormula in cell D4
AVERAGE ceil formu a rn cell D5
Step 3 Enterthe population standard deviation o: 12 into cell Dó
Step 4 Enterthe hypothesized value forthe population mean 295 into cell D8
The remaining cell formulae automatically provide the standard error, the value ofthe test statistic z, and three p-values. Because the alternatíve hypothesis (pn * 295)indicates a two-tailed test, the p-value (Two Tail) in cell D15 is used to make therejection decision. With 7z-value : 0.1255 > d: 0.05, the null hypothesis cannot berejected. Thep-values in cells D13 or D14 would be used if the hypotheses involveda one-tailed test.
This template can be used to do hypothesis test computations for other applications.For example, to conduct a hypothesis test for a new data set, enter the new sampledata into column A of the worksheet. Modify the formulas in cells D4 and D5 to cor-respond to the new data range. Enter the population standard deviation into cell D6and the hypothesized value for the population mean into cell D8 to obtain the results.If the new sample data have already been summarized, the new sample data do nothave to be entered into the worksheet. In this case, enter the sample size into cell D'1.the sample mean into cell D5, the population standard deviation into cell D6, and thehypothesized value for the population mean into cell D8 to obtain the results. Theworksheet in Figure 9.15 is available in the file Hyp Sigma Known on the CD thataccompanies this book.
:HAPTER 9 HYPOTHESIS TESTS
Figure 9.1 5 EXCEL worksheet for hypothesis tests about a population mean with o known
1
3
J
i6
7
q
.i0
1.1
tz
le16
. 1,a,
.ü51
52
f,4etres
303
28S
31?
29i304
-1 t!
293
290
304
2S0
31 1
305
303
301
292
AbleÜes
303
2AS
2$8
243
317
30s
317
293
zó1
290
30.1
2gÜ
311
3Ü5
292
301
292
.,Y...Hypothesis Test AbÖut a Popu!alion lllean
líy'ith o Known
Sample Size =COUllTr.A2;45 1 I
Sample Mean =AVERAGEiA2:Ai1 iPopulation std' DeviaÍion 12
Hypothesized Value 29ö
Standard Error =D6,SORTiDJ:T€st Statistic 2 =iD6-D8jlD10
Population mean: o unknownWe illustrate using the Munich Airport rating example in Section 9.4. Th.. - "
entered into cells 42:,46l of an EXCEL worksheet. The population standarc -,o is unknown and will be estimated by the sample standard deviation s. T. .
significance is u - 0.05. The following steps are necessary to use the temp..',data set, to test the hypothesis H,,: Lt = 7 versus H,: pt> 1.
Step l Enter the data range A2:Aó I into the : COUNT ce íormuLa in ce -- -
Step 2 Enterthe data range A2:Aól lntothe : AVERAGE cell lormula n C. _
Step 3 Enterthe data range A2:Aól into the : STDEV cell lormula in ce . ]:
Step 4 Enterthe hypothesized value íorthe population mean 7 into cell D3
The remaining cell formulae automatically provide the standard error, the value . ," : ,: ,,
tic l, the number of degrees of freedom, and three p-values. Because the alten.- ,
Ut > 1) indicates an upper tail test, thep-value (Upper Tail) in cellD I5 is used . -' -sion. With p-value : 0.0353 < a: 0.05, the null hypothesis is rejected. Thc -- - r '
Dl4 or Dl6 would be used if the hypotheses involved a lower tail test or a n,. - *This template can be used to do hypothesis test computations for other :.:
instance, to conduct a hypothesis test for a new data set, enter the neu . -:column A of the worksheet and modify the formulae in cells D4, D5, and l'-to the new data range. Enter the hypothesized value for the population mi:-obtain the results. If the new sample data have already been summanze -. - . L il
data do not have to be entered into the worksheet. In this case. enter th; ',-.- ,,ir
lrp9]llllLslEsr NG usr NG ;.: s,'.
cell D4. the sample mean into cell D5, the sample standard deviation into cell D6. and ::.-hr pothesized value for the population mean into cell D8 to obtain the results. The ',r orkshe e
is available in the Í]le Hyp Sigma Unknown on the CD that accompanies this book.
Population proportionWe illustrate using the Aspire gymnasium survey data presented in Section 9.5. Thelevel of significance is a: 0.05. The data of Male or Female user are in column A ofan EXCEL worksheet. The data are entered into cells A2:A401. The followin-e stepscan be used to test the hypothesis H,,'. 7t= 0.20 versus H,: tt) 0.20.
Step l Enterthe data range A2:A40 I into the : CoUNTA cell íormuia n ce]l D3
Step 2 Enter Female as the responSe oí interest in cell D4
Step 3 Enterthe data range A2:A40 I into the : COUNTIF cell formula ln cell D5
Step 4 Enterthe hypothesized value forthe population proportion 0.20 into cel D8
The remaining cell formulae automatically provide the standard error, the value ofthe test statistic ;, and three p-values. Because the alternative hypothesis (z > 0.20)indicates an upper tail test, thep-value (Upper Tail) in cel1 D14 is used to make thedecision. With 7r-value : 0.0062p-values in cells D13 or D15 would be used if the hypothesis involved a lower tailtest or a two-tailed test.
This template can be used to do hypothesis test computations for other applications.For instance, to conduct a hypothesis test for a new data set, enter the new sample datainto column A of the worksheet. Modify the formulae in cells D3 and D5 to correspondto the new data range. Enter the response of interest into cell D4 and the hypothesizedvalue for the population proportion into cell D8 to obtain the results. If the new sampledata have already been summarized, the new sample data do not have to be entered intothe worksheet. In this case, enter the sample size into cell D3, the sample proportioninto cell D6, and the hypothesized value for the population proportion into cell D8 toobtain the results. There is a worksheet available in the file 'Hypothesis p' on the CDthat accompanies this book.
We describe the use of PASW to construct a hypothesis test for a population mean in the
ounknown condition. There are no inbuilt routines in PASW for the oknown condition,nor for a population proponion.
Population mean: o unknownThe One-Sample T Test routine in PASW constructs both a confidence interval and a
hypothesis test.
Step ! Analyze ) Compare Means > One-Sample T Test lYain menu ba
CHAPTER 9 HYPOTHESIS TESTS
Step 2 Transfer the Rating vadable to the Test Variable(s) boxfOne-Sample T Test par--
Enter 7 in the Test Value boxCllcl< OK
The routine was illustrated in Chapter 8 using the credit card balance data for ;sample of 85 households. The PASW results were displayed in Figure 8.10. Simil;.:results are shown here in Figure 9.16 for the Munich Airport ratings, which ar;in the first column of the PASW data file ('AirRating.SAV' on the accompanr-ing CD). The PASW routine constructs a two-tailed test. The p-value for a one-tailed test can be computed as half the two-tailed p-value shown in the outpu:0.o1Il2: 0.035.
Figure 9.ló PASW output íor the Yunich Airport rating hypothesis test