Top Banner
1 Introduction 1 CHAPTER OBJECTIVE To identify the role of statistics in the analysis of data from engineering and the sciences CONTENTS 1.1 Statistics: The Science of Data 1.2 Fundamental Elements of Statistics 1.3 Types of Data 1.4 The Role of Statistics in Critical Thinking 1.5 A Guide to Statistical Methods Presented in This Text STATISTICS IN ACTION Contamination of Fish in the Tennessee River: Collecting the Data MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 1 MARKED SET
12

Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

Jul 07, 2018

Download

Documents

nguyennhu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

1

Introduction

1CHAPTER

OBJECTIVETo identify the role of statistics in the analysis of data fromengineering and the sciences

CONTENTS

1.1 Statistics: The Science of Data

1.2 Fundamental Elements of Statistics

1.3 Types of Data

1.4 The Role of Statistics in Critical Thinking

1.5 A Guide to Statistical Methods Presented in This Text

STATISTICS IN ACTIONContamination of Fish in the Tennessee River: Collectingthe Data

•••

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 1

MARKED SET

Page 2: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

2 Chapter 1 Introduction

1.1 Statistics: The Science of Data

According to The Random House College Dictionary, statistics is “the science thatdeals with the collection, classification, analysis, and interpretation of numerical factsor data.” In short, statistics is the science of data.

Definition 1.1Statistics is the science of data. This involves collecting, classifying, summarizing,organizing, analyzing, and interpreting data.

The science of statistics is commonly applied to two types of problems:

1. Summarizing, describing, and exploring data

2. Using sample data to infer the nature of the data set from which the sample wasselected

As an illustration of the descriptive applications of statistics, consider the UnitedStates census, which involves the collection of a data set that purports to characterizethe socioeconomic characteristics of the approximately 295 million people living in theUnited States. Managing this enormous mass of data is a problem for the computersoftware engineer, and describing the data utilizes the methods of statistics. Similarly,an environmental engineer uses statistics to describe the data set consisting of the dailyemissions of sulfur oxides of an industrial plant recorded for 365 days last year. Thebranch of statistics devoted to these applications is called descriptive statistics.

Definition 1.2The branch of statistics devoted to the organization, summarization, and de-scription of data sets is called descriptive statistics.

Sometimes the phenomenon of interest is characterized by a data set that is eitherphysically unobtainable or too costly or time-consuming to obtain. In such situations,we obtain a subset of the data—called a sample—and use the sample information toinfer its nature. To illustrate, suppose the phenomenon of interest is the drinking-waterquality on an inhabited, but remote, Pacific island. You might expect water quality todepend on such factors as temperature of the water, the level of the most recent rain-fall, etc. In fact, if you were to measure the water quality repeatedly within the samehour at the same location, the quality measurements would vary, even for the samewater temperature. Thus, the phenomenon “drinking-water quality” is characterizedby a large data set that consists of many (actually, an infinite number of) water qualitymeasurements—a data set that exists only conceptually. To determine the nature ofthis data set, we sample it—i.e., we record quality for n water specimens collected atspecified times and locations, and then use this sample of n quality measurements toinfer the nature of the large conceptual data set of interest. The branch of statisticsused to solve this problem is called inferential statistics.

Definition 1.3The branch of statistics concerned with using sample data to make an inferenceabout a large set of data is called inferential statistics.

1.2 Fundamental Elements of Statistics

In statistical terminology, the data set that we want to describe, the one that character-izes a phenomenon of interest to us, is called a population. Then, we can define asample as a subset of data selected from a population. Sometimes, the wordspopulation and sample are used to represent the objects upon which the measurements

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 2

MARKED SET

Page 3: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

1.2 Fundamental Elements of Statistics 3

are taken (i.e., the experimental units). In a particular study, the meaning attached tothese terms will be clear by the context in which they are used.

Definition 1.4A statistical population is a data set (usually large, sometimes conceptual) thatis our target of interest.

Definition 1.5A sample is a subset of data selected from the target population.

Definition 1.6The object (e.g., person, thing, transaction, specimen, or event) upon whichmeasurements are collected is called the experimental unit. (Note: A popula-tion consists of data collected on many experimental units.)

In studying populations and samples, we focus on one or more characteristics orproperties of the experimental units in the population. The science of statistics refersto these characteristics as variables. For example, in the drinking-water quality study,two variables of interest to engineers are the chlorine-residual (measured in parts permillion) and the number of fecal coliforms in a 100-milliliter water specimen.

Definition 1.7A variable is a characteristic or property of an individual experimental unit.

Example 1.1 Engineers with the University of Kentucky Transportation Research Programhave collected data on accidents occurring at intersections in Lexington, Ken-tucky. One of the goals of the study was to estimate the rate at which left-turnaccidents occur at intersections without left-turn-only lanes. This estimate willbe used to develop numerical warrants (or guidelines) for the installation of left-turn lanes at all major Lexington intersections. The engineers collected data ateach of 50 intersections without left-turn-only lanes over a 1-year period. Ateach intersection, they monitored traffic and recorded the total number of carsturning left that were involved in an accident.a. Identify the variable and experimental unit for this study.

b. Describe the target population and the sample.

c. What inference do the transportation engineers want to make?

Solution a. Since the engineers collected data at each of 50 intersections, the experimental unitis an intersection without a left-turn-only lane. The variable measured is the totalnumber of cars turning left that were involved in an accident.

b. The goal of the study is to develop guidelines for the installation of left-turn lanesat all major Lexington intersections; consequently, the target population consists ofall major intersections in the city. The sample consists of the subset of 25 intersec-tions monitored by the engineers.

c. The engineers will use the sample data to estimate the rate at which left-turn acci-dents occur at all major Lexington intersections. (We learn, in Chapter 7, that thisestimate is the number of left-turn accidents in the sample divided by the totalnumber of cars making left turns in the sample.)

The preceding definitions and example identify four of the five elements of an in-ferential statistical problem: a population, one or more variables of interest, a sample,and an inference. The fifth element pertains to knowing how good the inference is—that is, the reliability of the inference. The measure of reliability that accompanies aninference separates the science of statistics from the art of fortune-telling. A palm

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 3

MARKED SET

Page 4: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

4 Chapter 1 Introduction

reader, like a statistician, may examine a sample (your hand) and make inferencesabout the population (your future life). However, unlike statistical inferences, thepalm reader’s inferences include no measure of how likely the inference is to be true.

To illustrate, consider the transportation engineers’ estimate of the left-turn acci-dent rate at Lexington, Kentucky, intersections in Example 1.1. The engineers are in-terested in the error of estimation (i.e., the difference between the sample accident rateand the accident rate for the target population). Using statistical methods, we can de-termine a bound on the estimation error. This bound is simply a number (e.g., 10%)that our estimation error is not likely to exceed. In later chapters, we learn that thisbound is used to help measure our “confidence” in the inference. The reliability of sta-tistical inferences is discussed throughout this text. For now, simply realize that an in-ference is incomplete without a measure of reliability.

Definition 1.8A measure of reliability is a statement (usually quantified) about the degree ofuncertainty associated with a statistical inference.

A summary of the elements of both descriptive and inferential statistical problemsis given in the following boxes.

Four Elements of Descriptive Statistical Problems1. The population or sample of interest

2. One or more variables (characteristics of the population or sample units) that areto be investigated

3. Tables, graphs, or numerical summary tools

4. Identification of patterns in the data

Five Elements of Inferential Statistical Problems1. The population of interest

2. One or more variables (characteristics of the experimental units) that are to be investigated

3. The sample of experimental units

4. The inference about the population based on information contained in the sample

5. A measure of reliability for the inference

Applied Exercises1.1 Steel anticorrosion study. Researchers at the Department

of Materials Science and Engineering, National TechnicalUniversity (Athens, Greece), examined the anticorrosivebehavior of different epoxy coatings on steel. (Pigment &Resin Technology, Vol. 32, 2003.) Flat panels cut fromsteel sheets were coated with one of four different types ofepoxy (S1, S2, S3, and S4). After exposing the panels towater for one day, the corrosion rate (nanoamperes persquare centimeter) was determined for each panel.

a. What are the experimental units for the study?b. Suppose you are interested in describing only the cor-

rosion rates of steel panels coated with epoxy type S1.Define the target population and relevant sample.

1.2 Success/failure of software reuse. The PROMISE Soft-ware Engineering Repository, hosted by the University

of Ottawa, is a collection of publicly available data setsto serve researchers in building prediction software mod-els. A PROMISE data set on software reuse, saved in theSWREUSE file, provides information on the success orfailure of reusing previously developed software for eachproject in a sample of 24 new software developmentprojects. (Data source: IEEE Transactions on SoftwareEngineering, Vol. 28, 2002.) Of the 24 projects, 9 werejudged failures and 15 were successfully implemented.

a. Identify the experimental units for this study.b. Describe the population from which the sample is

selected.c. Use the sample information to make an inference about

the population.

1.3 Orchard contamination from insecticides. Pesticides ap-plied to an extensively grown crop can result in inadvertent

SWREUSE

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 4

MARKED SET

Page 5: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

1.3 Types of Data 5

COGAS

ambient air contamination. Environmental Science & Tech-nology (Oct. 1993) reported on thion residues of the insec-ticide chlorpyrifos used on dormant orchards in the SanJoaquin Valley, California. Ambient air specimens werecollected daily at an orchard site during an intensive periodof spraying—a total of 13 days—and the thion level(ng/m3) was measured each day.

a. Identify the population of interest to the researchers.b. Identify the sample.

1.4 Ground motion of earthquakes. In the Journal of Earth-quake Engineering (Nov., 2004), a team of civil andenvironmental engineers studied the ground motion char-acteristics of 15 earthquakes that occurred around theworld between 1940 and 1995. Three (of many) variablesmeasured on each earthquake were the type of groundmotion (short, long, or forward directive), earthquakemagnitude (Richter scale) and peak ground acceleration(feet per second). One of the goals of the study was to es-timate the inelastic spectra of any ground motion cycle.

a. Identify the experimental units for this study.b. Do the data for the 15 earthquakes represent a popula-

tion or a sample? Explain.

1.5 Computer power system load currents. Electrical engi-neers recognize that high neutral current in computerpower systems is a potential problem. To determine theextent of the problem, a survey of the computer powersystem load currents at 146 U.S. sites was taken (IEEETransactions on Industry Applications, July/Aug. 1990).The survey revealed that less than 10% of the sites hadhigh neutral to full-load current ratios.

a. Identify the population of interest.b. Identify the sample.c. Use the sample information to make an inference about

the population.

1.6 Precooling vegetables. Researchers have developed a newprecooling method for preparing Florida vegetables for

market. The system employs an air and water mixture de-signed to yield effective cooling with a much lower waterflow than conventional hydrocooling. To compare the effec-tiveness of the two systems, 20 batches of green tomatoeswere divided into two groups; one group was precooledwith the new method, and the other with the conventionalmethod. The water flow (in gallons) required to effectivelycool each batch was recorded.

a. Identify the population, the samples, and the type ofstatistical inference to be made for this problem.

b. How could the sample data be used to compare thecooling effectiveness of the two systems?

1.7 Weekly carbon monoxide data. The World Data Centre forGreenhouse Gases collects and archives data for green-house and related gases in the atmosphere. One such dataset lists the level of carbon monoxide gas (measured inparts per billion) in the atmosphere each week at the ColdBay, Alaska, weather station. The weekly data for theyears 2000–2002 are saved in the COGAS file.

a. Identify the variable measured and the correspondingexperimental unit.

b. If you are interested in describing only the weekly car-bon monoxide values at Cold Bay station for the years2000–2002, does the data represent a population or asample? Explain.

1.8 Monitoring defective items. Checking all manufactureditems coming off an assembly line for defectives would bea costly and time-consuming procedure. One effectiveand economical method of checking for defectives in-volves the selection and examination of a portion of theitems by a quality control engineer. The percentage of ex-amined items that are defective is computed and then usedto estimate the percentage of all items manufactured onthe line that are defective. Identify the population, thesample, and a type of statistical inference to be made forthis problem.

*A finer breakdown of data types into nominal, ordinal, interval, and ratio data is possible. Nominal data arequalitative data with categories that cannot be meaningfully ordered. Ordinal data are also qualitative data,but a distinct ranking of the groups from high to low exists. Interval and ratio data are two different typesof quantitative data. For most statistical applications (and all the methods presented in this introductorytext), it is sufficient to classify data as either quantitative or qualitative.

1.3 Types of Data

Data can be one of two types, quantitative or qualitative. Quantitative data are thosethat represent the quantity or amount of something, measured on a numerical scale.For example, the power frequency (measured in megahertz) of a semiconductor is aquantitative variable, as is the breaking strength (measured in pounds per square inch)of steel pipe. In contrast, qualitative (or categorical) data possess no quantitative in-terpretation. They can only be classified. The set of n occupations corresponding to agroup of n engineering graduates is a qualitative data set. The type of pigment (zinc ormica) used in an anticorrosion epoxy coating also represents qualitative data.*

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 5

MARKED SET

Page 6: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

6 Chapter 1 Introduction

Definition 1.9Quantitative data are those that represent the quantity or amount ofsomething.

Definition 1.10Qualitative data are those that have no quantitative interpretation, i.e., theycan only be classified into categories.

Example 1.2 The Journal of Performance of Constructed Facilities (Feb., 1990) reported onthe performance dimensions of water distribution networks in the Philadelphiaarea. For one part of the study, the following variables were measured for eachsampled water pipe section. Identify the data produced by each as quantitativeor qualitative.a. Pipe diameter (measured in inches)

b. Pipe material (steel or PVC)

c. Pipe location (Center City or suburbs)

d. Pipe length (measured in feet)

Solution Both pipe diameter (in inches) and pipe length (in feet) are measured on a meaningfulnumerical scale; hence, these two variables produce quantitative data. Both type ofpipe material and pipe location can only be classified—material is either steel or PVC;location is either Center City or the suburbs. Consequently, pipe material and pipe lo-cation are both qualitative variables.

The proper statistical tool used to describe and analyze data will depend onthe type of data. Consequently, it is important to differentiate between quantitativeand qualitative data.

Applied Exercises1.9 Drinking-water quality study. Disasters (Vol. 28, 2004) pub-

lished a study of the effects of a tropical cyclone on thequality of drinking water on a remote Pacific island. Watersamples (size 500 milliliters) were collected approximately4 weeks after Cyclone Ami hit the island. The followingvariables were recorded for each water sample. Identifyeach variable as quantitative or qualitative.

a. Town where sample was collectedb. Type of water supply (river intake, stream, or borehole)c. Acidic level (pH scale, 1 to 14)d. Turbidity level (nephalometric turbidity units = NTUs)e. Temperature (degrees Centigrade)f. Number of fecal coliforms per 100 millilitersg. Free chlorine-residual (milligrams per liter)h. Presence of hydrogen sulphide (yes or no)

1.10 Extinct New Zealand birds. Environmental engineers at theUniversity of California (Riverside) are studying the pat-terns of extinction in the New Zealand bird population.(Evolutionary Ecology Research, July 2003.) The follow-ing characteristics were determined for each bird speciesthat inhabited New Zealand at the time of the Maori colo-nization (i.e., prior to European Contact).

a. Flight capability (volant or flightless)b. Habitat type (aquatic, ground terrestrial, or aerial

terrestrial)c. Nesting site (ground, cavity within ground, tree, cavity

above ground)d. Nest density (high or low)e. Diet (fish, vertebrates, vegetables, or invertebrates)f. Body mass (grams)g. Egg length (millimeters)h. Extinct status (extinct, absent from island, present)

1.11 Computer power system load currents. Refer to the IEEETransactions on Industry Applications (July/Aug. 1990)survey of computer power system load currents in Exercise1.5. In addition to the ratio of neutral current to full-loadcurrent, the researchers also recorded the type of load (line-to-line or line-to-neutral) and the computer system vendor.Identify the type of data for each variable recorded.

1.12 CT scanning for lung cancer. A new type of screening forlung cancer, computed tomography (CT), has been devel-oped. Medical physicists believe CT scans are more sensi-tive than regular X-rays in pinpointing small tumors. TheH. Lee Moffitt Cancer Center at the University of South

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 6

MARKED SET

Page 7: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

1.4 The Role of Statistics in Critical Thinking 7

Florida is currently conducting a clinical trial of 50,000smokers nationwide to compare the effectiveness of CTscans with X-rays for detecting lung cancer. (Todays’ To-morrows, Fall 2002.) Each participating smoker is ran-domly assigned to one of two screening methods, CT orchest X-ray, and their progress tracked over time. In addi-tion to the type of screening method used, the physicistsrecorded the age at which the scanning method first de-tects a tumor for each smoker.

a. Identify the experimental units of the study.b. Identify the two variables measured for each experi-

mental unit.c. Identify the type (quantitative or qualitative) of the

variables measured.d. What is the inference that will ultimately be drawn

from the clinical trial?

1.13 Deep hole drilling. “Deep hole” drilling is a family ofdrilling processes used when the ratio of hole depth tohole diameter exceeds 10. Successful deep hole drillingdepends on the satisfactory discharge of the drill chip. Anexperiment was conducted to investigate the performanceof deep hole drilling when chip congestion exists (Journalof Engineering for Industry, May 1993). Some important

variables in the drilling process are described here. Identifythe data type for each variable.

a. Chip discharge rate (number of chips discarded perminute)

b. Drilling depth (millimeters)c. Oil velocity (millimeters per second)d. Type of drilling (single-edge, BTA, or ejector)e. Quality of hole surface

1.14 National Bridge Inventory. All highway bridges in theUnited States are inspected periodically for structural defi-ciency by the Federal Highway Administration (FHWA).Data from the FHWA inspections are compiled into theNational Bridge Inventory (NBI). Several of the nearly100 variables maintained by the NBI are listed below.Classify each variable as quantitative or qualitative.

a. Length of maximum span (feet)b. Number of vehicle lanesc. Toll bridge (yes or no)d. Average daily traffice. Condition of deck (good, fair, or poor)f. Bypass or detour length (miles)g. Route type (interstate, U.S., state, county, or city)

1.4 The Role of Statistics in Critical Thinking

Experimental research in engineering and the sciences typically involves the use ofexperimental data—a sample—to infer the nature of some conceptual population thatcharacterizes a phenomenon of interest to the experimenter. This inferential process isan integral part of the scientific method. Inference based on experimental data is firstused to develop a theory about some phenomenon. Then the theory is tested againstadditional sample data.

How does the science of statistics contribute to this process? To answer this ques-tion, we must note that inferences based on sample data will almost always be subjectto error, because a sample will not provide an exact image of the population. The na-ture of the information provided by a sample depends on the particular sample chosenand thus will change from sample to sample. For example, suppose you want to esti-mate the proportion of all steel alloy failures at U.S. petrochemical plants caused bystress corrosion cracking. You investigate the cause of failure for a sample of 100 steelalloy failures and find that 47 were caused by stress corrosion cracking. Does thismean that exactly 47% of all steel alloy failures at petrochemical plants are caused bystress corrosion cracking? Of course, the answer is “no.” Suppose that, unknown toyou, the true percentage of steel alloy failures caused by stress corrosion cracking is44%. One sample of 100 failures might yield 47 that were caused by cracking, whereasanother sample of 100 might yield only 42. Thus, an inference based on sampling isalways subject to uncertainty.

On the other hand, suppose one petrochemical plant experienced a steel alloy fail-ure rate of 81%. Is this an unusually high failure rate, given the sample rate of 47%?The theory of statistics uses probability to measure the uncertainty associated with aninference. It enables engineers and scientists to calculate the probabilities of observingspecific samples or data measurements, under specific assumptions about the popula-tion. These probabilities are used to evaluate the uncertainties associated with sampleinferences; for example, we can determine whether the plant’s steel alloy failure rate

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 7

MARKED SET

Page 8: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

8 Chapter 1 Introduction

Descriptive

Chapters10, 11, 12

Section 15.7

Modelrelationships

Onemean

Twomeans

Three or more means

Onevariance

Twovariances

VariancesMeans

InferentialStudy

ReliabilityProbability

Chapters3, 4, 5, 6

Chapter17

Qualitative Quantitative Quantitative QualitativeData

Section2.1

Sections2.2–2.7

Parameter

Data

Parameter

Proportions

Oneproportion

Twoproportions

Three ormore

proportions

Chapter 9Sections7.10, 8.12

Sections7.9, 8.11

16.6

Sections7.12, 8.14

Sections7.11, 8.13,

16.4

Chapter 14,Sections15.5, 15.6

Sections7.7, 7.8,8.9, 8.1015.3, 15.4

Sections7.6, 8.7,

15.2, 16.3

FIGURE 1.1Flowchart of statistical methods described in the text

of 81% is unusually high by calculating the chance of observing such a high rate giventhe sample information.

Thus, a major contribution of statistics is that it enables engineers and scientists tomake inferences—estimates and decisions about the target population—with a knownmeasure of reliability. With this ability, an engineer can make intelligent decisions and in-ferences from data; that is, statistics helps engineers to think critically about their results.

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 8

MARKED SET

Page 9: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

Quick Review 9

Definition 1.11Statistical thinking involves applying rational thought and the science of sta-tistics to critically assess data and inferences.

1.5 A Guide to Statistical Methods Presented in This Text

Although we present some useful methods for exploring and describing data sets(Chapter 2), the major emphasis in this text and in modern statistics is in the area of in-ferential statistics. The flowchart in Figure 1.1 is provided as an outline of the chaptersin this text and as a guide to selecting the statistical method appropriate for your par-ticular analysis.

Quick ReviewKey TermsData

Descriptive statistics

Experimental unit

Inference

Inferential statistics

Measure of reliability

Measurement error

Population

Qualitative data

Quantitative data

Reliability

Sample

Statistical thinking

Statistics

Variable

Chapter Summary Notes• Two types of statistical applications: descriptive and inferential

• Fundamental elements of statistics: population, experimental units, variable, sample, inference, measure of reliability

• Descriptive statistics involves summarizing and describing data sets.

• Inferential statistics involves using a sample to make inferences about a population.

• Two types of data: quantitative and qualitative

Supplementary Exercises1.15 Reliability of a computer system. The reliability of a com-

puter system is measured in terms of the lifelength of aspecified hardware component (e.g., the hard disk drive).To estimate the reliability of a particular system, 100 com-puter components are tested until they fail, and their life-lengths are recorded.

a. What is the population of interest?b. What is the sample?c. Are the data quantitative or qualitative?d. How could the sample information be used to estimate

the reliability of the computer system?

1.16 Traveling turtle hatchlings. Hundreds of sea turtle hatch-lings, instinctively following the bright lights of condomini-ums, wandered to their deaths across a coastal highway inFlorida (Tampa Tribune, Sept. 16, 1990). This incidentled researchers to begin experimenting with special low-pressure sodium lights. One night, 60 turtle hatchlings werereleased on a dark beach and their direction of travel noted.The next night, the special lights were installed and thesame 60 hatchlings were released. Finally, on the thirdnight, tar paper was placed over the sodium lights. Conse-

quently, the direction of travel was recorded for eachhatchling under three experimental conditions—darkness,sodium lights, and sodium lights covered with tar paper.

a. Identify the population of interest to the researchers.b. Identify the sample.c. What type of data were collected, quantitative or

qualitative?

1.17 Acid neutralizer experiment. A chemical engineer con-ducts an experiment to determine the amount of hy-drochloric acid necessary to neutralize 2 milliliters (ml) ofa newly developed cleaning solution. The chemist pre-pares five 2-ml portions of the solution and adds a knownconcentration of hydrochloric acid to each. The amount ofacid necessary to achieve neutrality of the solution is de-termined for each of the five portions.

a. Identify the experimental units for the study.b. Identify the variable measured.c. Describe the population of interest to the chemical

engineer.d. Describe the sample.

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 9

MARKED SET

Page 10: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

10 Chapter 1 Introduction

1.18 PCB in soil. A preliminary study was conducted to obtaininformation on the background levels of the toxic sub-stance polychlorinated biphenyl (PCB) in soil samples inthe United Kingdom (Chemosphere, Feb. 1986). For eachsoil sample taken, the researchers recorded the location(rural or urban) and the PCB level (measured in grams perkilogram of soil). Identify the variables measured as quan-titative or qualitative.

1.19 Intellectual development of engineering students. Perry’smodel of intellectual development was applied to under-graduate engineering students at Penn State (Journal ofEngineering Education, Jan. 2005). Perry scores (rangingfrom 1 to 5) were determined for 21 students in a first-year, project-based design course. (Note: A Perry score of1 indicates the lowest level of intellectual development,and a Perry score of 5 indicates the highest level.) The av-erage Perry score for the 21 students was 3.27.

a. Identify the experimental units for this study.b. What is the population of interest? The sample?c. What type of data, quantitative or qualitative, are

collected?d. Use the sample information to make an inference about

the population.

1.20 Type of data. State whether each of the following data setsis quantitative or qualitative.

a. Arrival times of 16 reflected seismic wavesb. Types of computer software used in a database man-

agement systemc. Brands of calculator used by 100 engineering students

on campusd. Ash contents in pieces of coal from three different minese. Mileages attained by 12 automobiles powered by alcoholf. Numbers of print characters per line of computer out-

put for 20 line printersg. Shift supervisors in charge of computer operations at

an airline companyh. Accident rates at 46 machine shops

1.21 Structurally deficient bridges. Refer to Exercise 1.14. Themost recent NBI data were analyzed, and the results pub-lished in the Journal of Infrastructure Systems (June 1995).Using the FHWA inspection ratings, each of the 470,515highway bridges in the United States was categorized asstructurally deficient, functionally obsolete, or safe. About26% of the bridges were found to be structurally deficient,and 19% were functionally obsolete.

a. What is the variable of interest to the researchers?b. Is the variable of part a quantitative or qualitative?c. Is the data set analyzed a population or a sample?

Explain.d. How did the researchers obtain the data for their study?

FISHDDT

STATISTICS IN ACTIONContamination of Fish in the Tennessee River: Collecting the Data

Chemical and manufacturing plants often discharge toxic waste materials into nearby rivers andstreams. These toxicants have a detrimental effect on the plant and animal life inhabiting the riverand the river’s bank. One type of pollutant, commonly known as DDT, is especially harmful to fish and,

indirectly, to people. The Food and Drug Administration sets the limit for DDT content in individual fish at 5parts per million (ppm). Fish with DDT content exceeding this limit are considered potentially hazardous topeople if consumed. A study was undertaken to examine the DDT content of fish inhabiting the TennesseeRiver (in Alabama) and its tributaries.

The Tennessee River flows in a west–east direction across the northern part of the state of Alabama,through Wheeler Reservoir, a national wildlife refuge. Ecologists fear that contaminated fish migrating fromthe mouth of the river to the reservoir could endanger other wildlife that prey on the fish. This concernis more than academic. A manufacturing plant was once located along Indian Creek, which enters theTennessee River 321 miles upstream from the mouth. Although the plant has been inactive for over10 years, there is evidence that the plant discharged toxic materials into the creek, contaminating all thefish in the immediate area. Have the fish in the Tennessee River and its tributary creeks also been contam-inated? And if so, how far upstream have the contaminated fish migrated? To answer these and other ques-tions, members of the U.S. Army Corps of Engineers collected fish specimens at different locations alongthe Tennessee River and three tributary creeks: Flint Creek (which enters the river 309 miles upstream fromthe river’s mouth), Limestone Creek (310 miles upstream), and Spring Creek (282 miles upstream). Each fishwas first weighed (in grams) and measured (length in centimeters), then the fillet of the fish was extractedand the DDT concentration (in parts per million) in the fillet was measured.

The FISHDDT file contains the length, weight, and DDT measurements for a total of 144 fish specimens.Obviously, not all the fish in the Tennessee River and its tributaries were captured. Consequently, the dataare based on a sample collected from the population of all fish inhabiting the Tennessee River. Here, the

•••

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 10

MARKED SET

Page 11: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

Statistics in Action 11

words population and sample are used to describe the objects upon which the measurements are taken,i.e., the fish. We could also use the terms to represent data sets. For example, the 144 DDT measurementsrepresent a sample collected from the population consisting of DDT measurements for all fish inhabitingthe river.

In addition to the quantitative variables length, weight, and DDT concentration, notice that the data setalso contains information on the qualitative variables location (i.e., where the fish were captured) andspecies of the fish. Three species of fish were examined: channel catfish, largemouth bass, and smallmouthbuffalo. The different symbols for location are interpreted as follows. The first two characters represent theriver or creek, and the remaining characters represent the distance (in miles) from the mouth of the river orcreek. For example, FCM5 indicates that the fish was captured in Flint Creek (FC), 5 miles upstream fromthe mouth of the creek (M5). Similarly, TRM380 denotes a fish sample collected from the Tennessee River(TR), 380 miles upstream from the river’s mouth (M380).

The U.S. Army Corps of Engineers used the data in the FISHDDT file to compare the DDT contents of fishat different locations and among the different species, and to determine the relationship (if any) of lengthand weight to DDT content. In subsequent chapters, we demonstrate several of these analyses in examples,exercises, and in a statistics in Action section. •

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 11

MARKED SET

Page 12: Introduction - esminfo.prenhall.comesminfo.prenhall.com/takealook2006/stats/mendenhall/pdf/ch01.pdf2 Chapter 1 Introduction 1.1 Statistics:The Science of Data According to The Random

MENDMC01_0131877062.QXD 3/29/06 11:48 PM Page 12

MARKED SET