3–4 4. 1.5. Measures of Positionrichmath.org/texts/stats/Elementary Statistics - A Step...Using the formula, find the coefficient of skewness for each distribution, and describe

Section 3–4 Measures of Position 115

c. Subtract 5 from each value, and find the standarddeviation.d. Multiply each value by 5, and find the standard

deviation.e. Divide each value by 5, and find the standard deviation.f. Generalize the results of parts b through e.g. Compare these results with those in Exercise 3–38.

*3–87. The mean deviation is found by using thefollowing formula:

where

X � value

� mean

n � number of values

� absolute value

Find the mean deviation for the following data.

5, 9, 10, 11, 11, 12, 15, 18, 20, 22

*3–88. A measure to determine the skewness of adistribution is called the Pearson coefficient of skewness.The formula is

The values of the coefficient usually range from 3 to �3.When the distribution is symmetrical, the coefficient iszero; when the distribution is positively skewed, it ispositive; and when the distribution is negatively skewed, itis negative.

Using the formula, find the coefficient of skewness foreach distribution, and describe the shape of the distribution.a. Mean � 10, median � 8, standard deviation � 3.b. Mean � 42, median � 45, standard deviation � 4.c. Mean � 18.6, median � 18.6, standard

deviation � 1.5.d. Mean � 98, median � 97.6, standard deviation � 4.

skewness �3�X

� MD�

s

� �

X�

mean deviation ��X X

��

n

In addition to measures of central tendency and measures of variation, there are alsomeasures of position or location. These measures include standard scores, percentiles,deciles, and quartiles. They are used to locate the relative position of a data value in thedata set. For example, if a value is located at the 80th percentile, it means that 80% of thevalues fall below it in the distribution and 20% of the values fall above it. The median isthe value that corresponds to the 50th percentile, since half of the values fall below it andhalf of the values fall above it. This section discusses these measures of position.

There is an old saying that states, “You can’t compare apples and oranges.” But with theuse of statistics, it can be done to some extent. Suppose that a student scored 90 on amusic test and 45 on an English exam. Direct comparison of raw scores is impossible,since the exams might not be equivalent in terms of number of questions, value of eachquestion, and so on. However, a comparison of a relative standard similar to both can bemade. This comparison uses the mean and standard deviation and is called a standardscore or z score. (We also use z scores in later chapters.)

A standard score or z score for a value is obtained by subtracting the mean from the value and dividing theresult by the standard deviation. The symbol for a standard score is z. The formula is

For samples, the formula is

For populations, the formula is

The z score represents the number of standard deviations a data value falls above or below the mean.

z �X �

z �X X

—

s

z �value mean

standard deviation

3–4

Measures of PositionObjective 3. Identify theposition of a data value in adata set using variousmeasures of position such aspercentiles, deciles, andquartiles.Standard Scores

116 Chapter 3 Data Description

A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of10; she scored 30 on a history test with a mean of 25 and a standard deviation of 5.Compare her relative positions on the two tests.

Solution

First, find the z scores. For calculus the z score is

For history the z score is

Since the z score for calculus is larger, her relative position in the calculus class is higherthan her relative position in the history class.

Note that if the z score is positive, the score is above the mean. If the z score is 0,the score is the same as the mean. And if the z score is negative, the score is below themean.

Find the z score for each test and state which is higher.

Test A X � 38 � 40 s � 5

Test B X � 94 � 100 s � 10

Solution

For test A,

For test B,

The score for test A is relatively higher than the score for test B.

When all data for a variable are transformed into z scores, the resulting distribu-tion will have a mean of 0 and a standard deviation of 1. A z score, then, is actually thenumber of standard deviations each variable is from the mean for a specific distribution.In Example 3–29, the calculus score of 65 was actually 1.5 standard deviations abovethe mean of 50. This will be explained in more detail in Chapter 7.

Percentiles are position measures used in educational and health-related fields to indi-cate the position of an individual in a group.

A percentile P is an integer (1 � P � 99) such that Pth percentile is a value whereP% of the data values are less than or equal to the value and 100 P% of the data val-ues are greater than or equal to the value.

z �94 100

10� 0.6

z �X X

�

s�

38 405

� 0.4

X�

X�

z �30 25

5� 1.0

z �X X

�

s�

65 5010

� 1.5

Example 3–29

The average number offaces a person learns torecognize and rememberduring his or her lifetimeis 10,000. (The Harper’sIndex Book, p. 86)

Interesting FactsInteresting Facts

Example 3–30

Percentiles


In many situations, the graphs and tables showing the percentiles for various meas-ures such as test scores, heights, or weights have already been completed. Table 3–3shows the percentile ranks for scaled scores on the Test of English as a Foreign Lan-guage. If a student had a scaled score of 58 for Section 1 (listening and comprehension),that student would have a percentile rank of 81. Hence, that student did better than 81%of the students who took Section 1 of the exam.

Table 3–3 Percentile Ranks and Scaled Scores on the Test of English as aForeign Language*

Section 2: Section 3: Section 1: Structure Vocabulary Total

Scaled Listening and written and reading scaled Percentile score comprehension expression comprehension score rank

68 99 98

66 98 96 98 660 99

64 96 94 96 640 97

62 92 90 93 620 94

60 87 84 88 600 89

→58 81 76 81 580 82

56 73 68 72 560 73

54 64 58 61 540 62

52 54 48 50 520 50

50 42 38 40 500 39

48 32 29 30 480 29

46 22 21 23 460 20

44 14 15 16 440 13

42 9 10 11 420 9

40 5 7 8 400 5

38 3 4 5 380 3

36 2 3 3 360 1

34 1 2 2 340 1

32 1 1 320

30 1 1 300

Mean 51.5 52.2 51.4 Mean 517

S.D. 7.1 7.9 7.5 S.D. 68

*Based on the total group of 1,178,193 examinees tested from July 1989 through June 1991.

Source: Reprinted by permission of Educational Testing Service, the copyright owner.

Figure 3–5 shows percentiles in graphic form of weights of girls from ages 2 to 18.To find the percentile rank of an 11-year-old who weighs 82 pounds, start at the 82-pound weight on the left axis and move horizontally to the right. Find the 11 on the hor-izontal axis and move up vertically. The two lines meet at the 50th percentile curvedline; hence, an 11-year-old girl who weighs 82 pounds is in the 50th percentile for herage group. If the lines do not meet exactly on one of the curved percentile lines, then thepercentile rank must be approximated.


Percentiles are also used to compare an individual’s test score with the nationalnorm. For example, tests such as the National Educational Development Test (NEDT)are taken by students in ninth or tenth grade. A student’s scores are compared with thoseof other students locally and nationally by using percentile ranks. A similar test for ele-mentary school students is called the California Achievement Test.

Percentiles are not the same as percentages. That is, if a student gets 72 correct an-swers out of a possible 100, she obtains a percentage score of 72. There is no indicationof her position with respect to the rest of the class. She could have scored the highest,the lowest, or somewhere in between. On the other hand, if a raw score of 72 corre-sponds to the 64th percentile, then she did better than 64% of the students in her class.

Percentiles are symbolized by

P1, P2, P3, . . . , P99

Figure 3–5

Weights of Girls by Ageand Percentile Rankings

90

80

70

60

50

Wei

ght (

kg)

Wei

ght (

lb)

40

30

20

10

190

180

170

160

150

140

130

120

110

100

90

82

70

60

50

40

30

20

2 543 6 987 10Age (years)

131211 14 1716 1815

95th

90th

75th

50th

25th

10th

5th

Source: Distributed by Mead Johnson Nutritional Division. Reprinted with permission.


and divide the distribution into 100 groups.

Percentile graphs can be constructed as shown in the next example. Percentile graphsuse the same values as the cumulative relative frequency graphs described in Sec-tion 2–3, except that the proportions have been converted to percents.

The frequency distribution for the systolic blood pressure readings (in millimeters ofmercury, mm Hg) of 200 randomly selected college students follows. Construct a per-centile graph.

A B C D Class Cumulative Cumulative

boundaries Frequency frequency percent

89.5–104.5 24

104.5–119.5 62

119.5–134.5 72

134.5–149.5 26

149.5–164.5 12

164.5–179.5 4

200

Solution

STEP 1 Find the cumulative frequencies and place them in column C.

STEP 2 Find the cumulative percentages and place them in column D. To do thisstep, use the formula

For the first class,

The completed table is shown next.

A B C D Class Cumulative Cumulative

boundaries Frequency frequency percent

89.5–104.5 24 24 12

104.5–119.5 62 86 43

119.5–134.5 72 158 79

134.5–149.5 26 184 92

149.5–164.5 12 196 98

164.5–179.5 4 200 100

200

cumulative % �24200

• 100% � 12%

cumulative % �cumulative frequency

n• 100%

P97 P98 P99

Largestdatavalue

1%1%1%

P1 P2 P3

Smallestdata

value

1%1%1%

Example 3–31


STEP 3 Graph the data, using class boundaries for the x axis and the percentages forthe y axis, as shown in Figure 3–6.

Once a percentile graph has been constructed, one can find the approximate corre-sponding percentile ranks for given blood pressure values and find approximate bloodpressure values for given percentile ranks.

For example, to find the percentile rank of a blood pressure reading of 130, find 130on the x axis of Figure 3–6, and draw a vertical line to the graph. Then move horizon-tally to the value on the y axis. Note that a blood pressure of 130 corresponds to ap-proximately the 70th percentile.

If the value that corresponds to the 40th percentile is desired, start on the y axis at 40and draw a horizontal line to the graph. Then draw a vertical line to the x axis, and readthe value. In Figure 3–6, the 40th percentile corresponds to a value of approximately 118.Thus, if a person has a blood pressure of 118, he or she is at the 40th percentile.

Finding values and the corresponding percentile ranks by using a graph yields onlyapproximate answers. Several mathematical methods exist for computing percentiles fordata. They can be used to find the approximate percentile rank of a data value or to finda data value corresponding to a given percentile. When the data set is large (100 or more),these methods yield better results. The next several examples show these methods.

Percentile FormulaThe percentile corresponding to a given value (X) is computed by using the following formula:

A teacher gives a 20-point test to 10 students. The scores are shown below. Find the per-centile rank of a score of 12.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

percentile ��number of values below X� � 0.5

total number of values• 100%

Cum

ulat

ive

perc

enta

ges

y

x

89.5 104.5 119.5 134.5Class boundaries

149.5 164.5 179.5

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

Figure 3–6

Percentile Graph forExample 3–31.

Example 3–32


Solution

Arrange the data in order from lowest to highest.

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

Then substitute in the formula.

Since there are six values below a score of 12, the solution is

Thus, a student whose score was 12 did better than 65% of the class.

Note: One assumes that a score of 12 in Example 3–32, for instance, means theo-retically any value between 11.5 and 12.5.

Using the data in Example 3–32, find the percentile rank for a score of 6.

Solution

There are three values below 6. Thus

A student who scored 6 did better than 35% of the class.

The next two examples show a procedure for finding a value corresponding to agiven percentile.

Using the scores in Example 3–32, find the value corresponding to the 25th percentile.

Solution

STEP 1 Arrange the data in order from lowest to highest.

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

STEP 2 Compute

where

n � total number of values

p � percentile

Thus,

STEP 3 If c is not a whole number, round it up to the next whole number; in thiscase, c � 3. (If c is a whole number, see the next example.) Start at the

c �10 • 25

100� 2.5

c �n • p100

percentile �3 � 0.5

10• 100% � 35th percentile

percentile �6 � 0.5

10• 100% � 65th percentile

percentile ��number of values below X� � 0.5

total number of values• 100%

Example 3–33

Example 3–34


lowest value and count over to the third value, which is 5. Hence, the value5 corresponds to the 25th percentile.

Using the data set in Example 3–32, find the value that corresponds to the 60th percentile.

Solution

STEP 1 Arrange the data in order from smallest to largest.

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

STEP 2 Substitute in the formula.

STEP 3 If c is a whole number, use the value halfway between the c and c � 1values when counting up from the lowest value—in this case, the 6th and7th values.

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

6th value 7th value

The value halfway between 10 and 12 is 11. Find it by adding the two valuesand dividing by 2.

Hence, 11 corresponds to the 60th percentile. Anyone scoring 11 would have done bet-ter than 60% of the class.

The steps for finding a value corresponding to a given percentile are summarized inthe Procedure Table.

10 � 122

� 11

↑↑

c �n • p100

�10 • 60

100� 6

Example 3–35

Finding a Data Value Corresponding to a Given Percentile

STEP 1 Arrange the data in order from lowest to highest.

STEP 2 Substitute in the formula

where

n � total number of values

p � percentile

STEP 3A If c is not a whole number, round up to the next whole number. Starting at thelowest value, count over to the number that corresponds to the rounded-up value.

STEP 3B If c is a whole number, use the value halfway between c and c � 1 when countingup from the lowest value.

c �n • p100

Procedure TableProcedure Table


Quartiles divide the distribution into four groups, denoted by Q1, Q2, Q3.Note that Q1 is the same as the 25th percentile; Q2 is the same as the 50th percentile

or the median; Q3 corresponds to the 75th percentile, as shown.

Quartiles can be computed using the formulas given for percentiles; however, it ismuch easier to arrange the data in order from smallest to largest and find the median.This is Q2. To find Q1, find the median of the data values less than the median. To findQ3, find the median of the data values that are larger than the median.

Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.

Solution

STEP 1 Arrange the data in order:

5, 6, 12, 13, 15, 18, 22, 50

STEP 2 Find the median (Q2).

5, 6, 12, 13, 15, 18, 22, 50

↑MD

STEP 3 Find the median of the data values less than 14.

5, 6, 12, 13

↑Q1

Q1 is 9.

STEP 4 Find the median of the data values greater than 14.

15, 18, 22, 50

↑Q3

Here Q3 is 20. Hence, Q1 � 9, Q2 � 14, Q3 � 20.

Deciles divide the distribution into 10 groups as shown. They are denoted by D1,D2, etc.

Q3 �18 � 22

2� 20

Q1 �6 � 12

2� 9

MD �13 � 15

2� 14

25% 25%25% 25%

Smallestdatavalue Q1

Largestdatavalue

MDQ2 Q3

Quartiles and Deciles

Example 3–36


Note that D1 corresponds to P10; D2 corresponds to P20, etc. Deciles can be found usingthe formulas given for percentiles. Taken altogether then, these are the relationshipsamong percentiles, deciles, and quartiles.

Deciles are denoted by D1, D2, D3, . . . , D9

and they correspond to P10, P20, P30, . . . , P90

Quartiles are denoted by Q1, Q2, Q3

and they correspond to P25, P50, P75

The median is the same as P50 or Q2 or D5

The position measures are summarized in Table 3–4.

Table 3–4 Summary of Position Measures

Measure Definition Symbol(s)

Standard score Number of standard deviations a data value is above or zor z score below the mean

Percentile Position in hundredths a data value is in the distribution Pn

Decile Position in tenths a data value is in the distribution Dn

Quartile Position in fourths a data value is in the distribution Qn

A data set should be checked for extremely high or extremely low values. These valuesare called outliers.

An outlier is an extremely high or an extremely low data value when compared with the rest of the data values.

There are several ways to check for outliers. One method is shown in the nextexample.

Check the following data set for outliers.

5, 6, 12, 13, 15, 18, 22, 50

Solution

The data value 50 is extremely suspect. The steps in checking for an outlier follow.

STEP 1 Find Q1 and Q3. This was done in the previous example; Q1 is 9 and Q3 is 20.

STEP 2 Find the interquartile range (IQR), which is Q3 Q1.

IQR � Q3 Q1 � 20 9 � 11

STEP 3 Multiply this value by 1.5.

1.5(11) � 16.5

STEP 4 Subtract the value obtained in Step 3 from Q1 and add the value obtained inStep 3 to Q3.

9 16.5 � 7.5 and 20 � 16.5 � 36.5

10% 10% 10% 10% 10% 10% 10% 10%10% 10%

Smallestdatavalue D1

LargestdatavalueD2 D3 D4 D5 D6 D7 D8 D9

Outliers

Example 3–37


STEP 5 Check the data set for any data values that fall outside the interval from7.5 to 36.5. The value 50 is outside this interval; hence, it can beconsidered an outlier.

There are several reasons to check a data set for outliers. First, the data value mayhave resulted from a measurement or observational error. Perhaps the researcher meas-ured the variable incorrectly. Second, the data value may have resulted from a recordingerror. That is, it may have been written or typed incorrectly. Third, the data value mayhave been obtained from a subject that is not in the defined population. For example,suppose test scores were obtained from a seventh-grade class, but a student in that classwas actually in the sixth grade and had special permission to attend the class. This stu-dent might have scored extremely low on that particular exam on that day. Fourth, thedata value might be a legitimate value that occurred by chance (although the probabil-ity is extremely small).

There are no hard-and-fast rules on what to do with outliers, nor is there completeagreement among statisticians on ways to identify them. Obviously, if they occurred asa result of an error, an attempt should be made to correct the error or else the data valueshould be omitted entirely. When they occur naturally by chance, the statistician mustmake a decision about whether to include them in the data set.

When a distribution is normal or bell-shaped, data values that are beyond threestandard deviations of the mean can be considered suspected outliers.

ExercisesExercises

3–89. What is a z score?

3–90. Define percentile rank.

3–91. What is the difference between a percentage and apercentile?

3–92. Define quartile.

3–93. What is the relationship between quartiles andpercentiles?

3–94. What is a decile?

3–95. How are deciles related to percentiles?

3–96. To which percentile, quartile, and decile does themedian correspond?

3–97. If a history test has a mean of 100 and a standarddeviation of 10, find the corresponding z score for eachtest score.a. 115 d. 100b. 124 e. 85c. 93

3–98. The reaction time to a stimulus for a certain testhas a mean of 2.5 seconds and a standard deviation of0.3 second. Find the corresponding z score for eachreaction time.a. 2.7 d. 3.1b. 3.9 e. 2.2c. 2.8

3–99. A final examination for a psychology course has amean of 84 and a standard deviation of 4. Find thecorresponding z score for each raw score.a. 87 d. 76b. 79 e. 82c. 93

3–100. An aptitude test has a mean of 220 and a standarddeviation of 10. Find the corresponding z score for eachexam score.a. 200 d. 212b. 232 e. 225c. 218

3–101. Which of the following exam grades has a betterrelative position?a. A grade of 43 on a test with � 40 and s � 3.b. A grade of 75 on a test with � 72 and s � 5.

3–102. A student scores 60 on a mathematics test that hasa mean of 54 and a standard deviation of 3, and she scores80 on a history test with a mean of 75 and a standarddeviation of 2. On which test did she do better than therest of the class?

3–103. Which score indicates the highest relative position?a. A score of 3.2 on a test with � 4.6 and s � 1.5.b. A score of 630 on a test with � 800 and s � 200.c. A score of 43 on a test with � 50 and s � 5.X

�X�

X�

X�X�


3–104. The following distribution represents the data forweights of fifth-grade boys. Find the approximate weightscorresponding to each percentile given by constructing apercentile graph.

Weight (pounds) Frequency

52.5–55.5 955.5–58.5 1258.5–61.5 1761.5–64.5 2264.5–67.5 15

a. 25th c. 80thb. 60th d. 95th

3–105. For the data in Exercise 3–104, find theapproximate percentile ranks of the following weights.a. 57 pounds c. 64 poundsb. 62 pounds d. 59 pounds

3–106. (ans) The data below represent the scores on anational achievement test for a group of tenth-gradestudents. Find the approximate percentile ranks of thefollowing scores by constructing a percentile graph.a. 220 d. 280b. 245 e. 300c. 276

Score Frequency

196.5–217.5 5217.5–238.5 17238.5–259.5 22259.5–280.5 48280.5–301.5 22301.5–322.5 6

3–107. For the data in Exercise 3–106, find the approximatescores that correspond to the following percentiles.a. 15th d. 65thb. 29th e. 80thc. 43rd

3–108. (ans) The airborne speeds in miles per hour of 21planes are shown next. Find the approximate values thatcorrespond to the given percentiles by constructing apercentile graph.

Class Frequency

366–386 4387–407 2408–428 3429–449 2450–470 1471–491 2492–512 3513–533 4

21

a. 9th d. 60thb. 20th e. 75thc. 45th

Source: Reprinted with permission from The World Almanac andBook of Facts 1995. Copyright © 1994 PRIMEDIA Reference Inc.All rights reserved.

3–109. Using the data in Exercise 3–108, find the approxi-mate percentile ranks of the following miles per hour.a. 380 mph d. 505 mphb. 425 mph e. 525 mphc. 455 mph

3–110. Find the percentile ranks of each weight in the dataset. The weights are in pounds.

78, 82, 86, 88, 92, 97

3–111. In Exercise 3–110, what value corresponds to the30th percentile?

3–112. Find the percentile rank for each test score in thedata set.

12, 28, 35, 42, 47, 49, 50

3–113. In Exercise 3–112, what value corresponds to the60th percentile?

3–114. Find the percentile rank for each test score in thedata set.

5, 12, 15, 16, 20, 21

3–115. What test score in Exercise 3–114 corresponds tothe 33rd percentile?

3–116. Using the procedure shown in Example 3–37,check each data set for outliers.a. 16, 18, 22, 19, 3, 21, 17, 20b. 24, 32, 54, 31, 16, 18, 19, 14, 17, 20c. 321, 343, 350, 327, 200d. 88, 72, 97, 84, 86, 85, 100e. 145, 119, 122, 118, 125, 116f. 14, 16, 27, 18, 13, 19, 36, 15, 20

*3–117. Another measure of average is called themidquartile; it is the numerical value halfway between Q1

and Q3, and the formula is

Using this formula and other formulas, find Q1, Q2,Q3, the midquartile, and the interquartile range for eachdata set.a. 5, 12, 16, 25, 32, 38b. 53, 62, 78, 94, 96, 99, 103

midquartile �Q1 � Q3

3


Technology Step by StepTechnology Step by Step

Finding the Mean and Standard DeviationFinding the Mean and Standard Deviation

Example MT3–11. Type the data from Example 3–39 (in the following section) into C1 of MINITAB. Name the

column CARS-THEFT.

52 58 75 79 57 65 62 77 56 59 51 53 51 66 5568 63 78 50 53 67 65 69 66 69 57 73 72 75 55

2. Select Stat>Basic Statistics>Display Descriptive Statistics.

3. The cursor will be blinking in the Variables text box. Double-click C1.

4. Click [OK].

The results will be displayed in the Session Window as shown. The column label “CARS-THEFT” is truncated to 8 letters in the display. The standard deviation is the unbiased estimate, s.The trimmed mean or TrMean is the mean for the data after the lowest and highest 5% are dis-carded. If the trimmed mean is different from the mean, there may be outliers.

To calculate various descriptive statistics:1. Enter data into L1.

2. Press STAT to get the menu.

3. Press � to move cursor to CALC; then press 1 for 1 Var Stats

4. Press 2nd [L1] then ENTER.

The calculator will displaysample mean

�x sum of the data values�x2 sum of the squares of the data valuesSx sample standard deviationx population standard deviationn number of data values

min X smallest data valueQ1 lower quartile

Med medianQ3 upper quartile

max X largest data value

Example TI3–1Find the various descriptive statistics for the auto sales data from Example 3–23:

11.2 11.9 12.0 12.8 13.4 14.3

x_

MINITABStep by Step

Session Window withDescriptive Statistics

TI-83Step by Step


Following the steps above, we obtain the following results, as shown on the screen:The mean is 12.6.The sum is 75.6.The sum of x2 is 958.94.The unbiased estimator of the standard deviation Sx is 1.1296017.The population standard deviation x is 1.031180553.The sample size n is 6.The smallest data value is 11.2.Q1 is 11.9.The median is 12.4.Q3 is 13.4.The largest data value is 14.3.

Finding the Central TendencyFinding the Central TendencyExample XL3–1To find the mean, mode, and median of a data set:

1. Enter the numbers in a range of cells (here shown as the numbers in cells A2 to A12). We usethe data from Example 3–11 on stopping distances:

15 18 18 18 20 22 24 24 24 26 26

2. For the mean, enter =AVERAGE(A2:A12) in a blank cell.

3. For the mode, enter =MODE(A2:A12) in a blank cell.

4. For the median, enter =MEDIAN(A2:A12) in a blank cell.

These three functions are available from the standard toolbar by clicking the fx icon and scrollingdown the list of statistical functions. Note: for distributions that are bimodal, like this one, the Ex-cel MODE function reports the first mode only. A better practice is to use the Histogram routinefrom the Data Analysis Add-In, which reports actual counts in a table.

Output Output

ExcelStep by Step


Finding Measures of VariationFinding Measures of VariationExample XL3–2To find values that estimate the spread of a distribution of numbers:

1. Enter the numbers in a range (here A1:A6). We use the data from Example 3–23 onEuropean automobile sales.

2. For the sample variance, enter =VAR(A1:A6) in a blank cell.3. For the sample standard deviation, enter =STDEV(A1:A6) in a blank cell.4. For the range, you can compute the value =MAX(A1:A6) MIN(A1:A6).

There are also functions STDEVP for population standard deviation and VARP for populationvariances

Descriptive Statistics in ExcelDescriptive Statistics in ExcelExample XL3–3Excel’s Data Analysis options include an item called Descriptive Statistics that reports all the stan-dard measures of a data set.

1. Enter the data set shown (9 numbers) in column A of a new worksheet.

12 17 15 16 16 14 18 13 10

2. Select Tools>Data Analysis.

3. Use this data (A1:A9) as the Input Range in the Descriptive Statistics dialog box.

4. Check the Summary statistics option, and click [OK].

Descriptive StatisticsDialog Box


Here’s the summary output for this data set. Note that this one operation reports most of the sta-tistics used in this chapter.

In traditional statistics, data are organized by using a frequency distribution. Fromthis distribution various graphs such as the histogram, frequency polygon, and ogivecan be constructed to determine the shape or nature of the distribution. In addition, var-ious statistics such as the mean and standard deviation can be computed to summarizethe data.

The purpose of traditional analysis is to confirm various conjectures about the na-ture of the data. For example, from a carefully designed study, a researcher might wantto know if the proportion of Americans who are exercising today has increased from 10years ago. This study would contain various assumptions about the population, variousdefinitions such as exercise, and so on.

In exploratory data analysis (EDA), data are organized using a stem and leaf plot.The summary statistics used are the median and interquartile range. Finally, a boxplotcan be constructed to determine visually the nature of the distribution. The purpose ofexploratory data analysis is to examine data in order to find out what information can bediscovered. For example, are there any gaps in the data? Can any patterns be discerned?Here the researcher starts out with few or no assumptions.

Exploratory data analysis was developed by John Tukey and presented in his bookExploratory Data Analysis (Addison-Wesley, 1977).

The stem and leaf plot is a method of organizing data and is a combination of sortingand graphing. It has the advantage over grouped frequency distribution of retaining theactual data while showing them in graphic form.

A stem and leaf plot is a data plot that uses part of a data value as the stem and part of the data value as theleaf to form groups or classes.

3–5

Exploratory DataAnalysisObjective 4. Use thetechniques of exploratory dataanalysis, including stem andleaf plots, boxplots, and five-number summaries todiscover various aspects ofdata.

Stem and Leaf Plots

3–4 4. 1.5. Measures of Positionrichmath.org/texts/stats/Elementary Statistics - A Step...Using the formula, find the coefficient of skewness for each distribution, and describe

Documents