In addition to measures of central tendency and measures of variation, there are also measures of position or location. These measures include standard scores, percentiles, deciles, and quartiles. They are used to locate the relative position of a data value in the data set. For example, if a value is located at the 80th percentile, it means that 80% of the values fall below it in the distribution and 20% of the values fall above it. The median is the value that corresponds to the 50th percentile, since half of the values fall below it and half of the values fall above it. This section discusses these measures of position. There is an old saying that states, “You can’t compare apples and oranges.” But with the use of statistics, it can be done to some extent. Suppose that a student scored 90 on a music test and 45 on an English exam. Direct comparison of raw scores is impossible, since the exams might not be equivalent in terms of number of questions, value of each question, and so on. However, a comparison of a relative standard similar to both can be made. This comparison uses the mean and standard deviation and is called a standard score or z score. (We also use z scores in later chapters.) A standard score or z score for a value is obtained by subtracting the mean from the value and dividing the result by the standard deviation. The symbol for a standard score is z. The formula is For samples, the formula is For populations, the formula is The z score represents the number of standard deviations a data value falls above or below the mean. z X z X X — s z value mean standard deviation 3–4 Measures of Position Objective 3. Identify the position of a data value in a data set using various measures of position such as percentiles, deciles, and quartiles. Standard Scores
16
Embed
3–4 4. 1.5. Measures of Positionrichmath.org/texts/stats/Elementary Statistics - A Step...Using the formula, find the coefficient of skewness for each distribution, and describe
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Section 3–4 Measures of Position 115
c. Subtract 5 from each value, and find the standarddeviation.d. Multiply each value by 5, and find the standard
deviation.e. Divide each value by 5, and find the standard deviation.f. Generalize the results of parts b through e.g. Compare these results with those in Exercise 3–38.
*3–87. The mean deviation is found by using thefollowing formula:
where
X � value
� mean
n � number of values
� absolute value
Find the mean deviation for the following data.
5, 9, 10, 11, 11, 12, 15, 18, 20, 22
*3–88. A measure to determine the skewness of adistribution is called the Pearson coefficient of skewness.The formula is
The values of the coefficient usually range from 3 to �3.When the distribution is symmetrical, the coefficient iszero; when the distribution is positively skewed, it ispositive; and when the distribution is negatively skewed, itis negative.
Using the formula, find the coefficient of skewness foreach distribution, and describe the shape of the distribution.a. Mean � 10, median � 8, standard deviation � 3.b. Mean � 42, median � 45, standard deviation � 4.c. Mean � 18.6, median � 18.6, standard
deviation � 1.5.d. Mean � 98, median � 97.6, standard deviation � 4.
skewness �3�X
� MD�
s
� �
X�
mean deviation ���X X
��
n
In addition to measures of central tendency and measures of variation, there are alsomeasures of position or location. These measures include standard scores, percentiles,deciles, and quartiles. They are used to locate the relative position of a data value in thedata set. For example, if a value is located at the 80th percentile, it means that 80% of thevalues fall below it in the distribution and 20% of the values fall above it. The median isthe value that corresponds to the 50th percentile, since half of the values fall below it andhalf of the values fall above it. This section discusses these measures of position.
There is an old saying that states, “You can’t compare apples and oranges.” But with theuse of statistics, it can be done to some extent. Suppose that a student scored 90 on amusic test and 45 on an English exam. Direct comparison of raw scores is impossible,since the exams might not be equivalent in terms of number of questions, value of eachquestion, and so on. However, a comparison of a relative standard similar to both can bemade. This comparison uses the mean and standard deviation and is called a standardscore or z score. (We also use z scores in later chapters.)
A standard score or z score for a value is obtained by subtracting the mean from the value and dividing theresult by the standard deviation. The symbol for a standard score is z. The formula is
For samples, the formula is
For populations, the formula is
The z score represents the number of standard deviations a data value falls above or below the mean.
z �X �
z �X X
—
s
z �value mean
standard deviation
3–4
Measures of PositionObjective 3. Identify theposition of a data value in adata set using variousmeasures of position such aspercentiles, deciles, andquartiles.Standard Scores
116 Chapter 3 Data Description
A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of10; she scored 30 on a history test with a mean of 25 and a standard deviation of 5.Compare her relative positions on the two tests.
Solution
First, find the z scores. For calculus the z score is
For history the z score is
Since the z score for calculus is larger, her relative position in the calculus class is higherthan her relative position in the history class.
Note that if the z score is positive, the score is above the mean. If the z score is 0,the score is the same as the mean. And if the z score is negative, the score is below themean.
Find the z score for each test and state which is higher.
Test A X � 38 � 40 s � 5
Test B X � 94 � 100 s � 10
Solution
For test A,
For test B,
The score for test A is relatively higher than the score for test B.
When all data for a variable are transformed into z scores, the resulting distribu-tion will have a mean of 0 and a standard deviation of 1. A z score, then, is actually thenumber of standard deviations each variable is from the mean for a specific distribution.In Example 3–29, the calculus score of 65 was actually 1.5 standard deviations abovethe mean of 50. This will be explained in more detail in Chapter 7.
Percentiles are position measures used in educational and health-related fields to indi-cate the position of an individual in a group.
A percentile P is an integer (1 � P � 99) such that Pth percentile is a value whereP% of the data values are less than or equal to the value and 100 P% of the data val-ues are greater than or equal to the value.
z �94 100
10� 0.6
z �X X
�
s�
38 405
� 0.4
X�
X�
z �30 25
5� 1.0
z �X X
�
s�
65 5010
� 1.5
Example 3–29
The average number offaces a person learns torecognize and rememberduring his or her lifetimeis 10,000. (The Harper’sIndex Book, p. 86)
Interesting FactsInteresting Facts
Example 3–30
Percentiles
Section 3–4 Measures of Position 117
In many situations, the graphs and tables showing the percentiles for various meas-ures such as test scores, heights, or weights have already been completed. Table 3–3shows the percentile ranks for scaled scores on the Test of English as a Foreign Lan-guage. If a student had a scaled score of 58 for Section 1 (listening and comprehension),that student would have a percentile rank of 81. Hence, that student did better than 81%of the students who took Section 1 of the exam.
Table 3–3 Percentile Ranks and Scaled Scores on the Test of English as aForeign Language*
Section 2: Section 3: Section 1: Structure Vocabulary Total
Scaled Listening and written and reading scaled Percentile score comprehension expression comprehension score rank
68 99 98
66 98 96 98 660 99
64 96 94 96 640 97
62 92 90 93 620 94
60 87 84 88 600 89
→58 81 76 81 580 82
56 73 68 72 560 73
54 64 58 61 540 62
52 54 48 50 520 50
50 42 38 40 500 39
48 32 29 30 480 29
46 22 21 23 460 20
44 14 15 16 440 13
42 9 10 11 420 9
40 5 7 8 400 5
38 3 4 5 380 3
36 2 3 3 360 1
34 1 2 2 340 1
32 1 1 320
30 1 1 300
Mean 51.5 52.2 51.4 Mean 517
S.D. 7.1 7.9 7.5 S.D. 68
*Based on the total group of 1,178,193 examinees tested from July 1989 through June 1991.
Source: Reprinted by permission of Educational Testing Service, the copyright owner.
Figure 3–5 shows percentiles in graphic form of weights of girls from ages 2 to 18.To find the percentile rank of an 11-year-old who weighs 82 pounds, start at the 82-pound weight on the left axis and move horizontally to the right. Find the 11 on the hor-izontal axis and move up vertically. The two lines meet at the 50th percentile curvedline; hence, an 11-year-old girl who weighs 82 pounds is in the 50th percentile for herage group. If the lines do not meet exactly on one of the curved percentile lines, then thepercentile rank must be approximated.
118 Chapter 3 Data Description
Percentiles are also used to compare an individual’s test score with the nationalnorm. For example, tests such as the National Educational Development Test (NEDT)are taken by students in ninth or tenth grade. A student’s scores are compared with thoseof other students locally and nationally by using percentile ranks. A similar test for ele-mentary school students is called the California Achievement Test.
Percentiles are not the same as percentages. That is, if a student gets 72 correct an-swers out of a possible 100, she obtains a percentage score of 72. There is no indicationof her position with respect to the rest of the class. She could have scored the highest,the lowest, or somewhere in between. On the other hand, if a raw score of 72 corre-sponds to the 64th percentile, then she did better than 64% of the students in her class.
Percentiles are symbolized by
P1, P2, P3, . . . , P99
Figure 3–5
Weights of Girls by Ageand Percentile Rankings
90
80
70
60
50
Wei
ght (
kg)
Wei
ght (
lb)
40
30
20
10
190
180
170
160
150
140
130
120
110
100
90
82
70
60
50
40
30
20
2 543 6 987 10Age (years)
131211 14 1716 1815
95th
90th
75th
50th
25th
10th
5th
Source: Distributed by Mead Johnson Nutritional Division. Reprinted with permission.
Section 3–4 Measures of Position 119
and divide the distribution into 100 groups.
Percentile graphs can be constructed as shown in the next example. Percentile graphsuse the same values as the cumulative relative frequency graphs described in Sec-tion 2–3, except that the proportions have been converted to percents.
The frequency distribution for the systolic blood pressure readings (in millimeters ofmercury, mm Hg) of 200 randomly selected college students follows. Construct a per-centile graph.
A B C D Class Cumulative Cumulative
boundaries Frequency frequency percent
89.5–104.5 24
104.5–119.5 62
119.5–134.5 72
134.5–149.5 26
149.5–164.5 12
164.5–179.5 4
200
Solution
STEP 1 Find the cumulative frequencies and place them in column C.
STEP 2 Find the cumulative percentages and place them in column D. To do thisstep, use the formula
For the first class,
The completed table is shown next.
A B C D Class Cumulative Cumulative
boundaries Frequency frequency percent
89.5–104.5 24 24 12
104.5–119.5 62 86 43
119.5–134.5 72 158 79
134.5–149.5 26 184 92
149.5–164.5 12 196 98
164.5–179.5 4 200 100
200
cumulative % �24200
• 100% � 12%
cumulative % �cumulative frequency
n• 100%
P97 P98 P99
Largestdatavalue
1%1%1%
P1 P2 P3
Smallestdata
value
1%1%1%
Example 3–31
120 Chapter 3 Data Description
STEP 3 Graph the data, using class boundaries for the x axis and the percentages forthe y axis, as shown in Figure 3–6.
Once a percentile graph has been constructed, one can find the approximate corre-sponding percentile ranks for given blood pressure values and find approximate bloodpressure values for given percentile ranks.
For example, to find the percentile rank of a blood pressure reading of 130, find 130on the x axis of Figure 3–6, and draw a vertical line to the graph. Then move horizon-tally to the value on the y axis. Note that a blood pressure of 130 corresponds to ap-proximately the 70th percentile.
If the value that corresponds to the 40th percentile is desired, start on the y axis at 40and draw a horizontal line to the graph. Then draw a vertical line to the x axis, and readthe value. In Figure 3–6, the 40th percentile corresponds to a value of approximately 118.Thus, if a person has a blood pressure of 118, he or she is at the 40th percentile.
Finding values and the corresponding percentile ranks by using a graph yields onlyapproximate answers. Several mathematical methods exist for computing percentiles fordata. They can be used to find the approximate percentile rank of a data value or to finda data value corresponding to a given percentile. When the data set is large (100 or more),these methods yield better results. The next several examples show these methods.
Percentile FormulaThe percentile corresponding to a given value (X) is computed by using the following formula:
A teacher gives a 20-point test to 10 students. The scores are shown below. Find the per-centile rank of a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
percentile ��number of values below X� � 0.5
total number of values• 100%
Cum
ulat
ive
perc
enta
ges
y
x
89.5 104.5 119.5 134.5Class boundaries
149.5 164.5 179.5
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
Figure 3–6
Percentile Graph forExample 3–31.
Example 3–32
Section 3–4 Measures of Position 121
Solution
Arrange the data in order from lowest to highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Then substitute in the formula.
Since there are six values below a score of 12, the solution is
Thus, a student whose score was 12 did better than 65% of the class.
Note: One assumes that a score of 12 in Example 3–32, for instance, means theo-retically any value between 11.5 and 12.5.
Using the data in Example 3–32, find the percentile rank for a score of 6.
Solution
There are three values below 6. Thus
A student who scored 6 did better than 35% of the class.
The next two examples show a procedure for finding a value corresponding to agiven percentile.
Using the scores in Example 3–32, find the value corresponding to the 25th percentile.
Solution
STEP 1 Arrange the data in order from lowest to highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
STEP 2 Compute
where
n � total number of values
p � percentile
Thus,
STEP 3 If c is not a whole number, round it up to the next whole number; in thiscase, c � 3. (If c is a whole number, see the next example.) Start at the
c �10 • 25
100� 2.5
c �n • p100
percentile �3 � 0.5
10• 100% � 35th percentile
percentile �6 � 0.5
10• 100% � 65th percentile
percentile ��number of values below X� � 0.5
total number of values• 100%
Example 3–33
Example 3–34
122 Chapter 3 Data Description
lowest value and count over to the third value, which is 5. Hence, the value5 corresponds to the 25th percentile.
Using the data set in Example 3–32, find the value that corresponds to the 60th percentile.
Solution
STEP 1 Arrange the data in order from smallest to largest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
STEP 2 Substitute in the formula.
STEP 3 If c is a whole number, use the value halfway between the c and c � 1values when counting up from the lowest value—in this case, the 6th and7th values.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
6th value 7th value
The value halfway between 10 and 12 is 11. Find it by adding the two valuesand dividing by 2.
Hence, 11 corresponds to the 60th percentile. Anyone scoring 11 would have done bet-ter than 60% of the class.
The steps for finding a value corresponding to a given percentile are summarized inthe Procedure Table.
10 � 122
� 11
↑↑
c �n • p100
�10 • 60
100� 6
Example 3–35
Finding a Data Value Corresponding to a Given Percentile
STEP 1 Arrange the data in order from lowest to highest.
STEP 2 Substitute in the formula
where
n � total number of values
p � percentile
STEP 3A If c is not a whole number, round up to the next whole number. Starting at thelowest value, count over to the number that corresponds to the rounded-up value.
STEP 3B If c is a whole number, use the value halfway between c and c � 1 when countingup from the lowest value.
c �n • p100
Procedure TableProcedure Table
Section 3–4 Measures of Position 123
Quartiles divide the distribution into four groups, denoted by Q1, Q2, Q3.Note that Q1 is the same as the 25th percentile; Q2 is the same as the 50th percentile
or the median; Q3 corresponds to the 75th percentile, as shown.
Quartiles can be computed using the formulas given for percentiles; however, it ismuch easier to arrange the data in order from smallest to largest and find the median.This is Q2. To find Q1, find the median of the data values less than the median. To findQ3, find the median of the data values that are larger than the median.
Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
Solution
STEP 1 Arrange the data in order:
5, 6, 12, 13, 15, 18, 22, 50
STEP 2 Find the median (Q2).
5, 6, 12, 13, 15, 18, 22, 50
↑MD
STEP 3 Find the median of the data values less than 14.
5, 6, 12, 13
↑Q1
Q1 is 9.
STEP 4 Find the median of the data values greater than 14.
15, 18, 22, 50
↑Q3
Here Q3 is 20. Hence, Q1 � 9, Q2 � 14, Q3 � 20.
Deciles divide the distribution into 10 groups as shown. They are denoted by D1,D2, etc.
Q3 �18 � 22
2� 20
Q1 �6 � 12
2� 9
MD �13 � 15
2� 14
25% 25%25% 25%
Smallestdatavalue Q1
Largestdatavalue
MDQ2 Q3
Quartiles and Deciles
Example 3–36
124 Chapter 3 Data Description
Note that D1 corresponds to P10; D2 corresponds to P20, etc. Deciles can be found usingthe formulas given for percentiles. Taken altogether then, these are the relationshipsamong percentiles, deciles, and quartiles.
Deciles are denoted by D1, D2, D3, . . . , D9
and they correspond to P10, P20, P30, . . . , P90
Quartiles are denoted by Q1, Q2, Q3
and they correspond to P25, P50, P75
The median is the same as P50 or Q2 or D5
The position measures are summarized in Table 3–4.
Table 3–4 Summary of Position Measures
Measure Definition Symbol(s)
Standard score Number of standard deviations a data value is above or zor z score below the mean
Percentile Position in hundredths a data value is in the distribution Pn
Decile Position in tenths a data value is in the distribution Dn
Quartile Position in fourths a data value is in the distribution Qn
A data set should be checked for extremely high or extremely low values. These valuesare called outliers.
An outlier is an extremely high or an extremely low data value when compared with the rest of the data values.
There are several ways to check for outliers. One method is shown in the nextexample.
Check the following data set for outliers.
5, 6, 12, 13, 15, 18, 22, 50
Solution
The data value 50 is extremely suspect. The steps in checking for an outlier follow.
STEP 1 Find Q1 and Q3. This was done in the previous example; Q1 is 9 and Q3 is 20.
STEP 2 Find the interquartile range (IQR), which is Q3 Q1.
IQR � Q3 Q1 � 20 9 � 11
STEP 3 Multiply this value by 1.5.
1.5(11) � 16.5
STEP 4 Subtract the value obtained in Step 3 from Q1 and add the value obtained inStep 3 to Q3.
9 16.5 � 7.5 and 20 � 16.5 � 36.5
10% 10% 10% 10% 10% 10% 10% 10%10% 10%
Smallestdatavalue D1
LargestdatavalueD2 D3 D4 D5 D6 D7 D8 D9
Outliers
Example 3–37
Section 3–4 Measures of Position 125
STEP 5 Check the data set for any data values that fall outside the interval from7.5 to 36.5. The value 50 is outside this interval; hence, it can beconsidered an outlier.
There are several reasons to check a data set for outliers. First, the data value mayhave resulted from a measurement or observational error. Perhaps the researcher meas-ured the variable incorrectly. Second, the data value may have resulted from a recordingerror. That is, it may have been written or typed incorrectly. Third, the data value mayhave been obtained from a subject that is not in the defined population. For example,suppose test scores were obtained from a seventh-grade class, but a student in that classwas actually in the sixth grade and had special permission to attend the class. This stu-dent might have scored extremely low on that particular exam on that day. Fourth, thedata value might be a legitimate value that occurred by chance (although the probabil-ity is extremely small).
There are no hard-and-fast rules on what to do with outliers, nor is there completeagreement among statisticians on ways to identify them. Obviously, if they occurred asa result of an error, an attempt should be made to correct the error or else the data valueshould be omitted entirely. When they occur naturally by chance, the statistician mustmake a decision about whether to include them in the data set.
When a distribution is normal or bell-shaped, data values that are beyond threestandard deviations of the mean can be considered suspected outliers.
ExercisesExercises
3–89. What is a z score?
3–90. Define percentile rank.
3–91. What is the difference between a percentage and apercentile?
3–92. Define quartile.
3–93. What is the relationship between quartiles andpercentiles?
3–94. What is a decile?
3–95. How are deciles related to percentiles?
3–96. To which percentile, quartile, and decile does themedian correspond?
3–97. If a history test has a mean of 100 and a standarddeviation of 10, find the corresponding z score for eachtest score.a. 115 d. 100b. 124 e. 85c. 93
3–98. The reaction time to a stimulus for a certain testhas a mean of 2.5 seconds and a standard deviation of0.3 second. Find the corresponding z score for eachreaction time.a. 2.7 d. 3.1b. 3.9 e. 2.2c. 2.8
3–99. A final examination for a psychology course has amean of 84 and a standard deviation of 4. Find thecorresponding z score for each raw score.a. 87 d. 76b. 79 e. 82c. 93
3–100. An aptitude test has a mean of 220 and a standarddeviation of 10. Find the corresponding z score for eachexam score.a. 200 d. 212b. 232 e. 225c. 218
3–101. Which of the following exam grades has a betterrelative position?a. A grade of 43 on a test with � 40 and s � 3.b. A grade of 75 on a test with � 72 and s � 5.
3–102. A student scores 60 on a mathematics test that hasa mean of 54 and a standard deviation of 3, and she scores80 on a history test with a mean of 75 and a standarddeviation of 2. On which test did she do better than therest of the class?
3–103. Which score indicates the highest relative position?a. A score of 3.2 on a test with � 4.6 and s � 1.5.b. A score of 630 on a test with � 800 and s � 200.c. A score of 43 on a test with � 50 and s � 5.X
�X�
X�
X�X�
126 Chapter 3 Data Description
3–104. The following distribution represents the data forweights of fifth-grade boys. Find the approximate weightscorresponding to each percentile given by constructing apercentile graph.
3–105. For the data in Exercise 3–104, find theapproximate percentile ranks of the following weights.a. 57 pounds c. 64 poundsb. 62 pounds d. 59 pounds
3–106. (ans) The data below represent the scores on anational achievement test for a group of tenth-gradestudents. Find the approximate percentile ranks of thefollowing scores by constructing a percentile graph.a. 220 d. 280b. 245 e. 300c. 276
3–107. For the data in Exercise 3–106, find the approximatescores that correspond to the following percentiles.a. 15th d. 65thb. 29th e. 80thc. 43rd
3–108. (ans) The airborne speeds in miles per hour of 21planes are shown next. Find the approximate values thatcorrespond to the given percentiles by constructing apercentile graph.
3–109. Using the data in Exercise 3–108, find the approxi-mate percentile ranks of the following miles per hour.a. 380 mph d. 505 mphb. 425 mph e. 525 mphc. 455 mph
3–110. Find the percentile ranks of each weight in the dataset. The weights are in pounds.
78, 82, 86, 88, 92, 97
3–111. In Exercise 3–110, what value corresponds to the30th percentile?
3–112. Find the percentile rank for each test score in thedata set.
12, 28, 35, 42, 47, 49, 50
3–113. In Exercise 3–112, what value corresponds to the60th percentile?
3–114. Find the percentile rank for each test score in thedata set.
5, 12, 15, 16, 20, 21
3–115. What test score in Exercise 3–114 corresponds tothe 33rd percentile?
3–116. Using the procedure shown in Example 3–37,check each data set for outliers.a. 16, 18, 22, 19, 3, 21, 17, 20b. 24, 32, 54, 31, 16, 18, 19, 14, 17, 20c. 321, 343, 350, 327, 200d. 88, 72, 97, 84, 86, 85, 100e. 145, 119, 122, 118, 125, 116f. 14, 16, 27, 18, 13, 19, 36, 15, 20
*3–117. Another measure of average is called themidquartile; it is the numerical value halfway between Q1
and Q3, and the formula is
Using this formula and other formulas, find Q1, Q2,Q3, the midquartile, and the interquartile range for eachdata set.a. 5, 12, 16, 25, 32, 38b. 53, 62, 78, 94, 96, 99, 103
midquartile �Q1 � Q3
3
Section 3–4 Measures of Position 127
Technology Step by StepTechnology Step by Step
Finding the Mean and Standard DeviationFinding the Mean and Standard Deviation
Example MT3–11. Type the data from Example 3–39 (in the following section) into C1 of MINITAB. Name the
3. The cursor will be blinking in the Variables text box. Double-click C1.
4. Click [OK].
The results will be displayed in the Session Window as shown. The column label “CARS-THEFT” is truncated to 8 letters in the display. The standard deviation is the unbiased estimate, s.The trimmed mean or TrMean is the mean for the data after the lowest and highest 5% are dis-carded. If the trimmed mean is different from the mean, there may be outliers.
To calculate various descriptive statistics:1. Enter data into L1.
2. Press STAT to get the menu.
3. Press � to move cursor to CALC; then press 1 for 1 Var Stats
4. Press 2nd [L1] then ENTER.
The calculator will displaysample mean
�x sum of the data values�x2 sum of the squares of the data valuesSx sample standard deviationx population standard deviationn number of data values
min X smallest data valueQ1 lower quartile
Med medianQ3 upper quartile
max X largest data value
Example TI3–1Find the various descriptive statistics for the auto sales data from Example 3–23:
11.2 11.9 12.0 12.8 13.4 14.3
x_
MINITABStep by Step
Session Window withDescriptive Statistics
TI-83Step by Step
128 Chapter 3 Data Description
Following the steps above, we obtain the following results, as shown on the screen:The mean is 12.6.The sum is 75.6.The sum of x2 is 958.94.The unbiased estimator of the standard deviation Sx is 1.1296017.The population standard deviation x is 1.031180553.The sample size n is 6.The smallest data value is 11.2.Q1 is 11.9.The median is 12.4.Q3 is 13.4.The largest data value is 14.3.
Finding the Central TendencyFinding the Central TendencyExample XL3–1To find the mean, mode, and median of a data set:
1. Enter the numbers in a range of cells (here shown as the numbers in cells A2 to A12). We usethe data from Example 3–11 on stopping distances:
15 18 18 18 20 22 24 24 24 26 26
2. For the mean, enter =AVERAGE(A2:A12) in a blank cell.
3. For the mode, enter =MODE(A2:A12) in a blank cell.
4. For the median, enter =MEDIAN(A2:A12) in a blank cell.
These three functions are available from the standard toolbar by clicking the fx icon and scrollingdown the list of statistical functions. Note: for distributions that are bimodal, like this one, the Ex-cel MODE function reports the first mode only. A better practice is to use the Histogram routinefrom the Data Analysis Add-In, which reports actual counts in a table.
Output Output
ExcelStep by Step
Section 3–4 Measures of Position 129
Finding Measures of VariationFinding Measures of VariationExample XL3–2To find values that estimate the spread of a distribution of numbers:
1. Enter the numbers in a range (here A1:A6). We use the data from Example 3–23 onEuropean automobile sales.
2. For the sample variance, enter =VAR(A1:A6) in a blank cell.3. For the sample standard deviation, enter =STDEV(A1:A6) in a blank cell.4. For the range, you can compute the value =MAX(A1:A6) MIN(A1:A6).
There are also functions STDEVP for population standard deviation and VARP for populationvariances
Descriptive Statistics in ExcelDescriptive Statistics in ExcelExample XL3–3Excel’s Data Analysis options include an item called Descriptive Statistics that reports all the stan-dard measures of a data set.
1. Enter the data set shown (9 numbers) in column A of a new worksheet.
12 17 15 16 16 14 18 13 10
2. Select Tools>Data Analysis.
3. Use this data (A1:A9) as the Input Range in the Descriptive Statistics dialog box.
4. Check the Summary statistics option, and click [OK].
Descriptive StatisticsDialog Box
130 Chapter 3 Data Description
Here’s the summary output for this data set. Note that this one operation reports most of the sta-tistics used in this chapter.
In traditional statistics, data are organized by using a frequency distribution. Fromthis distribution various graphs such as the histogram, frequency polygon, and ogivecan be constructed to determine the shape or nature of the distribution. In addition, var-ious statistics such as the mean and standard deviation can be computed to summarizethe data.
The purpose of traditional analysis is to confirm various conjectures about the na-ture of the data. For example, from a carefully designed study, a researcher might wantto know if the proportion of Americans who are exercising today has increased from 10years ago. This study would contain various assumptions about the population, variousdefinitions such as exercise, and so on.
In exploratory data analysis (EDA), data are organized using a stem and leaf plot.The summary statistics used are the median and interquartile range. Finally, a boxplotcan be constructed to determine visually the nature of the distribution. The purpose ofexploratory data analysis is to examine data in order to find out what information can bediscovered. For example, are there any gaps in the data? Can any patterns be discerned?Here the researcher starts out with few or no assumptions.
Exploratory data analysis was developed by John Tukey and presented in his bookExploratory Data Analysis (Addison-Wesley, 1977).
The stem and leaf plot is a method of organizing data and is a combination of sortingand graphing. It has the advantage over grouped frequency distribution of retaining theactual data while showing them in graphic form.
A stem and leaf plot is a data plot that uses part of a data value as the stem and part of the data value as theleaf to form groups or classes.
3–5
Exploratory DataAnalysisObjective 4. Use thetechniques of exploratory dataanalysis, including stem andleaf plots, boxplots, and five-number summaries todiscover various aspects ofdata.