Excursions in Modern Mathematics, 7e: 14.1 - 1 Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1 Graphical Descriptions of Data 14.2 Variables 14.3 Numerical Summaries 14.4 Measures of Spread
Excursions in Modern Mathematics, 7e: 14.1 - 1 Copyright © 2010 Pearson Education, Inc.
14 Descriptive Statistics
14.1 Graphical Descriptions of Data
14.2 Variables
14.3 Numerical Summaries
14.4 Measures of Spread
Excursions in Modern Mathematics, 7e: 14.1 - 2 Copyright © 2010 Pearson Education, Inc.
A data set is a collection of data values.
Statisticians often refer to the individual
data values in a data set as data points.
For the sake of simplicity, we will work with
data sets in which each data point consists
of a single number, but in more complicated
settings, a single data point can consist of
many numbers.
Data Set
Excursions in Modern Mathematics, 7e: 14.1 - 3 Copyright © 2010 Pearson Education, Inc.
• Use the letter N to represent the size of
the data set. In real- life applications,
data sets can range in size from
reasonably small (a dozen or so data
points) to very large (hundreds of millions
of data points), and the larger the data
set is, the more we need a good way to
describe and summarize it.
Data Set
Excursions in Modern Mathematics, 7e: 14.1 - 4 Copyright © 2010 Pearson Education, Inc.
The day after the midterm exam in his Stat 101 class, Dr.Blackbeard has posted the results online. The data set consists of N = 75 data points (the number of students who took the test). Each data point (listed in the second column) is a score between 0 and 25 (Dr. Blackbeard gives no partial credit). Notice that the numbers listed in the first column are not data points–they are numerical IDs used as substitutes for names to protect the students’ rights of privacy.
Example 14.1 Stat 101 Test Scores
Excursions in Modern Mathematics, 7e: 14.1 - 5 Copyright © 2010 Pearson Education, Inc.
Example 14.1 Stat 101 Test Scores
Excursions in Modern Mathematics, 7e: 14.1 - 6 Copyright © 2010 Pearson Education, Inc.
How do we package the results into a compact, organized, and intelligible whole?
Example 14.1 Stat 101 Test Scores
Excursions in Modern Mathematics, 7e: 14.1 - 7 Copyright © 2010 Pearson Education, Inc.
The first step in summarizing the information in Table 14-1 is to organize the scores in a frequency table such as Table 14-2. In this table, the number below each score gives the frequency of the score–that is, the number of students getting that particular score.
Example 14.2 Stat 101 Test Scores:
Part 2
Excursions in Modern Mathematics, 7e: 14.1 - 8 Copyright © 2010 Pearson Education, Inc.
We can readily see from Table 14-2 that there was one student with a score of 1, one with a score of 6, two with a score of 7, six with a score of 8, and so on. Notice that the scores with a frequency of zero are not listed in the table.
Example 14.2 Stat 101 Test Scores:
Part 2
Excursions in Modern Mathematics, 7e: 14.1 - 9 Copyright © 2010 Pearson Education, Inc.
• Figure 14-1 (next slide) shows the same information in a much more visual way called a bar graph, with the test scores listed in increasing order on a horizontal axis and the frequency of each test score displayed by the height of the column above that test score.
• Notice that in the bar graph, even the test scores with a frequency of zero show up–there simply is no column above these scores.
Example 14.2 Stat 101 Test Scores:
Part 2
Excursions in Modern Mathematics, 7e: 14.1 - 10 Copyright © 2010 Pearson Education, Inc.
Figure 14-1
Example 14.2 Stat 101 Test Scores:
Part 2
Excursions in Modern Mathematics, 7e: 14.1 - 11 Copyright © 2010 Pearson Education, Inc.
• Bar graphs are easy to read, and they are a nice way to present a good general picture of the data.
• Outliers are extreme data points that do not fit into the overall pattern of the data. In this example there are two obvious outliers–the score of 24 (head and shoulders above the rest of the class) and the score of 1 (lagging way behind the pack).
Example 14.2 Stat 101 Test Scores:
Part 2
Excursions in Modern Mathematics, 7e: 14.1 - 12 Copyright © 2010 Pearson Education, Inc.
• Sometimes it is more convenient to express the bar graph in terms of relative frequencies –that is, the frequencies given in terms of percentages of the total population.
• Figure 14-2 shows a relative frequency bar graph for the Stat 101 data set. Notice that we indicated on the graph that we are dealing with percentages rather than total counts and that the size of the data set is N = 75.
Example 14.2 Stat 101 Test Scores:
Part 2
Excursions in Modern Mathematics, 7e: 14.1 - 13 Copyright © 2010 Pearson Education, Inc.
Figure 14-2
Example 14.2 Stat 101 Test Scores:
Part 2
Excursions in Modern Mathematics, 7e: 14.1 - 14 Copyright © 2010 Pearson Education, Inc.
• This allows anyone who wishes to do so to compute the actual frequencies. For example, Fig. 14-2 indicates that 12% of the 75 students scored a 12 on the exam, so the actual frequency is given by 75 0.12 = 9 students.
• The change from actual frequencies to percentages (or vice versa) does not change the shape of the graph–it is basically a change of scale.
Example 14.2 Stat 101 Test Scores:
Part 2
Excursions in Modern Mathematics, 7e: 14.1 - 15 Copyright © 2010 Pearson Education, Inc.
• Page 545, problem 6
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 16 Copyright © 2010 Pearson Education, Inc.
• Frequency charts that use icons or
pictures instead of bars to show the
frequencies are commonly referred to as
pictograms.
Bar Graph versus Pictogram
Excursions in Modern Mathematics, 7e: 14.1 - 17 Copyright © 2010 Pearson Education, Inc.
• The point of a pictogram is that a graph
is often used not only to inform but also
to impress and persuade, and, in such
cases, a well-chosen icon or picture can
be a more effective tool than just a bar.
• Here’s a pictogram displaying the same
data as in figure 14-2.
Bar Graph versus Pictogram
Excursions in Modern Mathematics, 7e: 14.1 - 18 Copyright © 2010 Pearson Education, Inc.
Figure 14-3
Bar Graph versus Pictogram
Excursions in Modern Mathematics, 7e: 14.1 - 19 Copyright © 2010 Pearson Education, Inc.
This figure is a pictogram showing the growth
in yearly sales of the XYZ Corporation
between 2001 and 2006. It’s a good picture to
Example 14.3 Selling the XYZ
Corporation
show at a
shareholders
meeting, but
the picture is
actually quite
misleading.
Excursions in Modern Mathematics, 7e: 14.1 - 20 Copyright © 2010 Pearson Education, Inc.
This figure shows a pictogram for exactly the
same data with a much more accurate and
sobering picture of how well the XYZ
Example 14.3 Selling the XYZ
Corporation
Corporation
had been
doing.
Excursions in Modern Mathematics, 7e: 14.1 - 21 Copyright © 2010 Pearson Education, Inc.
The difference between the two pictograms
can be attributed to a couple of standard
tricks of the trade: (1) stretching the scale of
the vertical axis and (2) “cheating” on the
choice of starting value on the vertical axis.
As an educated consumer, you should always
be on the lookout for these tricks. In graphical
descriptions of data, a fine line separates
objectivity from propaganda.
Example 14.3 Selling the XYZ
Corporation
Excursions in Modern Mathematics, 7e: 14.1 - 22 Copyright © 2010 Pearson Education, Inc.
14 Descriptive Statistics
14.1 Graphical Descriptions of Data
14.2 Variables
14.3 Numerical Summaries
14.4 Measures of Spread
Excursions in Modern Mathematics, 7e: 14.1 - 23 Copyright © 2010 Pearson Education, Inc.
• A variable is any characteristic that varies
with the members of a population.
• A variable that represents a measurable
quantity is called a numerical (or
quantitative) variable.
Variable
Excursions in Modern Mathematics, 7e: 14.1 - 24 Copyright © 2010 Pearson Education, Inc.
• Example: the students in Dr. Blackbeard’s
Stat 101 course (the population) did not
all perform equally on the exam (see
chapter 14.1). Thus, the test score is a
variable, which in this particular case is a
whole number between 0 and 25.
Variable
Excursions in Modern Mathematics, 7e: 14.1 - 25 Copyright © 2010 Pearson Education, Inc.
In some instances, such as when the
instructor gives partial credit, a test score
may take on a fractional value, such as 18.5
or 18.25. Even in these cases, however, the
possible increments for the values of the
variable are given by some minimum
amount–a quarter-point, a half-point,
whatever. In these cases, the variable is a
discrete variable.
Variable
Excursions in Modern Mathematics, 7e: 14.1 - 26 Copyright © 2010 Pearson Education, Inc.
• In contrast to a discrete variable, the amount of time each student studied for the exam is a continuous variable. In this case the variable can take on values that differ by any amount: an hour, a minute, a second, a tenth of a second, and so on.
Variable
Excursions in Modern Mathematics, 7e: 14.1 - 27 Copyright © 2010 Pearson Education, Inc.
• When the difference between the values of a numerical variable can be arbitrarily small, we call the variable continuous (person’s height, weight, foot size, time it takes to run one mile);
• when possible values of the numerical variable change by minimum increments, the variable is called discrete (person’s IQ, SAT score, shoe size, score of a basketball game).
Numerical Variable
Excursions in Modern Mathematics, 7e: 14.1 - 28 Copyright © 2010 Pearson Education, Inc.
Variables can also describe characteristics that cannot be measured numerically: nationality, gender, hair color, and so on. Variables of this type are called categorical (or qualitative) variables.
Categorical Variable
Excursions in Modern Mathematics, 7e: 14.1 - 29 Copyright © 2010 Pearson Education, Inc.
• In some ways, categorical variables must be treated differently from numerical variables–they cannot, for example, be added, multiplied, or averaged.
• In other ways, categorical variables can be treated much like discrete numerical variables, particularly when it comes to graphical descriptions, such as bar graphs and pictograms.
Categorical Variable
Excursions in Modern Mathematics, 7e: 14.1 - 30 Copyright © 2010 Pearson Education, Inc.
Table 14-3 shows undergraduate enrollments
in each of the five schools at Tasmania State
Example 14.4 Enrollments at Tasmania
State University
University. A sixth category
(“other”) includes
undeclared students,
interdisciplinary majors,
and so on. The variable
“school” is a categorical
variable.
Excursions in Modern Mathematics, 7e: 14.1 - 31 Copyright © 2010 Pearson Education, Inc.
Vertical and horizontal bar graphs displaying the data for table 14-3.
Example 14.4 Enrollments at Tasmania
State University
Excursions in Modern Mathematics, 7e: 14.1 - 32 Copyright © 2010 Pearson Education, Inc.
When the number of categories is small, as is the case here, another common way to describe the relative frequencies of the categories is by using a pie chart. In a pie chart the “pie” represents the entire population (100%), and the “slices” represent the categories (or classes), with the size (angle) of each slice being proportional to the relative frequency of the corresponding category.
Example 14.4 Enrollments at Tasmania
State University
Excursions in Modern Mathematics, 7e: 14.1 - 33 Copyright © 2010 Pearson Education, Inc.
This figure shows an accurate pie chart for the school-enrollment data given in Table 14-3.
Example 14.4 Enrollments at Tasmania
State University
Excursions in Modern Mathematics, 7e: 14.1 - 34 Copyright © 2010 Pearson Education, Inc.
• Page 546, problems 14(a)
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 35 Copyright © 2010 Pearson Education, Inc.
• When it comes to deciding how best to display graphically the frequencies of a population, a critical issue is the number of categories into which numerical (quantitative) data can fall.
• When the number of categories is too big (say, in the dozens), a bar graph or pictogram can become muddled and ineffective.
How Many Categories
Excursions in Modern Mathematics, 7e: 14.1 - 36 Copyright © 2010 Pearson Education, Inc.
The SAT consists of three sections: a math
section, a writing section, and a critical
reading section, with the scores for each
section ranging from a minimum of 200 to a
maximum of 800 and going up in increments
of 10 points. In 2007, there were 1,494,531
college-bound seniors who took the SAT.
How do we describe the math section results
for this group of students?
Example 14.6 2007 SAT Math Scores
Excursions in Modern Mathematics, 7e: 14.1 - 37 Copyright © 2010 Pearson Education, Inc.
We could set up a frequency table (or a bar
graph) with the number of students scoring
each of the possible scores–
200, 210, 220,… ,790, 800
The problem is that there are 61 different
possible scores between 200 and 800, and
this number is too large for an effective bar
graph.
Example 14.6 2007 SAT Math Scores
Excursions in Modern Mathematics, 7e: 14.1 - 38 Copyright © 2010 Pearson Education, Inc.
• In this case the data can be grouped
together, or aggregated, into sets of scores
into categories called class intervals.
• The decision as to how the class intervals
are defined and how many there are will
depend on how much or how little detail is
desired, but as a general rule of thumb, the
number of class intervals should be
somewhere between 5 and 20.
Example 14.6 2007 SAT Math Scores
Excursions in Modern Mathematics, 7e: 14.1 - 39 Copyright © 2010 Pearson Education, Inc.
SAT scores are usually
aggregated into 12 class
intervals of essentially the
same size:
200–249,
250–299,
300–349,
700–749,
750–800.
Example 14.6 2007 SAT Math Scores
Excursions in Modern Mathematics, 7e: 14.1 - 40 Copyright © 2010 Pearson Education, Inc.
Here is the associated bar graph.
Example 14.6 2007 SAT Math Scores
Excursions in Modern Mathematics, 7e: 14.1 - 41 Copyright © 2010 Pearson Education, Inc.
• Page 546,
problem 7
Example
Excursions in Modern Mathematics, 7e: 14.1 - 42 Copyright © 2010 Pearson Education, Inc.
• Page 546, problem 7
Solution:
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 43 Copyright © 2010 Pearson Education, Inc.
When a numerical variable is continuous, its
possible values can vary by infinitesimally
small increments. As a consequence, there
are no gaps between the class intervals,
and our old way of doing things (using
separated columns or stacks) will no longer
work. In this case we use a variation of a
bar graph called a histogram.
Histogram
Excursions in Modern Mathematics, 7e: 14.1 - 44 Copyright © 2010 Pearson Education, Inc.
Suppose we want to use a graph to display the distribution of starting salaries for last year’s graduating class at Tasmania State University.
The starting salaries of the N = 3258 graduates range from a low of $40,350 to a high of $74,800.
Example 14.8 Starting Salaries of TSU
Graduates
Excursions in Modern Mathematics, 7e: 14.1 - 45 Copyright © 2010 Pearson Education, Inc.
Based on this range and the amount of detail we want to show, we must decide on the length of the class intervals. A reasonable choice would be to use class intervals defined in increments of $5000.
Example 14.8 Starting Salaries of TSU
Graduates
Excursions in Modern Mathematics, 7e: 14.1 - 46 Copyright © 2010 Pearson Education, Inc.
Example 14.8 Starting Salaries of TSU
Graduates
Excursions in Modern Mathematics, 7e: 14.1 - 47 Copyright © 2010 Pearson Education, Inc.
• The superscript “plus” marks in Table 14-6
indicate how we chose to deal with the
endpoints in Fig. 14-11.
• 45,000+–50,000 indicates numbers greater
than 45,000 but less than or equal to
50,000
Example 14.8 Starting Salaries of TSU
Graduates
Excursions in Modern Mathematics, 7e: 14.1 - 48 Copyright © 2010 Pearson Education, Inc.
• A starting salary of exactly $50,000, for
example, would be listed under the
45,000+–50,000 class interval rather than
the 50,000+–55,000 class interval.
Example 14.8 Starting Salaries of TSU
Graduates
Excursions in Modern Mathematics, 7e: 14.1 - 49 Copyright © 2010 Pearson Education, Inc.
Here is the histogram
showing the relative
frequency of each
class interval. As we
can see, a histogram
is very similar to a bar
graph.
Example 14.8 Starting Salaries of TSU
Graduates
Excursions in Modern Mathematics, 7e: 14.1 - 50 Copyright © 2010 Pearson Education, Inc.
• The differences between a bar graph and a
histogram are:
– Unlike a bar graph, a histogram is used for
continuous variables and there can be no
gaps between the class intervals
– Unlike a bar graph, the bars of a histogram
touch each other
Example 14.8 Starting Salaries of TSU
Graduates
Excursions in Modern Mathematics, 7e: 14.1 - 51 Copyright © 2010 Pearson Education, Inc.
When creating histograms, we should try,
as much as possible, to define class
intervals of equal length.
Use Class Intervals of Equal Length
Excursions in Modern Mathematics, 7e: 14.1 - 52 Copyright © 2010 Pearson Education, Inc.
• Page 547, problem 21
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 53 Copyright © 2010 Pearson Education, Inc.
14 Descriptive Statistics
14.1 Graphical Descriptions of Data
14.2 Variables
14.3 Numerical Summaries
14.4 Measures of Spread
Excursions in Modern Mathematics, 7e: 14.1 - 54 Copyright © 2010 Pearson Education, Inc.
Measures of Location
Measures of location such as the mean (or
average), the median, and the quartiles,
are numbers that provide information about
the values of the data.
Numerical Summaries of a Data Set
Excursions in Modern Mathematics, 7e: 14.1 - 55 Copyright © 2010 Pearson Education, Inc.
The best known of all numerical summaries
of data is the average, also called the mean.
There is no universal agreement as to which
of these names is a better choice–in some
settings mean is a better choice than
average, in other settings it’s the other way
around. In this chapter we will use
whichever seems the better choice at the
moment.
Average or Mean
Excursions in Modern Mathematics, 7e: 14.1 - 56 Copyright © 2010 Pearson Education, Inc.
The average (or mean) of a set of N
numbers is found by adding the numbers
and dividing the total by N. In other words,
the average of the numbers
d1, d2, d3,…, dN
is
A = (d1 + d2 + d3 +…+ dN)/N.
Average or Mean
Excursions in Modern Mathematics, 7e: 14.1 - 57 Copyright © 2010 Pearson Education, Inc.
• Page 548, problem 24(a)
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 58 Copyright © 2010 Pearson Education, Inc.
• Page 548, problem 24(a)
• Solution: 0.1625
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 59 Copyright © 2010 Pearson Education, Inc.
In this example we will find the average test
score in the Stat 101 exam first introduced in
Example 14.1. To find this average we need
to add all the test scores and divide by 75.
Test scores are given in Table 14-1 (next
slide)
Example 14.9 Stat 101 Test Scores:
Part 4
Excursions in Modern Mathematics, 7e: 14.1 - 60 Copyright © 2010 Pearson Education, Inc.
Example 14.1 Stat 101 Test Scores
Excursions in Modern Mathematics, 7e: 14.1 - 61 Copyright © 2010 Pearson Education, Inc.
The addition of the 75 test scores can be
simplified considerably if we use a frequency
table.
Example 14.9 Stat 101 Test Scores:
Part 4
Excursions in Modern Mathematics, 7e: 14.1 - 62 Copyright © 2010 Pearson Education, Inc.
From the frequency table we can find the sum
S of all the test scores as follows: Multiply
each test score by its corresponding
frequency and then add these products. Thus,
the sum of all the test scores is
S = (1 1) + (6 1) + (7 2) + (8 6) + …+
(16 1) + (24 1) = 814
If we divide this sum by N = 75, we get the
average test score A = 814/75 ≈ 10.85 points.
Example 14.9 Stat 101 Test Scores:
Part 4
Excursions in Modern Mathematics, 7e: 14.1 - 63 Copyright © 2010 Pearson Education, Inc.
In general, to find the average A of a data
set given by a frequency table such as
Table 14-8 we do the following:
Step 1.
S = d1•f1 + d2•f2 +… + dk•fk
To Find the Average
Step 2.
N = f1 + f2 +…+ fk
Step 3.
A = S/N
Excursions in Modern Mathematics, 7e: 14.1 - 64 Copyright © 2010 Pearson Education, Inc.
• Page 548, problem 29(a)
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 65 Copyright © 2010 Pearson Education, Inc.
• Page 548, problem 29(a)
• Solution: 1.5875
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 66 Copyright © 2010 Pearson Education, Inc.
This example (next slide) shows that the average can be a misleading statistic.
Example 14.10 Starting Salaries of
Philosophy Majors
Excursions in Modern Mathematics, 7e: 14.1 - 67 Copyright © 2010 Pearson Education, Inc.
The average starting salary of philosophy majors who recently graduated from Tasmania State University is $76,400 a year! But, one of the graduating philosophy majors happens to be basketball star “Hoops” Tallman, who is doing his thing in the NBA for a starting salary of $3.5 million a year.
Example 14.10 Starting Salaries of
Philosophy Majors
Excursions in Modern Mathematics, 7e: 14.1 - 68 Copyright © 2010 Pearson Education, Inc.
If we were to take this one outlier out of the population of 75 philosophy majors, we would have a more realistic picture of what philosophy majors are making.
Example 14.10 Starting Salaries of
Philosophy Majors
Excursions in Modern Mathematics, 7e: 14.1 - 69 Copyright © 2010 Pearson Education, Inc.
■ The total of the other 74 salaries (excluding
Hoops’s cool 3.5 mill) is $2,230,000
■ The average of the remaining 74 salaries is
then
$2,230,000/74 ≈ $30,135
Example 14.10 Starting Salaries of
Philosophy Majors
Excursions in Modern Mathematics, 7e: 14.1 - 70 Copyright © 2010 Pearson Education, Inc.
Let p be any integer between 0 and 100.
The pth percentile of a data set is a value
for which p percent of the values in the data
set are less than or equal to this value.
Percentile
Excursions in Modern Mathematics, 7e: 14.1 - 71 Copyright © 2010 Pearson Education, Inc.
First, sort the data from small to large.
If you are finding the pth percentile of a data set of size N, calculate p percent of N which is the position locator:
Calculate pth percentile
Np
L100
Excursions in Modern Mathematics, 7e: 14.1 - 72 Copyright © 2010 Pearson Education, Inc.
If L is an integer, the pth percentile is the
average of the data values in positions L
and L+1.
If L is not an integer, round up and use the
value in this position as the pth percentile.
Calculate pth percentile
Excursions in Modern Mathematics, 7e: 14.1 - 73 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
• Find the 25th and 75th percentiles
53 37 38 50 65 44 47 39 36 57 44 69
Excursions in Modern Mathematics, 7e: 14.1 - 74 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
• Sort the data from small to large:
36 37 38 39 44 44 47 50 53 57 65 69
There are 12 data values.
Excursions in Modern Mathematics, 7e: 14.1 - 75 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
25% of 12
312100
25L
Excursions in Modern Mathematics, 7e: 14.1 - 76 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
• The data can be grouped as follows:
36 37 38 39 44 44 47 50 53 57 65 69
25% of the data is less than or equal to 38.5
(the average of 38 and 39).
The 25th percentile is 38.5
3rd position
Excursions in Modern Mathematics, 7e: 14.1 - 77 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
75% of 12
912100
75L
Excursions in Modern Mathematics, 7e: 14.1 - 78 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
36 37 38 39 44 44 47 50 53 57 65 69
75% of the data is less than or equal to 55
(the average of 53 and 57).
The 75th percentile is 55
9th position
Excursions in Modern Mathematics, 7e: 14.1 - 79 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
• Find the 25th and 75th percentiles
44 50 39 36 47 38 65
Excursions in Modern Mathematics, 7e: 14.1 - 80 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
• First sort the data
36 38 39 44 47 50 65
There are 7 data values.
Excursions in Modern Mathematics, 7e: 14.1 - 81 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
25% of 7
75.17100
25L
1.75 round up to get 2
Excursions in Modern Mathematics, 7e: 14.1 - 82 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
36 38 39 44 47 50 65
2nd position
The 25th percentile is 38
Approximately 25% of the data is less than or equal to 38
Excursions in Modern Mathematics, 7e: 14.1 - 83 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
75% of 7
25.57
100
75L
5.25 round up to get 6
Excursions in Modern Mathematics, 7e: 14.1 - 84 Copyright © 2010 Pearson Education, Inc.
Example of Finding Percentile
36 38 39 44 47 50 65
6th position
The 75th percentile is 50
Approximately 75% of the data is less than or equal to 50
Excursions in Modern Mathematics, 7e: 14.1 - 85 Copyright © 2010 Pearson Education, Inc.
To reward good academic performance from
its athletes, Tasmania State University has a
program in which athletes with GPAs in the
top 20th percentile of their team’s GPAs get a
$5000 scholarship and athletes with GPAs in
the top forty-fifth percentile of their team’s
GPAs who did not get the $5000 scholarship
get a $2000 scholarship.
Example 14.12 Scholarships by
Percentiles
Excursions in Modern Mathematics, 7e: 14.1 - 86 Copyright © 2010 Pearson Education, Inc.
The women’s soccer team has N = 15
players. A list of their GPAs is as follows:
3.42, 3.91, 3.33, 3.65, 3.57, 3.45, 4.0, 3.71,
3.35, 3.82, 3.67, 3.88, 3.76, 3.41, 3.62
When we sort these GPAs we get the list
3.33, 3.35, 3.41, 3.42, 3.45, 3.57, 3.62, 3.65,
3.67, 3.71, 3.76, 3.82, 3.88, 3.91, 4.0
Example 14.12 Scholarships by
Percentiles
Excursions in Modern Mathematics, 7e: 14.1 - 87 Copyright © 2010 Pearson Education, Inc.
Since this list goes from lowest to highest
GPA, we are looking for the 80th percentile
and above (top 20th percentile) for the $5000
scholarships and the 55th percentile and
above (top 45th percentile) for the $2000
scholarships.
Example 14.12 Scholarships by
Percentiles
Excursions in Modern Mathematics, 7e: 14.1 - 88 Copyright © 2010 Pearson Education, Inc.
$5000 scholarships: The locator for the 80th
percentile is (0.8) 15 = 12. Here the locator
is a whole number, so the 80th percentile is
given = 3.85 (the average between 3.82 and
3.88). Thus, three students (the ones with
GPAs of 3.88, 3.91 and 4.0) get $5000
scholarships.
Example 14.12 Scholarships by
Percentiles
Excursions in Modern Mathematics, 7e: 14.1 - 89 Copyright © 2010 Pearson Education, Inc.
$2000 scholarships: The locator for the 55th
percentile is (0.55) 15 = 8.25. Here the
locator is not a whole number, so we round it
up to 9, and the 55th percentile is given by
d9 = 3.67. Thus, the students with GPAs of
3.67, 3.71, 3.76 and 3.82 (all students with
GPAs of 3.67 or higher except the ones that
already received $5000 scholarships) get
$2000 scholarships.
Example 14.12 Scholarships by
Percentiles
Excursions in Modern Mathematics, 7e: 14.1 - 90 Copyright © 2010 Pearson Education, Inc.
• The 50th percentile of a data set is
known as the median and denoted by M.
• The median splits a data set into two
halves–half of the data is at or below the
median and half of the data is at or
above the median.
Median
Excursions in Modern Mathematics, 7e: 14.1 - 91 Copyright © 2010 Pearson Education, Inc.
First, sort the data from small to large.
Median is the 50th percentile so the locator is 50 percent of N:
Calculate Median
250.0
100
50 NNNL
Excursions in Modern Mathematics, 7e: 14.1 - 92 Copyright © 2010 Pearson Education, Inc.
If L is an integer, the median is the average
of the data values in positions L and L+1.
If L is not an integer, round up and use the
value in this position as the median.
Calculate pth percentile
Excursions in Modern Mathematics, 7e: 14.1 - 93 Copyright © 2010 Pearson Education, Inc.
Example of Finding Median
• Find the median
53 37 38 50 65 44 47 39 36 57 44 69
Excursions in Modern Mathematics, 7e: 14.1 - 94 Copyright © 2010 Pearson Education, Inc.
• Sort the data from small to large:
36 37 38 39 44 44 47 50 53 57 65 69
There are 12 data values.
Example of Finding Median
Excursions in Modern Mathematics, 7e: 14.1 - 95 Copyright © 2010 Pearson Education, Inc.
50% of 12
612100
50L
Example of Finding Median
Excursions in Modern Mathematics, 7e: 14.1 - 96 Copyright © 2010 Pearson Education, Inc.
• The data can be grouped as follows:
36 37 38 39 44 44 47 50 53 57 65 69
50% of the data is less than or equal to 45.5
(the average of 44 and 47).
The median is 45.5
6th position
Example of Finding Median
Excursions in Modern Mathematics, 7e: 14.1 - 97 Copyright © 2010 Pearson Education, Inc.
• Find the median
44 50 39 36 47 38 65
Example of Finding Median
Excursions in Modern Mathematics, 7e: 14.1 - 98 Copyright © 2010 Pearson Education, Inc.
• First sort the data
36 38 39 44 47 50 65
There are 7 data values.
Example of Finding Median
Excursions in Modern Mathematics, 7e: 14.1 - 99 Copyright © 2010 Pearson Education, Inc.
50% of 7
5.37100
50L
3.5 round up to get 4
Example of Finding Median
Excursions in Modern Mathematics, 7e: 14.1 - 100 Copyright © 2010 Pearson Education, Inc.
36 38 39 44 47 50 65
2nd position
The median is 44
Approximately 50% of the data is less than or equal to 44
Example of Finding Median
Excursions in Modern Mathematics, 7e: 14.1 - 101 Copyright © 2010 Pearson Education, Inc.
After the median, the next most commonly
used set of percentiles are the first and third
quartiles. The first quartile (denoted by Q1)
is the 25th percentile, and the third quartile
(denoted by Q3) is the 75th percentile.
Quartiles
Excursions in Modern Mathematics, 7e: 14.1 - 102 Copyright © 2010 Pearson Education, Inc.
During the last year, 11 homes sold in the
Green Hills subdivision. The selling prices, in
chronological order, were $267,000,
$252,000, $228,000, $234,000, $292,000,
$263,000, $221,000, $245,000, $270,000,
$238,000, and $255,000. We are going to find
the median and the quartiles of the N = 11
home prices.
Example 14.13 Home Prices in Green
Hills
Excursions in Modern Mathematics, 7e: 14.1 - 103 Copyright © 2010 Pearson Education, Inc.
Sorting the home prices from smallest to
largest (and dropping the 000’s) gives the
sorted list
221, 228, 234, 238, 245, 252, 255, 263, 267,
270, 292
Example 14.13 Home Prices in Green
Hills
Excursions in Modern Mathematics, 7e: 14.1 - 104 Copyright © 2010 Pearson Education, Inc.
The locator for the median is (0.5) 11 = 5.5,
the locator for the first quartile is
(0.25) 11 = 2.75, and the locator for the third
quartile is (0.75) 11 = 8.25. Since these
locators are not whole numbers, they must be
rounded up:
5.5 to 6,
2.75 to 3, and
8.25 to 9.
Example 14.13 Home Prices in Green
Hills
Excursions in Modern Mathematics, 7e: 14.1 - 105 Copyright © 2010 Pearson Education, Inc.
Thus, the median home price is given by
252 (i.e., M = $252,000),
the first quartile is given by
234 (i.e., M = $234,000),
and the third quartile is given by
267 (i.e., M = $267,000).
Example 14.13 Home Prices in Green
Hills
Excursions in Modern Mathematics, 7e: 14.1 - 106 Copyright © 2010 Pearson Education, Inc.
Oops! Just this morning a home sold in Green Hills for $264,000. We need to recalculate the median and quartiles for what are now N = 12 home prices. We can use the sorted data set that we already had–all we have to do is insert the new home price (264) in the right spot (remember, we drop the 000’s!). This gives
221, 228, 234, 238, 245, 252, 255, 263, 264, 267, 270, 292
Example 14.13 Another Home Sells in
Green Hills
Excursions in Modern Mathematics, 7e: 14.1 - 107 Copyright © 2010 Pearson Education, Inc.
Now N = 12 and in this case the median is the
average of 252 and 255. It follows that the
median home price is M = $253,500. The
locator for the first quartile is
(0.25) 12 = 3, since the locator is a whole
number, the first quartile is the average of
234 and 238 (i.e., Q1 = $236,000).
Similarly, the third quartile is Q3 = $265,500
(the average of 264 and 267).
Example 14.13 Another Home Sells in
Green Hills
Excursions in Modern Mathematics, 7e: 14.1 - 108 Copyright © 2010 Pearson Education, Inc.
• We can calculate the median using a
frequency table as the following example
shows.
Calculate median and quartiles using
a Frequency Table
Excursions in Modern Mathematics, 7e: 14.1 - 109 Copyright © 2010 Pearson Education, Inc.
Find the median and quartile scores for the
Stat 101 data set (shown again in Table 14-
10). Having the frequency table available
eliminates the need for sorting the scores–the
frequency table has, in fact, done this for us.
Example 14.14 Stat 101 Test Scores:
Part 5
Excursions in Modern Mathematics, 7e: 14.1 - 110 Copyright © 2010 Pearson Education, Inc.
• Here N = 75 (odd), so the median is the thirty-eighth score (counting from the left) in the frequency table.
• To find the thirty-eighth number in Table 14-10, we tally frequencies as we move from left to right: 1 + 1= 2; 1 + 1 + 2 = 4; 1 + 1 + 2 + 6 = 10; 1 + 1 + 2 + 6 + 10 = 20; 1 + 1 + 2 + 6 + 10 + 16 = 36.
Example 14.14 Stat 101 Test Scores:
Part 5
Excursions in Modern Mathematics, 7e: 14.1 - 111 Copyright © 2010 Pearson Education, Inc.
• The 36th test score on the list is a 10 (the last of the 10’s) and the next 13 scores are all 11’s. We can conclude that the 38th test score is 11. Thus, M = 11.
Example 14.14 Stat 101 Test Scores:
Part 5
Excursions in Modern Mathematics, 7e: 14.1 - 112 Copyright © 2010 Pearson Education, Inc.
• The locator for the first quartile is L = (0.25)
75 = 18.75 which is rounded up to 19.
• To find the nineteenth score in the
frequency table, we tally frequencies from
left to right: 1 + 1 = 2; 1 + 1 + 2 = 4; 1 + 1 +
2 + 6 = 10; 1 + 1 + 2 + 6 + 10 = 20.
Example 14.14 Stat 101 Test Scores:
Part 5
Excursions in Modern Mathematics, 7e: 14.1 - 113 Copyright © 2010 Pearson Education, Inc.
• The tenth test score is 8 (the last of the 8’s)
and the next ten test scores are all 9.
Hence, the first quartile of the Stat 101
midterm scores is the 19th test score which
is Q1 = 9.
Example 14.14 Stat 101 Test Scores:
Part 5
Excursions in Modern Mathematics, 7e: 14.1 - 114 Copyright © 2010 Pearson Education, Inc.
Similarly, we find the third quartile of the Stat
101 data set is Q3 = 12.
Example 14.14 Stat 101 Test Scores:
Part 5
Excursions in Modern Mathematics, 7e: 14.1 - 115 Copyright © 2010 Pearson Education, Inc.
• The five-number summary is
(1) smallest value in the data set (called the Min),
(2) first quartile Q1,
(3) median M,
(4) third quartile Q3, and
(5) the largest value in the data set (called the
Max).
The Five-Number Summary
Excursions in Modern Mathematics, 7e: 14.1 - 116 Copyright © 2010 Pearson Education, Inc.
For the Stat 101 data set, the five-number
summary is Min = 1, Q1 = 9, M = 11, Q3 = 12,
Max = 24. What useful information can we get
out of this?
–the N = 75 test scores were not evenly spread
out over the range of possible scores. For
example, from M = 11 and Q3 = 12 we can
conclude that at least 25% of the class (that
means at least 19 students) scored either 11 or 12
on the test.
Example 14.16 Stat 101 Test Scores:
Part 6
Excursions in Modern Mathematics, 7e: 14.1 - 117 Copyright © 2010 Pearson Education, Inc.
• from Q3 = 12 and Max = 24 we can
conclude that less than one-fourth of the
class (i.e., at most 18 students) had scores
in the 13–24 point range.
• Using similar arguments, we can conclude
that at least 19 students had scores
between Q1 = 9 and M = 11 points and no
more than 18 students scored in the 1–8
point range.
Example 14.16 Stat 101 Test Scores:
Part 6
Excursions in Modern Mathematics, 7e: 14.1 - 118 Copyright © 2010 Pearson Education, Inc.
• A box plot (also known as a box-and-whisker plot) is a picture of the five-number summary of a data set.
• The box plot consists of a rectangular box that sits above a scale and extends from the first quartile Q1 to the third quartile Q3 on that scale.
Box Plots
Excursions in Modern Mathematics, 7e: 14.1 - 119 Copyright © 2010 Pearson Education, Inc.
• A vertical line crosses the box, indicating the position of the median M. On both sides of the box are “whiskers” extending to the smallest value, Min, and largest value, Max, of the data.
Box Plots
Excursions in Modern Mathematics, 7e: 14.1 - 120 Copyright © 2010 Pearson Education, Inc.
This figure shows a generic box plot for a
data set.
Box Plots
Excursions in Modern Mathematics, 7e: 14.1 - 121 Copyright © 2010 Pearson Education, Inc.
This figure shows a box plot for the Stat 101
data set. The long whiskers in this box plot
are largely due to the outliers 1 and 24.
Box Plots
Excursions in Modern Mathematics, 7e: 14.1 - 122 Copyright © 2010 Pearson Education, Inc.
This figure shows a variation of the same
box plot, but with the two outliers, marked
with two crosses, segregated from the rest
of the data.
Box Plots
Excursions in Modern Mathematics, 7e: 14.1 - 123 Copyright © 2010 Pearson Education, Inc.
This figure shows box plots for the starting
salaries of two different populations: first-year
agriculture and engineering graduates of
Tasmania State University.
Example 14.17 Comparing Agriculture
and Engineering Salaries
Excursions in Modern Mathematics, 7e: 14.1 - 124 Copyright © 2010 Pearson Education, Inc.
Superimposing the two box plots on the same scale allows us to make some useful comparisons. It is clear, for instance, that engineering graduates are doing better overall than agriculture graduates, even though at the very top levels agriculture graduates are better paid.
Example 14.17 Comparing Agriculture
and Engineering Salaries
Excursions in Modern Mathematics, 7e: 14.1 - 125 Copyright © 2010 Pearson Education, Inc.
Another interesting point is that the median salary of agriculture graduates ($43,000) is less than the first quartile of the salaries of engineering graduates ($45,000).
Example 14.17 Comparing Agriculture
and Engineering Salaries
Excursions in Modern Mathematics, 7e: 14.1 - 126 Copyright © 2010 Pearson Education, Inc.
The very short whisker on the left side of the agriculture box plot tells us that the bottom 25% of agriculture salaries are concentrated in a very narrow salary range ($32,500–$35,000).
Example 14.17 Comparing Agriculture
and Engineering Salaries
Excursions in Modern Mathematics, 7e: 14.1 - 127 Copyright © 2010 Pearson Education, Inc.
We can also see that agriculture salaries are much more spread out than engineering salaries,even though most of the spread occurs at the higher end of the salary scale.
Example 14.17 Comparing Agriculture
and Engineering Salaries
Excursions in Modern Mathematics, 7e: 14.1 - 128 Copyright © 2010 Pearson Education, Inc.
14 Descriptive Statistics
14.1 Graphical Descriptions of Data
14.2 Variables
14.3 Numerical Summaries
14.4 Measures of Spread
Excursions in Modern Mathematics, 7e: 14.1 - 129 Copyright © 2010 Pearson Education, Inc.
• The difference between the highest and
lowest values of the data in a data set is.
called the range of the data set denoted
by R. Thus,
R = Max – Min
• The range of a data set is a useful piece
of information when there are no outliers
in the data. In the presence of outliers
the range tells a distorted story.
The Range
Excursions in Modern Mathematics, 7e: 14.1 - 130 Copyright © 2010 Pearson Education, Inc.
Example 14.1 Stat 101 Test Scores
The range of the test scores in the Stat 101
exam is 24 – 1 = 23 points
Excursions in Modern Mathematics, 7e: 14.1 - 131 Copyright © 2010 Pearson Education, Inc.
• The range is sensitive to outliers.
• In the previous example, if we discount
the two outliers, the remaining 73 test
scores would have a much smaller range
of 16 – 6 = 10 points.
The Range
Excursions in Modern Mathematics, 7e: 14.1 - 132 Copyright © 2010 Pearson Education, Inc.
• To eliminate the possible distortion
caused by outliers, a common practice
when measuring the spread of a data set
is to use the interquartile range,
denoted by the acronym IQR.
• The interquartile range is the difference
between the third quartile and the first
quartile
IQR = Q3 – Q1
The Interquartile Range
Excursions in Modern Mathematics, 7e: 14.1 - 133 Copyright © 2010 Pearson Education, Inc.
• The IQR tells us how spread out the
middle 50% of the data values are.
• For many types of real-world data, the
interquartile range is a useful measure of
spread.
The Interquartile Range
Excursions in Modern Mathematics, 7e: 14.1 - 134 Copyright © 2010 Pearson Education, Inc.
For the Stat 101 data set, the five-number
summary is
Min = 1, Q1 = 9, M = 11, Q3 = 12, Max = 24
IQR = Q3 – Q1 = 12 – 9 = 3
The middle 50% of the exam scores differ by
at most 3 points.
Example 14.16 Stat 101 Test Scores
Excursions in Modern Mathematics, 7e: 14.1 - 135 Copyright © 2010 Pearson Education, Inc.
• Page 550, problem 52
• Note: use the solutions from exercise 37
which are that the first quartile is 29 and
the third quartile is 32
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 136 Copyright © 2010 Pearson Education, Inc.
• Page 550, problem 52
• Solution: there are 10 outliers which are 6
ages of 37 and 4 ages of 39.
Examples
Excursions in Modern Mathematics, 7e: 14.1 - 137 Copyright © 2010 Pearson Education, Inc.
• The most important and most commonly used measure of spread for a data set is the standard deviation.
• The key concept for understanding the standard deviation is the concept of deviation from the mean. If A is the average of the data set and x is an arbitrary data value, the difference x – A is the deviation from the mean for the data value x.
Standard Deviation
Excursions in Modern Mathematics, 7e: 14.1 - 138 Copyright © 2010 Pearson Education, Inc.
• The deviations from the mean tell us how “far” the data values are from the average value of the data.
• We will use this information to figure out how spread out the data is.
Standard Deviation
Excursions in Modern Mathematics, 7e: 14.1 - 139 Copyright © 2010 Pearson Education, Inc.
■ Let A denote the mean of the data set. For each number x in the data set, compute its deviation from the mean (x – A) and square each of these numbers. These numbers are called the squared deviations.
■ Find the average of the squared deviations. This number is called the variance V.
THE STANDARD DEVIATION
OF A DATA SET
Excursions in Modern Mathematics, 7e: 14.1 - 140 Copyright © 2010 Pearson Education, Inc.
■ The standard deviation, denoted by the greek letter , is the square
root of the variance
THE STANDARD DEVIATION
OF A DATA SET
V
Excursions in Modern Mathematics, 7e: 14.1 - 141 Copyright © 2010 Pearson Education, Inc.
Over the course of the semester, Angela
turned in all of her homework assignments.
Her grades in the 10 assignments (sorted
from lowest to highest) were
85, 86, 87, 88, 89, 91, 92, 93, 94, 95
Calculate the standard deviation of this data
set.
Example 14.19 Calculation of a SD
Excursions in Modern Mathematics, 7e: 14.1 - 142 Copyright © 2010 Pearson Education, Inc.
First calculate the average of the data set:
85, 86, 87, 88, 89, 91, 92, 93, 94, 95
Adding all of the data values gives 900.
Average = 900/10 =90
Example 14.19 Calculation of a SD
Excursions in Modern Mathematics, 7e: 14.1 - 143 Copyright © 2010 Pearson Education, Inc.
Next calculate the deviations of each data
value in the data set. After subtracting 90
from each data value we get:
-5, -4, -3, -2, -1, 1, 2, 3, 4, 5
Next square each of these values to get the
squared deviations:
25, 16, 9, 4, 1, 1, 4, 9, 16, 25
Example 14.19 Calculation of a SD
Excursions in Modern Mathematics, 7e: 14.1 - 144 Copyright © 2010 Pearson Education, Inc.
Next we average the squared deviations to
get the variance:
V = 110/10 = 11
Finally we take the square root of the
variance to get the standard deviation:
Example 14.19 Calculation of a SD
3.311
Excursions in Modern Mathematics, 7e: 14.1 - 145 Copyright © 2010 Pearson Education, Inc.
Calculating the deviations
and squared deviations are
can be easily summarized in
a table. Then we add the
numbers in the last column
(squared deviations) and
divide by 10 to get the
standard deviation.
Example 14.19
Excursions in Modern Mathematics, 7e: 14.1 - 146 Copyright © 2010 Pearson Education, Inc.
• It is clear from just a casual look at
Angela’s homework scores that she was
pretty consistent in her homework, never
straying too much above or below her
average score of 90 points.
• The standard deviation is, in effect, a way
to measure this degree of consistency (or
lack thereof).
Interpreting the Standard Deviation
Excursions in Modern Mathematics, 7e: 14.1 - 147 Copyright © 2010 Pearson Education, Inc.
• A small standard deviation tells us that the
data are consistent and the spread of the
data is small, as is the case with Angela’s
homework scores.
Interpreting the Standard Deviation
Excursions in Modern Mathematics, 7e: 14.1 - 148 Copyright © 2010 Pearson Education, Inc.
The ultimate in consistency within a data set
is when all the data values are the same (like
Angela’s friend Chloe, who got a 20 in every
homework assignment). When this happens
the standard deviation is 0.
Interpreting the Standard Deviation
Excursions in Modern Mathematics, 7e: 14.1 - 149 Copyright © 2010 Pearson Education, Inc.
On the other hand, when there is a lot of
inconsistency within the data set, we are
going to get a large standard deviation. This
is illustrated by Angela’s other friend, Tiki,
whose homework scores were
5, 15, 25, 35, 45, 55, 65, 75, 85, 95
The standard deviation of this data is almost
29 points.
Interpreting the Standard Deviation
Excursions in Modern Mathematics, 7e: 14.1 - 150 Copyright © 2010 Pearson Education, Inc.
The standard deviation is arguably the most
important and frequently used measure of
data spread. Here are a few facts.
Summary of the Standard Deviation
Excursions in Modern Mathematics, 7e: 14.1 - 151 Copyright © 2010 Pearson Education, Inc.
• The standard deviation of a data set is
measured in the same units as the original
data.
• For example, if the data are points are
dollar amounts then the standard deviation
is given in dollars; if the data have units of
gallons then the standard deviation has
units of gallons.
Summary of the Standard Deviation
Excursions in Modern Mathematics, 7e: 14.1 - 152 Copyright © 2010 Pearson Education, Inc.
• If the standard deviation is small, we can
conclude that the data points are all
bunched together–there is very little
spread. A standard deviation of 0 means
that every data value is the same.
• As the standard deviation is large, we can
conclude that the data points are spread
out.
Summary of the Standard Deviation