Math 120 Notes – Chapter 3 – Numerically Summarizing Data Section 3.1 – Measures of Central Tendency Objectives: 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data 3. Explain what it means for a statistic to be resistant 4. Determine the mode of a variable from raw data Objective – Determine the Arithmetic Mean of a Variable from Raw Data - – a.k.a. the average – computed by adding all the values of the variable in the data set and dividing by the number of observations - – (pronounced “mew”), is computed using all the individuals in a population (this is a parameter) o If x 1 , x 2 , …, x N are the N observations of a variable from a population, then the population mean, µ, is - – (pronounced “x-bar”), is computed using sample data (this is a statistic) If x 1 , x 2 , …, x n are the n observations of a variable from a sample, then the sample mean, ! , is *Note: You will have to do this by hand on the exam μ = x 1 + x 2 + + x N N = x i ∑ N x = x 1 + x 2 + + x n n = x i ∑ n Math 203
35
Embed
Math 120 Notes – Chapter 3 – Numerically Summarizing Data Math 203 Section 3… · 2020-01-03 · Math 120 Notes – Chapter 3 – Numerically Summarizing Data Section 3.1 –
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Math 120 Notes – Chapter 3 – Numerically Summarizing Data
Section 3.1 – Measures of Central Tendency
Objectives:
1. Determine the arithmetic mean of a variable from raw data
2. Determine the median of a variable from raw data
3. Explain what it means for a statistic to be resistant
4. Determine the mode of a variable from raw data
Objective – Determine the Arithmetic Mean of a Variable from Raw Data
- – a.k.a. the average – computed by adding all the
values of the variable in the data set and dividing by the number of observations
- – (pronounced “mew”), is computed using all
the individuals in a population (this is a parameter)
o If x1, x2, …, xN are the N observations of a variable from a population, then the population
mean, µ, is
- – (pronounced “x-bar”), is computed using
sample data (this is a statistic)
If x1, x2, …, xn are the n observations of a variable from a sample, then the sample mean, ! , is
*Note: You will have to do this by hand on the exam
µ =x1 + x2 ++ xN
N=
xi∑N
x = x1 + x2 ++ xnn
=xi∑n
Math 203
HillMath120,Page 2
Example: Computing a Population Mean and a Sample Mean
The following data represent the travel times (in minutes) to work for all seven employees of a start-up web
development company.
23, 36, 23, 18, 5, 26, 43
a) Compute the population mean of this data
b) Use the sample of 4 employees given below to compute the sample mean.
23, 36, 5, 26
c) Use the sample of 4 employees given below to compute the sample mean.
36, 23, 43, 26
d) What do you notice about the three different means found in parts (a) – (c)?
HillMath120,Page 3
Objective – Determine the Median of a Variable from Raw Data
- – the value of a variable that lies in the middle of the data
when arranged in ascending order
o the bottom 50% of the data from the top 50%
Steps in Finding the Median of a Data Set
1. Arrange the data in ascending order (smallest to biggest).
2. Determine the number of observations
3. Determine the observation in the middle of the data set.
a. If the number of observations is , then the median is the data value that is exactly in the
middle of the data set.
b. If the number of observations is , then the median is the mean of the two middle
observations in the data set.
*Note: You will have to do this by hand on the exam
Example: Computing a Median of a Data Set with an Odd Number of Observations
Determine the median travel time for all seven employees of the company from the previous example.
23, 36, 23, 18, 5, 26, 43
HillMath120,Page 4
Example: Computing a Median of a Data Set with an Even Number of Observations
1. Determine the median of the first sample data from the previous example.
23, 36, 5, 26
2. Determine the median of the second sample data from the previous example.
36, 23, 43, 26
3. What do you notice about the medians found in the previous problems, as compared to the means?
- – A numerical summary (mean, median, etc.) of data in which
extreme values (very large or small) relative to the data do not affect its value substantially
HillMath120,Page 5
Now we are going to find the mean and median on our calculators:
First, to input the values into L1 (or any list) hit:
Stat, highlight Edit, Enter, then hit enter after inputting each number:
23, 36, 23, 18, 5, 26, 43
To find the mean hit:
Stat, right arrow once to Calc, Enter to execute 1-Var Stats. Then hit 2nd, L1 (the #1), Enter
Note: To clear a list, arrow to the top of the list until the list name (Such as L1) is highlighted then
hit the clear button followed by enter. Make sure to hit clear, not delete.
HillMath120,Page 6
Relation Between the Mean, Median, and Distribution Shape
Objective – Determine the Mode of a Variable from Raw Data
- – the most frequent observation of a variable that occurs in the data set
o A set of data can have no mode, one mode, or more than one mode
o If there all observations have the same frequency, then the data set has no mode.
o If there are two observations that occur with the highest frequency, then the data is bimodal.
o If there are more than two observations that occur with the highest frequency, then the data
is multimodal.
HillMath120,Page 7
Example: Finding the Mode of a Data Set
A sample of 30 registered voters was surveyed in which the respondents were asked, “Do you consider
your political views to be conservative, moderate, or liberal?” The results of the survey are shown in the
table.
Determine the mode political view.
HillMath120,Page 8
Summary of the Measures of Central Tendency
HillMath120,Page 9
Section 3.2 – Measures of Dispersion
Objectives
1. Determine the range of a variable from raw data
2. Determine the standard deviation of a variable from raw data
3. Determine the variance of a variable from raw data
4. Use the Empirical Rule to describe data that are bell shaped
5. Use Chebyshev’s Inequality to describe any data set
Quick example discussing Dispersion:
To order food at a McDonald’s restaurant, one must choose from multiple lines, while at Wendy’s
Restaurant, one enters a single line.
The following data represent the wait time (in minutes) in line for a simple random sample of 30 customers
at each restaurant during the lunch hour.
The mean wait time at both restaurants is 1.39 minutes
HillMath120,Page 10
Looking at the graphs, McDonald’s wait time appears to be more
Objective – Determine the range of a variable from raw data
- – of a variable is the difference between the largest data value and
the smallest data values (difference between the min and max)
o That is, Range = R = Largest Data Value – Smallest Data Value
Example: Finding the Range of a Set of Data
The following data represent the travel times (in minutes) to work for all seven employees of a start-up web
development company. Find the range of the data.
23, 36, 23, 18, 5, 26, 43
Range =
HillMath120,Page 11
*We will have to do this by hand and on calculators on exams (and show how you did it)
Objective – Determine the Standard Deviation of a Variable from Raw Data
- (!) is given by the formula:
Where x1, x2, . . . , xN are the N observations in the population and µ is the population mean.
Example: Computing a Population Standard Deviation
Back to the travel times (in minutes) to work for all seven employees of a start-up web development
company. Compute the population standard deviation of this data.
23, 36, 23, 18, 5, 26, 43
σ =x1 −µ( )2 + x2 −µ( )2 ++ xN −µ( )2
N
=xi −µ( )2∑N
HillMath120,Page 12
- (s) is given by the formula:
where x1, x2, . . . , xn are the n observations in the sample and is the sample mean
Example: Computing a Sample Standard Deviation
Take a sample of 4 of the travel times to work from the previous example. Compute the sample standard
deviation of this data.
Sample:
*We will have todo this by handand on calculatorson exams (andshow how you didit)
Careful!!! It’s similar to the formula for population standard deviation right?
HillMath120,Page 13
Using your Calculator to find Population and Sample Standard Deviations
We can use our calculators to find population and sample standard deviations by executing the 1-Var Stats
program on the data and knowing how to interpret the results.
Let’s use the travel times of the start-up company data: 23, 36, 23, 18, 5, 26, 43
Input the data into L1 (if you don’t have it already), execute 1-Var Stats L1 (I hope you remember how).
The standard deviation is σx =
The standard deviation is sx =
HillMath120,Page 14
Extra Example: Computing a Population and Sample Standard Deviation
Given are the scores of 6 different Statistics students’ first exam:
81, 68, 91, 72, 55, 70
a) Treating the scores as a population, find the standard deviation of the test scores.
b) Now treat the scores as a sample, and find the standard deviation of the test scores.
HillMath120,Page 15
– we call n - 1 the degrees of freedom because the
first n - 1 observations have freedom to be whatever value they wish, but the nth value has no freedom
o The nth must be whatever value forces the sum of the deviations about the mean to equal
zero (it cancels out all the deviations)
Example: Comparing Standard Deviations
Recall the Wendy’s and McDonald’s data. Which data has a larger standard deviation and how do you
know?
Remember we said that the McDonald’s data is more
Overall Idea: More dispersed data results in a standard deviation.
HillMath120,Page 16
Objective – Determine the Variance of a Variable from Raw Data
- – the square of the standard deviation
o The population variance is
o The sample variance is
Example: Computing a Population Variance
We previously computed the standard deviation of the travel times to work for all seven employees of the
start-up web development company.
23, 36, 23, 18, 5, 26, 43
Recall that the population standard deviation was σ = 11.36 minutes
So the population variance is σ2 =
If we were told to treat the data as a sample, the sample standard deviation would be
s = 12.27 minutes
So the sample variance is s2 =
HillMath120,Page 17
Objective – Use the Empirical Rule to Describe Data That Are Bell Shaped
The Empirical Rule: 68, 95, 99.7
If a distribution is roughly , then
o Approx. of the data will lie within 1 standard deviation of the mean
o Approx. of the data will lie within 2 standard deviations of the mean
o Approx. of the data will lie within 3 standard deviations of the mean
Note: We can also use the Empirical Rule based on sample data with ! used in place of µ and s used in
place of σ.
HillMath120,Page 18
Example: Using the Empirical Rule
The following data represent the serum HDL cholesterol of the 54 female patients of a family doctor.
41 48 43 38 35 37 44 44 44
62 75 77 58 82 39 85 55 54
67 69 69 70 65 72 74 74 74
60 60 60 61 62 63 64 64 64
54 54 55 56 56 56 57 58 59
45 47 47 48 48 50 52 52 53
We are told that the data has a bell-shaped distribution. Also, the population mean, !, is 57.4 and the
population standard deviation, !, is 11.7.
a) According to the Empirical Rule determine the percentage of all patients that have serum HDL
within 3 standard deviations of the mean.
b) According to the Empirical Rule, determine the percentage of all patients that have serum HDL
between 34 and 69.1
c) According to the Empirical Rule, determine the percentage of all patients that have serum HDL greater
than 69.1.
HillMath120,Page 19
Extra Example: Using the Empirical Rule
A random sample of 30 statistics exams from a previous semester has a mean score, !, of 72.1 with a
standard deviation of 4.3. The distribution of exam scores is known to be approximately bell-shaped.
Determine the following:
a) The approximate percentage of exams with scores greater than 80.7.
b) About 95% of scores will lie between what two scores?
c) About 68% of scores will lie between what two scores?
HillMath120,Page 20
Objective – Use Chebyshev’s Inequality to Describe Any Set of Data
Chebyshev’s Inequality:
For any data set or distribution, at least 1− !!! ⋅ 100% of the observations lie within standard
deviations of the mean, where k is any number greater than 1.
Note: We can also use Chebyshev’s Inequality based on sample data.
Example: Using Chebyshev’s Theorem
Using the data from the previous example, use Chebyshev’s Theorem to determine the following:
a) The percentage of exams scores within 3 standard deviations of the mean, 72.1.
b) The minimum percentage of exam scores between 63.5 and 80.7.
HillMath120,Page 23
Objective – Compute the Weighted Mean
- – mean found by multiplying each value of the variable by its
corresponding weight, adding these products, and dividing this sum by the sum of the weights
where w is the weight of the ith observation and xi is the value of the ith observation
Example: Computed a Weighted Mean
Bob goes to the “Buy the Weigh” Nut store and creates his own bridge mix. He combines 1 pound of
raisins, 2 pounds of chocolate covered peanuts, and 1.5 pounds of cashews.
The raisins cost $1.25 per pound, the chocolate covered peanuts cost $3.25 per pound, and the cashews cost
$5.40 per pound. What is the cost per pound of this mix?
Gathering the data:
Raisins:
Peanuts:
Cashews:
*Again we will have to do this by hand and on calculators on exams (and show/explain how you did)
HillMath120,Page 27
Section 3.4 - Measures of Position and Outliers
Objectives:
1. Determine and interpret z-scores
2. Interpret percentiles
3. Determine and interpret quartiles
4. Determine and interpret the interquartile range
5. Check a set of data for outliers
Objective – Determine and interpret z-scores
– measures how many standard deviations the data value is above or below the mean.
o The z-score of a data value can be found using the following formulas. There is both a
population z-score and a sample z-score formula:
Population z-score Sample z-score
! = !!!! ! = !!!
!
o A negative z-score would indicate that the data value is below the mean.
o A positive z-score would indicate that the data value is above the mean.
o Z-scores are typically rounded to two decimal places
3.3
HillMath120,Page 28
Examples: Using Z-Scores
1. You are filling out an application for college. The application requests either your ACT score or your
SAT I score. You scored a 32 on the ACT and a 635 on the SAT I. On the ACT exam, the mean score
is 30 with a standard deviation of 4, while the SAT I has a mean score of 505 with a standard deviation
of 109. Which test score should you provide on your application? Why?
2. A highly selective boarding school will only admit students who place at least 1.5 standard deviations
above the mean on a standardized test that has a mean of 200 and a standard deviation of 26. What is
the minimum score that an applicant must make on the test to be accepted?
Hill$Math$120,$Page$ 28$
Example: Using Z-Scores
Sports enthusiasts love to debate who is a “better” player when a direct comparison cannot occur. For
example, in 2010, Josh Johnson of the Florida Marlins had the lowest earned-run average (ERA is the
mean number of runs yielded per nine innings pitched) of any starting pitcher in the National League, with
an ERA of 2.30. Meanwhile, Clay Bucholz of the Boston Red Sox finished 2nd in the American League
with an ERA of 2.33.
In the National League, the mean ERA in 2010 was 3.622 and the standard deviation was 0.743. In the
American League, the mean ERA in 2010 was 3.929 and the standard deviation was 0.775. Which player
had the better year relative to his peers? Why?
Clay had a better year relative to his peers based on his z score. He was just more than 2 standard deviations away from the mean compared to Josh Johnson being less than 2 deviations away from the mean.
Hill$Math$120,$Page$ 30$
Objective 2 - Interpret Percentiles – the kth percentile
- – a value such that approximately k percent of the
observations in a data set are less than or equal to the value discussed.
Example: Interpret a Percentile
The Graduate Record Examination (GRE) is a test required for admission to many U.S. graduate schools.
The University of Pittsburgh Graduate School of Public Health requires a GRE score no less than the 70th
percentile for admission into their Human Genetics MPH or MS program. Interpret this admissions
Interpretation: In order to be admitted to this program, an applicant must score
than 70% of the people who take the GRE. Put another way, the individual’s
score must be in the
Another (quick) Example: Interpret a Percentile
Sticking with the GRE idea, let’s say that Stanford requires a GRE score no less than the 85th percentile for
admission into their M.B.A. program. In order to be admitted, an applicant would have to score in the top
what percentage on the test?
Hill$Math$120,$Page$ 31$
Objective 3 - Determine and Interpret Quartiles
- divide data sets into fourths, or four equal parts
• The 1st quartile, Q1, is equivalent to the percentile.
o Q1 divides the bottom 25% the data from the top 75%.
• The 2nd quartile is equivalent to the , or the 50th percentile
o Q2 divides the bottom 50% of the data from the top 50% of the data
• The 3rd quartile, Q3, is equivalent to the percentile.
o Q3 divides the bottom 75% of the data from the top 25% of the data.
• The 4th quartile is just the value (We don’t really use a Q4)
Finding Quartiles
Step 1 Arrange the data in ascending order. Step 2 Determine the median, M, or second quartile, Q2. Step 3 Divide the data set into halves: the observations below (to the left of) M and the
observations above M. - The first quartile, Q1, is the median of the bottom half- The third quartile, Q3, is the median of the top half.
*Note: If the number of observations is odd, do not include the median when determining Q1 and Q3 byhand.
Hill$Math$120,$Page$ 32$
Example: Finding and Interpreting Quartiles
A group of BYU students collected data on the speed of vehicles traveling through a construction zone on a
state highway, where the posted speed was 25 mph. The recorded speed of 14 randomly selected vehicles is
Step 2: (we previously found) The interquartile range is
Step 3: Thus the fences are
Lower Fence: Q1 – 1.5(IQR) =
Upper Fence: Q3 + 1.5(IQR) =
Step 4: Conclusion:
Hill$Math$120,$Page$ 36$
Extra Practice: Finding the Mean and Standard Deviation of Grouped Data
The following data represent SAT Mathematics scores for 2010. Approximate the mean and standard deviation of the scores both by hand and using your calculator.
SAT Math Score Frequency
200–299 36,305 300–399 193,968 400–499 459,010 500–599 467,855 600–699 286,518 700–800 104,334 Source: The College Board
Hill$Math$120,$Page$ 37$
Section 3.5 – The Five-Number Summary and Boxplots
Objectives:
1. Compute the five-number summary
2. Draw and interpret boxplots
Objective 1 – Compute the Five-Number Summary
- – the minimum, Q1, median, Q3, and maximum values of a set of data
*Min = smallest data value; Max = largest data value
We organize the Five-Number summary as follows:
Example: Obtaining the Five-Number Summary
Every six months, the United States Federal
Reserve Board conducts a survey of credit card
plans in the U.S.
The following data are the interest rates charged
by 10 credit card issuers randomly selected.
Determine the five-number summary of the data
by hand.
Hill$Math$120,$Page$ 38$
Now using your calculator, Input the data into our calculator (in any list) and execute the 1-VarStats
program. Scroll down to obtain the Five Number Summary.
*Note: Input values without percent signs, i.e. 6.5 rather than 6.5%
Five Number Summary:
Objective 2 – Draw and Interpret Boxplots
Example: Drawing a Boxplot
Back to the interest rates charged by the 10 credit card issuers in a previous example. Below is the data (in