AP Statistics Monday, 31 August 2015 • OBJECTIVE TSW learn (1) the reasons for studying statistics, and (2) vocabulary. • FORM DUE (only if it is signed) – Information Sheet (wire basket) • If you have T-shirt money, bring it up at the beginning of the period (after the bell rings). • Assignments (WS and newspaper
64
Embed
AP Statistics Monday, 31 August 2015 OBJECTIVE TSW learn (1) the reasons for studying statistics, and (2) vocabulary. FORM DUE (only if it is signed) –Information.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
AP StatisticsMonday, 31 August 2015
• OBJECTIVE TSW learn (1) the reasons for studying statistics, and (2) vocabulary.
• FORM DUE (only if it is signed)– Information Sheet (wire basket)
• If you have T-shirt money, bring it up at the beginning of the period (after the bell rings).
• Assignments (WS and newspaper article) will be collected on Wednesday, 09/02/2015.
Chapter 1 Assignments1) WS Chapter 1
– Due on Wednesday, 02 September 2015.
2) Newspaper article (You may type or hand-write this, but your answers must be complete sentences.)
– Look in the newspaper (you may have to go on-line if you do not get a newspaper) for an article that uses statistics to reach a conclusion.
– In your own words, describe the situation and conclusion.
– Based on the information in the article, is the conclusion reasonable? Why or why not?
Histograms: Displaying the Distributionof Earthquake Magnitudes (cont.)
A relative frequency histogram displays the percentage of cases in each bin instead of the counts. In this way, relative
frequency histograms are faithful to the area principle.
Here is a relative frequency histogram of earthquake magnitudes:
Stem-and-Leaf DiagramA quick technique for picturing the distributional pattern associated with numerical data is to create a picture called a stem-and-leaf diagram (Commonly called a stem plot).
1. We want to break up the data into a reasonable number of groups.
2. Looking at the range of the data, we choose the stems (one or more of the leading digits) to get the desired number of groups.
3. The next digits (or digit) after the stem become(s) the leaf.
4. Typically, we truncate (leave off) the remaining digits.
When to Use Stem-and-Leaf Displays
Numerical data sets with a small to moderate number of observations.
This does NOT work well with very large data sets.
How to Construct a Stem-and-Leaf Display
1. Select one or more leading digits for the stem values. The trailing digits (or sometimes just the first one of the trailing digits) become the leaves.
2. List possible stem values in a vertical column.
3. Record the leaf for every observation beside the corresponding stem value.
4. Indicate the units for stems and leaves somewhere in the display.
AP StatisticsTuesday, 01 September 2015
• OBJECTIVE TSW explore (1) histograms, (2) stem-and-leaf plots, (3) dot plots, and (4) boxplots and (5) describe the center, shape, and spread of a distribution.
• FORM DUE (only if it is signed)– Information Sheet (wire basket)
• Get out WS Chapter 1.
• If you have T-shirt money, bring it up at the beginning of the period (after the bell rings).
• QUIZ: Ch. 1 & 2 will be tomorrow, 09/02/15.– I will TRY (very hard) to post both Ch.1 and Ch. 2 PowerPoints.
• ASSIGNMENTS DUE TOMORROW (09/02/15)– WS Chapter 1– Newspaper Article
WS Chapter 11) categorical (qualitative)
2) categorical (qualitative)
3) quantitative
4) quantitative
5) who: 2500 cars what: distance from the bicycle to the pass car population of interest: all cars passing bicyclists
6) who: workers who buy coffee in an office what: amount of money contributed to collection tray population of interest: all people in honor system payment situations
What a Stem-and-Leaf Display Shows
1. A representative or typical value in the data set.
2. The extent of the spread about such a value.
3. The presence of any gaps in the data.
4. The extent of the symmetry in the distribution of values.
5. The number and location of peaks.6. The presence of any outliers.
Stem Plot
10 11 12 13 14 15 16 17 18 19 20
33154504900500000570000
50
Choosing the 1st two digits as the stem and the 3rd digit as the leaf we have the following:
For our first example, we use the weights of 25 female students.
10 11 12 13 14 15 16 17 18 19 20
33014455000590000005700
50
Typically we sort the order of the stems in increasing order.
We also note on the diagram the units for stems and leaves
Stem: Tens and hundredsdigits
Leaf: Ones digit
Probable outliers
Stem Plot
Definition: Outlier
An outlier is an unusually small or large data value.
When to Use Stem-and-Leaf Displays
Use with numerical data sets with a small to moderate number of observations.
NOTE: Stem-and-leaf displays do not work well with very large data sets.
The following are the GPAs for the 20 advisees of a faculty member.
If the ones digit is used as the stem, you only get three groups. You can expand this a little by breaking up the stems by using each stem twice letting the 2nd digits 0-4 go with the first and the 2nd digits 5-9 with the second.
The next slide gives two versions of the stem-and-leaf diagram.
From this comparative stem and leaf diagram, it is clear that the male ages are all more closely grouped then the female ages. Also, the females have a number of outliers.
If you can fold the histogram along a vertical line through the middle and have the edges roughly match, the histogram is symmetric.
AP StatisticsWednesday, 02 September 2015
• OBJECTIVE TSW explore (1) histograms, (2) stem-and-leaf plots, (3) dot plots, and (4) boxplots and (5) describe the center, shape, and spread of a distribution.
• ASSIGNMENTS DUE– WS Chapter 1 wire basket
– Newspaper Article black tray
• If you have T-shirt money, bring it up at the beginning of the period (after the bell rings).
The 5-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum)
The 5-number summary for the recent tsunami earthquake Magnitudes looks like this:
Why use boxplots?• ease of construction• convenient handling of outliers• construction is not subjective
(like histograms)• Used with medium or large size
data sets (n > 10)• useful for comparative displays
Disadvantage of boxplots
• does not retain the individual observations
• should not be used with small data sets (n < 10)
How to construct • find five-number summary
Min Q1 Med Q3 Max• draw box from Q1 to Q3• draw median as center line in
the box• extend whiskers to min & max
Modified boxplots• display outliers • fences mark off mild &
extreme outliers• whiskers extend to largest
(smallest) data value inside the fence
ALWAYS use modified boxplots in this class!!!
Inner fence
Q1 Q3
Q1 – 1.5IQR Q3 + 1.5IQRAny observation outside this fence is an outlier! Put a dot
for the outliers.
Interquartile Range (IQR) – is the range (length) of the box
Q3 - Q1
Modified Boxplot . . .
Q1 Q3
Draw the “whisker” from the quartiles to the observation that is within the
fence!
Outer fence
Q1 Q3
Q1 – 3IQR Q3 + 3IQR
Any observation outside this fence is an extreme outlier!
Any observation between the fences is considered a mild outlier.
For the AP Exam . . .
. . . you just need to find outliers, you DO NOT need to identify them as mild or extreme.
Therefore, you just need to use the
1.5IQRs
A report from the U.S. Department of Justice gave the following percent increase in federal prison populations in 20 northeastern & mid-western states in 1999.
5.9 1.3 5.0 5.9 4.5 5.6 4.1 6.3 4.86.9
4.5 3.5 7.2 6.4 5.5 5.3 8.0 4.4 7.23.2
Create a modified boxplot. Describe the distribution.Use the calculator to create a modified boxplot.
The median is 5.4.There is an outlier at 1.3.The distribution is fairly symmetrical.
Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer.
(see data on note page)
Create parallel boxplots. Compare the distributions.
Cancer
No Cancer
100 200Radon
• The median radon concentration for the no cancer group is lower than the median for the cancer group.
• The range of the cancer group is larger than the range for the no cancer group.
• Both distributions are skewed right.
• The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85.
Because the median considers only the order of values, it is resistant to values that are extraordinarily large or small; it simply notes that they are one of the “big ones” or “small ones” and ignores their distance from center.
To choose between the mean and median, start by looking at the data. If the histogram is symmetric and there are no outliers, use the mean.
However, if the histogram is skewed or with outliers, you are better off with the median.
Tell -- What About Unusual Features? If there are multiple modes, try to understand why.
If you identify a reason for the separate modes, it may be good to split the data into two groups.
If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing. Note: The median and IQR are not likely to be