Feb 23, 2016
Outline
1. Histograms and boxplots2. Mean and standard deviation3. Proportions and bar charts4. Sampling and allocation5. Inference and confidence
intervals
6. t tests and alternatives7. ANOVA8. Regression and correlation9. More ANOVA and regression10. Categorical data analysis
Histograms and Boxplots
Learning outcomesStatisticaleseMaking histograms
- deciding type and bin width- the macro/micro distinction in graphing
Making boxplots - ranking and ordering data- learning the 5-point summary
Statisticalese
I will probably have a bagel today.Probability of having a bagel > 50%
It takes about 20 minutes to cook rice.The central tendency (more on what this means throughout the course) for cooking rice is 20 minutes.
Statisticalese takes English phrases that include numerical information and uncertainity and translates them (often making them more precise).
Today's data set: DNA exonerations
• Hundreds of people found guilty of crimes, who spent time in prison, and later exonerated by DNA evidence.
• http://www.innocenceproject.org/• http://www.fiu.edu/~dwright/steps/dnaphotos.pptx
Casenoi firstni lastni statei year1i year2i timei 1 Gary Dotson Illinois 1979 1989 10
2 David Vasquez Virginia 1985 1989 4 3 Edward Green DC. 1989 1990 1 : : : : : : :162 Leo Waters N. Carolina 1981 2005 24163 George Rodriquez Texas 1987 2005 18
This is what a data file looks like in most statistics packages
Focus is on the timei variable for years in prison. The subscripts show the values vary.
Frequency Table
Histogram: With dots
0 10 20 30Years in prison
Freq
uenc
y
0 10 20 30Years in prison
0
1
0
20
Stem and leaf diagram
Deciding bin width
Name histogram
5 point summary
values: 2 5 8 3 8 7 2 2 12 sorted: 2 2 2 3 4 7 8 8 12ranks: 1 2 3 4 5 6 7 8 9
values: 2 2 2 3 4 7 8 8 12ranks: 1 2 3 4 5 6 7 8 9
↑ ↑ ↑ ↑ ↑minimum first
quartilemedian third
quartilemaximum
Median when n is even: the mid-rank
Boxplots (Box and Whiskers)
0 10 20 30Years in Prison
0 10 20 30Years in Prison
Comparing histograms and boxplots
Summary
• Statisticalese. A language for numbers and chance.• Histograms. Decide bin width.• Boxplot. Shows outliers well.
• Graphs. Make clear. Avoid adding frills.