Statistics for Everyone Workshop Fall 2010 Part 1 Statistics as a Tool in Scientific Research: Summarizing & Graphically Representing Data Workshop presented by Linda Henkel and Laura McSweeney of Fairfield University Funded by the Core Integration Initiative and the Center for Academic Excellence at Fairfield University
47
Embed
Statistics for Everyone Workshop Fall 2010 Part 1 Statistics as a Tool in Scientific Research: Summarizing & Graphically Representing Data Workshop presented.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Statistics for Everyone Workshop Fall 2010
Part 1
Statistics as a Tool in Scientific Research:
Summarizing & Graphically Representing Data
Workshop presented by Linda Henkel and Laura McSweeney of Fairfield University
Funded by the Core Integration Initiative and the Center for Academic Excellence at Fairfield University
Statistics as a Tool in Scientific Research
Types of Research Questions• Descriptive (What does X look like?)
• Correlational (Is there an association between X and Y? As X increases, what does Y do?)
• Experimental (Do changes in X cause changes in Y?)
Different statistical procedures allow us to answer the different kinds of research questions
Statistics as a Tool in Scientific Research
Start with the science and use statistics as a tool to answer the research question
Get your students to formulate a research question first:
• How often does this happen?
• Did all plants/people/chemicals act the same?
• What happens when I add more sunlight, give more praise, pour in more water?
Statistics as a Tool in Scientific Research
Can collect data in class
Can use already collected data (yours or database)
Helping students to formulate research question: Ask them to think about what would be interesting to know. What do they want to find out? What do they expect?
For example, what research questions might you ask from the survey?
Nominal: numbers are arbitrary; 1= male, 2 = female
Ordinal: numbers have order (i.e., more or less) but you do not know how much more or less; 1st place runner was faster but you do not know how much faster than 2nd place runner
Interval: numbers have order and equal intervals so you know how much more or less; A temperature of 102 is 2 points higher than one of 100
Ratio: same as interval but because there is an absolute zero you can talk meaningfully about twice as much and half as much; Weighing 200 pounds is twice as heavy as 100 pounds
Types of Data on Questionnaire1. What College are you from?
CAS Business Engineering Nursing Other
2. How many years have you been teaching at Fairfield University? 3. How important do you think it is to integrate statistics into your
courses?Not at all Somewhat Important Important Very Important
4. How excited are you about integrating statistics into your courses?1 2 3 4 5 6 7
Not at all Extremely excited excited
Types of Data on Questionnaire5. Are you male or female?
6. How many hours a week do you watch television on average? 7. How many hours a day do you spend on the internet on average? 8. Of the following reality/game shows, which one would you most
like to be on?(a) Dancing With the Stars (b) American Idol (c) Bachelor/Bachelorette (d) The Apprentice
9. Can you roll your tongue? Yes No 10. How many siblings do you have?
Types of Statistical Procedures
Descriptive: Organize and summarize data
Inferential: Draw inferences about the relations between variables; use samples to generalize to population
Descriptive Statistics
The first step is ALWAYS getting to know your data
Summarize and visualize your data
It is a big mistake to just throw numbers into the computer and look at the output of a statistical test without any idea what those numbers are trying to tell you or without checking if the assumptions for the test are met.
Descriptive StatisticsNumerical Summaries:• Frequencies • Contingency tables • Measures of central tendency• Measures of variability • Representing numerical summaries in tables
Graphical Summaries: • Bar graphs or Pie graphs• Histograms• Scatterplots• Time series plot
Summarizing and Reporting Categorical Data
Frequency = number of times each score occurs in a set of data
Relative Frequency = percent or proportion of times each score occurs in a set of data
Frequency Table
Marital StatusFrequency
(f)
Relative Frequency
(rel f)
Married 34 .14
Widowed 129 .54
Divorced 35 .15
Separated 30 .12
Never Married 13 .05
Total 241 1.00
Contingency Table
A display to summarize two categorical variables in a table.
Each entry in the table represents the number of observations in a sample with a certain outcome for the 2 variables.
Contingency Tables
Gender Binge Drinker
Non-binge
Drinker
Total
Male 1908 2017 3925
Female 2854 4125 6979
Total 4762 6142 10904
Contingency Tables
Gender Binge Drinker
Non-binge
Drinker
Total
Male 49% 51% 3925
Female 41% 59% 6979
Total 4762 6142 10904
Choosing the Appropriate Type of Graph
One categorical variable (e.g., Political party): Bar Chart or Pie Graph
Two categorical variables (e.g., Political party vs. Gender): Side-by-side Bar Chart
*Notice with 2 variables, one variable may be treated as the dependent variable and one variable may be treated as the independent variable.
Choosing the Appropriate Type of Graph
One numerical variable (e.g., Height): Histogram
One numerical variable and one categorical variable (e.g., Height vs. Gender): Side-by-side Histograms
Two paired numerical variables (e.g., Weight vs. Exercise per week): Scatterplot
One numerical variable over time (e.g., Number of Cells vs. Minutes): Time Series Plot
*Notice with 2 variables, one variable may be treated as the dependent variable and one variable may be treated as the independent variable.
Bar Graph (Frequency)
Bar Graph (Relative Frequency)
Pie Chart
Simple Frequency Tables and Bar Graphs
Side by Side Bar Charts
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
Binge Non-binge
Pro
po
rtio
ns
M
F
Conditioned Proportions
Binge Drinker
Nonbinge Drinker
Male 0.49 0.51
Female 0.41 0.59
Histogram of Simple Frequency Data
Size Frequency Distribution for Male Crabs
0
2
4
6
8
10
12
14
11.00-15.00
15.00-19.00
19.00-23.00
23.00-27.00
27.00-31.00
31.00-35.00
35.00-39.00
39.00-43.00
43.00-47.00
47.00-51.00
CW (mm)
Counts
Size Frequency Distribution for Female Crabs
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
11.00-15.00
15.00-19.00
19.00-23.00
23.00-27.00
27.00-31.00
31.00-35.00
35.00-39.00
39.00-43.00
43.00-47.00
47.00-51.00
CW (mm)
Cou
nts
Side by Side Histograms
Percent
Overweight
Threshold
of Pain
89 2
90 3
75 4
30 4.5
51 5.5
75 7
62 9
45 13
90 15
20 14
Scatterplot
Time Series Plot
Software File Updates
0
100
200
300
400
500
0 2 4 6 8 10 12 14
Month Number
Nu
mb
er o
f U
pd
ates
Month Number of Updates
1 323
2 268
3 290
4 405
5 383
6 368
... ...
12 75
Shapes of Distributions
• Normal (approximately symmetric)
• Skewed
• Unimodal/Bimodal/Uniform/Other
• Outliers
The Normal Curve
“Bell-shaped”
Most scores in center, tapering off symmetrically in both tails
Amount of peakedness (kurtosis) can vary
Variations to Normal Distribution
Skew = Asymmetrical distribution
• Positive/right skew = greater frequency of low scores than high scores (longer tail on high end/right)
• Negative/left skew = greater frequency of high scores than low scores (longer tail on low end/left)
Histogram Showing Positive (Right) Skew
Variations to Normal Distribution
Bimodal distribution: two peaks
Rectangular/Uniform: all scores occur with equal frequency
Potential Outlier: An observation that is well above or below the overall bulk of the data
Important to determine normality (look at the histogram of the data) so you can choose appropriate measures of central tendency and variability
Examples of Bad Graphs: What is Wrong With the Picture?
7.7
7.8
7.9
8
8.1
8.2
8.3
8.4
On campus In town
Location
Po
llen
Co
un
t
Examples of Bad Graphs: What is Wrong With the Picture?
05
10152025
303540
4550
On campus In town
Location
Po
llen
Co
un
t
A BETTER Graph
0
1
2
3
4
5
6
7
8
9
10
On campus In town
Location
Po
llen
Co
un
t
00.5
1
1.52
2.53
3.5
44.5
5
2006 2007
Year
Ave
rag
e P
oll
en C
ou
nt
0
1
2
3
4
5
6
2000 2001 2002 2003 2004 2005 2006 2007 2008
Year
Po
llen
Co
un
t
02
468
10
121416
1820
2000 2001 2002 2003 2004 2005 2006 2007 2008
Year
Po
llen
Co
un
t
Example of a Bad Graph
Graph the distribution of the first digits.
First Digit
Frequency
1 109
2 75
3 77
4 99
5 72
6 117
7 89
8 62
9 43
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9
FIRST
NUMBER
Example of a Bad Graph
Graph the distribution of the first digits.
First Digit
Frequency
1 109
2 75
3 77
4 99
5 72
6 117
7 89
8 62
9 43
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9
NUMBER
Example of a GOOD Graph
Graph the distribution of the first digits.
First Digit
Frequency
1 109
2 75
3 77
4 99
5 72
6 117
7 89
8 62
9 43
Distribution of the First Digits
020406080
100120140
1 2 3 4 5 6 7 8 9
First Digit
Nu
mb
er o
f O
ccu
ran
ces
Example of a Bad Graph
Graph the distribution of the number of bacteria in the cultures sampled.
Number of Bacteria
41
33
43
52
46
37
44
49
53
30
Bacteria
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10
BAC
Example of a Bad Graph
Graph the distribution of the number of bacteria in the cultures sampled.
Number of Bacteria
41
33
43
52
46
37
44
49
53
30
Distribution of Bacteria
0
0.5
1
1.5
2
2.5
30.00-33.00
33.00-36.00
36.00-39.00
39.00-42.00
42.00-45.00
45.00-48.00
48.00-51.00
51.00-54.00
54.00-57.00
57.00-60.00
Number of Bacteria
Num
ber
of
Sam
ple
s
Example of a GOOD Graph
Graph the distribution of the number of bacteria in the cultures sampled.
Number of Bacteria
41
33
43
52
46
37
44
49
53
30
Distribution of Bacteria
0
1
2
3
4
5
6
30.00-40.00
40.00-50.00
50.00-60.00
60.00-70.00
70.00-80.00
80.00-90.00
90.00-100.00
100.00-110.00
110.00-120.00
120.00-130.00
Number of Bacteria
Nu
mber
of
Sam
ple
s
Guidelines for Good Graphs
• Label both axes and provide a heading to make clear what the graph is representing.
• Vertical axes should usually start at 0 to help our eyes compare relative sizes.
• Remove any clutter that isn’t needed or is distracting
• The axes may need to be resized to remove extra white space
• Be careful in using unusual bars since it can be easy to get the relative percentages that the figures represent incorrect.
• Sometimes displaying information for more than one group on the same graph can be difficult especially when the values differ greatly. Consider using relative frequencies or separate graphs instead.
Other Guidelines to Making Graphs
• Y axis should be ¾ as tall as X axis
• When the number of score values on X axis is large, scores should be collapsed so there are at least 5 intervals but no more than 12
• The width of each interval on the X axis should be equal
• Frequency on the Y axis must be continuous and regular
• Range on the Y axis and X axis must neither unduly compress nor unduly stretch the data
• Looking at shape of distribution
• Making graphs
Teaching tips: • Hands-on practice is important for your