Preliminaries Introduction to Statistical Investigations
Feb 24, 2016
Preliminaries
Introduction to Statistical Investigations
Statistics vs. Anecdotal Evidence
Smoking causes cancer. Seat belts save lives.
Do Vaccines Cause Autism?
Nelson says it wasn't long after her son Parker's shots at 15 months that she noticed something was wrong."He had run a slight fever after the vaccinations, but I didn't think anything of it," said Nelson. “… about a week after that he just completely stopped talking."After months of worrying, wondering, and going back and forth with doctors, an official diagnosis was made: autism.Nelson believes it started with the vaccines."Gradually, I started piecing it together. He got sick after his vaccinations and about a week later everything changed. He was a completely different little boy then," said Nelson.
http://www.wsaz.com/charleston/headlines/19376044.html
StatisticsScientific conclusions cannot be
based on anecdotal evidence. We need evidence from data.
Statistics is the science of producing useful data to address a research question, analyzing the resulting data, and drawing appropriate conclusions from the data.
Six-Step Statistical Investigation Method
Logic ofInference
Scope ofInference
SignificanceEstimation
GeneralizationCausation
6. Look back and ahead
1. Ask a research question
Research Hypothesis
2. Design a study and collect data
3. Explore the data
4. Draw inferences
5. Formulate conclusions
Example P.1: Organ Donations While a majority of people
approve of organ donation in principle, far less than that actually sign up when getting a driver’s license.
Different states have different recruiting methods.
Do these different methods result in different sign-up rates?
Recruiting Organ DonorsStep 1. Ask a Research Question In general: Is there a method that
will increase the likelihood that a person agrees to become an organ donor.
More specifically: Does the default option presented to driver’s license applicants influence the likelihood of someone becoming an organ donor?
Recruiting Organ DonorsStep 2: Design a study and collect dataThe researchers decided to recruit various
participants and ask them to pretend to apply for a new driver’s license.
The participants did not know in advance that different options were given for the donor question, or even that this issue was the main focus of the study.
They offered an incentive of $4.00 for completing an online survey. After the results were collected, the researchers removed data arising from multiple responses from the same IP address, surveys completed in less than five seconds, and respondents whose residential address could not be verified.
Recruiting Organ DonorsStep 2: Design a study and collect dataSome of the participants were forced to make a
choice of becoming a donor or not, without being given a default option (the “neutral” group).
Other participants were told that the default option was not to be a donor but that they could choose to become a donor if they wished (the “opt-in” group).
The remaining participants were told that the default option was to be a donor but that they could choose not to become a donor if they wished (the “opt-out” group).
Recruiting Organ DonorsStep 3: Explore the data. 44 of the 56 (78.6%)
participants in the neutral group agreed to become organ donors,
23 of 55 (41.8%) participants in the opt-in group agreed to become organ donors, and
41 of 50 (82.0%) participants in the opt-out group agreed to become organ donors.
Recruiting Organ DonorsStep 4: Draw inferences beyond the data. Using methods that you will learn in this course, the
researchers analyzed whether the observed differences between the groups was large enough to indicate that the default option had a genuine effect.
In particular, they reported strong evidence that the neutral and opt-out versions do lead to a higher chance of agreeing to become a donor, as compared to the opt-in version currently used in many states.
In fact, they could be quite confident that the neutral version increases the chances that a person agrees to become a donor by between 20 and 54 percentage points, a difference large enough to save thousands of lives per year in the United States.
Recruiting Organ DonorsStep 5: Formulate conclusions. Based on the analysis of the data and the
design of the study, it is reasonable for these researchers to conclude that the neutral version causes an increase in the proportion who agree to become donors.
But because the participants in the study were volunteers recruited from internet bulletin boards, generalizing conclusions beyond these participants is only legitimate if they are representative of a larger group of people.
Recruiting Organ DonorsStep 6: Look back and ahead. One limitation of the study is that participants were
asked to imagine how they would respond, which might not mirror how people would actually respond in such a situation.
A new study might look at people’s actual responses to questions about organ donation or could monitor donor rates for states that adopt a new policy.
Researchers could also examine whether presenting educational material on organ donation might increase people’s willingness to donate.
Another improvement would be to include participants from wider demographic groups than these volunteers.
TerminologyThe individual entities on which data are
recorded are called observational units. The recorded characteristics of the observational
units are the variables of interest.Variables can be:
◦ Quantitative You can add, subtract, etc. with the values. Height, weight, distance, time…
◦ Categorical Labels for which arithmetic does not make sense. Sex, ethnicity, eye color…
What are the observational units and variables in the Organ Donation Study?
More TerminologyThe distribution
of variable describes the pattern of value/category outcomes.
For the organ donation study the bar chart shown displays the distribution of responses.
Old FaithfulExample P.2
Old FaithfulHow faithful is Old Faithful?Can the time of the next eruption
be accurately predicted?
Old Faithful
Old FaithfulResearchers collected data on
222 eruptions taken over a number of days in the summers of 1978 and 1979.
The results are shown in a dotplot.
100959085807570656055504540time until next eruption (min)
Old FaithfulWhat are the observational units and
variable in this study? Is the variable quantitative or categorical?We can see from the dotplot that Old
Faithful is not perfectly predictable. The time until the next eruption varies
from eruption to eruption. This variability is the most fundamental
property in studying Statistics. Without variability, we wouldn’t need statistics.
Old FaithfulLet’s take another look at the dotplot
and describe the distribution.
What could be some explanations for the variability?
100959085807570656055504540time until next eruption (min)
Old FaithfulOne explanation could be the
duration of previous eruption (short: < 3.5 min. or long > 3.5 min.)
100959085807570656055504540
short
long
time until next eruption (min)
erup
tion
type
Old Faithful
Summer 2005
Old FaithfulOne way to measure the center
of a distribution is with the average, also called the mean.
One way to measure variability is with the standard deviation, which is roughly the average distance between a data value in the distribution and the mean of the distribution
Old Faithful Mean Standard
deviationOverall 71.0 12.8After short duration
56.3 8.5
After long duration
78.7 6.3
100959085807570656055504540
short
long
time until next eruption (min)
erup
tion
type
Old FaithfulBasic TerminologySome aspects to look for in a distribution of a
quantitative variable are:◦ Shape: Is the distribution symmetric? Mound-
shaped? Are there several peaks or clusters? ◦ Center: Where is the distribution centered? What
is a typical value?◦ Variability: How spread out are the data? Are
most within a certain range of values?◦ Unusual observations: Are there outliers that
deviate markedly from the overall pattern of the other data values? Are there other unusual features in the distribution?
Exploration P.3: Cars or GoatsPages P-13 to P-17