Top Banner
1 Where do data come from and Why we don’t (always) trust statisticians.
22

1 Where do data come from and Why we don’t (always) trust statisticians.

Dec 14, 2015

Download

Documents

Gideon Peres
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Where do data come from and Why we don’t (always) trust statisticians.

1

Where do data come from and Why we don’t (always) trust

statisticians.

Page 2: 1 Where do data come from and Why we don’t (always) trust statisticians.

2

Induction vs. Deduction the gist of statistics

• Deduction: “What is true about the whole, must be true about a part.”

• Induction: “What is true about the part might be true about the whole.”

Page 3: 1 Where do data come from and Why we don’t (always) trust statisticians.

3

Population vs. Sample

• Population is the entire group of individuals about which we want information.

• Sample is a part of population from which we actually collect information.

• We use samples to study population because, often, populations are impossible or impractical to study.

Page 4: 1 Where do data come from and Why we don’t (always) trust statisticians.

4

Real Life Example of a Bad Sample

• Ann Landers, a famous columnist, collected a sample of 10,000 people who wrote in to answer this question: “If you could do it all over again, would you have children?”

• 70% of the respondents said that they would not have children.

• When a sample was selected at random, 91% of the people said that they would have children.

Page 5: 1 Where do data come from and Why we don’t (always) trust statisticians.

5

Potential problems with sample surveys

• Undercoverage occurs when some groups in population are left out of the process of choosing the sample.

• Nonresponse occurs when an individual chosen for the sample cannot be contacted or refuses to respond.

Page 6: 1 Where do data come from and Why we don’t (always) trust statisticians.

6

Another Real life Example of a Bad Sample

• In 1936 Literary Digest mailed out 10,000,000 ballots asking who the respondents are going to vote for – A. Landon or F.D. Roosevelt.

• 2,300,000 ballots were returned, predicting a strong win (57%) for Landon.

Page 7: 1 Where do data come from and Why we don’t (always) trust statisticians.

7

Another Real life Example of a Bad Sample

• George Gallup surveyed 50,000 people chosen randomly.

• Comparison of forecasts:Gallup’s Prediction for Roosevelt 56%

Gallup’s prediction of Digest 44%

Digest prediction for Roosevelt 43%

Actual vote 62%

• Literary Digest used their subscription list, phone directory, lists of car owners, club members.

Page 8: 1 Where do data come from and Why we don’t (always) trust statisticians.

8

Page 9: 1 Where do data come from and Why we don’t (always) trust statisticians.

9

Right and Wrong Ways to Sample

• A simple random sample is a sample where (1) each unit of population has an equal chance of being chosen and (2) all units are chosen independently.

• The sample is biased if at least one group of individuals has greater chances of being selected.

Page 10: 1 Where do data come from and Why we don’t (always) trust statisticians.

10

Example of a good sample

• You want to study effects of computers on GPA. You don’t have the resources to study all students.

• To select a sample of students for the study you– Get a list of all students,

– Select at random students on the list,

– Collect information from the students selected,

– Compare those who have computer with those who don’t.

Page 11: 1 Where do data come from and Why we don’t (always) trust statisticians.

11

Example of a bad sample

• You want to study effects of computers on GPA. You don’t have the resources to study all students.

• To select a sample of students for the study you– Use your friends.– Hang an ad in the computer lab.– Post an on-line questionnaire on WKU site.

Page 12: 1 Where do data come from and Why we don’t (always) trust statisticians.

12

Stratified Random Sample

• When we know proportions of each group in the population – Stratified random sample is better than SRS.

• In stratified sample, number of people chosen from each group is proportional to the size of that group in the population.

Page 13: 1 Where do data come from and Why we don’t (always) trust statisticians.

13

Confounding

• Two explanatory variables are confounded when their effects on the response variable cannot be distinguished from each other.

• Confounding is often a problem with a study that uses sample surveys to collect data (even if sampling is done right).

Page 14: 1 Where do data come from and Why we don’t (always) trust statisticians.

14

Observation vs. Experiment

• Observational study - observes individuals and measures variables but does not attempt to influence responses.

• Experiment imposes treatment on individuals to observe their responses.

Page 15: 1 Where do data come from and Why we don’t (always) trust statisticians.

15

How to design an Experiment

• The purpose of an experiment is to find out how one variable (response variable) changes in response to change in another variable (explanatory variable).

• Experiment:Subject Treatment Response

Page 16: 1 Where do data come from and Why we don’t (always) trust statisticians.

16

Placebo Effect

• Placebo effect – change in behavior due to participation in experiment.

• Placebo effect is a problem when experiment does not have a control group (a basis for comparison)

• To avoid the problem – design a randomized comparative experiment.

Page 17: 1 Where do data come from and Why we don’t (always) trust statisticians.

17

How to design a Randomized Comparative Experiment

• Randomly split the subjects into two groups:– control group – receives no treatment– treatment group – receives treatment

• Compare the results.

• Both will be equally affected by Placebo effect, so the difference between the groups shows whether the treatment works.

Page 18: 1 Where do data come from and Why we don’t (always) trust statisticians.

18

How to interpret results of an experiment• Observe outcomes for treatment and control

groups.

• If outcomes are different enough so that we can say that this difference would rarely occur by chance, we conclude that the difference is statistically significant.

Page 19: 1 Where do data come from and Why we don’t (always) trust statisticians.

19

Population vs. Sample

• Population is the entire group of individuals about which we want information.

• Sample is a part of population from which we actually collect information.

• Based on the sample, we make conclusion about the whole population.

Page 20: 1 Where do data come from and Why we don’t (always) trust statisticians.

20

Parameter vs. Statistic

• A Parameter is the number that describes the population.

• A Statistic is a number that describes the sample.

• We use statistics to estimate parameters.

Page 21: 1 Where do data come from and Why we don’t (always) trust statisticians.

21

Sampling Distribution

• The result of your study is a statistic, which can vary from sample to sample

• Sampling Distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population

• Estimate=True Parameter + Sampling Error

Page 22: 1 Where do data come from and Why we don’t (always) trust statisticians.

22

Bias and variability

• A statistic is biased if the mean of the sampling distribution is not equal to the true value of the parameter being estimated.

• Variability of a statistic is the spread of sampling distribution.

• Bias does not go away with larger samples.

• Variability goes away with larger samples.