1 Where do data come from and Why we don’t (always) trust statisticians.

1

Where do data come from and Why we don’t (always) trust

statisticians.

2

Induction vs. Deduction the gist of statistics

• Deduction: “What is true about the whole, must be true about a part.”

• Induction: “What is true about the part might be true about the whole.”

3

Population vs. Sample

• Population is the entire group of individuals about which we want information.

• Sample is a part of population from which we actually collect information.

• We use samples to study population because, often, populations are impossible or impractical to study.

4

Real Life Example of a Bad Sample

• Ann Landers, a famous columnist, collected a sample of 10,000 people who wrote in to answer this question: “If you could do it all over again, would you have children?”

• 70% of the respondents said that they would not have children.

• When a sample was selected at random, 91% of the people said that they would have children.

5

Potential problems with sample surveys

• Undercoverage occurs when some groups in population are left out of the process of choosing the sample.

• Nonresponse occurs when an individual chosen for the sample cannot be contacted or refuses to respond.

6

Another Real life Example of a Bad Sample

• In 1936 Literary Digest mailed out 10,000,000 ballots asking who the respondents are going to vote for – A. Landon or F.D. Roosevelt.

• 2,300,000 ballots were returned, predicting a strong win (57%) for Landon.

7

Another Real life Example of a Bad Sample

• George Gallup surveyed 50,000 people chosen randomly.

• Comparison of forecasts:Gallup’s Prediction for Roosevelt 56%

Gallup’s prediction of Digest 44%

Digest prediction for Roosevelt 43%

Actual vote 62%

• Literary Digest used their subscription list, phone directory, lists of car owners, club members.

8

9

Right and Wrong Ways to Sample

• A simple random sample is a sample where (1) each unit of population has an equal chance of being chosen and (2) all units are chosen independently.

• The sample is biased if at least one group of individuals has greater chances of being selected.

10

Example of a good sample

• You want to study effects of computers on GPA. You don’t have the resources to study all students.

• To select a sample of students for the study you– Get a list of all students,

– Select at random students on the list,

– Collect information from the students selected,

– Compare those who have computer with those who don’t.

11

Example of a bad sample

• You want to study effects of computers on GPA. You don’t have the resources to study all students.

• To select a sample of students for the study you– Use your friends.– Hang an ad in the computer lab.– Post an on-line questionnaire on WKU site.

12

Stratified Random Sample

• When we know proportions of each group in the population – Stratified random sample is better than SRS.

• In stratified sample, number of people chosen from each group is proportional to the size of that group in the population.

13

Confounding

• Two explanatory variables are confounded when their effects on the response variable cannot be distinguished from each other.

• Confounding is often a problem with a study that uses sample surveys to collect data (even if sampling is done right).

14

Observation vs. Experiment

• Observational study - observes individuals and measures variables but does not attempt to influence responses.

• Experiment imposes treatment on individuals to observe their responses.

15

How to design an Experiment

• The purpose of an experiment is to find out how one variable (response variable) changes in response to change in another variable (explanatory variable).

• Experiment:Subject Treatment Response

16

Placebo Effect

• Placebo effect – change in behavior due to participation in experiment.

• Placebo effect is a problem when experiment does not have a control group (a basis for comparison)

• To avoid the problem – design a randomized comparative experiment.

17

How to design a Randomized Comparative Experiment

• Randomly split the subjects into two groups:– control group – receives no treatment– treatment group – receives treatment

• Compare the results.

• Both will be equally affected by Placebo effect, so the difference between the groups shows whether the treatment works.

18

How to interpret results of an experiment• Observe outcomes for treatment and control

groups.

• If outcomes are different enough so that we can say that this difference would rarely occur by chance, we conclude that the difference is statistically significant.

19

Population vs. Sample

• Population is the entire group of individuals about which we want information.

• Sample is a part of population from which we actually collect information.

• Based on the sample, we make conclusion about the whole population.

20

Parameter vs. Statistic

• A Parameter is the number that describes the population.

• A Statistic is a number that describes the sample.

• We use statistics to estimate parameters.

21

Sampling Distribution

• The result of your study is a statistic, which can vary from sample to sample

• Sampling Distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population

• Estimate=True Parameter + Sampling Error

22

Bias and variability

• A statistic is biased if the mean of the sampling distribution is not equal to the true value of the parameter being estimated.

• Variability of a statistic is the spread of sampling distribution.

• Bias does not go away with larger samples.

• Variability goes away with larger samples.

1 Where do data come from and Why we don’t (always) trust statisticians.

Documents

sample population

sample of students

stratified sample

good sample

stratified random sample

simple random sample

study population

sample surveys undercoverage