Objective: What methods can we use to select samples that are representative of the population? In what ways can samples be biased?

Post on 01-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Objective: What methods can we use to select samples that are representative of the population? In what ways can samples be biased?

BIAS

• Sampling methods that tend to over or under emphasize some characteristics of the population are said to be biased.

• What kinds of design problems can bias a survey?

What kinds of design problems can bias a survey?

Judgment Sample (random rectangles activity)

Voluntary response sample (phone-ins)

Convenience sampling (my friends)

Bad sampling frame (list of select group)

Undercoverage (missing a whole segment)

Nonresponse (getting 5 surveys back from 150)

Response Bias (question wording or the way data

are collected)

Comment on these statements

a) Stopping students on their way out of the cafeteria is a good way to sample if we want to know about the quality of food there.

b) We drew a sample of 100 from 3000 students in a school. To get the same level of precision for a town of 30000, we should sample 1000 townspeople.

c) A poll taken at our favorite web site (www.statisfun.com) garnered 12,357 responses. The majority said they enjoy doing statistics homework. With a sample that large, we can be pretty sure that most Statistics students feel this way too.

Some questions from student surveys…1)How much money do you spend on

holiday gift shopping?

• $0-10 / $11-20 / $21 – 30 / …./$91-100 /

• More

2)On a scale of 1-10, how much importance do you give to the well-being of the environment?

• Pepsi or Coke?

• How many marshmallows can fit in your mouth?

• What is your annual family income?

A DISMAL FAILURE

• During the 1936 presidential campaign between Alf Landon and Franklin Delano Roosevelt, the Literary Digest mailed more than 10 million ballots (surveys of public opinion to forecast the election), and got back 2.4 million ballots (which is quite a lot). The results were clear. Alf Landon would win by a landslide: 57% to 43%.

• Landon only carried 2 states and Roosevelt won by 62% to 37%.

• What went wrong?

The sample was not representative…

• The 10 million names were taken from a phone list. Other lists included driver’s registrations and memberships in country club organizations.

• How can we avoid the Digest’s errors?

RANDOMIZE!

• Try to select people at random.• Randomizing protects us from the influences of

ALL the features of our population. It does that by making sure that on average, our sample looks like the rest of the population.

Sample Size

• How big a sample do you need?

• The fraction of the population that you’ve sampled doesn’t matter.

• It’s the sample size itself that’s important.

You need a sample large enough to be representative of the population.

Why can’t we just “sample” the entire population?

• This is called a:

• CENSUS

• Difficult, time consuming, impractical, populations rarely stand still, complex.

• The National Center for Chronic Disease Prevention & Health Promotion reports that 21.7% of US teens never or rarely wear seatbelts.

• What does this statement mean? They probably did not take a census.

• We have sample statistics and the corresponding model:

• a population parameter.

Notation AlertNAME STATISTIC PARAMETER

Mean

Standard Deviation

Correlation

Regression Coefficient

Proportion

CALVIN & HOBBES

• Let’s design a school wide survey to actually distribute to the school. Take a look at the original questions you came up with and let’s pick (as a class) 10-20 good questions to ask and word them as best we can.

• Let’s also think about how we want to distribute it to the students.

BIAS VS. ERROR

• Sampling error = sampling variation • It describes the natural variability in results

that might be observed from one sample to the next. Error is found in every sample.

• Bias is found in the sampling method. Bias means that something about the design systematically distorts the results so that they are unlikely to reflect reality.

BIAS VS ERROR

“ERROR” IS OKAY! BIAS IS BAD!

Good Sampling Methods

• Simple Random Sample – each individual or combination of individuals in the population has an equal chance of being selected.

• Sampling Frame – a list of individuals from which the sample is drawn.

• Samples drawn at random generally differ from each other. This is known as sampling variability. Sampling variability is not a problem.

More Good Sampling Methods

• What if we wanted to find out about how males and females differed on a certain topic?

• Or how upper classmen and lower classmen compare?

• Or what if we wanted to make sure that all ethnicities or all religions were represented in our sample?

Stratified Sampling

• Sometimes the population can be sliced into homogenous groups, called strata, before the sample is selected.

• Then simple random sampling is used within each stratum before the results are combined.

• For example, if we wanted to figure out if there was a difference in attitudes towards dress codes between the grade levels, we could do a simple random sample of each grade level (instead of the entire school).

Cluster / Multi-stage Sampling

• Sometimes you can also split the population into similar parts or clusters. Then we could select one or a few clusters at random and perform a census within each of them.

• Book example – suppose we want to assess the reading level of a textbook based on the lengths of the sentences.

• We could perform a simple random sample by numbering all the sentences in the book and then using a random number generator to pick out 50 sentences in which to count the words.

• What might be an easier method?

• Instead, we could randomly select a few pages (the pages of the textbook are already numbered), and then count all the sentences on that page.

• Each page represents a “cluster.”

Multi-Stage Sampling

• Uses a variety of sampling methods together

• Example: Let’s discuss the book example again – what if chapters later on in the textbook are generally more difficult?

Multi-Stage Sampling

• Step 1: Divide the book into sections (units)

• Step 2: Randomly select a chapter from each section

• Step 3: Randomly select a few pages from each chapter

• Step 4: Systematically select a few sentences from each page to count

Systematic Sampling

• A sample drawn by selecting individuals systematically from a sampling frame.

• Like: selecting every 10th person from a list alphabetically or generated randomly using a random number table.

• Airport searches and crossing the border

Activ Stats

Estimating a total activity

Diseased Forest

• The dots on the page in front of you represent diseased trees in a forest. They are randomly scattered throughout the forest – perhaps the disease is spread via flying insects. We want a count of the diseased trees without having to inspect every tree.

• What should we do?

Diseased Forest #2

• This second “forest” represents a disease that must be spread by contact. It clusters in some regions and is almost unseen in others.

• Would a simple random sample work in this case?

top related