Top Banner
STT 200 Ashoke Kumar Sinha Acknowledgement: Author is indebted to Dr. Jennifer Kaplan and Dr. ParthanilRoy for allowing him to use/edit many of their slides. This note is based on Chapters 2, 11 and 12 of the textbook.
30

01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Jul 19, 2018

Download

Documents

nguyentram
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

STT 200

Ashoke Kumar Sinha

Acknowledgement: Author is indebted to Dr. Jennifer Kaplan and

Dr. Parthanil Roy for allowing him to use/edit many of their slides.

This note is based on Chapters 2, 11 and 12 of the textbook.

Page 2: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Read The Textbook

• Today’s materials can be read from Chapters 2,

11 and 12 of the textbook.

• I am going to cover only a part of the above

chapters.

• The part I do not cover is not important for

this course.

2

Page 3: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Course Outline

• Collecting Data

– Surveys

• Exploratory Data Analysis

– Data Representations

– Numerical Summaries of Data

– Data Models (probability)

• Inference

3

Page 4: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

What is Statistics?

• Statistics - a word with 2 meanings

– A subject, like mathematics or physics.

– A value we compute from (sample) data.

Wikipedia: Statistics is a “mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentationof data”.

4

Page 5: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Important Questions

• We need to know if the data are “good enough” to use as basis for decision

– Who are the data about?

– What do the data represent?

– When were the data collected?

– Where were the data collected?

– How were the data collected?

– Why were the data collected?

5

Page 6: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Few terminologies

• Population is the complete set of all items that we are interested in studying.

The number of items in a population is called the population size, usually denoted by N.

• A sample is a subset of the population.

Usually n denotes sample size (the number of observations in a sample).

• A variable is a characteristic or property of an item on which we take measurements. [Answers to the question what.]

• The items or the individuals from whom/which the data are collected are often called cases. [Answers to the question who.]

• Data are the observed values of the variable.

• Study of a whole population is called census, and that of sample is known as sample survey.

6

Page 7: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Census vs. Survey

• Every 10 years, the U.S. government takes a census

of the population of the U.S. and finds the values of certain parameters like average family size or income.

• But a census is costly, so usually, if we want to know something about the population we survey a sampleof the population and find the values of statistics like average family size or income and use the statistics as estimates of the parameter.

So using a survey we get to learn something about the population, by only asking a sample.

7

Page 8: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Population and Sample

• Suppose we would like to estimate the fraction of East Lansing residents who are students.

• In this case, the population is all East Lansing residents.

• However, surveying the entire population may be costly, time-consuming and laborious and therefore, we can do our job by selecting a sample which is “a good representative of the population”.

8

Page 9: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Parameter and Statistic

• Parameters are the values we calculate from the population data.

Population mean, population variance, population median etc. are the examples of parameters.

• Statistics - a word with 2 meanings

– A subject, like mathematics or physics.

– Values we compute from sample data.

Sample mean, sample variance, sample proportion etc. are the examples of statistics.

Singular of statistics is “statistic”.

9

Page 10: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

3 Main Ideas

• Examine only a smaller fraction (i.e., a sample) selected from the population.

• Select the sample “suitably using a randomization scheme” so that the sample becomes a good representative of the population.

• The fraction of the population that has been sampled does not matter. It is the sample size itself that is important.

10

Page 11: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Examples of Surveys

• Exit polling in elections

• Public opinion polls

• Nielson Television ratings

• J.D. Powers Car ratings

11

Page 12: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Ann Lander Children Debate

Ann Landers asked her readers “If you had to do it all over again, would you have children?”

She received nearly 10,000 responses, about 70% of which said “No!”.

A nation-wide random poll of 1373 parents found that 91% would have children again.

Why do we have such a difference?

Voluntary Response Bias.

• Bias is any systematic failure of a sample to represent its population.

12

Page 13: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

More on Voluntary Response

• In general, internet polls ALL suffer from voluntary response bias.

• There is NO WAY to overcome this bias.

• So, basically, we can NEVER use the results of an internet poll to make predictions about the population.

• Because people who voluntarily respond are a special group of people who are passionate about the topic and do not represent the general public.

13

Page 14: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Designing Surveys

• Identify the population of interest and design a good sampling plan.

• Write questions that are not likely to produce response bias.

• There is no way to recover from a biased sample so choose your sample well. (but how?)

14

Page 15: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Sampling Schemes

In this course, we shall learn about 5 different

sampling schemes:-

1. Simple Random Sampling

2. Stratified Sampling

3. Cluster Sampling

4. Multistage Sampling

5. Systematic Sampling

15

Page 16: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Simple Random Sampling

• A simple random sample is a subset of individuals (a sample) chosen from a population in such a way that any subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals.

• The process of choosing a simple random sample is known as simple random sampling.

16

Page 17: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Simple Random Sampling: An ExampleSuppose I would like to draw a simple random sample of size n=4 from a class of 50 students. How would I do that?

• I shall assign numbers 01, 02, …, 50 to each of these 50 students, write these numbers on 50 similar looking pieces of papers, mix them well in a basket and then pick 4 numbers from the basket without replacement.

• Or, I can use random number table to select 4 students.

• Or, I can write a computer program and which will do the job.

17

Page 18: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Simple Random Sampling (SRS)

• Simple Random Sampling (SRS) is usually done using random numbers.

• Random number tables are available in – our textbook (Appendix D),

– Internet (random number generator websites),

• TI 83/84 calculator can generate random numbers.

• An example:43900 44304 30419 02647 27619 26146 57122 64194 69535 53513 01579 30823 16533 85961 51118 55649 95170 50049 58854 85557 05447 45777 71671 47104 20805 73144 16128 13733 67803 32150 65667 38559 46441 96238 46845 68467 56717 91966 86221 30014 72076 19333 04120 96643 19074 51781 80216 21469

18

Page 19: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

With or without replacement?

• In small populations and often in large ones, such sampling is typically done "without replacement", i.e., one deliberately avoids choosing any member of the population more than once.

• Although simple random sampling can also be conducted with replacement, this is less common and would normally be described more fully as simple random sampling with replacement.

19

Page 20: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

With or without replacement• In with replacement sampling scheme an item may be

selected several times.

• In without replacement sampling scheme no item is allowed

to be selected more than once.

Example: Suppose we are selecting 4 items out of 55 (identified

with numbers 00, 01, …, 54). We use the following random

number table:

43900 44304 30419 02647 27619 26146

With replacement sampling will produce: {43, 04, 43, 04}.

Without replacement sampling will produce: {43, 04, 30, 41}.

20

Page 21: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Stratified Sampling

• Sometimes the population is first sliced into

homogeneous groups called strata and simple

random sampling is used within each stratum.

Finally, these subsamples are combined into a

sample. This sampling scheme is known as

“stratified random sampling” or simply

“stratified sampling”.

21

Page 22: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

When to use Stratified Sampling?

• Suppose we would like to know how students feel about funding for the football team in a large university and the student population consists of 40% men and 60% women. Suppose we feel that men and women would have different views on the funding.

In this case, a simple random sample won’t do a good job. Instead, it is better to divide the student population in two strata: male and female students. We can then choose a stratified sample consisting of 40 male students and 60 female students. This will be a better representative of the population.

Moral: Whenever we have a heterogeneous population, it is better to use stratified sampling.

22

Page 23: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Cluster Sampling

• Splitting the population into

representative clusters can make

sampling more practical. Then we could

simply select one or a few clusters at

random and perform a census within

each of them. This sampling scheme is

called cluster sampling.

23

Page 24: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

When to use Cluster Sampling?

• Suppose I am trying to find out what MSU

freshmen think about the dining service on

campus and I know that freshmen at MSU are

all housed in 10 freshman dorms.

In this case, I shall select two or three of these

10 dorms at random and contact all the

residents of these selected dorms.

24

Page 25: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Stratified vs. Cluster Sampling

• Strata are homogeneous but different from one another while clusters are heterogeneous and resemble the overall population.

• We perform simple random sampling in ALL strata where as we only choose a few clusters at random and perform a census in those clusters.

25

Page 26: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Multistage Sampling

• Sampling schemes that combine several

methods are called multistage sampling. Most

surveys conducted by professional

organizations use multistage sampling.

• The exact scheme depends on the nature of

the populations and the nature of the survey.

26

Page 27: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

An Example of Multistage Sampling

• Suppose I am trying to find out what MSU freshmen think about the

dining service on campus and I know that freshmen at MSU are all

housed in 10 freshmen dorms. Suppose I am concerned about

possible differences of opinions between men and women and these

dorms have men and women on alternate floors.

Now I can use a combination of stratified and cluster sampling as

follows: I would first choose 2 freshman dorms at random (out of 10)

and then select some dorm floors at random from among those that

house men, and, separately, from among those that house women. I

could then treat each selected floor as a cluster and interview

everyone on that floor.

27

Page 28: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Types of Samples• Simple Random Sample (SRS) - every sample has an

equal probability of being chosen.

• Cluster - entire groups are randomly selected.

• Stratified Random - the population is divided into homogenous groups and a simple random sample is chosen from each group.

• Multistage - used in national polling, usually starts with random selection of states, and then counties, and then houses to call.

• Convenience - individuals who are conveniently available.

• Systematic – individuals are picked in a predetermined order.

28

Page 29: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Example

To represent the population of MSU students:

• Simple Random Sample (SRS) - randomly generate a subset of the PIDs of all students or put all the names in a hat, shake it up and draw some out.

• Cluster - a set of large lecture classes of different disciplines.

• Stratified Random - randomly generate a set of PIDs for each class: freshmen, sophomores, juniors and seniors.

• Multistage - randomly choose 3 dorms on campus, then randomly choose 2 floors of each dorm and sample from each of the floors using SRS.

• Convenience - our STT 200 class.

• Systematic - every 5th student I meet in the food-court.

29

Page 30: 01. Introdution, sampling - Michigan State University · Simple Random Sampling (SRS) • Simple Random Sampling (SRS) is usually done using random numbers. ... • TI 83/84 calculator

Variable types

Variables (and hence data) can be of two types:(a) Qualitative or categorical,

(b) Quantitative or numerical.

• Qualitative or categorical variable cannot be usually measured in numerical scale, and simply records quality.

One may use numbers to code the values of a qualitative data, but those numbers are arbitrary.

• A quantitative or numerical variable assigns naturally numerical values, for which arithmetic operations, such as averaging, make sense.

Caution! There are some numerical data, such as phone number, order number, zip code etc., which are not variables, but identifiers. Though often numerical, they are to identify or keep track of individuals/cases. Summing or averaging those numbers mean nothing.

30