Top Banner
1 CHAPTER 5: DATA COLLECTION AND SAMPLING Outline Population and sample Sources of data • Sampling Sampling plans Simple random sampling Stratified random sampling Cluster sampling
24

CHAPTER 5: DATA COLLECTION AND SAMPLING

Jan 01, 2016

Download

Documents

kadeem-bell

CHAPTER 5: DATA COLLECTION AND SAMPLING. Outline Population and sample Sources of data Sampling Sampling plans Simple random sampling Stratified random sampling Cluster sampling. POPULATION AND SAMPLE. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CHAPTER 5: DATA COLLECTION AND SAMPLING

1

CHAPTER 5: DATA COLLECTION AND SAMPLING

Outline

• Population and sample• Sources of data• Sampling• Sampling plans

– Simple random sampling– Stratified random sampling– Cluster sampling

Page 2: CHAPTER 5: DATA COLLECTION AND SAMPLING

2

POPULATION AND SAMPLE

• Parameter: summary measure about population, usually unknown or known from some published sources

• Statistic: summary measure about sample

Page 3: CHAPTER 5: DATA COLLECTION AND SAMPLING

3

SOURCES OF DATA

• Primary data:– Data published (in printed form, on data tapes, disks, and

internet) the same organization that collected data– Some government agencies:

http://www.census.gov/

http://www.statcan.ca/

Page 4: CHAPTER 5: DATA COLLECTION AND SAMPLING

4Sample data available from the Statistics Canada website

Page 5: CHAPTER 5: DATA COLLECTION AND SAMPLING

5Sample data available from the Statistics Canada website

Page 6: CHAPTER 5: DATA COLLECTION AND SAMPLING

6

SOURCES OF DATA

• Secondary data:– Data published by an

organization different from the one that originally collected and published the data

– A popular source of the secondary data is the Statistical Abstract of the United States

Page 7: CHAPTER 5: DATA COLLECTION AND SAMPLING

7

SOURCES OF DATA

• Observational and experimental studies:– Observational study: data is collected and recorded

without controlling any factor like it is done in an experimental study

– experimental study: if more than one factor may cause the same outcome, it may be desirable to vary one factor at a time and control (keep unchanged) the other factors e.g.,

• aircraft primer paints are applied to improve finished paint adhesion force which depends on

– primer application method: dripping and spraying– type of primer paint: type 1, 2, 3

Page 8: CHAPTER 5: DATA COLLECTION AND SAMPLING

8

SOURCES OF DATA

• an experiment was designed in which– three specimens were painted with each primer

using each application method, a finish paint was applied, and the adhesion force was measured. The resulting data are shown below:

Adhesion Force DataPrimer Type Dipping Spraying

1 4.0, 4.5, 4.3 5.4, 4.9, 5.62 5.6, 4.9, 5.4 5.8, 6.1, 6.33 3.8, 3.7, 4.0 5.5, 5.0, 5.0

Page 9: CHAPTER 5: DATA COLLECTION AND SAMPLING

9

SOURCES OF DATA

• Surveys:– Personal interview– Telephone interview– Questionnaire survey

Page 10: CHAPTER 5: DATA COLLECTION AND SAMPLING

10

SAMPLING

• Target population– The population about which inference is desired

• Sampled population– The actual population about which the sample has

been taken• Self-selected samples

– The responders mail/call responses– Such samples are usually biased

Page 11: CHAPTER 5: DATA COLLECTION AND SAMPLING

11

SAMPLING PLANS

• Simple random sampling• Stratified random sampling• Cluster sampling

Page 12: CHAPTER 5: DATA COLLECTION AND SAMPLING

12

SIMPLE RANDOM SAMPLING

• Suppose we have data about the annual incomes of 40 families in a spreadsheet file RANDSAMP.XLS.

• We want to choose a simple random sample of size 10 from this frame.

• How can this be done?• And how do summary statistics of the chosen families

compare to the corresponding summary statistics of the population?

Page 13: CHAPTER 5: DATA COLLECTION AND SAMPLING

13

SIMPLE RANDOM SAMPLING

The family income data are shown on right

Page 14: CHAPTER 5: DATA COLLECTION AND SAMPLING

14

SIMPLE RANDOM SAMPLING

• A simple random sample is a sample in which the sampling units are chosen from the population by means of a random mechanism such as a random number table so that every possible sample with the same number of observations is equally likely to be chosen.

• For example, let sample 1 consist of families 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and sample 2 consist of families 1, 2, 3, 4, 5, 6, 7, 8, 9, 11. If a simple random sample is chosen, then Samples 1 and 2 will be equally likely to be chosen.

Page 15: CHAPTER 5: DATA COLLECTION AND SAMPLING

15

SIMPLE RANDOM SAMPLING

• Solution: The idea is very simple. We first generate a column of random numbers in column C. Then we sort the rows according to the random numbers and choose the first 10 families in the sorted rows.

• The following procedure produces the results.

– Random numbers. Enter the formula =RAND() in cell C10 and copy it down column C.

– Replace with values. To enable sorting we must “freeze” the random numbers - that is, replace their formulas with values. To do this, select the range C10:C49 use Edit/Copy and then use Edit/Paste Special with the Values option.

Page 16: CHAPTER 5: DATA COLLECTION AND SAMPLING

16

SIMPLE RANDOM SAMPLING

– Copy to a new range. Copy the range A10:C49 to the range E10:G49.

– Sort. Select the range E10:G49 and use the Data/Sort menu item. Sort according to the Random # column in ascending order. Then the 10 families with the 10 smallest random numbers are the ones in the sample.

– Means. Use the AVERAGE, MEDIAN and STDEV functions in row 6 to calculate summary statistics of the first 10 incomes in column F.

Page 17: CHAPTER 5: DATA COLLECTION AND SAMPLING

17

SIMPLE RANDOM SAMPLING

The result of all the operations are shown on right

Page 18: CHAPTER 5: DATA COLLECTION AND SAMPLING

18

STRATIFIED RANDOM SAMPLING

• Suppose we can identify various sub-populations within the total population. We call these sub-populations strata.

• It makes sense to select a simple random sample from the stratum instead of from the entire population. This is called stratified sampling.

• This method is particularly useful when there is considerable variation between the various strata but relatively little variation within a given stratum.

Page 19: CHAPTER 5: DATA COLLECTION AND SAMPLING

19

STRATIFIED RANDOM SAMPLING

• To obtain a stratified random sample we must choose a total sample size n, and we must choose a sample size ni for each stratum i.

• There are many ways to choose these numbers but the most popular method is proportional sample sizes.

• The advantage of proportional sample sizes is that they are very easy to determine. The disadvantage is that they ignore differences in variability among the strata.

Page 20: CHAPTER 5: DATA COLLECTION AND SAMPLING

20

STRATIFIED RANDOM SAMPLING

• Sears has data on all 1000 people in the city of Smalltown who have Sears credit cards.

• Sears is interested in estimating the average number of other credit cards these people own, as well as other information about their use of credit.

• The company decides to stratify these customers by age, select a stratified sample of size 100 with proportional sample sizes, and then contact these 100 people by phone.

Page 21: CHAPTER 5: DATA COLLECTION AND SAMPLING

21

STRATIFIED RANDOM SAMPLING

• First, Sears must decide exactly how to stratify by age.• The reasoning is that different age groups probably have

different attitudes and behavior regarding credit.• After preliminary investigation they decide to have three age

categories: 18-30, 31-62, and 63-80.• Number of customers in each category are as follows:

CategoryNumber of Customers

18 to 30 132

31 to 62 766

63 to 80 102

1000

Page 22: CHAPTER 5: DATA COLLECTION AND SAMPLING

22

STRATIFIED RANDOM SAMPLING

• In a stratified random sampling with proportional sample sizes, the total sample size of 100 is distributed in 3 categories as follows:

CategoryNumber of Customers Sample Size

18 to 30 132 132*100/100013

31 to 62 766 766*100/100077

63 to 80 102 102*100/100010

1000 100

Page 23: CHAPTER 5: DATA COLLECTION AND SAMPLING

23

CLUSTER SAMPLING

• Suppose a company is interested in various characteristics of households in a particular city. The sampling units are households.

• We could proceed with the sampling methods discussed but it would be more convenient another way.

• We could divide the city into city blocks as sampling units and then sample all the households in the chosen blocks.

• In this case the city blocks are called clusters and the sampling is called cluster sampling.

Page 24: CHAPTER 5: DATA COLLECTION AND SAMPLING

24

CLUSTER SAMPLING

• The advantage of cluster sampling is sampling convenience (and possibly less cost).

• It is straightforward to select a cluster sample. The key is to define the sampling units as the clusters, then select a simple random sample of clusters. Then sample all the population members in each selected cluster.