Session 2A:The limitations of Simple Random Sampling andThe practice of sampling for household surveys
Juan Muñoz, Sistemas IntegralesDelhi, March 18, 2013
Please join Channel 41
Limitations of Simple Random Sampling
• SRS may be an option in certain cases,• but it may not be practical if
– if we need estimations for subgroups of the population – especially if some of the subgroups are small
– we don’t dispose of an adequate sample frame– a Simple Random Sample would be too scattered in the territory
• We then resort to other techniques– Stratification– Sampling in stages
2
Elections in an archipelago
3
Stratification• We divide the population into subgroups, called strata
• We take a separate sample in each stratum
• Stratification may be needed if:– We want to reduce the standard error, by gaining control of the composition of the sample
– We want to assure the representativity of certain groups
• The selection probabilities may differ across strata
4
These two objectives are contradictory in
practice
This imposes the use of weights, to assure the external validity of our impact evaluation
Electoral survey in two stages
5
Two‐stage sampling• Instead of taking a SRS
• We divide the territory into small areas, called Primary Sampling Units (PSUs).– In the first stage, we choose PSUs.
– In the second stage, we select households in the chosen PSUs
6
Two‐stage sampling• Solves the problems of SRS
– Reduces transportation costs– Reduces sample frame problems
• The sample can be made self‐weighted if– We choose PSUs with Probability Proportional to Size (PPS), and then
– We take a fixed number of households in each PSU
• The price to pay is cluster effect
7
Cluster effectStandard error grows if, instead of taking a Simple Random Sample of n households, we take a two‐stage sample, with
k PSUs and m households per PSU (n=k•m)
Cluster effect
Intra‐Cluster Correlation
1122 mee SRSTSS
Two-Stage Sample Simple Random Sample
0.01 0.02 0.05 0.10 0.20Intra-Cluster Correlation
1.03 1.06 1.15 1.30 1.601.05 1.10 1.25 1.50 2.001.07 1.14 1.35 1.70 2.401.11 1.22 1.55 2.10 3.201.14 1.28 1.70 2.40 3.801.19 1.38 1.95 2.90 4.801.29 1.58 2.45 3.90 6.801.39 1.78 2.95 4.90 8.801.59 2.18 3.95 6.90 12.801.79 2.58 4.95 8.90 16.802.19 3.38 6.95 12.90 24.80
1.151.251.351.551.701.952.452.953.954.956.95
1.19 1.38 1.95 2.90 4.80
Cluster EffectFor a total sample size of 12,000 households
3,000 42,000 61,500 81,000 12800 15600 20400 30300 40200 60150 80100 120
1.95
Number of PSUs
HHs per PSU
Design effect
• In a two‐stage sampleCluster effect = e2TSS / e2SRS
• In a more complex design(with two or more stages, stratification, etc.)Design effect = Deff = e2Complex design / e2SRS
• Can be interpreted as an apparent contraction of the sample size, as a result of clustering and stratification
• Can be estimated with special software(e.g., Stata’s svy commands)
10
Household samplesChoosing the PSUs• PSUs are Census Enumeration Areas (EAs), or groups of EAs
• PSUs typically have 50 to 200 households
• The sample frame is a small file. Can easily be managed with Excel
• PSUs in the sample are generally selected with Probability Proportional to Size (PPS)
• The selected PSUs must be recognizable in the field
11
Implies collaboration with the National Census Office
Outsized PSUs may require some work
See how to do it in the UN Manual for HH Surveys in
development and transition countries
Computer files are not enough,
We also need maps
Household samplesChoosing the households
• The best sample frame is the full list of all households in the selected PSUs
• The household listing operation requires time and money. Relative to the project’s overall calendar and budget, these are– Marginal, if they are accounted for beforehand– Large enough to be a big headache, if they are
not• Information to be reported on the listing
– Name and address, as a minimum– Additional data required for the selection
(e.g., presence of pregnant women, or children)• Households are generally selected from the
listing by systematic equal probability sampling
12
Do not ask additional
information that is not essential
Beware of imitations, such
as
random walkssnowballing
expert opinion
The best way of dealing with non response is to…
1. Replace non respondents by similar households
2. Increase the sample size to compensate for non response
3. Use correction formulas4. Use imputation
techniques to simulate the answers of non respondents
5. None of the above
The best way of dealing with non response is to prevent it
Lohr, Sharon L. Sampling: Design & Analysis (1999)14
The big problem with non response is not the reduction of sample size. The problem is bias.
Non response
Interviewer
Questionnaire Respondent
Training
Work loadMotivation
QualificationWork plan
Demographic
Socio‐economic
Biologic testing
Fatigue
Motivation
Proxy
Availability
Fuente: “Some factors affecting Non‐Response.” by R. Platek. 1977. Survey Methodology. 3. 191‐21415