Top Banner
MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling
35

MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

Mar 27, 2015

Download

Documents

Noah Holloway
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Multiple Indicator Cluster SurveysSurvey Design Workshop

Advanced Sampling

Page 2: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Major steps in designing MICS sample

• Define objectives– Key indicators– Desired level of precision– Subnational domains of estimation

• Identify most appropriate sampling frame– Sample for another survey conducted recently– Most recent census of population and housing

• Determine sample size and allocation– Determine availability of previous MICS or DHS results to

provide measures of sampling parameters

Page 3: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Sampling frame• Sampling frame should be nationally-representative and have complete

coverage, with measures of size (households or population)

• Most countries conduct Census of Population and Housing every 10 years– Generally provides most effective sampling frame for household surveys

• Sample for another survey conducted recently

• In case of older frame, geographic areas with substantial changes, such as peri-urban in larges cities, may need to be updated

• When no census is available– Identify most complete geographic frame available– Example – Southern Sudan – list of villages from WHO immunization program

with estimated population

Page 4: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

MICS recommendations onsample size determinants

FACTOR RECOMMENDATION

1.Expected size estimate of indicators (next slide)2.Expected size estimate of target population 12-23 mos [3%]3.Average household size 6 persons4.Relative margin of error wanted 12% of coverage rate5.Level of confidence wanted 95 percent6.Design effect in cluster surveys 1.57.Expected non-response rate 10 percent8.Number of clusters or PSUs - minimum [300-400]9.Cluster size [15-35 households]10.Number of estimation “domains” wanted [5 or fewer]11.Survey budget (country specific)

For items 2, 3, 6, 7 use available country data (recent survey or census); if not available, use value above.

Page 5: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Indicators for sample size determination

• Sample size is different for each MICS indicator.• Must choose a key indicator, since only one sample size can be used

in MICS• Recommendations for choosing key indicator:

– Choose from among main indicators of interest in your country– Choose the one which will yield largest sample size– Usually for a single-year age group– Usually DPT, measles, polio or tuberculosis immunization - or birth

weight below 2.5 kg• Exceptions: Do not choose infant or maternal mortality rates as the

key indicators. Do not choose a low prevalence indicator that is desirably low (such as malnutrition prevalence).

Page 6: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Sample size formula

where– n is the required sample size, expressed as number of households, for the KEY

indicator,– 4 is factor to achieve 95 percent level of confidence,– r is anticipated prevalence rate for key indicator, – 1.1 is factor to raise sample size by 10 percent for potential nonresponse,– deff is shortened symbol for design effect,– 0.12r is margin of error to be tolerated, defined as 12 percent of r (12 percent

thus represents the relative margin of error of r),– p is proportion of total population that smallest group comprises, and– is average household size.

_2 ))(()12(.

)1.1)()(1(4

npr

deffrrn

_

n

Page 7: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Example

• Target group: Children 12 to 23 months old• Percent of population: 3.5 percent• Key indicator: DPT immunization coverage• Prevalence (Coverage): 25 percent• Deff: 1.6• Non-response adjustment: 1.05 (response rate 95%)• Average household size: 6

6667000189.

26.1

)6)(035(.)25.12(.

)05.1)(6.1)(75)(.25(.42

6667000189.

26.1

)6)(035(.)25.12(.

)05.1)(6.1)(75)(.25(.42

Page 8: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Sample size (Households) to estimate coverage rates for smallest target population

Average Household Size

(number of persons)

estimated rate,

r = 0.25

estimated rate,

r = 0.30

estimated rate, r = 0.35

estimated rate,

r = 0.40

4.0 13,750 10,694 8,512 6,875 4.5 12,222 9,506 7,566 6,111 5.0 11,000 8,556 6,810 5,500 5.5 10,000 7,778 6,191 5,000 6.0 9,167 7,130 5,675 4,583

Use this table when your

1. Target population is 2.5% of total population; this is generally children 12-23 months

old

2. Sample design effect, deff, is assumed to be 1.5 and nonresponse is expected to be 10 percent

3. Relative margin of error is set at 12 percent of estimate of coverage rate, r

Page 9: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Note on precision requirements• In case of MICS2, precision requirements expressed in terms

of acceptable margin of error (ME), which varied according to the size of the estimate (5% absolute error for high rate indicators or 3% for low rate indicators)

• For MICS3 and MICS4, this was simplified to a relative margin of error (RME) of 0.12

• Follow guidelines in sampling chapter carefully; avoid indicators with a high rate

• Final criterion for acceptable precision: is the confidence interval useful?– If confidence interval is too wide, estimate may not be useful

Page 10: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Stratification and sample allocation• Stratification is the process of dividing the sampling frame

into sub-groups (strata) of homogeneous (similar) PSUs• Advantages: better precision, flexible design, sub-national

estimates for smaller domains (differential sampling rates)– Reduced variance within stratum given similarity of units

• Example of stratification: region, urban/rural• Existing sampling frame, such as master sample, may have

socioeconomic stratification for large cities– Should improve statistical efficiency of sample design

• Geographic domains defined as strata– Possible to use variable sampling rates by domain to ensure sufficient

sample size for each

Page 11: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Implicit stratification

• Sort the sampling frame according to certain characters such as regions, urban-rural residence, sub-regions, districts, etc., then select a systematic pps sample.

• Ensures a representative sample for each subgroup

• Automatically provides proportional allocation by size of subgroup

Page 12: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Allocation of sample to strata• Proportional allocation

– Effective for precision of estimates at the national level• Equal allocation to each domain

– Used when each domain requires same level of precision• Optimum allocation – takes into account differential

variance and costs by stratum– For example, variability may be higher in urban areas and

enumeration costs may be higher in rural areas – use higher sampling rate for urban areas

Page 13: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Number of PSUs and cluster size• Survey costs depend not only on number of households but

their distribution among primary sampling units (PSUs)• Important to determine effective balance between number of

sample PSUs and cluster size• In general, the more PSUs the better for reliability but the

greater the cost (mostly costs of travel and listing)• At national level, minimum of 300 to 400 PSUs should be

selected– Subnational domains require larger samples

• Cluster size should be as small as practical for reliability• Example: 8000 households selected in 400 PSUs of 20 sample

households each is a much more reliable sample than 200 PSUs of 40 households each, but more expensive

Page 14: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Design effect• Deff - ratio of variance of estimate based on stratified multi-

stage sample design and corresponding variance from simple random sample of same size

• Measure of the relative efficiency of the sample design• Effective stratification reduces the deff• Cluster sampling increases the deff• Deft = square root of Deff, expressed as ratio of standard

errors– Generally presented in tables of standard errors for the DHS

Page 15: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Design effect (continued)

• In case of cluster sampling, deff generally measures effect of clustering

• δ = intraclass correlation coefficient, or measure of homogeneity within cluster

• = average cluster size (households per cluster)• Design effect increases with intraclass correlation

and cluster size

)1(1_

mdeff

_

m

Page 16: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

MICS Sampling Option 1 – use an existing sample

• Design MICS as a rider to another survey if timely and feasible

• Use sample from a previous survey and re-interview households for MICS

• Or, use old survey sample EAs and construct new listing of households to select for MICS

• Old sample must be probability-based, national in scope

• Possibilities – DHS, other national health survey, recent labour force survey

• Important: design parameters must be known (such as selection probability, stratification, etc.)

Page 17: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Sampling option 1 - continued• Advantages of using previous sample

- cost savings- maps available for interviewers- appropriate sampling plan available- simplicity

• Limitations of using old sample- burden on respondents- sample design may need modification

* sample size* sub-national coverage* number of PSUs or clusters

• Balance between loss and gain

Page 18: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

MICS Sampling Option 2 – new sample with household listing

• Design new MICS sample based on prototype• Two stages with census as frame• Use of implicit stratification, systematic selection

of census EAs at first stage with pps• Create standard segments (DHS approach)• List households in selected segments• Select households systematically from listing• Interview selected households, no replacement

will be allowed

Page 19: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Sampling Option 2 - continued

• Advantages of option 2- simple design- probability-based- if possible self-weighting (national level)

• Limitations of option 2- expense of listing households- time necessary to list households

[Example, sample size of 5000 households may require 25000 to 50000 households to be listed]

Page 20: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

DHS Method - Option 2

• Create “standard” segments• Divide census population in each EA by 500 to

determine number of standard segments• Map sketch segments in each EA• Choose 1 segment at random• List households in selected segment only (instead of

entire EA)• Purpose is to reduce listing workload to a

manageable size

Page 21: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

MICS Sampling Option 3 – use “compact clusters” with no listing

• Modified segment, or cluster, design)• Design new MICS sample based on prototype• Two stages with census as frame• Use of implicit stratification, systematic selection of

census EAs at first stage with pps• Pre-determine number of segments (measure of size)

based on desired cluster size• Map sketch segments in each EA• Choose 1 segment at random• Interview all households in selected segment

Page 22: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

MICS Sampling Option 3 - continued

• Illustration:• Suppose desired cluster size is 20 households.• Suppose first sample EA contains 112 census

households (according to frame)• Divide 112 by 20 = 5.6 (round to 6)• Map sketch exactly 6 segments based on canvass of

EA• Select one segment at random• Interview all households (no matter how many are

currently in the selected segment)

Page 23: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

MICS Sampling Option 3 - continued

• Advantages of option 3– avoids listing completely– probability-based– self-weighting (national level)

• Limitations of option 3– less reliable than option 2 (households are “clustered”

together in compact segments)– segmentation itself can be time-consuming and

complicated– difficult to control overall sample size

Page 24: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Common sampling option used by some countries

• Select EAs systematically with PPS, where measure of size is based on number of households (or population)

• In case of large EAs in sample, subdivide into standard segments, similar to Option 2

• Advantage: measures of size more exact, easier to implement a self-weighting design and control sample size

Page 25: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

PPS systematic selection of PSUs• Selection of PSUs with PPS provides a self-weighting

sample when a fairly constant number of sample households selected in each PSU at second stage

• Systematic sampling of PSUs from a geographically ordered list ensures that the sample is geographically representative, with a proportional allocation to the different levels of geography

• Examine template for PPS systematic sampling

Page 26: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Listing of households in sample segments

• Importance of new listing to represent current population

• Problems with using previous listing (older than 1 year)– Does not represent newer households– Distribution of sample population by age group

distorted, generally with higher median age– Difficulty of finding households in old list

Page 27: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Listing of households (continued)

• Common problems found in listing operations– Problem with quality of sketch maps – difficult to

determine segment boundaries– Sometimes large differences found between

number of households in frame (census) and number listed

Page 28: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Selection of sample households from listing

• Selection of households in the office following listing operation– Advantages – conducted by specialized staff, possible to avoid

selection bias in the field, possible to control overall sample size– Disadvantage – increased costs from having two field visits

• Selection of households in field– Advantage – cost savings of having one integrated field operation– Disadvantage - correct sampling may be difficult for field staff,

selection may be biased• Self-weighting samples – cluster sizes somewhat variable• Selection of fixed number of sample households per cluster

– Controls sample size, allowing weights to vary somewhat by EA• Use of household selection table in field

– Easy to use, minimizes selection bias

Page 29: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Considerations for designingself-weighting samples

• Main advantage of self-weighting sample is to simplify the estimation procedures– Also effective for national-level estimates

• Disadvantages of self-weighting samples– May not be possible to obtain reliable estimates for smaller

subnational groups, given proportional allocation of sample– Difficult to control overall sample size– Use of SPSS and other software packages that automatically weight

survey tables reduce advantages of self-weighting samples• Most countries are not using self-weighting samples for MICS

– Prefer selection of fixed number of households per EA

Page 30: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Subnational estimates• Number of separate areas (domains) for which separate, equally

reliable estimates are wanted affects sample size• For example, if 10 regional estimates are wanted, theoretically the

sample should be increased by factor of 10• As a compromise, larger sampling errors accepted for subnational

estimates– One proposal (by Dr. Vijay Verma) – increase national sample size by

factor of D0.65, where D is the number of domains– Results in an average increase in the sampling errors for domain estimates

by a factor of about 1.5– Minimum number of PSUs required for each domain – for example, 30

clusters• Allocation of sample to domains

– Equal allocation– Modified proportional allocation, with a minimum and maximum number

of sample PSUs per domain

Page 31: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Survey weighting procedures

• All analysis based on survey data must apply survey weights in order to prevent biased results

• Survey weighting is design-specific– Overall probability of selection has component from each

sampling stage.– Design weight is inverse of final probability of selection

• Non-response must be taken into account– Separate non-response adjustment for households, women age

15-49 years and children under 5 years

Page 32: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Survey weighting procedures• Formulas for calculating weights depend on the exact

sample design used in each country• Design weights important for validating calculation of

weights and coverage of frame– Weighted total number of households by region, urban and

rural strata should be compared to corresponding distribution from census data or projections

• Normalized weights – each weight is divided by the overall average weight– Using normalized weights, the weighted and unweighted total

number of sample cases (households, women and children) are equal

• Review of templates for calculating weights

Page 33: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Sampling error estimation• Calculation of sampling errors necessary to evaluate reliability of

survey estimates• Should be done for 30-50 important indicators• Methodology is complex and design-specific• There are several software options for sampling error calculations:

– SPSS – Complex Samples add-on – calculation of standard errors, confidence intervals and design effects

– Other existing software can be used (Stata, Clusters, WesVar, CENVAR, PCCarp, etc.)

– Soon variance component will be added to CSPro• Review of SPSS sampling error application

Page 34: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Reducing bias• Accuracy of survey results depends on both variance and bias

(mostly from nonsampling errors)• Bias should be minimized with quality control for all survey

operations• Basic data quality determined during enumeration

– Important to have good training and supervision in the field• Data capture should include 100% or sample verification • Important to have quality control for editing and coding

procedures• Computer consistency and range checks

Page 35: MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling.

MICS4 Survey Design Workshop

Country example

• 2008 Mozambique MICS3• Use of existing survey• Subsample of EAs from the other survey• Shared listing with another survey• Different households selected for each survey