2To put your footer here go to View > Header and Footer
Learning ObjectivesBy the end of this session, you will be able to
• explain the role of sampling weights in estimating population parameters
• calculate sampling weights for very simple sampling designs
• appreciate that calculating sampling weights for complex survey designs is non-trivial and requires professional expertise
3To put your footer here go to View > Header and Footer
What is meant by sampling weights?
• Real surveys are generally multi-stage
• At each stage, probabilities of selecting units at that stage are not generally equal
• When population parameters like a mean or proportion is to be estimated, results from lower levels need to be scaled-up from the sample to the population
• This scaling-up factor, applied to each unit in the sample is called its sampling weight.
4To put your footer here go to View > Header and Footer
A simple example• Suppose for example, a simple random sample
of 500 HHs in a rural district (having 7349 HHs in total) showed 140 were living below the poverty line
• Hence total in population living below the poverty line = (140/500)*7349 =2058
• Data for each HH was a 0,1 variable, 1 being allocated if HH was below poverty line.
• Multiplying this variable by 7349/500=14.7 & summing would lead to the same answer.
• i.e. sampling weight for each HH = 14.7
5To put your footer here go to View > Header and Footer
Why are weights needed?• Above was a trivial example with equal
probabilities of selection
• In general, units in the sample have very differing probabilities of selection, i.e. rare to get a self-weighting design
• To allow for unequal probabilities of selection, each unit is weighted by the reciprocal of its probability of selection
• Thus sampling weight=(1/prob of selection)
6To put your footer here go to View > Header and Footer
Weights in stratified sampling• Consider “To the Woods” example data set
discussed in Session 10.
• Mean number of large trees were:– 97.875 in region 1, based on n1=8
– 83.500 in region 2, based on n2=6
• Hence total number of large trees in the forest can be computed as(96*97.875) + (72*83.5) = 15408
• So what are the sampling weights used for each unit (plot)?
7To put your footer here go to View > Header and Footer
Self-weighting again• The sampling weights are the same for all
plots, whether in region 1 or region 2. Why is this?
• What are the probabilities of selection here?– In region 1, each unit is selected with prob=8/96– In region 2, each unit is selected with prob=6/72
• Recall that a design where probabilities of selection are equal for all selected units is called a self-weighting design.
• So regarding the sample as a simple random sample should give us the correct mean.
8To put your footer here go to View > Header and Footer
Results for means• The mean number of large trees, using the
formula for stratified sampling, gives[(96/168)*97.875 ] + [(72/168)*83.5]= 91.71
• Regarding the 14 observations pretending they were drawn as a simple random sample gives 91.71 as the answer.
• The results for variances however differ– Variance of stratified sample mean=1.28– Variance of mean ignoring stratification = 2.18
9To put your footer here go to View > Header and Footer
Results for means• Important to note that the weights used in
computing a mean, i.e. – (96/168)*(1/8) = 1/14 for plots in region 1, & – (72/168)*(1/6) = 1/14 for plots in region 2,
are not sampling weights
• Sampling weights refer to the multiplying factor when estimating a total.
• Essentially they represent the number of elements in the population that an individual sampling unit represent.
10To put your footer here go to View > Header and Footer
Other uses of weight• Weights are also used to deal with non-
responses and missing values
• If measurements on all units are not available for some reason, may re-compute the sampling weights to allow for this.
• e.g. In conducting the Household Budget Survey 2000/2001 in Tanzania, not all rural areas planned in the sampling scheme were visited. As a result, sampling weights had to be re-calculated and used in the analysis.
11To put your footer here go to View > Header and Footer
Computation of weights• General approach is to find the probability of
selecting a unit at every stage of the sample selection process
• e.g. in a 3-stage design, three set of probabilities will result
• Probability of selecting each final stage unit is then the product of these three probabilities
• The reciprocal of the above probability is then the sampling weight
12To put your footer here go to View > Header and Footer
Difficulties in computations
• Standard methods as illustrated in textbooks on sampling, often do not apply in real surveys
• Complex sampling designs are common
• Computing correct probabilities of selection can then be very challenging
• Usually professional assistance is needed to determine the correct sampling weights and to use in correctly in the analysis
13To put your footer here go to View > Header and Footer
Software for dealing with weights
• When analysing data from complex survey designs, it is important to check that the software can deal with sampling weights
• Packages such as Stata, SAS, Epi-info have facilities for dealing with sampling weights
• However, need to be careful that the approaches used are appropriate for your own survey design
14To put your footer here go to View > Header and Footer
References• Brogan, D. (2004) Sampling error estimation for
survey data. Chapter XII, pp.447-490, of the UN Publication An Analysis of Operating Characteristics of Household Surveys in Developing and Transition Countries: Survey Costs, Design Effects and Non-Sampling Errors. Available at http://unstats.un.org/unsd/hhsurveys/index.htm. (accessed 10th September 2007)
• Lohr, S.L. (1999) Sampling: Design and Analysis. International Thomson Publishing. ISBN 0-534-35361-4
• Rao, P.S.R.S. (2000) Sampling Methodologies: with applications. Chapman and Hall, London.