Sampling weights: an appreciation

SADC Course in Statistics

Sampling weights: an appreciation

(Sessions 19)

2To put your footer here go to View > Header and Footer

Learning ObjectivesBy the end of this session, you will be able to

• explain the role of sampling weights in estimating population parameters

• calculate sampling weights for very simple sampling designs

• appreciate that calculating sampling weights for complex survey designs is non-trivial and requires professional expertise


What is meant by sampling weights?

• Real surveys are generally multi-stage

• At each stage, probabilities of selecting units at that stage are not generally equal

• When population parameters like a mean or proportion is to be estimated, results from lower levels need to be scaled-up from the sample to the population

• This scaling-up factor, applied to each unit in the sample is called its sampling weight.


A simple example• Suppose for example, a simple random sample

of 500 HHs in a rural district (having 7349 HHs in total) showed 140 were living below the poverty line

• Hence total in population living below the poverty line = (140/500)*7349 =2058

• Data for each HH was a 0,1 variable, 1 being allocated if HH was below poverty line.

• Multiplying this variable by 7349/500=14.7 & summing would lead to the same answer.

• i.e. sampling weight for each HH = 14.7


Why are weights needed?• Above was a trivial example with equal

probabilities of selection

• In general, units in the sample have very differing probabilities of selection, i.e. rare to get a self-weighting design

• To allow for unequal probabilities of selection, each unit is weighted by the reciprocal of its probability of selection

• Thus sampling weight=(1/prob of selection)


Weights in stratified sampling• Consider “To the Woods” example data set

discussed in Session 10.

• Mean number of large trees were:– 97.875 in region 1, based on n1=8

– 83.500 in region 2, based on n2=6

• Hence total number of large trees in the forest can be computed as(96*97.875) + (72*83.5) = 15408

• So what are the sampling weights used for each unit (plot)?


Self-weighting again• The sampling weights are the same for all

plots, whether in region 1 or region 2. Why is this?

• What are the probabilities of selection here?– In region 1, each unit is selected with prob=8/96– In region 2, each unit is selected with prob=6/72

• Recall that a design where probabilities of selection are equal for all selected units is called a self-weighting design.

• So regarding the sample as a simple random sample should give us the correct mean.


Results for means• The mean number of large trees, using the

formula for stratified sampling, gives[(96/168)*97.875 ] + [(72/168)*83.5]= 91.71

• Regarding the 14 observations pretending they were drawn as a simple random sample gives 91.71 as the answer.

• The results for variances however differ– Variance of stratified sample mean=1.28– Variance of mean ignoring stratification = 2.18


Results for means• Important to note that the weights used in

computing a mean, i.e. – (96/168)*(1/8) = 1/14 for plots in region 1, & – (72/168)*(1/6) = 1/14 for plots in region 2,

are not sampling weights

• Sampling weights refer to the multiplying factor when estimating a total.

• Essentially they represent the number of elements in the population that an individual sampling unit represent.


Other uses of weight• Weights are also used to deal with non-

responses and missing values

• If measurements on all units are not available for some reason, may re-compute the sampling weights to allow for this.

• e.g. In conducting the Household Budget Survey 2000/2001 in Tanzania, not all rural areas planned in the sampling scheme were visited. As a result, sampling weights had to be re-calculated and used in the analysis.


Computation of weights• General approach is to find the probability of

selecting a unit at every stage of the sample selection process

• e.g. in a 3-stage design, three set of probabilities will result

• Probability of selecting each final stage unit is then the product of these three probabilities

• The reciprocal of the above probability is then the sampling weight


Difficulties in computations

• Standard methods as illustrated in textbooks on sampling, often do not apply in real surveys

• Complex sampling designs are common

• Computing correct probabilities of selection can then be very challenging

• Usually professional assistance is needed to determine the correct sampling weights and to use in correctly in the analysis


Software for dealing with weights

• When analysing data from complex survey designs, it is important to check that the software can deal with sampling weights

• Packages such as Stata, SAS, Epi-info have facilities for dealing with sampling weights

• However, need to be careful that the approaches used are appropriate for your own survey design


References• Brogan, D. (2004) Sampling error estimation for

survey data. Chapter XII, pp.447-490, of the UN Publication An Analysis of Operating Characteristics of Household Surveys in Developing and Transition Countries: Survey Costs, Design Effects and Non-Sampling Errors. Available at http://unstats.un.org/unsd/hhsurveys/index.htm. (accessed 10th September 2007)

• Lohr, S.L. (1999) Sampling: Design and Analysis. International Thomson Publishing. ISBN 0-534-35361-4

• Rao, P.S.R.S. (2000) Sampling Methodologies: with applications. Chapman and Hall, London.


Sampling weights: an appreciation

Documents

Sampling weights: an appreciation