GS Training and Outreach Workshop on Agricultural Surveys Training Seminar: Sampling and Estimation in Agricultural Surveys Cristiano Ferraz 24 October 2016
GS Training and Outreach Workshop on Agricultural Surveys
Training Seminar:
Sampling and Estimation in Agricultural Surveys
Cristiano Ferraz 24 October 2016
Sampling and Estimation in Agricultural Surveys
http://gsars.org/wp-content/uploads/2016/02/MSF-010216-web.pdf
Download a free copy of the Handbook at:
Sampling and Estimation in Agricultural Surveys
Objective:To provide the participants the opportunity to get in touch with key-concepts and practical aspects of designing a sample to generate agriculture estimates.
Sampling and Estimation in Agricultural Surveys
Overview:
• Agricultural Surveys: Challenging features• Typical Frames in Ag-Surveys• Single Frame Surveys• Multiple Frame Designs• Dual Frame Survey
Sampling and Estimation in Agricultural Surveys
Agricultural Surveys: Challenging Features
• Covers a large spectrum of subjects• There is a great variety of variables of interest
Sampling and Estimation in Agricultural Surveys
Agricultural Surveys: Challenging Features
• Often a multi-subject/multi-purpose survey• Suffers influence from nature and culture
Sampling and Estimation in Agricultural Surveys
Agricultural Surveys: Challenging Features
• Require periodicity
Sampling and Estimation in Agricultural Surveys
Typical Frames in Agriculture:
• List Frames• Area Frames• Dual Frames• Multiple Frames
Sampling FRAME
A Sampling Frame can be defined as a reference system composed by a set of materials, devices or coordinates that identifies and provides access to samplingunits, so that a sample can be selected andits elements can be reached.
What is a Sampling Frame?
Sampling and Estimation in Agricultural Surveys
Sampling FRAME
This type of frame is recognized by the main characteristic of listing its components. Examples of list frames include: • a list of farmers from a country or region; • a list of associates from a cooperative
association; • a list of beneficiaries of a type of government
policy program, etc.
What is a List Frame?
Sampling and Estimation in Agricultural Surveys
Sampling FRAME
Area frames are used to geographically cover a target population. Typical area frames use technological devices to identify and to provide access (coordinates) to well defined segments of lands.
What is an Area Frame?
Sampling and Estimation in Agricultural Surveys
Sampling FRAME
Sampling FRAME
Household Survey
Agricultural Survey
Sampling FRAME
Sampling FRAME
Grain Survey
Livestock Survey
What is a Master Sampling Frame?
Sampling and Estimation in Agricultural Surveys
MASTERSAMPLING
FRAME
Household Survey
Agricultural Survey
Grain Survey
Livestock Survey
A Master Sampling Frame is a unique Sampling Frame System from which samples for different surveys can be selected, each one using its own probability sample design. Used in this way, Master Frames can be an efficient tool to integrate surveys.
What is a Master Sampling Frame?
Sampling and Estimation in Agricultural Surveys
MASTERSAMPLING
FRAME
Agricultural Survey: T1
Agricultural Survey: T3
Agricultural Survey: T2
Agricultural Survey: T4
A Master Sampling Frame can also be used to select samples for the same survey at different points in time. Used in this way, Master Frames provide the sampling support to longitudinal, and panel type surveys.
Sampling and Estimation in Agricultural Surveys
What is a Master Sampling Frame?
Sampling and Estimation in Agricultural Surveys
While in many cases there is a one-to-one relationship between the agriculturalholding, the household, and the land parcel, it is not always that this happens.
Challenge:Master Sampling Frames for Agriculture Surveys must satisfy the needs of three statistical units: • the farm or agricultural holding; • the household; and • the land.
Single Frame Surveys:
Basic fundamental concepts:• Population or target
population• Subpopulation• Frame and Sampled
Population• Sampling unit• Observation unit• Reporting unit
Sampling and Estimation in Agricultural Surveys
Sistema de Produção de Gado de Leite do Agreste Meridional de Pernambuco
Sampling and Estimation in Agricultural Surveys
Sistema de Produção de Gado de Leitedo Agreste Meridional de Pernambuco
Sampling and Estimation in Agricultural Surveys
Population or Target Population:
Sistema de Produção de Gado de Leitedo Agreste Meridional de Pernambuco
Sampling and Estimation in Agricultural Surveys
Subpopulation:
• Multi-purpose aspects of agricultural surveys may requireestimates for subpopulations of interest.
• These are specific subsets of elementary units for whichinferences are required.
• For example, inference for the subpopulation of milkproducers that have received technical support from localgovernmental agencies could be necessary.
Sistema de Produção de Gado de Leitedo Agreste Meridional de Pernambuco
Sampling and Estimation in Agricultural Surveys
Frame and Sampled Population:
Target population: set of all milk producers from the Agreste Meridional de Pernambuco - AMPE
Frame: List of all milk producers from AMPE that sells their milk to a given Industry
FRAME: 95% coverage level
Sistema de Produção de Gado de Leitedo Agreste Meridional de Pernambuco
Sampling and Estimation in Agricultural Surveys
Sampling Unit, Observation Unit and Reporting Unit:
Sampling and Estimation in Agricultural Surveys
Survey error = sampling error + non-sampling error
Sample Surveys
Census
Sampling errorNon-sampling error
Non-sampling error
Sampling and Estimation in Agricultural Surveys
Design-based inference for finite populations
Suppose that 𝑁𝑁 is the size of the target population, and let 𝑈𝑈 be the set ofindices uniquely identified: 𝑈𝑈 = {1,2, … ,𝑁𝑁}. Let 𝑆𝑆 ⊂ 𝑈𝑈 be a sample of 𝑛𝑛from 𝑈𝑈.
Let 𝑦𝑦𝑘𝑘 be the value of the variable of interest 𝑦𝑦 for unit 𝑘𝑘 of the targetpopulation 𝑈𝑈.
The inclusion of 𝑘𝑘 in the sample is indicated by the following randomvariable:
𝐼𝐼𝑘𝑘 = 𝐼𝐼𝑘𝑘 𝑆𝑆 = � 1, 𝑖𝑖𝑖𝑖 𝑘𝑘 ∈ 𝑆𝑆0, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖𝑜𝑜𝑜𝑜
Sampling and Estimation in Agricultural Surveys
𝐼𝐼𝑘𝑘~𝐵𝐵𝑜𝑜𝑜𝑜𝑛𝑛𝑜𝑜𝐵𝐵𝐵𝐵𝐵𝐵𝑖𝑖(𝜋𝜋𝑘𝑘)
In general, when sampling from finite populations,
𝐶𝐶𝑜𝑜𝐶𝐶(𝐼𝐼𝑘𝑘 , 𝐼𝐼𝑙𝑙) = 𝜋𝜋𝑘𝑘𝑙𝑙 − 𝜋𝜋𝑘𝑘 𝜋𝜋𝑙𝑙
𝑉𝑉𝑉𝑉𝑜𝑜(𝐼𝐼𝑘𝑘) = 𝜋𝜋𝑘𝑘(1 − 𝜋𝜋𝑘𝑘)
𝐸𝐸(𝐼𝐼𝑘𝑘) = 𝜋𝜋𝑘𝑘
Sampling and Estimation in Agricultural Surveys
The randomization role:
Probability sampling designs determine the exact distribution of 𝐼𝐼𝑘𝑘 ,providing the sample inclusion probabilities:
𝜋𝜋𝑘𝑘 = 𝑃𝑃 𝐼𝐼𝑘𝑘 = 1 ; 𝜋𝜋𝑘𝑘𝑙𝑙= 𝑃𝑃 𝐼𝐼𝑘𝑘𝐼𝐼𝑙𝑙 = 1 .
Probability sampling designs require that all 𝜋𝜋𝑘𝑘>0.
Sampling and Estimation in Agricultural Surveys
Design-based inference for finite populations
Probability sampling designs determine the exact distribution of 𝐼𝐼𝑘𝑘 ,providing the sample inclusion probabilities:
𝜋𝜋𝑘𝑘 = 𝑃𝑃 𝐼𝐼𝑘𝑘 = 1 ; 𝜋𝜋𝑘𝑘𝑙𝑙= 𝑃𝑃 𝐼𝐼𝑘𝑘𝐼𝐼𝑙𝑙 = 1 .
Probability sampling designs require that all 𝜋𝜋𝑘𝑘>0.
• First-order inclusion probability
Design-based inference for finite populations
Probability sampling designs determine the exact distribution of 𝐼𝐼𝑘𝑘 ,providing the sample inclusion probabilities:
𝜋𝜋𝑘𝑘 = 𝑃𝑃 𝐼𝐼𝑘𝑘 = 1 ; 𝜋𝜋𝑘𝑘𝑙𝑙= 𝑃𝑃 𝐼𝐼𝑘𝑘𝐼𝐼𝑙𝑙 = 1 .
Probability sampling designs require that all 𝜋𝜋𝑘𝑘>0.
Sampling and Estimation in Agricultural Surveys
• Second-order inclusion probability
Sampling and Estimation in Agricultural Surveys
Parameter and estimator:
Given a probability sampling design, a unifying result, due to Horvitz andThompson (1952) ensures unbiased estimation of parameters such asmeans, totals and percentages. Lets focus on the problem of estimatinga population total (parameter):
𝑌𝑌 = �𝑘𝑘∈𝑈𝑈
𝑦𝑦𝑘𝑘
The Horvitz-Thompson estimator for 𝑌𝑌 is given by:
�𝑌𝑌 = �𝑘𝑘∈𝑆𝑆
𝑦𝑦𝑘𝑘𝜋𝜋𝑘𝑘
Sampling and Estimation in Agricultural Surveys
The variance of the Horvitz-Thompson estimator can be written as:
𝑉𝑉𝑉𝑉𝑜𝑜𝑝𝑝 �𝑌𝑌 = ∑𝑘𝑘∈𝑈𝑈∑𝑙𝑙∈𝑈𝑈(𝜋𝜋𝑘𝑘𝑙𝑙 − 𝜋𝜋𝑘𝑘𝜋𝜋𝑙𝑙)𝑦𝑦𝑘𝑘𝜋𝜋𝑘𝑘
𝑦𝑦𝑙𝑙𝜋𝜋𝑙𝑙
In addition, an unbiased estimate of this variance may be obtained using:
�𝑉𝑉𝑉𝑉𝑜𝑜𝑝𝑝 �𝑌𝑌 = �𝑘𝑘∈𝑆𝑆
�𝑙𝑙∈𝑆𝑆
(𝜋𝜋𝑘𝑘𝑙𝑙−𝜋𝜋𝑘𝑘𝜋𝜋𝑙𝑙)𝜋𝜋𝑘𝑘𝑙𝑙
𝑦𝑦𝑘𝑘𝜋𝜋𝑘𝑘
𝑦𝑦𝑙𝑙𝜋𝜋𝑙𝑙
Frame and sample design:
An important characteristic of frames is the nature of its sampling unit.On one hand, it is possible to identify either LIST or AREA frames. On theother, it is possible to identify:
• Type A: Frames with sampling units as elements of the population;• Type B: Frames with sampling units as sets of elements of the
population.
Availability of type A frames allows for direct element sampling designsto be used.
Sampling and Estimation in Agricultural Surveys
Suppose a type A frame is available:
• Simple Random Sampling• Systematic Sampling• Probability Proportional to Size Design – PPS• Multivariate Probability Proportional to Size Design – MPPS• Stratified Sampling
Sampling and Estimation in Agricultural Surveys
Suppose a type A frame is available:
• Simple Random Sampling• Systematic Sampling• Probability Proportional to Size Design – PPS• Multivariate Probability Proportional to Size Design – MPPS• Stratified Sampling
Sampling and Estimation in Agricultural Surveys
These designs need auxiliary information
Sampling and Estimation in Agricultural Surveys
Simple Random Sampling
Samples selected from a population of size N according to a simple randomsampling design have a pre-assigned size n, and are such that theprobability of selecting a given sample s is
𝑃𝑃 𝑜𝑜 = 𝑁𝑁𝑛𝑛
−1
In a simple random sample, the first and second order inclusionprobabilities are
𝜋𝜋𝑘𝑘 =𝑛𝑛𝑁𝑁
𝑉𝑉𝑛𝑛𝑎𝑎 𝜋𝜋𝑘𝑘𝑙𝑙 =𝑛𝑛(𝑛𝑛 − 1)𝑁𝑁(𝑁𝑁 − 1)
Sampling and Estimation in Agricultural Surveys
Systematic Sampling
Suppose that a sample of size n is to be selected from a population of size Nusing a systematic sampling design. First, a sample interval, given by
𝑉𝑉 =𝑁𝑁𝑛𝑛
is calculated. Suppose that a is an integer number. Then, a sample of one israndomly selected from the first a elements identified by the frame.Thereafter, every a-th element of the frame is also included in the sample.
Sampling and Estimation in Agricultural Surveys
In systematic sampling, the inclusion probabilities are
𝜋𝜋𝑘𝑘 =𝑛𝑛𝑁𝑁
𝑉𝑉𝑛𝑛𝑎𝑎 𝜋𝜋𝑘𝑘𝑙𝑙 = �𝑛𝑛𝑁𝑁
, 𝑖𝑖𝑖𝑖 𝑘𝑘 𝑉𝑉𝑛𝑛𝑎𝑎 𝐵𝐵 𝑉𝑉𝑜𝑜𝑜𝑜 𝑖𝑖𝑛𝑛 𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑉𝑉𝑠𝑠𝑠𝑠𝐵𝐵𝑜𝑜
0, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖𝑜𝑜𝑜𝑜
Sampling and Estimation in Agricultural Surveys
Probability Proportional to Size Sampling (PPS)
In the previous examples, each population unit had the same chance ofbeing selected, regardless of the method of selection or the populationunit’s actual size.
If a measure of size (relevance) can be attached to each unit, a probability-proportional-to-size (PPS) sample can be drawn.
Sampling and Estimation in Agricultural Surveys
The following example is used to illustrate PPS sampling:
Name Measure of Size Accumulated Measure
1 10 10
2 1 11
3 4 15
4 15 30
5 5 35
Sampling and Estimation in Agricultural Surveys
The following example is used to illustrate Systematic PPS sampling:
Name Measure of Size Accumulated Measure
1 10 10
2 1 11
3 4 15
4 15 30
5 5 35
Sampling and Estimation in Agricultural Surveys
Multivariate probability-proportional-to-size (MPPS)The same example as that described for the PPS, with two available measurements of size, follows:
Name Measure 1 Measure 2 Improved Accumulated
of Size of Size Size Measure Measure
1 10 8 10 10
2 1 2 2 12
3 3 4 4 16
4 15 10 15 31
5 5 19 19 50
Sampling and Estimation in Agricultural Surveys
Multivariate probability-proportional-to-size (MPPS)
Suppose that there are J ≥ 2 variables of interest (items),each having at least one auxiliary variable that can beused as a measurement of size. Let 𝑥𝑥𝑗𝑗𝑘𝑘 be the value ofthe size measure j for element k in a given f frame. Letalso
𝑋𝑋𝑗𝑗 = �𝑘𝑘∈𝑓𝑓
𝑥𝑥𝑗𝑗𝑘𝑘
be the total of the auxiliary variable j over frame f.
Sampling and Estimation in Agricultural Surveys
Multivariate probability-proportional-to-size (MPPS)
In addition, let 𝑛𝑛𝑗𝑗 be the sample size needed for the variable of interest j.Then, the inclusion probability under an MPPS design is given by
𝜋𝜋𝑘𝑘𝑓𝑓 = 𝑠𝑠𝑖𝑖𝑛𝑛 1,𝑠𝑠𝑉𝑉𝑥𝑥 𝑛𝑛𝑗𝑗
𝑥𝑥𝑗𝑗𝑘𝑘𝑋𝑋𝑗𝑗
, 𝑗𝑗 = 1,2 … 𝐽𝐽
The remaining steps for selecting the sample are identical to PPS sampling.
Stratified Sampling
In stratified sampling, the population is first divided into subgroups calledstrata, in a process called stratification. Then, elements are sampled fromeach stratum (subgroup) on the basis of a given probability sample design,such as simple random sampling.
Stratification can be used for several purposes, but each requires someinformation on the sample units. Sometimes, stratification is used whenestimates are to be made for subpopulations of interest, such as geographicor administrative areas or rare items.
Sampling and Estimation in Agricultural Surveys
Suppose a type B frame is available:
• Cluster Sampling• Two-stage Sampling
Sampling and Estimation in Agricultural Surveys
Cluster Sampling
The main characteristic of cluster sampling is that the sampling unit is acluster of units.
To select a cluster sample, a simple random sample of clusters is taken andeach unit in the selected clusters is investigated.
Systematic can also be used to select a cluster sample.
Sampling and Estimation in Agricultural Surveys
Two-stage Sampling
Two-stage sampling is the sampling procedure that results when eachselected cluster is subsampled for population elements.
Suppose that 50 farms clustered into 15 villages are to be surveyed.
Suppose further that it is decided to select five villages at random, obtain alisting of all farms within each selected village, and then select two farmsfrom within each village. In this case, each farm has a chance of appearing inthe sample at least once with each of the other farms, and the overall samplesize and survey workload can thus be controlled.
Sampling and Estimation in Agricultural Surveys
Sampling and Estimation in Agricultural Surveys
Cochran (1977) suggests a survey planning according to the following general topics:
I. Identification of the goals of the survey;II. Definition of the target population;III. Definition of the variables of interest and the data to be
collected;IV. Identification of the desired degree of precision;V. Selection of the data collection instrument;VI. Identification of a frame;VII. Design of the sample;
Sampling and Estimation in Agricultural Surveys
Cochran (1977) suggests a survey planning according to the following general topics:
VIII. Pre-test;IX. of the sample and collection of the data / organization of
the fieldwork;X. Data description and analysis;XI. Summary of the obtained information and
recommendations for future surveys.
Sampling and Estimation in Agricultural Surveys
Cochran (1977) suggests a survey planning according to the following general topics:
I. Identification of the goals of the survey;II. Definition of the target population;III. Definition of the variables of interest and the data to be
collected;IV. Identification of the desired degree of precision;V. Selection of the data collection instrument;VI. Identification of a frame;VII. Design of the sample;
Sampling Design
The choice of sample design depends on the type of the frame and theavailability of auxiliary information.
Example:
Sampling and Estimation in Agricultural Surveys
Sampling and Estimation in Agricultural Surveys
Multiple Frame Design
PopulationFrame 1
Frame 2
Frame 3
Sampling and Estimation in Agricultural Surveys
Multiple Frame Design
PopulationFrame 1
Frame 2
Frame 3
Frame 4
Sampling and Estimation in Agricultural Surveys
Multiple Frame Design
PopulationFrame 1
Frame 2
Frame 3
Frame 4
S1
Sampling and Estimation in Agricultural Surveys
Multiple Frame Design
PopulationFrame 1
Frame 2
Frame 3
Frame 4
S1S2
Sampling and Estimation in Agricultural Surveys
Multiple Frame Design
PopulationFrame 1
Frame 2
Frame 3
Frame 4
S1S2
S3
Sampling and Estimation in Agricultural Surveys
Multiple Frame Design
PopulationFrame 1
Frame 2
Frame 3
Frame 4
S1S2
S3
S4
Sampling and Estimation in Agricultural Surveys
Dual Frame Design
• Very flexible approach • Can accommodate a variety of estimators
• Compromise solution for dealing with disadvantages of area and list frames
• Accommodates the advantages of area and list frames
Sampling and Estimation in Agricultural Surveys
Dual Frame Assumptions:1. Completness2. Identifiability
Population
Sampling and Estimation in Agricultural Surveys
Dual Frame Assumptions:1. Completness2. Identifiability
Population
Area Frame provides full coverage
Sampling and Estimation in Agricultural Surveys
Dual Frame Assumptions:1. Completness2. Identifiability
PopulationList Frame
Area Frame Sample
Sampling and Estimation in Agricultural Surveys
Dual Frame Assumptions:1. Completness2. Identifiability
PopulationList Frame
Area Frame Sample
Area Frame Sampled Elements Identified at List Frame
Thank YouCristiano FerrazUniversidade Federal de PernambucoDepartamento de EstatísticaCAST – Computational Agriculture Statistics [email protected] | [email protected]
Sampling and Estimation in Agricultural Surveys