Mar 26, 2015
SAMPLING METHODS
Stratification and Clustering
Richard L. Scheaffer
University of Florida
http://courses.ncssm.edu/math/Stat_Inst/Notes.htm
The Sampling Process
• Population
• Sampling Unit
• Sampling Frame
• Sampling Design
• Sample
…and Process Failures
• Sampling errors
• Non-sampling errors*
PSI Example
CEMENT PLANT
Avenues 1 2 3 4 5 6 7 8 9 10
1 121 118 124 123 116 118 120 118 114 122 2 116 118 118 113 117 116 117 112 112 115 3 114 107 109 106 112 108 112 110 111 111 4 105 104 103 101 103 105 104 106 109 107
Streets 5 100 100 101 96 98 96 100 100 105 100 6 97 95 96 94 96 95 96 97 96 97 7 92 90 91 89 93 94 93 92 92 90 8 86 81 85 87 85 85 86 87 83 84 9 80 78 80 79 77 81 81 79 84 81 10 76 77 74 77 75 74 80 75 77 74
Display 1: Grid of houses showing PSI measurements
Cluster_Column
Cluster_Row
Random_Sample
Stratify_Column
Stratify_Row
80 90 100 110 120
Sample_Mean
Sampling Design ComparisonsBox Plot
Display 2: Samples of size 10 by various designs
SIMPLE RANDOM SAMPLINGThe observations y1, y2, . . .yn are to be sampled from a population with mean , standard deviation , and of size N in such a way that every possible sample of size n has an equal chance of being selected. If the sample mean is denoted by, it follows that
and
E( y ) = μ
(V y ) =
σ
2
n
N − n
N − 1
⎛
⎝
⎜
⎞
⎠
⎟
For the sample variance s2, it can be shown that
and, thus, that an unbiased estimator of the variance of the sample mean is given by
€
ˆ V (y ) =s2
nN −n
N
⎛ ⎝ ⎜
⎞ ⎠ ⎟=
s2
n1−f)( )
€
E(s
2
) =
N
N − 1
σ
2
= S
2
STRATIFIED RANDOM SAMPLING
Stratified sampling designs:
• Convenience (administration efficiency)• Estimates desired for each of the strata • Reduced variation (statistical efficiency)
Let
€
y i denote the sample mean for the simple random sample selected from stratum i, ni the sample size for stratum i, iμ the population mean for stratum i, and Ni the size of stratum i.
€
y st = 1N
Niy ii=1
L∑ = Wiy ii=1
L∑
€
V (y st∧
)
€
= 1N2 Ni
2 Ni −niNi
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟i=1
L∑
si2
ni
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
Allocation
From a total sample size of n, how many observations should be allocated to stratum i? The sample size and allocation across strata may be chosen to minimize variance for fixed sample size
€
ni
= nN
iσ
i
Njσ
j∑
Proportional Allocat ion
€
ni
= nN
i
Nj∑
nN
i
N
€
Comparison to SRS
The following comparisons apply for situations in which the Ni are all relatively large. (Here Wi = Ni/N.)
Vran - Vprop =
and
Vprop - Vopt =
1 − f
n
W
i
∑ ( μ
i
− μ )
2
1
n
W
i
(S
i
∑ − S )
2
where S = W
i
∑ S
i
CLUSTER SAMPLING
Commonly used when:• Frame for elements is relatively difficult to obtain• Frame for clusters of elements of elements is
relatively easy to obtain
Examples: • Classrooms versus students• City blocks versus residents• Cartons of items stored in a warehouse versus
individual items
Examples of Clusters
• Polls (home or workplace)
• Crop yield surveys (trees, corn, sugar cane)
• Animal studies (traps, colonies)
• Systematic sample
Single-stage cluster sample:
Select a simple random sample of clusters and then measure each element within the sampled clusters.
N = number of clusters in the population n = number of clusters selected in a simple random sample mi = number of elements in cluster i, i = 1, . . . , N
€
m =1n mii=1n∑ = average cluster size for the sample
yi = total of all observations in the ith cluster
€
y =yii=1
n∑
mii=1n∑
=y tm
where
€
ˆ V (y ) = N − nN
⎛
⎝ ⎜
⎞
⎠ ⎟
1
nm 2sr2
€
sr2 =
( yi
− y mi)2
i=1
n∑
n − 1
01234
5678
0 2 4 6 8 10 12 14Residences_m
Renters_y = 0.488Residences_m
Rented Residences Scatter Plot
Two-stage cluster sampling with equal cluster sizes
€
ˆ μ =N
M
⎛
⎝ ⎜
⎞
⎠ ⎟
M iy ii =1
n
∑
n=
1
M
M iy ii =1
n
∑
n
Two-stage Cluster Sampling with Equal Cluster Sizes
€
ˆ V (̂ μ )=(1−f1)MSBnm+(1−f2)1N ⎛ ⎝ ⎜ ⎞
⎠ ⎟MSWm
where f1 = n/N, f2 = m/M,
MSB =
€
mn−1 y i−̂ μ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟2i=1n∑
and
MSW =
€
1n(m−1) yij−y i ⎛ ⎝ ⎜ ⎞
⎠ ⎟2
j=1m∑i=1
n∑
=
€
1n si2i=1n∑
1. If N is large,
€
ˆ V (̂ μ ) = MSB/nm and depends only on the c luster means.
2. If m =
€
M (or f2 = 1), then two-stage cluster sampling reduces to one stage cluster sampling.
3. If n = N then two-stage cluster sampling becomes stratified random
sampling with N strata and m observations from each.
FIGURE 1: Distribution of class GPA’s
2.00
2.50
3.00
3.50
4.00
GPA
N=60 Mean 3.27 Standard Deviation 0.55
FIGURE 2: Distribution of sample means from simple random sampling
3.00 3.20 3.40
5
10
15
20
SRS 20 Mean 3.27 Standard Deviation 0.105
FIGURE 3: GPA’S by gender
2.0
2.5
3.0
3.5
4.0
1 2
Gender
G
P
A
Count Mean Standard Deviation F 45 3.39 0.436 M 15 2.92 0.698
FIGURE 4: Distribution of sample means from stratified random sampling
3.00 3.20 3.40
5
10
15
20
StRS 20 Mean 3.27 Standard Deviation 0.091 n=20 n1=15 n2=5
FIGURE 5: Distribution of sample means from cluster sampling; ordered clusters
2.5 3.0 3.5
5
10
15
20
25
MeanOrd Mean 3.30 Standard Deviation 0.232 (n=4, m=5)
FIGURE 6: Distribution of sample means from cluster sampling; random clusters
2.5 3.0 3.5
10
20
30
40
MeanRan
Mean 3.30 Standard Deviation 0.089
FIGURE 7: Distribution of sample means from cluster sampling; systematic clusters
2.50 2.75 3.00 3.25
10
20
30
40
Mean
Mean 3.27 Standard Deviation 0.055
Summary Chart
Mean Standard Deviation
Population 3.27 0.550 Simple random 3.27 0.105 Stratified (gender) 3.27 0.091 Cluster-ordered 3.30 0.232 Cluster-random 3.30 0.089 Cluster-systematic 3.27 0.055
Names to Explore
• P. C. Mahalanobis– "The I.S.I. has taken the lead in the original development of the
technique of sample surveys, the most potent fact finding process available to the administration". R. A. Fisher
• Walter Shewhart• Jerzy Neyman• William Cochran• Edwards Deming• Warren Mitofsky
References• Groves, Robert; Dillman, Donald; Eltinge, John;
and Little, Roderick, editors. 2002. Survey Nonresponse. New York: Wiley.
• Lohr, S. 1999. Sampling: Design and Analysis, Pacific Grove, CA: Brooks Cole.
• Scheaffer, Richard; Mendenhall, William; and Ott, R. Lyman. 1996. Elementary Survey Sampling, 5th ed. Belmont,CA: Duxbury Press.
EXTRAS!
Equal cluster sizes-comparison to SRS
€
MSB =SSB
n −1=
m
n −1(y i − y c )2
i =1
n
∑
€
MSW =SSW
n(m−1)=
1
n(m−1)(yij − y i )
2
j =1
m
∑i =1
n
∑
€
ˆ V (y c ) =N − n
N
⎛
⎝ ⎜
⎞
⎠ ⎟1
nmMSB
For a random sample of size mn,
€
ˆ V (y ) =Nm− nm
Nm
⎛
⎝ ⎜
⎞
⎠ ⎟s2
nm=
N − n
N
⎛
⎝ ⎜
⎞
⎠ ⎟s2
nm
€
ˆ s 2 ≈1
m(m−1)MSW + MSB[ ]
€
RE∧
(y c/y ) = ˆ s 2
MSB