SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida [email protected] .

SAMPLING METHODS

Stratification and Clustering

Richard L. Scheaffer

University of Florida

[email protected]

http://courses.ncssm.edu/math/Stat_Inst/Notes.htm

mailto:[email protected]







The Sampling Process

• Population

• Sampling Unit

• Sampling Frame

• Sampling Design

• Sample

…and Process Failures

• Sampling errors

• Non-sampling errors*

PSI Example

CEMENT PLANT

Avenues 1 2 3 4 5 6 7 8 9 10

1 121 118 124 123 116 118 120 118 114 122 2 116 118 118 113 117 116 117 112 112 115 3 114 107 109 106 112 108 112 110 111 111 4 105 104 103 101 103 105 104 106 109 107

Streets 5 100 100 101 96 98 96 100 100 105 100 6 97 95 96 94 96 95 96 97 96 97 7 92 90 91 89 93 94 93 92 92 90 8 86 81 85 87 85 85 86 87 83 84 9 80 78 80 79 77 81 81 79 84 81 10 76 77 74 77 75 74 80 75 77 74

Display 1: Grid of houses showing PSI measurements

Cluster_Column

Cluster_Row

Random_Sample

Stratify_Column

Stratify_Row

80 90 100 110 120

Sample_Mean

Sampling Design ComparisonsBox Plot

Display 2: Samples of size 10 by various designs

SIMPLE RANDOM SAMPLINGThe observations y1, y2, . . .yn are to be sampled from a population with mean , standard deviation , and of size N in such a way that every possible sample of size n has an equal chance of being selected. If the sample mean is denoted by, it follows that

and

E( y ) = μ

(V y ) =

σ

2

n

N − n

N − 1

⎛

⎝

⎜

⎞

⎠

⎟

For the sample variance s2, it can be shown that

and, thus, that an unbiased estimator of the variance of the sample mean is given by

€

ˆ V (y ) =s2

nN −n

N

⎛ ⎝ ⎜

⎞ ⎠ ⎟=

s2

n1−f)( )

€

E(s

2

) =

N

N − 1

σ

2

= S

2

STRATIFIED RANDOM SAMPLING

Stratified sampling designs:

• Convenience (administration efficiency)• Estimates desired for each of the strata • Reduced variation (statistical efficiency)

Let

€

y i denote the sample mean for the simple random sample selected from stratum i, ni the sample size for stratum i, iμ the population mean for stratum i, and Ni the size of stratum i.

€

y st = 1N

Niy ii=1

L∑ = Wiy ii=1

L∑

€

V (y st∧

)

€

= 1N2 Ni

2 Ni −niNi

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟i=1

L∑

si2

ni

⎛

⎝

⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟

Allocation

From a total sample size of n, how many observations should be allocated to stratum i? The sample size and allocation across strata may be chosen to minimize variance for fixed sample size

€

ni

= nN

iσ

i

Njσ

j∑

Proportional Allocat ion

€

ni

= nN

i

Nj∑

nN

i

N

€

Comparison to SRS

The following comparisons apply for situations in which the Ni are all relatively large. (Here Wi = Ni/N.)

Vran - Vprop =

and

Vprop - Vopt =

1 − f

n

W

i

∑ ( μ

i

− μ )

2

1

n

W

i

(S

i

∑ − S )

2

where S = W

i

∑ S

i

CLUSTER SAMPLING

Commonly used when:• Frame for elements is relatively difficult to obtain• Frame for clusters of elements of elements is

relatively easy to obtain

Examples: • Classrooms versus students• City blocks versus residents• Cartons of items stored in a warehouse versus

individual items

Examples of Clusters

• Polls (home or workplace)

• Crop yield surveys (trees, corn, sugar cane)

• Animal studies (traps, colonies)

• Systematic sample

Single-stage cluster sample:

Select a simple random sample of clusters and then measure each element within the sampled clusters.

N = number of clusters in the population n = number of clusters selected in a simple random sample mi = number of elements in cluster i, i = 1, . . . , N

€

m =1n mii=1n∑ = average cluster size for the sample

yi = total of all observations in the ith cluster

€

y =yii=1

n∑

mii=1n∑

=y tm

where

€

ˆ V (y ) = N − nN

⎛

⎝ ⎜

⎞

⎠ ⎟

1

nm 2sr2

€

sr2 =

( yi

− y mi)2

i=1

n∑

n − 1

01234

5678

0 2 4 6 8 10 12 14Residences_m

Renters_y = 0.488Residences_m

Rented Residences Scatter Plot

Two-stage cluster sampling with equal cluster sizes

€

ˆ μ =N

M

⎛

⎝ ⎜

⎞

⎠ ⎟

M iy ii =1

n

∑

n=

1

M

M iy ii =1

n

∑

n

Two-stage Cluster Sampling with Equal Cluster Sizes

€

ˆ V (̂ μ )=(1−f1)MSBnm+(1−f2)1N ⎛ ⎝ ⎜ ⎞

⎠ ⎟MSWm

where f1 = n/N, f2 = m/M,

MSB =

€

mn−1 y i−̂ μ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟2i=1n∑

and

MSW =

€

1n(m−1) yij−y i ⎛ ⎝ ⎜ ⎞

⎠ ⎟2

j=1m∑i=1

n∑

=

€

1n si2i=1n∑

1. If N is large,

€

ˆ V (̂ μ ) = MSB/nm and depends only on the c luster means.

2. If m =

€

M (or f2 = 1), then two-stage cluster sampling reduces to one stage cluster sampling.

3. If n = N then two-stage cluster sampling becomes stratified random

sampling with N strata and m observations from each.

FIGURE 1: Distribution of class GPA’s

2.00

2.50

3.00

3.50

4.00

GPA

N=60 Mean 3.27 Standard Deviation 0.55

FIGURE 2: Distribution of sample means from simple random sampling

3.00 3.20 3.40

5

10

15

20

SRS 20 Mean 3.27 Standard Deviation 0.105

FIGURE 3: GPA’S by gender

2.0

2.5

3.0

3.5

4.0

1 2

Gender

G

P

A

Count Mean Standard Deviation F 45 3.39 0.436 M 15 2.92 0.698

FIGURE 4: Distribution of sample means from stratified random sampling

3.00 3.20 3.40

5

10

15

20

StRS 20 Mean 3.27 Standard Deviation 0.091 n=20 n1=15 n2=5

FIGURE 5: Distribution of sample means from cluster sampling; ordered clusters

2.5 3.0 3.5

5

10

15

20

25

MeanOrd Mean 3.30 Standard Deviation 0.232 (n=4, m=5)

FIGURE 6: Distribution of sample means from cluster sampling; random clusters

2.5 3.0 3.5

10

20

30

40

MeanRan

Mean 3.30 Standard Deviation 0.089

FIGURE 7: Distribution of sample means from cluster sampling; systematic clusters

2.50 2.75 3.00 3.25

10

20

30

40

Mean

Mean 3.27 Standard Deviation 0.055

Summary Chart

Mean Standard Deviation

Population 3.27 0.550 Simple random 3.27 0.105 Stratified (gender) 3.27 0.091 Cluster-ordered 3.30 0.232 Cluster-random 3.30 0.089 Cluster-systematic 3.27 0.055

Names to Explore

• P. C. Mahalanobis– "The I.S.I. has taken the lead in the original development of the

technique of sample surveys, the most potent fact finding process available to the administration". R. A. Fisher

• Walter Shewhart• Jerzy Neyman• William Cochran• Edwards Deming• Warren Mitofsky

References• Groves, Robert; Dillman, Donald; Eltinge, John;

and Little, Roderick, editors. 2002. Survey Nonresponse. New York: Wiley.

• Lohr, S. 1999. Sampling: Design and Analysis, Pacific Grove, CA: Brooks Cole.

• Scheaffer, Richard; Mendenhall, William; and Ott, R. Lyman. 1996. Elementary Survey Sampling, 5th ed. Belmont,CA: Duxbury Press.

EXTRAS!

Equal cluster sizes-comparison to SRS

€

MSB =SSB

n −1=

m

n −1(y i − y c )2

i =1

n

∑

€

MSW =SSW

n(m−1)=

1

n(m−1)(yij − y i )

2

j =1

m

∑i =1

n

∑

€

ˆ V (y c ) =N − n

N

⎛

⎝ ⎜

⎞

⎠ ⎟1

nmMSB

For a random sample of size mn,

€

ˆ V (y ) =Nm− nm

Nm

⎛

⎝ ⎜

⎞

⎠ ⎟s2

nm=

N − n

N

⎛

⎝ ⎜

⎞

⎠ ⎟s2

nm

€

ˆ s 2 ≈1

m(m−1)MSW + MSB[ ]

€

RE∧

(y c/y ) = ˆ s 2

MSB

SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida [email protected] .

Documents

srs slide

gender slide

simple random sampling

summary chart slide

stage cluster sampling

distribution of sample

elementary survey sampling

random sample of size