Top Banner
33

SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida [email protected] .

Mar 26, 2015

Download

Documents

James Underwood
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .
Page 3: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

The Sampling Process

• Population

• Sampling Unit

• Sampling Frame

• Sampling Design

• Sample

Page 4: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

…and Process Failures

• Sampling errors

• Non-sampling errors*

Page 5: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

PSI Example

CEMENT PLANT

Avenues 1 2 3 4 5 6 7 8 9 10

1 121 118 124 123 116 118 120 118 114 122 2 116 118 118 113 117 116 117 112 112 115 3 114 107 109 106 112 108 112 110 111 111 4 105 104 103 101 103 105 104 106 109 107

Streets 5 100 100 101 96 98 96 100 100 105 100 6 97 95 96 94 96 95 96 97 96 97 7 92 90 91 89 93 94 93 92 92 90 8 86 81 85 87 85 85 86 87 83 84 9 80 78 80 79 77 81 81 79 84 81 10 76 77 74 77 75 74 80 75 77 74

Display 1: Grid of houses showing PSI measurements

Page 6: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Cluster_Column

Cluster_Row

Random_Sample

Stratify_Column

Stratify_Row

80 90 100 110 120

Sample_Mean

Sampling Design ComparisonsBox Plot

Display 2: Samples of size 10 by various designs

Page 7: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

SIMPLE RANDOM SAMPLINGThe observations y1, y2, . . .yn are to be sampled from a population with mean , standard deviation , and of size N in such a way that every possible sample of size n has an equal chance of being selected. If the sample mean is denoted by, it follows that

and

E( y ) = μ

(V y ) =

σ

2

n

N − n

N − 1

Page 8: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

For the sample variance s2, it can be shown that

and, thus, that an unbiased estimator of the variance of the sample mean is given by

ˆ V (y ) =s2

nN −n

N

⎛ ⎝ ⎜

⎞ ⎠ ⎟=

s2

n1−f)( )

E(s

2

) =

N

N − 1

σ

2

= S

2

Page 9: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

STRATIFIED RANDOM SAMPLING

Stratified sampling designs:

• Convenience (administration efficiency)• Estimates desired for each of the strata • Reduced variation (statistical efficiency)

Page 10: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Let

y i denote the sample mean for the simple random sample selected from stratum i, ni the sample size for stratum i, iμ the population mean for stratum i, and Ni the size of stratum i.

y st = 1N

Niy ii=1

L∑ = Wiy ii=1

L∑

V (y st∧

)

= 1N2 Ni

2 Ni −niNi

⎝ ⎜ ⎜

⎠ ⎟ ⎟i=1

L∑

si2

ni

⎜ ⎜ ⎜

⎟ ⎟ ⎟

Page 11: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Allocation

From a total sample size of n, how many observations should be allocated to stratum i? The sample size and allocation across strata may be chosen to minimize variance for fixed sample size

ni

= nN

i

Njσ

j∑

Proportional Allocat ion

ni

= nN

i

Nj∑

nN

i

N

Page 12: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Comparison to SRS

The following comparisons apply for situations in which the Ni are all relatively large. (Here Wi = Ni/N.)

Vran - Vprop =

and

Vprop - Vopt =

1 − f

n

W

i

∑ ( μ

i

− μ )

2

1

n

W

i

(S

i

∑ − S )

2

where S = W

i

∑ S

i

Page 13: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

CLUSTER SAMPLING

Commonly used when:• Frame for elements is relatively difficult to obtain• Frame for clusters of elements of elements is

relatively easy to obtain

Examples: • Classrooms versus students• City blocks versus residents• Cartons of items stored in a warehouse versus

individual items

Page 14: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Examples of Clusters

• Polls (home or workplace)

• Crop yield surveys (trees, corn, sugar cane)

• Animal studies (traps, colonies)

• Systematic sample

Page 15: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Single-stage cluster sample:

Select a simple random sample of clusters and then measure each element within the sampled clusters.

N = number of clusters in the population n = number of clusters selected in a simple random sample mi = number of elements in cluster i, i = 1, . . . , N

m =1n mii=1n∑ = average cluster size for the sample

yi = total of all observations in the ith cluster

y =yii=1

n∑

mii=1n∑

=y tm

Page 16: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

where

ˆ V (y ) = N − nN

⎝ ⎜

⎠ ⎟

1

nm 2sr2

sr2 =

( yi

− y mi)2

i=1

n∑

n − 1

Page 17: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

01234

5678

0 2 4 6 8 10 12 14Residences_m

Renters_y = 0.488Residences_m

Rented Residences Scatter Plot

Page 18: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Two-stage cluster sampling with equal cluster sizes

ˆ μ =N

M

⎝ ⎜

⎠ ⎟

M iy ii =1

n

n=

1

M

M iy ii =1

n

n

Page 19: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Two-stage Cluster Sampling with Equal Cluster Sizes

ˆ V (̂ μ )=(1−f1)MSBnm+(1−f2)1N ⎛ ⎝ ⎜ ⎞

⎠ ⎟MSWm

where f1 = n/N, f2 = m/M,

MSB =

mn−1 y i−̂ μ ⎛ ⎝ ⎜ ⎞ ⎠ ⎟2i=1n∑

and

MSW =

1n(m−1) yij−y i ⎛ ⎝ ⎜ ⎞

⎠ ⎟2

j=1m∑i=1

n∑

=

1n si2i=1n∑

Page 20: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

1. If N is large,

ˆ V (̂ μ ) = MSB/nm and depends only on the c luster means.

2. If m =

M (or f2 = 1), then two-stage cluster sampling reduces to one stage cluster sampling.

3. If n = N then two-stage cluster sampling becomes stratified random

sampling with N strata and m observations from each.

Page 21: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

FIGURE 1: Distribution of class GPA’s

2.00

2.50

3.00

3.50

4.00

GPA

N=60 Mean 3.27 Standard Deviation 0.55

Page 22: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

FIGURE 2: Distribution of sample means from simple random sampling

3.00 3.20 3.40

5

10

15

20

SRS 20 Mean 3.27 Standard Deviation 0.105

Page 23: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

FIGURE 3: GPA’S by gender

2.0

2.5

3.0

3.5

4.0

1 2

Gender

G

P

A

Count Mean Standard Deviation F 45 3.39 0.436 M 15 2.92 0.698

Page 24: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

FIGURE 4: Distribution of sample means from stratified random sampling

3.00 3.20 3.40

5

10

15

20

StRS 20 Mean 3.27 Standard Deviation 0.091 n=20 n1=15 n2=5

Page 25: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

FIGURE 5: Distribution of sample means from cluster sampling; ordered clusters

2.5 3.0 3.5

5

10

15

20

25

MeanOrd Mean 3.30 Standard Deviation 0.232 (n=4, m=5)

Page 26: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

FIGURE 6: Distribution of sample means from cluster sampling; random clusters

2.5 3.0 3.5

10

20

30

40

MeanRan

Mean 3.30 Standard Deviation 0.089

Page 27: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

FIGURE 7: Distribution of sample means from cluster sampling; systematic clusters

2.50 2.75 3.00 3.25

10

20

30

40

Mean

Mean 3.27 Standard Deviation 0.055

Page 28: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Summary Chart

Mean Standard Deviation

Population 3.27 0.550 Simple random 3.27 0.105 Stratified (gender) 3.27 0.091 Cluster-ordered 3.30 0.232 Cluster-random 3.30 0.089 Cluster-systematic 3.27 0.055

Page 29: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Names to Explore

• P. C. Mahalanobis– "The I.S.I. has taken the lead in the original development of the

technique of sample surveys, the most potent fact finding process available to the administration". R. A. Fisher

• Walter Shewhart• Jerzy Neyman• William Cochran• Edwards Deming• Warren Mitofsky

Page 30: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

References• Groves, Robert; Dillman, Donald; Eltinge, John;

and Little, Roderick, editors. 2002. Survey Nonresponse. New York: Wiley.

• Lohr, S. 1999. Sampling: Design and Analysis, Pacific Grove, CA: Brooks Cole.

• Scheaffer, Richard; Mendenhall, William; and Ott, R. Lyman. 1996. Elementary Survey Sampling, 5th ed. Belmont,CA: Duxbury Press.

Page 31: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

EXTRAS!

Page 32: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

Equal cluster sizes-comparison to SRS

MSB =SSB

n −1=

m

n −1(y i − y c )2

i =1

n

MSW =SSW

n(m−1)=

1

n(m−1)(yij − y i )

2

j =1

m

∑i =1

n

ˆ V (y c ) =N − n

N

⎝ ⎜

⎠ ⎟1

nmMSB

Page 33: SAMPLING METHODS Stratification and Clustering Richard L. Scheaffer University of Florida RLS907@bellsouth.net .

For a random sample of size mn,

ˆ V (y ) =Nm− nm

Nm

⎝ ⎜

⎠ ⎟s2

nm=

N − n

N

⎝ ⎜

⎠ ⎟s2

nm

ˆ s 2 ≈1

m(m−1)MSW + MSB[ ]

RE∧

(y c/y ) = ˆ s 2

MSB