Top Banner
3.2. Systematic sampling plan Jiahua,Chen Week3b
24

3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Apr 13, 2019

Download

Documents

LeKhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

3.2. Systematic sampling plan

Jiahua,Chen Week3b

Page 2: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Suppose the population is made of N = nk units.

The exact factorization assumption is to make our presentationsimpler.

If the population size cannot be perfectly factorized, we will usesome ad hoc remedies.

Jiahua,Chen Week3b

Page 3: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Suppose further that the sampling units have been lined-up.

This happens for the name list of a class, or name list of employersin a large company.

If customers of a store of a particular day is to be sampled, theorder of their entering the store is regarded as a line-up.

Another perfect example is offered by the card game.

Jiahua,Chen Week3b

Page 4: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

The systematic sampling plan would random select the first unitfrom the set of units {1, 2, . . . , k}.

After the first unit is decided, we sample every kth unit from thepopulation.

If we do not have a perfect N = nk factorization, some ad hocsteps will be used.

Jiahua,Chen Week3b

Page 5: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

LetAj = {j , j + k, j + 2k , . . . , j + (n − 1)k}

for j = 1, 2, . . . , k be k subsets of the populationP = {1, 2, . . . ,N}.

The systematic sampling plan clearly places equal probability onthem.

That is, it selects one of Aj , j = 1, 2, . . . , k equally likely.

It is helpful to take note of the definition of probability samplingplan in abstract mathematical way here.

Let the response variable be called y as usual.

Jiahua,Chen Week3b

Page 6: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Suppose a systematic sampling plan has been implemented.

Let Aj with some j is chosen (sampled) and the response valuesyi : i ∈ Aj are taken.

The resulting sample mean is then

ysys = n−1∑i∈Aj

yi = n−1n∑

i=1

yj+(i−1)k .

Jiahua,Chen Week3b

Page 7: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Let the following matrix represent a population with sampling unitslabeled as 1, 2, . . . , 54.

1 7 13 19 25 31 37 43 49

2 8 14 20 26 32 38 44 50

3 9 15 21 27 33 39 45 51

4 10 16 22 28 34 40 46 52

5 11 17 23 29 35 41 47 53

6 12 18 24 30 36 42 48 54

It is seen N = 54 = 6× 9.

Jiahua,Chen Week3b

Page 8: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

A systematic sampling plan to get a sample of size n = 9 is torandom select one of the rows in the following matrix.

1 7 13 19 25 31 37 43 49

2 8 14 20 26 32 38 44 50

3 9 15 21 27 33 39 45 51

4 10 16 22 28 34 40 46 52

5 11 17 23 29 35 41 47 53

6 12 18 24 30 36 42 48 54

If you play cards with 6 friends (including yourself), and youdistribute 54 cards evenly, then each player has obtained a sampleof 9 cards according to a systematic sampling plan.

Jiahua,Chen Week3b

Page 9: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

The abstract notation for the sample

Aj = {j , j + k, j + 2k , . . . , j + (n − 1)k}

when n = 9, k = 6 and N = 54 becomes

A2 = {2, 8, 14, 20, 26, 32, 38, 44, 50}.

Jiahua,Chen Week3b

Page 10: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Suppose the response values in the populations are given by

0.823 0.269 0.392 0.282 0.075 0.452 0.619 0.588 0.514

0.532 0.733 0.367 0.346 0.800 0.903 0.630 0.111 0.804

0.674 0.628 0.501 0.190 0.016 0.656 0.554 0.138 0.620

0.009 0.046 0.478 0.228 0.401 0.630 0.392 0.753 0.440

0.992 0.270 0.147 0.140 0.045 0.324 0.664 0.205 0.386

0.503 0.687 0.050 0.427 0.077 0.924 0.992 0.993 0.074

When A2 is the outcome of the systematic sampling plan, theobserved y -values will be

{yi} = {0.532, 0.733, 0.367, 0.346, 0.800, 0.903, 0.630, 0.111, 0.804}.

The sample mean and variance are given by

ysys = 0.581; s2sys = 0.0689.

Jiahua,Chen Week3b

Page 11: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

It is pretty easy to see that six possible sample means are given by

0.446 0.581 0.442 0.375 0.353 0.525

The average of these six sample means is given by 0.4535926which is exactly the same as Y .

In statistical terminology, what property does ysys has?

Jiahua,Chen Week3b

Page 12: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

The variance of the sample mean is given by

16{0.4462 + 0.5812 + 0.4422 + 0.3752 + 0.3532 + 0.5252}

−0.45359262 = 0.006303941.

Under SRSWOR, the sample mean would have variance

(1− 9/54) ∗ S2/9 = 0.007468412.

The difference between Var(ysys) and Var(ysrswor ) is not due toround-off error.

Jiahua,Chen Week3b

Page 13: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

We can show that

(1) ysys is unbiased for Y .

(2) Var(ysys) 6= Var(ysrswor ) (even if both have sample size n).

Jiahua,Chen Week3b

Page 14: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Let us denote

yj = n−1n∑

i=1

yj+(i−1)k

for the sample mean when Aj is chosen.

Averaging over all possible ysys values, we have

E (ysys) = k−1k∑

j=1

yj = Y .

That is, it is an unbiased estimator.

Jiahua,Chen Week3b

Page 15: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

The variance of ysys values is the average squared distance:

Var(ysys) = k−1k∑

i=1

(yi − Y )2.

It cannot be expressed as a simple function of population varianceS2.

The relationship will be given later.

Jiahua,Chen Week3b

Page 16: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Comparison with SRSWOR

.Why do we introduce systematic sampling plan?

It is apparently easier to implement.

Everyone who have ever played cards know it.

Jiahua,Chen Week3b

Page 17: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Is systematic sampling plan superior statistically?

First, the systematic sampling is perfect when we can make eachyj ≈ Y .

In this case, there might be variations between y -values withineach sample, but there is nearly no variation between the samplemeans yj .

Thus, ysys has much lower variance compared to y underSRSWOR.

The pity is: we probably never know when this actually happens.

Jiahua,Chen Week3b

Page 18: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Is systematic sampling plan superior statistically?

First, the systematic sampling is perfect when we can make eachyj ≈ Y .

In this case, there might be variations between y -values withineach sample, but there is nearly no variation between the samplemeans yj .

Thus, ysys has much lower variance compared to y underSRSWOR.

The pity is: we probably never know when this actually happens.

Jiahua,Chen Week3b

Page 19: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Is systematic sampling plan superior statistically again?

When there is a linear trend in response values when thepopulation is lined up, ysys may have lower variance compared to y .

When there is a cyclic trend in response values when the populationis lined up, use systematic sampling with a big dose of caution:it is bad if your cycle is the same as the cycle of the population.

When the population is lined up in random order, two samplingplans are practically the same.

Again, anyone who have ever played cards should agree.

Jiahua,Chen Week3b

Page 20: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Is systematic sampling plan superior statistically again?

When there is a linear trend in response values when thepopulation is lined up, ysys may have lower variance compared to y .

When there is a cyclic trend in response values when the populationis lined up, use systematic sampling with a big dose of caution:it is bad if your cycle is the same as the cycle of the population.

When the population is lined up in random order, two samplingplans are practically the same.

Again, anyone who have ever played cards should agree.

Jiahua,Chen Week3b

Page 21: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Decomposition of the population variation:The average of s2j = (n − 1)−1

∑i∈Aj

(yi − yj)2is within sample

variance:

Let S2 be the population variance, and define

S2wsys = k−1

k∑j=1

s2j .

We have

Var(ysys) = (1− 1

N)S2 − (1− 1

n)S2

wsys .

This leads to variance comparison

Var(ySRSWOR)− Var(ysys) = (1− 1

n)(S2

wsys − S2).

Jiahua,Chen Week3b

Page 22: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Other properties of the systematic samplingplan

The inclusion probability of each unit is

πi = n/N = 1/k

when the perfect factorization N = nk is valid.

The joint inclusion probability of units i , j , is

(1) πi ,j = 1/k when i − j is a multiple of k ;

(2) πi ,j = 0 when i − j is not a multiple of k .

Jiahua,Chen Week3b

Page 23: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Other properties of the systematic samplingplan

Because πi ,j = 0 when i − j is not a multiple of n, there is notstatistical solid (unbiased) estimator for Var(ysys).

If there is a reasonable ground to believe that the population unitsare in “random order”, we may go over the data analysis byregarding the sample as obtained via SRSWOR.

Jiahua,Chen Week3b

Page 24: 3.2. Systematic sampling plan - SLATE @ Stat UBC · A systematic sampling plan to get a sample of size n = 9 is to random select one of the rows in the following matrix. 1 7 13 19

Concluding remarks

1. Systematic sampling plan is a very practical plan.

2. The theory for this plan is more complex.

3. In applications, the data analysis based on “SRSWOR” is“wrong” but “reasonable”.

Jiahua,Chen Week3b