Top Banner
1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon What sampling design should be used for estimating the number of chinook redds on a river network*? estimation of status – number of spring-chinook redds in Middle Fork Salmon River one year Measurement design – we are not really thinking about the measurement design, we assume we have some way to identify and count redds once you get to a location.
34

1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

Mar 27, 2015

Download

Documents

Maria Ortega
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

1

Sampling designs for spawning data on

The Middle Fork Salmon River

*a lot like the Middle Fork Salmon R.

• What sampling design should be used for estimating the number of chinook redds on a river network*?

– estimation of status – number of spring-chinook redds in Middle Fork Salmon River one year

– Measurement design – we are not really thinking about the measurement design, we assume we have some way to identify and count redds once you get to a location.

Page 2: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

2

The Middle Fork Salmon River

Page 3: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

3

1995 1996 1997 1998 2001 2002

05

00

10

00

15

00

IndexOther

Redd data – “the Truth”

• IDFG Dataset (Russ Thurow) counted the number of redds in the Middle Fork Salmon River via helicopter

• All spawning reaches were censused each year• sampling was done by helicopter and where necessary by foot• Six years of data 1995-1998, 2001, and 2002

• These data can be considered the truth

year 1995 1996 1997 1998 2001 2002

Total redds 20 83 424 661 1789 1730

Page 4: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

4

Objectives

• Compare several designs to see if one estimates the number of redds (and only redds) the best– unbiased designs (estimators)

– “best” determined by

• standard error of estimator

• coverage probability (how many times 95% confidence interval actually contains the number of redds)

• cost

– Keep things fair by sampling the same total length of stream, the index covers 976 segments or 195.2 km. of stream.

• Does not imply equal cost

• Although some standard errors can be calculated analytically the coverage needs to be addressed via simulation.

Page 5: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

5

Methods

• Compare sampling strategies using IDFG data as the truth.• Sampling strategies include sampling design and estimator

.

.

.

sampledesign

Estimatorfor the total

And confidenceinterval

Page 6: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

6

Fre

qu

en

cy

1400 1600 1800 2000 2200

05

01

00

15

02

00

25

0

Methods

• Use simulation by resampling the population over and over

.

.

.

Page 7: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

7

Cost & Crew-trips

• Each segment gets an access pt.

• Travel to access sites based on whether

– airplane– Auto

• Travel from access sites to sampling reaches is the maximum distance from access site to furthest sampling reach in each “direction” along a tributary

• Cost = Fn(km by foot)4 round tripsrequired

Page 8: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

8

distances in 5km intervals.

Many areas require over 20 km hike

Maximum distance is 33 km.

Page 9: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

9

The sampling designs

• Index – sample the index reaches• Simple random sampling – using the unbiased estimator• Systematic sampling – sort tributaries in random order,

systematically sample along resulting line. • Stratify by Index – Sample independently within and outside

the index regions.• Adaptive cluster sampling – Choose segments with a simple

random sample. If sampled sites have redds sample adjacent segments.

• Spatially balanced design – Based on EMAP design, though selecting segments within primary sampling units rather than points (not yet implemented)

Page 10: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

10

Index sampling

• When the sample size is smaller than the overall size of the index region a simple random sample of the segments within the index is assumed.

• Two possibilities to estimate the number of redds from the index sample:1. Assume there are no redds outside of the index – estimates will be

too small.

2. Assume that the average number of redds per segment outside the index is the same inside and simply inflate the index estimator – estimates will be too large.

Page 11: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

11

1995 1996 1997 1998 2001 2002

01

00

02

00

03

00

04

00

05

00

0

* **

*

* *

estimates

* true value

Bias of Inflating Estimator from Index Sample

Redds

Page 12: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

12

Systematic sampling

• Order the tributaries in random order along a line• Choose sampling interval, k, so that final sample size is

approximately n

• Select a random number, r, between 1 and k

• Sample reaches r, r+k, r+2k, …, r+(n-1)k

• Systematic sampling is cluster sampling where clusters are made up of units far apart in space and one cluster is sampled

k

r r+k r+2k r+4kr+3k

Page 13: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

13

Stratify by Index

• Stratify by index and oversample index reaches• Simple random sample in each stratum

• Allocation:– Equal allocation: Usually does not perform well

– Proportional allocation: Does not oversample index sites so will probably not have good precision

– Optimal allocation: need to know the standard deviation

year 1995 1996 1997 1998 2001 2002

proportion in index

0.76 0.54 0.48 0.48 0.42 0.46

Page 14: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

14

Adaptive cluster sampling

• Original sample is simple random sample

• If sampled site meets criteria also sample sites in neighborhood– Criteria: presence of redds

– Neighborhood: segments directly upstream and downstream

• Continue until sites do not meet criteria– Both legs of confluences 1

3

2

5

4

6

in original sample

2

Meets criteria

Meets criteria4

includeneighbor

6

anddo not meetcriteria

1 3

Final sample includes: 2 1 3 4 6

Page 15: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

15

Design 20 83 424 661 1789 1730

SRS 39.0 18.6 9.9 8.2 7.3 7.1

Cluster (1km) 47.7 22.7 15.1 12.7 12.5 12.1

SYS 24.9 17.1 9.2 8.0 5.0 6.1

STRS (optimal) 27.2 15.3 8.6 7.2 6.7 6.2

ADAPT 39.3 18.8 10.5 7.6 8.7 8.4

Results: Normalized standard error of estimatorsRun size

SRS 86.6 93.0 94.5 95.3 93.3 94.3

Cluster (1km) 89.0 91.2 92.5 92.6 92.9 94.3

SYS 96.4 95.2 97.0 95.0 99.0 97.3

STRS (optimal) 92.0 95.0 94.3 95.5 92.9 93.5

ADAPT 87.6 94.8 94.7 94.9 94.7 93.9

Coverage Probability (.95)

Page 16: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

16

Costs

SRS

SRS-

1km

SYS

STR

S

Adaptive sampling‘95 ‘96 ‘97 ‘98 ‘01 ‘02

Ada

ptiv

e

40

05

00

60

07

00

Page 17: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

17

Precision per cost (10% sampling fraction)

big is good: high precision per km traveled

0 500 1000 1500

0.0

02

0.0

04

0.0

06

0.0

08

0.0

10

0.0

12

0.0

14

SRSSRS-1kmSYSSTRADP

SRSSRS-1kmSYSSTRADP

run size

Precision per cost

Page 18: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

18

Conclusions

• Stratifying by index results in the most precise estimates except in the large runs where systematic sampling seems to work best.

• The index sites should be oversampled in the stratified design. Proportional allocation (based on the size of the strata) results in poor precision.

• Although the systematic sampling strategy often is the most precise, there is not a good estimator for the variance. The estimator that assumes a simple random sample is conservative.

• Same pattern for different sampling fractions.

Page 19: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

19

Conclusions

• The cluster sampling design is not very precise but reduces costs significantly.

• Adaptive cluster sampling is not as precise as other designs.– It is optimal for rare clustered populations

– during small years the redds are not clustered enough

– during large years they are not rare enough

– only during the medium years does it compete with other designs.

• When cost and precision are analyzed together – small runs – either stratified by index or SRS-1km work best

– large runs – either systematic or stratified by index work best

Page 20: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

20

Not yet finished

• EMAP type design.• successive difference variance estimator for the systematic

sampling• Adaptive sampling with same initial sample size (cost function

does not penalize this much)• Cost function

– including road travel

– crew trips/day units

Page 21: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

21

Points vs. Lines

• Pick points -- points are picked along stream continuum and the measurement unit is constructed around the point

• advantages:– different size measurement

units are easily implemented

• disadvantages:– difficulty with overlapping units– inadvertent variable probability

design because of confluences and headwaters

– Analysis may be complicated

• Pick Segments – Universe is segmented before sampling and segments are picked from population of segments

• advantages:– simple to implement– simple estimators

• disadvantages:– Difficult frame construction

before sampling – Cannot accommodate varying

lengths of sampling unit

Page 22: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

22

Adaptive Cluster Sampling

• Use the draw-by-draw probability estimator:

– Let wi be the average number of redds in the network of which segment i belongs, then

– with variance

Thompson 1992

Page 23: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

23

COSTS

• Our costs are based on the number of kilometers traveled by foot.

• Each segment in the MF is assigned to an access point (this is not optimized in some rare instances the assigned access point is not the closest) and the distance along the stream from that access point is calculated

• There are two types of access points air fields and trailheads. For this exercise they both have the same price.

• Because we are tallying the number of km. along the streams, this cost function also models other types of sampling including via helicopter and raft.

Page 24: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

24

Six years

1995 1996

Page 25: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

25

Six years

1997 1998

Page 26: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

26

Six years

2001 2002

Page 27: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

27

Access to MFSR

• Roadless area• Airplane access

possible

Page 28: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

28

Page 29: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

29

air vs. car access

Page 30: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

30

Index sample

• Not sure how to build estimates for total number of redds in Middle fork.– expand current estimator

(assume same density outside of index)

– use current estimate (assume 0 redds outside of index)

year 1995 1996 1997 1998 2001 2002

Number counted in Index

19 62 290 448 1178 1199

Total number of redds

20 83 424 661 1789 1730

Page 31: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

31

Page 32: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

32

Stratify by Index

• Oversample index sites where most redds are located• Simple random sample in each stratum

• Equal allocation:

• Proportional allocation:

year 1995 1996 1997 1998 2001 2002

5.33 12.68 36.61 47.34 121.43 106.98

coverage 90.4 94.6 94.2 94.8 92.9 93.4

year 1995 1996 1997 1998 2001 2002

7.77 15.26 41.08 52.37 124.90 115.56

coverage 88.0 94.7 95.0 94.9 94.4 93.6

Page 33: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

33

Stratify by index

• Optimal allocation

• Using

year 1995 1996 1997 1998 2001 2002

proportion in index

0.76 0.54 0.48 0.48 0.42 0.46

n index 746 530 475 464 407 445

n other 230 446 501 512 569 531

year 1995 1996 1997 1998 2001 2002

5.49 12.76 36.60 47.26 120.50 106.58

coverage 92.0 95.0 94.3 95.5 92.9 93.5

Page 34: 1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating.

34

Stratify by index

• Using

year 1995 1996 1997 1998 2001 2002

n index 746 530 475 464 407 445

n other 230 446 501 512 569 531