Top Banner
Comparison of Design-Based and Model-Based Techniques for Selecting Spatially Balanced Samples of Environmental Resources. Don L. Stevens, Jr. Department of Statistics Oregon State University
64

Don L. Stevens, Jr. Department of Statistics Oregon State University

Jan 10, 2016

Download

Documents

Toshi

Comparison of Design-Based and Model-Based Techniques for Selecting Spatially Balanced Samples of Environmental Resource s. Don L. Stevens, Jr. Department of Statistics Oregon State University. Designs and Models for. Aquatic Resource Surveys. DAMARS. R82-9096-01. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Don L. Stevens, Jr. Department of Statistics Oregon State University

Comparison of Design-Based and Model-Based Techniques for Selecting

Spatially Balanced Samples of Environmental Resources.

Don L. Stevens, Jr.

Department of Statistics

Oregon State University

Page 2: Don L. Stevens, Jr. Department of Statistics Oregon State University

The research described in this presentation has been funded by the U.S. Environmental Protection

Agency through the STAR Cooperative Agreement CR82-9096-01 Program on Designs and Models

for Aquatic Resource Surveys at Oregon State University. It has not been subjected to the Agency's

review and therefore does not necessarily reflect the views of the Agency, and no official endorsement

should be inferred

R82-9096-01

Page 3: Don L. Stevens, Jr. Department of Statistics Oregon State University

Preview

• Two conceptual frameworks to support inference from sample properties to population characteristics: model-based & design-based

• Both encompass inference and sample selection methodologies

• Both sets of selection methodologies have techniques to incorporate prior information and knowledge

Page 4: Don L. Stevens, Jr. Department of Statistics Oregon State University

Preview

• Conjecture: With same prior information & knowledge, probabilistic samples can be near-optimal judged by model-based criteria

• Conjecture: Probabilistic samples can be more robust than optimal model-based samples

Page 5: Don L. Stevens, Jr. Department of Statistics Oregon State University

Preview

• Claim: With same prior information & knowledge, probabilistic samples can be near-optimal judged by model-based criteria

• Claim: Probabilistic samples can be more robust than optimal model-based samples

• There’s a catch: what is “optimal”?

Page 6: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Context

• Environmental monitoring and assessment application, particularly aquatics

• Response is a condition measure– Water quality– Chemical contamination– Biological quantity, e.g., IBI– Physical habitat metric– Salmon population levels

Page 7: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Context

• Populations distributed over space

• Sample sites will be visited more than once, possible over a period of many years

• Overall sample may be split into panels

Page 8: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Context

• Environmental populations have spatial structure– Things close together tend to be influenced by

same set of factors– Things close together tend to share similar

substrates

Page 9: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Context

• Environmental populations have spatial structure– Things close together tend to be influenced by

same set of factors– Things close together tend to share similar

substrates

• But the spatial structure is almost certainly not stationary

Page 10: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Context

• Structure may be patchy rather than smoothly changing– Localized management practices– Localized contamination– Localized development – Natural discontinuities

• Slope• Substrate: soil, geology• Watercourse

Page 11: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Context

• Most populations of interest have existing samples in place– Frequently convenience samples– Preservation of historical continuity important

• Most large populations (e.g., covering a substantial portion of a state) will have accessibility issues

Page 12: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Context

• Some portions of the population will require a higher intensity sample– Scientific, economic, or political interest

• Sample allocation may need to be modified– Emerging issues– Problems solved

Page 13: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Strategy

• Compare some techniques for optimal model-based design to Generalized Random Tessellation Stratified (GRTS) design

• Various scenarios:– Variety of optimality criteria– Existing sample points– Variable interest variable spatial density– Inaccessible regions a priori & a posteriori

Page 14: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimality Criteria

• Statisticians think “best” means minimum variance, so optimum design is one that gives the minimum variance estimator, but…..

Page 15: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimality Criteria

• Statisticians think “best” means minimum variance, so optimum design is one that gives the minimum variance estimator, but…..

• Not always straightforward to decide on appropriate variance!

Page 16: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimality Criteria

• For example, suppose we need a value for

– Usual approach in spatial statistics is to use kriging to “predict” a mean, so we need the prediction variance

– But we need a variogram to krige, which usually has to be estimated assuming some model, so we should include variogram parameter uncertainty

– But what about variogram model itself?

( ) / | |B

z s ds B

Page 17: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimality Criteria

• For example, suppose we need a value for

– Usual design-based is to minimize sampling variance over repeated sample selections

– But even the design-based variance is dependent on spatial structure

– So we could adopt super-population model, and minimize expected variance

• Which puts us back in the spatial stats arena

( ) / | |B

z s ds B

Page 18: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimality Criterion

• Minimal assumptions: Points that are close together contain redundant

information, so we want a design that gives maximal dispersion

A point pattern that is “regular” in the stochastic point process sense gives maximal dispersion

Thus, we need to look at regularity criteria to select optimality criterion

Page 19: Don L. Stevens, Jr. Department of Statistics Oregon State University

Study Strategy• Compare using several optimality criteria

– Regularity of point process• K-function• Von Groenigen & Stein MMSD• Fractal dimension• Mean square deviation of distance to side, vertex,

boundary of Voronoi polygon

– Variance of estimated population mean• Over replicated sample selection• Over replicated population realizations• With models for non-stationary mean structure

Page 20: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimal Design

• Number of recent papers have used spatial simulated annealing to locate optimal sampling points – Begin with a random set of points

– Cycle through points,perturbing one at a time

– At each step, calculate an optimality criterion

– If better than old optimum, keep

– If worse, accept with some probability that decreases with the number of cycles

Page 21: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimal Design

• Van Groenigen & Stein MMSD– Minimized the Mean Shortest Distance:

S a set of sample points, x a point in target domain D, let d(x,S) be the distance from x to the nearest point in S. Then

– Note that for C(s) the Voronoi polygon of s

D

MSD = ( , ) / | |d x S dx D

( )

MSD = ( , ) / | |i i

is S C s

d x s dx D

Page 22: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimal Design

• Ripley’s K function:

K(r) : average number of additional sites within radius r of a site divided by the intensity of the process

Page 23: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimal Design

• Di Zio, Fontanella & Ippoliti used a measure related to the fractal dimension:

Let D be the slope of the best fitting line produced when log(K(r)) is regressed against log(r)

As sites become more evenly dispersed, D should approach 2, so 2-D is a measure of irregularity.

Page 24: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimal Design

• Proposed criterion: Let B(C(s)) be the boundary of the Voronoi polygon of s. Define

• SVB is approximated by the MSD distance from a sample point to Sides, Vertices, and Boundaries relative to a nominal value (Side is an edge that separates two sample points; a boundary is an edge determined by the domain)

2

( ( ))

( , ) /i i

i NOMs S B C s

SVB d b s d db SVB

Page 25: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Page 26: Don L. Stevens, Jr. Department of Statistics Oregon State University

Point Pattern Comparison

• Examine point patterns with 50 points in the unit square

• Show Voronoi polygons for each point pattern

Page 27: Don L. Stevens, Jr. Department of Statistics Oregon State University

Optimal Design

• MMSD seemed very slow to compute

• SVB seems to be comparable to MMSD, but much quicker to compute

Page 28: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Uniform Random Sample

Page 29: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

GRTS Sample

Page 30: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Optimal Point Pattern - Fractal Dimension

Page 31: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Optimal Point Pattern - SVB

Page 32: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

MMSD Point Pattern - n = 50

Page 33: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

MMSD Point Pattern - n = 20

Page 34: Don L. Stevens, Jr. Department of Statistics Oregon State University

-2.5 -2.0 -1.5 -1.0 -0.5 0.0

01

23

45

67

log(r)

log(

K(r

))

Slope = 1.81

Optimal Point Pattern - Fractal Dimension

Page 35: Don L. Stevens, Jr. Department of Statistics Oregon State University

-4 -3 -2 -1

01

23

45

67

log(r)

log(

K(r

))

GRTS Point Pattern - Fractal Dimension

Slope = 1.69

Page 36: Don L. Stevens, Jr. Department of Statistics Oregon State University

-4 -3 -2 -1 0

01

23

45

67

log(r)

log(

K(r

))

Uniform Random - Fractal Dimension

Slope = 1.52

Page 37: Don L. Stevens, Jr. Department of Statistics Oregon State University

Existing Points

• SSA can optimize placement of new sample points given some existing points

• Can do something similar with GRTS:– Determine limits on grid resolution &

placement such that existing points are all in distinct cells

– Do GRTS design conditional on those limits, and “select” cells with existing points

Page 38: Don L. Stevens, Jr. Department of Statistics Oregon State University

Existing Points

• Illustrate with 25 point design– Unrestricted– 5 points fixed, 20 unrestricted

Page 39: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Optimal Fractal Dimension- n = 25

Page 40: Don L. Stevens, Jr. Department of Statistics Oregon State University

-2.0 -1.5 -1.0 -0.5 0.0

01

23

45

log(r)

log(

K(r

))

Optimal n = 25 - Fractal Dimension

Slope = 1.93

Page 41: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Optimal n = 20 + 5 Fixed - Fractal Dimension

Page 42: Don L. Stevens, Jr. Department of Statistics Oregon State University

-2.0 -1.5 -1.0 -0.5

01

23

45

log(r)

log(

K(r

))

Optimal n = 20 + 5 Fixed - Fractal Dimension

Slope = 1.92

Page 43: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SVB Optimal Pattern n = 25 SVB = 0.211

Page 44: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SVB Optimal Pattern n = 20 +5 SVB = 0.217

Page 45: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

GRTS Sample n =25 SVB = 0.38 log(K) slope = 1.81

Page 46: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

GRTS Sample n =25 SVB = 0.65 log(K) slope = 1.72

Page 47: Don L. Stevens, Jr. Department of Statistics Oregon State University

Simulation Study

• Model-based approach: vary the surface, not the sample

• Created a patchy surface by “mixing” 3 smooth surfaces: a plane, a normal density, and a surface with several bumps, plus random noise

Page 48: Don L. Stevens, Jr. Department of Statistics Oregon State University

X

YZ

X + Y

Page 49: Don L. Stevens, Jr. Department of Statistics Oregon State University

X

YZ

N(0.25,0.075) * N(0.25,0.1)

Page 50: Don L. Stevens, Jr. Department of Statistics Oregon State University

X

Y

xy[,1]^2 + xy[,2]^2 -xy[,1] -xy[,2] +.1* cos(20*xy[,1]) + 0.1*sin(15*xy[,2]) +1.5

Page 51: Don L. Stevens, Jr. Department of Statistics Oregon State University

Simulation Study

• Patches were random tessellations of the unit square, generated as Voronoi polygons of 10 random points

Page 52: Don L. Stevens, Jr. Department of Statistics Oregon State University

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

xg

yg

Page 53: Don L. Stevens, Jr. Department of Statistics Oregon State University
Page 54: Don L. Stevens, Jr. Department of Statistics Oregon State University

Simulation Study

• Generated 1000 replicates of the random surface

• Sample each replicate with the Uniform Random, Fractal, SVB, and GRTS design points

• Calculate mean for each replicate, & variance of estimated mean over all replicates

Page 55: Don L. Stevens, Jr. Department of Statistics Oregon State University

Uniform

rslt[, 1]

Fre

quen

cy

0.8 1.0 1.2

010

025

0

GRTS

rslt[, 2]

Fre

quen

cy

0.8 1.0 1.2

015

030

0

LogK

rslt[, 3]

Fre

quen

cy

0.8 1.0 1.2

020

0

SVB

rslt[, 4]

Fre

quen

cy

0.8 1.0 1.2

015

030

0

Page 56: Don L. Stevens, Jr. Department of Statistics Oregon State University

Simulation Study

  Uniform GRTS LogK

Mean 1.079 1.091 1.088 1.083

Variance 0.0057 0.0039 0.0030 0.0035

 

SVB

Page 57: Don L. Stevens, Jr. Department of Statistics Oregon State University
Page 58: Don L. Stevens, Jr. Department of Statistics Oregon State University

Mean Structure Model

• Express the response as

where s) is mean structure, and z(s) is a random field (hopefully stationary)

• Following a suggestion by Cressie, we’ll use a model based on applying a median polish to determine mean structure

( ) ( ) ( )y s s z s

Page 59: Don L. Stevens, Jr. Department of Statistics Oregon State University

Mean Structure Model

• Median polish is analogous to ANOVA, in that the mean is expressed as sum of overall, row, & column effects

• Effects are estimated in an iterative procedure:– Extract row-wise medians

– Extract column-wise medians

– Add sum of median of row medians & median of column medians to overall effect

– Iterate several times.

Page 60: Don L. Stevens, Jr. Department of Statistics Oregon State University

Mean Structure Model

• Median polish will extract some kinds of structure, but doesn’t handle a patch-like response

• Try CART, with x,y coordinates as “classifying” variables

Page 61: Don L. Stevens, Jr. Department of Statistics Oregon State University

Example Data Set

• ODFW Coho Salmon spawners– Basic response is density (fish/km) of adult

fish at a site– Pooled data set over five years– Normalized each year by total number of fish

counted that year– Response is then proportion of total run at the

site

Page 62: Don L. Stevens, Jr. Department of Statistics Oregon State University

0 20 40 60 80 100 120 140

01

00

20

03

00

xp

yp

z = 0-10 < log(z) < -9-9 < log(z) < -8-8 < log(z) < -7-7 < log(z) < -6-6 < log(z) < -5-5 < log(z) < -4-4 < log(z) < -3-3 < log(z) < -2

Page 63: Don L. Stevens, Jr. Department of Statistics Oregon State University

Example

• To fit into median polish/CART framework, we binned x,y coordinates and straightened the coastline

Page 64: Don L. Stevens, Jr. Department of Statistics Oregon State University

0 50 150

0100

200

300

x

y

0 40 1000

10

20

30

Dist from coast

Dis

t fr

om

Calif

orn

ia