CH1 Introductionrw

8/10/2019 CH1 Introductionrw

1/7

INTRODUCTION

This

book

presents an introduction to the set of tools that has become

known commonly as

geostatistics.

Many statistical tools are useful

in developing qualitative insights into

a

wide variety of natural phe-

nomena; many others can be used to develop quantitative answers

to specific questions. Unfortunately most classical statistical meth-

ods make no use

of

the spatial information in earth science data sets.

Geostatistics offers

a

way of describing the spatial continuity that is an

essential feature of many natural phenomena and provides adaptat ions

of classical regression techniques to take advantage of this continuity.

The presentation of geostatistics in this book is not heavily mathe-

matical. Few theoretical derivations or formal proofs are given; instead

references are provided to more rigorous treatments of the material.

The reader should be able to recall basic calculus and be comfortable

with finding the minimum of a function by using the first derivative

and representing a spatial average as an integral. Matrix notation is

used in some of the later chapters since it offers

a

compact way of writ-

ing systems of simultaneous equations. The reader should also have

some familiarity with the statistical concepts presented in Chapters 2

and 3

Though we have avoided mathematical formalism the presentation

is not simplistic. The book is built around

a

series of case studies on

a distressingly real data set. As we soon shall see analysis of earth

science data can be both frustrating and fraught with difficulty. We

intend to trudge through the muddy spots stumble into the pitfalls

and wander into some of the dead ends. Anyone who has already


2/7

4 A n Introduction to Applied Geostatistics

tackled

a

geostat is t ical s tu dy will sy m pa thiz e with us in ou r man y

dilemmas.

O ur case s tudies d if fe rent f rom those t h a t prac t i t ioners encoun ter

in only one aspect ; throughout our s tudy we wil l have access to the

cor rec t answers. T h e d a ta se t w i th which we perform th e s tudies is in

fact

a sub set of

a

much larger , completely known d a ta set . T hi s gives

us a yardstick by which we can me asure the success

of

several different

approaches.

A

warn ing is appropr i a t e here. The solutions we propose in the

various case s tudies a re par t icula r to th e d a ta se t we use. I t i s not o ur

intent io n t o propose these as general recipes. T h e hal lmark of a good

geostat is t ical s tu dy is customizat ion of th e approa ch t o th e problem

at

hand. All we in tend in these s tudies is t o cul t iva te an und ers tand ing of

wh at various geos tat is tica l tools can d o an d, more im por tant ly , wh at

the i r l imi ta t ions a re .

The Walker

Lake

Data Set

T h e focus

of

this book is a d a ta se t tha t was der ived f rom a digi ta l

e levat ion model f rom th e western United Sta tes; the W alker Lak e are a

in Nevada.

We will not be using the original elevation values as variables in

ou r case s tudies . T h e variables we d o use , however, a re re la ted t o th e

elevat ion and, as we shal l see , their maps exhibi t features which are

related to th e topog raphic fea tures in F igure

1.1.

For this reason, we

will be referring t o specific su b areas within th e W alker Lake are a by

th e geographic names given in Fig ure 1.1.

T h e original digi ta l elevation m odel contained elevat ions for abo ut

2 million points on

a

regular gr id. Th ese e levat ions have been t rans-

formed to produce a data set consist ing of three var iables measured

a t e a c h

of 78,000

points on

a

260

x 300

rectang ular gr id. T h e f irst

t w o variables a re cont inuous an d their values rang e from zero to sev-

eral thou sand s. T h e third var iable is discrete an d i ts value is e i ther

one or two. Detai ls on how t o ob tain th e digita l e levat ion model a nd

reproduce th is d a ta se t a r e g iven in A ppend ix A.

W e have tried to avoid w riting

a

book th at is too specif ic

t o one fie ld of application.

For

this reason the var iables in the

Walker Lake da ta se t a re re fe rred t o anonymously as V U a nd T. Un-

for tuna te ly , a bias toward mining applications will occasionally creep


3/7

IlawUlane

Introduction

5

NEVADA

l

Figure 1 1 A location map of the Walker Lake area in Nevada The small rectangle

on the outline of Nevada shows the relative location of the area within the state

The larger rectangle shows the major topographic features within the area

in; this reflects both the historical roots

of

geostatistics

as

well as the

experience

of

the authors. The methods discussed here however are

quite generally applicable to any dat a set in which the values are spa-

tially continuous.

The continuous variables V and U ould be thicknesses of a geo-

logic horizon

or

the concentration of some pollutant; they could be soil

strength measurements

or

permeabilities; they could be rainfall mea-

surements

or

the diameters

of

trees. The discrete variable

T

can be

viewed as

a

number that assigns each point to one of two possible cate-

gories; it could record some important color difference

or

two different


4/7

6

An

In troduct ion to Appl ied Geostat is t ics

species; i t could s ep ar at e different rock typ es

or

different soil litholo-

gies; i t cou ld record som e chemical difference such

as

th e presence or

abs enc e of

a

par t icula r e lement .

For

th e sak e of convenience an d consistency we will refer t o

V

a n d

U as co ncent ra t ions

of

some mater ia l and will g ive bo th of th em uni t s

of pa r ts per mill ion pp m ). We will t reat T as an indica tor of two

types that will be referred

to

as type a nd type 2 Finally, we will

assign u ni ts of meters t o our gr id even tho ug h i ts or iginal dimensions

a re much l a rger than 260 x 300 m2

T h e Walker Lake da ta se t cons is ts of V

U

a nd

T

m e a s u r e m e n ts a t

eac h of

78 ,000

points on

a x 1

m2 gr id. From this extremely dense

d a t a s e t

a

subse t

of 470

sam ple points has been chosen t o represent

a

t yp ica l sample d a t a se t . To dis t inguish be tween these two da ta se ts ,

t h e comple te se t of al l information

for

th e 78,000 points is called th e

exhaustive d a ta se t , while th e smaller subse t of

470

points

is

cal led th e

sample d a t a s e t.

Goals

of the

Case

Studies

Using th e

470

samples in th e samp le d a t a se t we will address th e fol-

lowing problems:

1.

T he desc rip tion of th e imp or tan t f ea tu r e s of th e da ta .

2 T h e e s t i m a ti on of an average value over a l a rge a rea .

3

T he e s tima t ion

of

an unknow n value at a par t icula r loca t ion.

4

T h e e s t i m a ti on

of

a n av erage value over sm all areas.

5 T h e use of t he ava i lable sampl ing t o check th e per formance of a n

est im ation methodology.

6.

T h e use of samp le values of one var iable to im prove th e est ima-

t ion of another var iable .

7 T h e e s t ima t ion of

a

distr ibution of values over

a

l a rge a rea .

8.

T h e e s t ima t ion o f

a

di s t r ibu t ion

of

values over small area s.

9.

T he e s t ima t ion of a distr ibution of block averages.

10. T h e assessment of th e uncer ta inty of ou r var ious est ima tes.


5/7

Introduction

7

The

first

question despite being largely qualitative is very impor-

tant. Organization and presentation is a vital step in communicating

the essential features

of a

large data set. In the first part of this book

we will look a t descriptive tools. Univariate and bivariate description

are covered in Chapters 2 and 3 In Chapter 4 w e will look at various

ways of describing the spatial features of a data set. We will then take

all of the descriptive tools from these first chapters and apply them

to the Walker Lake dat a sets. The exhaustive dat a set is analyzed in

Chapter

5

and the sample

data

set is examined in Chapters 6 and 7.

The remaining questions all deal with estimation which is the topic

of the second part of the book. Using the information in the sample

data set we will estimate various unknown quantities and see how well

we have done by using the exhaustive data set to check our estimates.

O u r approach to estimation as discussed in Chapter 8 is first to con-

sider what i t is we are trying to estimate and then to adopt a method

that is suited t o tha t particular problem. Three important consider-

ations form the framework for our presentation of estimation in this

book. First do we want an estimate over a

large area

or

estimates for

specific local areas? Second are we interested only in some average

value

or

in the complete distribution

of

values? Third do we want our

estimates to refer to a volume of the same size as our sample data or

do we prefer to have our estimates refer to

a

different volume?

In Chapter 9 we will discuss why models are necessary and intro-

duce the probabilistic models common to geostatistics. In Chapter

10 we will present two methods for estimating an average value over

a large area. We then turn to the problem of local estimation.

In

Chapter 11 we will look at some nongeostatistical methods that are

commonly used for local estimation. This is followed in Chapter 12

by a

presentation of the geostatistical method known as ordinary point

kriging

The adaptation

of

point estimation methods to handle the

problem of local block estimates is discussed in Chapter 13.

Following the discussion in Chapter 14 of the important issue

of

the search strategy we will look a t cross validation in Chapter 15

and show how this procedure may be used to improve an estimation

methodology. In Chapter 16 we will address the practical problem of

modeling variograms an issue that arises in geostatistical approaches

to estimation.

In Chapter 17 we will look at how to use related information to

improve estimation. This is a complication that commonly arises in


6/7

8

A n Introduction to Applied Geostatistics

pract ice when one var iable is undersampled. W hen we ana lyze th e

s a m p l e da t a s e t in C ha p t e r

6,

we will see tha t th e measurements of the

second variable,

U

a re miss ing a t m any sample locat ions. T h e me thod

of cokr iging presented in Ch ap ter 17 allows us to inco rpo rate th e mo re

a b u n d a n t V sample values in the estimation of U t ak ing advan tage

of th e re la tionship be tween th e two t o improve our e s t imat ion of t h e

more sparsely samp led U variable.

T h e es t imat ion of a complete distr ibution is typically of more use

in prac t ice than i s the es t imat ion of a single average value. In m an y

applicat ions one is interested not in an overal l average value but in

th e average value abo ve som e specified th reshold. Th is th reshold is

often some ext reme va lue and th e es t imat ion of t he d i s t r ibu t ion above

ex tre m e values cal ls for dif ferent techniques th an th e est im atio n of t h e

overall mean. In Cha pter 18 we wil l explore the est imation of local

and global dis t r ibut ions. We wil l present the indicator approach one

of several adv anc ed techniques developed specif ically for t h e est im atio n

of local distr ibutions.

A further complication arises if we want our es t im ates t o refer to

a

volume different from t h e volume of ou r samples. T hi s is commonly

referred to

as

t h e support problem and frequently occurs in practical

appl icat ions.

For

exam ple, in

a

model of

a

petroleum reservoir o ne does

no t need e stim ated permeabilit ies for core-sized volumes bu t ra th er for

much la rger blocks. In a mine, one will be mining and processing vol-

umes much la rger th an th e volume of t he samples tha t a r e typica lly

available for

a

feasibi l i ty s tudy. In Chapter 19 we wi l l show tha t the

d i s t r ibu t ion of poin t values is not th e same

as

th e d is t r ibu t ion of av-

erage block values an d present tw o me tho ds for accou ntin g for this

discrepancy.

I n C ha p t e r 20 we will look a t t h e assessment

of

uncer ta in ty , a n i ssue

th at is typical ly muddied by a

lack of a c lear objec t ive meaning for th e

var ious uncer ta inty measures that probabi l is t ic models can provide.

W e will look a t several com mo n problems, discuss how o ur p robabilist ic

model might provide

a

relevant answer , and use th e exhaus t ive d a ta

set t o check th e performan ce of various m ethod s.

The f ina l chapter provides a recap of the tools discussed in the

boo k, recal ling th eir s t reng ths and their l imita t ions. Since this book

a t t e m p t s a n i n tr oduc ti on

to

bas ic methods , many advanced methods

have not been touched, however, the types of problems that require

mo re adv anced m etho ds a re discussed and fur th er references a re given.


7/7

Introduction

9

Before we begin exploring some basic geostatisticd tools we would

like to emphasize that the case studies used throughout the book are

presented for their educational value and not necessarily to provide a

definitive case study of the Walker Lake data set. It is

our

hope that

this book will enable a reader to explore new and creative combinations

of the many available tools and to improve on the rather simple studies

we have presented here.

CH1 Introductionrw

Documents