8/10/2019 CH1 Introductionrw
1/7
INTRODUCTION
This
book
presents an introduction to the set of tools that has become
known commonly as
geostatistics.
Many statistical tools are useful
in developing qualitative insights into
a
wide variety of natural phe-
nomena; many others can be used to develop quantitative answers
to specific questions. Unfortunately most classical statistical meth-
ods make no use
of
the spatial information in earth science data sets.
Geostatistics offers
a
way of describing the spatial continuity that is an
essential feature of many natural phenomena and provides adaptat ions
of classical regression techniques to take advantage of this continuity.
The presentation of geostatistics in this book is not heavily mathe-
matical. Few theoretical derivations or formal proofs are given; instead
references are provided to more rigorous treatments of the material.
The reader should be able to recall basic calculus and be comfortable
with finding the minimum of a function by using the first derivative
and representing a spatial average as an integral. Matrix notation is
used in some of the later chapters since it offers
a
compact way of writ-
ing systems of simultaneous equations. The reader should also have
some familiarity with the statistical concepts presented in Chapters 2
and 3
Though we have avoided mathematical formalism the presentation
is not simplistic. The book is built around
a
series of case studies on
a distressingly real data set. As we soon shall see analysis of earth
science data can be both frustrating and fraught with difficulty. We
intend to trudge through the muddy spots stumble into the pitfalls
and wander into some of the dead ends. Anyone who has already
8/10/2019 CH1 Introductionrw
2/7
4 A n Introduction to Applied Geostatistics
tackled
a
geostat is t ical s tu dy will sy m pa thiz e with us in ou r man y
dilemmas.
O ur case s tudies d if fe rent f rom those t h a t prac t i t ioners encoun ter
in only one aspect ; throughout our s tudy we wil l have access to the
cor rec t answers. T h e d a ta se t w i th which we perform th e s tudies is in
fact
a sub set of
a
much larger , completely known d a ta set . T hi s gives
us a yardstick by which we can me asure the success
of
several different
approaches.
A
warn ing is appropr i a t e here. The solutions we propose in the
various case s tudies a re par t icula r to th e d a ta se t we use. I t i s not o ur
intent io n t o propose these as general recipes. T h e hal lmark of a good
geostat is t ical s tu dy is customizat ion of th e approa ch t o th e problem
at
hand. All we in tend in these s tudies is t o cul t iva te an und ers tand ing of
wh at various geos tat is tica l tools can d o an d, more im por tant ly , wh at
the i r l imi ta t ions a re .
The Walker
Lake
Data Set
T h e focus
of
this book is a d a ta se t tha t was der ived f rom a digi ta l
e levat ion model f rom th e western United Sta tes; the W alker Lak e are a
in Nevada.
We will not be using the original elevation values as variables in
ou r case s tudies . T h e variables we d o use , however, a re re la ted t o th e
elevat ion and, as we shal l see , their maps exhibi t features which are
related to th e topog raphic fea tures in F igure
1.1.
For this reason, we
will be referring t o specific su b areas within th e W alker Lake are a by
th e geographic names given in Fig ure 1.1.
T h e original digi ta l elevation m odel contained elevat ions for abo ut
2 million points on
a
regular gr id. Th ese e levat ions have been t rans-
formed to produce a data set consist ing of three var iables measured
a t e a c h
of 78,000
points on
a
260
x 300
rectang ular gr id. T h e f irst
t w o variables a re cont inuous an d their values rang e from zero to sev-
eral thou sand s. T h e third var iable is discrete an d i ts value is e i ther
one or two. Detai ls on how t o ob tain th e digita l e levat ion model a nd
reproduce th is d a ta se t a r e g iven in A ppend ix A.
W e have tried to avoid w riting
a
book th at is too specif ic
t o one fie ld of application.
For
this reason the var iables in the
Walker Lake da ta se t a re re fe rred t o anonymously as V U a nd T. Un-
for tuna te ly , a bias toward mining applications will occasionally creep
8/10/2019 CH1 Introductionrw
3/7
IlawUlane
Introduction
5
NEVADA
l
Figure 1 1 A location map of the Walker Lake area in Nevada The small rectangle
on the outline of Nevada shows the relative location of the area within the state
The larger rectangle shows the major topographic features within the area
in; this reflects both the historical roots
of
geostatistics
as
well as the
experience
of
the authors. The methods discussed here however are
quite generally applicable to any dat a set in which the values are spa-
tially continuous.
The continuous variables V and U ould be thicknesses of a geo-
logic horizon
or
the concentration of some pollutant; they could be soil
strength measurements
or
permeabilities; they could be rainfall mea-
surements
or
the diameters
of
trees. The discrete variable
T
can be
viewed as
a
number that assigns each point to one of two possible cate-
gories; it could record some important color difference
or
two different
8/10/2019 CH1 Introductionrw
4/7
6
An
In troduct ion to Appl ied Geostat is t ics
species; i t could s ep ar at e different rock typ es
or
different soil litholo-
gies; i t cou ld record som e chemical difference such
as
th e presence or
abs enc e of
a
par t icula r e lement .
For
th e sak e of convenience an d consistency we will refer t o
V
a n d
U as co ncent ra t ions
of
some mater ia l and will g ive bo th of th em uni t s
of pa r ts per mill ion pp m ). We will t reat T as an indica tor of two
types that will be referred
to
as type a nd type 2 Finally, we will
assign u ni ts of meters t o our gr id even tho ug h i ts or iginal dimensions
a re much l a rger than 260 x 300 m2
T h e Walker Lake da ta se t cons is ts of V
U
a nd
T
m e a s u r e m e n ts a t
eac h of
78 ,000
points on
a x 1
m2 gr id. From this extremely dense
d a t a s e t
a
subse t
of 470
sam ple points has been chosen t o represent
a
t yp ica l sample d a t a se t . To dis t inguish be tween these two da ta se ts ,
t h e comple te se t of al l information
for
th e 78,000 points is called th e
exhaustive d a ta se t , while th e smaller subse t of
470
points
is
cal led th e
sample d a t a s e t.
Goals
of the
Case
Studies
Using th e
470
samples in th e samp le d a t a se t we will address th e fol-
lowing problems:
1.
T he desc rip tion of th e imp or tan t f ea tu r e s of th e da ta .
2 T h e e s t i m a ti on of an average value over a l a rge a rea .
3
T he e s tima t ion
of
an unknow n value at a par t icula r loca t ion.
4
T h e e s t i m a ti on
of
a n av erage value over sm all areas.
5 T h e use of t he ava i lable sampl ing t o check th e per formance of a n
est im ation methodology.
6.
T h e use of samp le values of one var iable to im prove th e est ima-
t ion of another var iable .
7 T h e e s t ima t ion of
a
distr ibution of values over
a
l a rge a rea .
8.
T h e e s t ima t ion o f
a
di s t r ibu t ion
of
values over small area s.
9.
T he e s t ima t ion of a distr ibution of block averages.
10. T h e assessment of th e uncer ta inty of ou r var ious est ima tes.
8/10/2019 CH1 Introductionrw
5/7
Introduction
7
The
first
question despite being largely qualitative is very impor-
tant. Organization and presentation is a vital step in communicating
the essential features
of a
large data set. In the first part of this book
we will look a t descriptive tools. Univariate and bivariate description
are covered in Chapters 2 and 3 In Chapter 4 w e will look at various
ways of describing the spatial features of a data set. We will then take
all of the descriptive tools from these first chapters and apply them
to the Walker Lake dat a sets. The exhaustive dat a set is analyzed in
Chapter
5
and the sample
data
set is examined in Chapters 6 and 7.
The remaining questions all deal with estimation which is the topic
of the second part of the book. Using the information in the sample
data set we will estimate various unknown quantities and see how well
we have done by using the exhaustive data set to check our estimates.
O u r approach to estimation as discussed in Chapter 8 is first to con-
sider what i t is we are trying to estimate and then to adopt a method
that is suited t o tha t particular problem. Three important consider-
ations form the framework for our presentation of estimation in this
book. First do we want an estimate over a
large area
or
estimates for
specific local areas? Second are we interested only in some average
value
or
in the complete distribution
of
values? Third do we want our
estimates to refer to a volume of the same size as our sample data or
do we prefer to have our estimates refer to
a
different volume?
In Chapter 9 we will discuss why models are necessary and intro-
duce the probabilistic models common to geostatistics. In Chapter
10 we will present two methods for estimating an average value over
a large area. We then turn to the problem of local estimation.
In
Chapter 11 we will look at some nongeostatistical methods that are
commonly used for local estimation. This is followed in Chapter 12
by a
presentation of the geostatistical method known as ordinary point
kriging
The adaptation
of
point estimation methods to handle the
problem of local block estimates is discussed in Chapter 13.
Following the discussion in Chapter 14 of the important issue
of
the search strategy we will look a t cross validation in Chapter 15
and show how this procedure may be used to improve an estimation
methodology. In Chapter 16 we will address the practical problem of
modeling variograms an issue that arises in geostatistical approaches
to estimation.
In Chapter 17 we will look at how to use related information to
improve estimation. This is a complication that commonly arises in
8/10/2019 CH1 Introductionrw
6/7
8
A n Introduction to Applied Geostatistics
pract ice when one var iable is undersampled. W hen we ana lyze th e
s a m p l e da t a s e t in C ha p t e r
6,
we will see tha t th e measurements of the
second variable,
U
a re miss ing a t m any sample locat ions. T h e me thod
of cokr iging presented in Ch ap ter 17 allows us to inco rpo rate th e mo re
a b u n d a n t V sample values in the estimation of U t ak ing advan tage
of th e re la tionship be tween th e two t o improve our e s t imat ion of t h e
more sparsely samp led U variable.
T h e es t imat ion of a complete distr ibution is typically of more use
in prac t ice than i s the es t imat ion of a single average value. In m an y
applicat ions one is interested not in an overal l average value but in
th e average value abo ve som e specified th reshold. Th is th reshold is
often some ext reme va lue and th e es t imat ion of t he d i s t r ibu t ion above
ex tre m e values cal ls for dif ferent techniques th an th e est im atio n of t h e
overall mean. In Cha pter 18 we wil l explore the est imation of local
and global dis t r ibut ions. We wil l present the indicator approach one
of several adv anc ed techniques developed specif ically for t h e est im atio n
of local distr ibutions.
A further complication arises if we want our es t im ates t o refer to
a
volume different from t h e volume of ou r samples. T hi s is commonly
referred to
as
t h e support problem and frequently occurs in practical
appl icat ions.
For
exam ple, in
a
model of
a
petroleum reservoir o ne does
no t need e stim ated permeabilit ies for core-sized volumes bu t ra th er for
much la rger blocks. In a mine, one will be mining and processing vol-
umes much la rger th an th e volume of t he samples tha t a r e typica lly
available for
a
feasibi l i ty s tudy. In Chapter 19 we wi l l show tha t the
d i s t r ibu t ion of poin t values is not th e same
as
th e d is t r ibu t ion of av-
erage block values an d present tw o me tho ds for accou ntin g for this
discrepancy.
I n C ha p t e r 20 we will look a t t h e assessment
of
uncer ta in ty , a n i ssue
th at is typical ly muddied by a
lack of a c lear objec t ive meaning for th e
var ious uncer ta inty measures that probabi l is t ic models can provide.
W e will look a t several com mo n problems, discuss how o ur p robabilist ic
model might provide
a
relevant answer , and use th e exhaus t ive d a ta
set t o check th e performan ce of various m ethod s.
The f ina l chapter provides a recap of the tools discussed in the
boo k, recal ling th eir s t reng ths and their l imita t ions. Since this book
a t t e m p t s a n i n tr oduc ti on
to
bas ic methods , many advanced methods
have not been touched, however, the types of problems that require
mo re adv anced m etho ds a re discussed and fur th er references a re given.
8/10/2019 CH1 Introductionrw
7/7
Introduction
9
Before we begin exploring some basic geostatisticd tools we would
like to emphasize that the case studies used throughout the book are
presented for their educational value and not necessarily to provide a
definitive case study of the Walker Lake data set. It is
our
hope that
this book will enable a reader to explore new and creative combinations
of the many available tools and to improve on the rather simple studies
we have presented here.