A GENERAL THEORY FOR EVALUATING JOINT DATA INTERACTION WHEN COMBINING DIVERSE DATA SOURCES A DISSERTATION SUBMITTED TO THE DEPARTMENT OF GEOLOGICAL AND ENVIRONMENTAL SCIENCES AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Evgenia I. Polyakova April 2008
185
Embed
A GENERAL THEORY FOR EVALUATING JOINT DATA …pangea.stanford.edu/departments/ere/dropbox/scrf/documents/Theses/SCRF-Theses/2000...a general theory for evaluating joint data interaction
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A GENERAL THEORY FOR EVALUATING JOINT DATA
INTERACTION WHEN COMBINING DIVERSE DATA SOURCES
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF GEOLOGICAL AND
ENVIRONMENTAL SCIENCES
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Evgenia I. Polyakova
April 2008
UMI Number: 3313643
Copyright 2008 by
Polyakova, Evgenia I.
All rights reserved.
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
®
UMI UMI Microform 3313643
Copyright 2008 by ProQuest LLC.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
We should also note here, that the validness of the conditional independence assump
tion will strongly depend on the support of the unknown A. If the support of the
unknown A is larger than these of the conditioning data, such assumption of condi
tional independence then can be justified.
Conveniently, another approach to get the joint data probability P(Di,...,Dn) in
equation (2.5) is to assume full data independence. Assuming that the data events
Di, ...,Dn are jointly independent leads to:
n
P(D1,...,Dn) = l[P(Di) (2.7) t = i
Hence, under both the assumption of conditional independence giving A = a and
the assumption of data independence, the sought-after fully conditional probability
is written as: P(A = a\D1,...,Dn) fTP(A = a\Di)
P{A = a) ~ l \ P(A = a) [ '
that is the updating ratio associated to all data is equal to the product of the ele
mentary updating ratios.
2.1.2 Heteroscedasticity
All the probabilities and conditional probabilities presented thus far are data values
and unknown value dependent: P(A = a\Di = di, i = l , . . . ,n) written concisely
as P(A\Di, i = l , . . . ,n) . Critically, the spread (e.g. variance) of such probabilities
which relates to uncertainty is data values dependent, a situation we will refer to as
"heteroscedastic". Conversely, independence of that spread from the data values and
from the unknown value is referred to as homoscedasticity.
2.1. CONDITIONAL INDEPENDENCE 15
The roots of the word homoscedasticity, or invariability of the error variance, come
from regression theory. It has been treated in some detail by many statistics and
economics texts ([19], [38], [42], [72]). Most often the assumption of homoscedas
ticity is made as a matter of pure convenience as it reduces considerably the mod
eling requirements. However, as has been pointed by Downs [19], it is the study
of heteroscedasticity which "may provide the only available evidence of interacting
variables". Such interaction between data for any given unknown may change the
naive assessment made from an association of individual data ignoring their interac
tion. Just like conditional independence, the homoscedasticity assumption should be
documented and made with caution, not accepted blindly as a matter of pure conve
nience. Unfortunately, the assumption of homoscedasticity is much too often taken
for granted, heteroscedasticity being seen as an illness to cure [19].
Some examples of homoscedasticity are:
• In regression theory and traditional geostatistics, the (regression) kriging weights
are homoscedastic in that they depend only on the variogram/covariance model
and the spatial geometry of the data, but are data values independent. Crit
ically, the kriging (estimation) error variance is also data values-independent:
an assumption or a result that is contrary to what is observed in practice.
For example, consider the two identical data configurations with different data
values as shown in Figure 2.1. The two geometric configurations of Figure 2.1 are
identical: in both cases the two data D\ and D?, are located at the same distance
from the unknown A. However, in Figure 2.1 (2), the much different data values
{D\ = 1%, D2 = 15%) would most likely carry a greater error potential in the
estimation of the unknown A than in Figure 2.1 (1) where the unknown A is
surrounded by two consistently small values (Z?i = 1%, D2 = 1.5%).
• The in-built assumption of homoscedasticity in regression has led to much effort
16 CHAPTER 2. A REVIEW OF EXISTING MODELS
|D2=1.5°o)
(1) (2)
Figure 2.1: Spatial geometry of the data for two different data values combinations. (1) unknown A is surrounded by two points with small data values; (2) unknown A is surrounded by the same two points but with very different data values which potentially can lead to greater error.
to justify it [19], most notably by calling on the properties of the Gaussian ran
dom function. Indeed, a characteristic property of such multivariate Gaussian
distribution is that all conditional distributions are Gaussian fully character
ized by the conditional mean which identifies the linear regression estimate, i.e.
kriging, and the conditional variance which is homoscedastic and identifies the
non conditional error variance or kriging variance:
E{[A-A*]2\Di = di, i = l,...,n} = E{[A-A*}2}=a2K (2.9)
If one accepts without question the multivariate Gaussian distribution model,
then the homoscedastic assumption need not call for any further discussion.
• The homoscedasticity of data errors.
Assume the availability of n data events Di,...,Dn that inform the unknown
A with the corresponding error terms Ri, ..., Rn. These n data then can be
modeled as:
Di = fi(A) + IU(A) (2.10)
2.1. CONDITIONAL INDEPENDENCE 17
The measurement Di is seen as a physical deterministic function fa of the un
known A plus a random error or deviation Ri [48]. One can argue that the
model Di = fi(A) + Ri(A) is absolutely general as long as the distribution of
errors Ri is accepted as dependent on the variable A. Hence for A = a, the
data remains random Di = fi(a) + Ri(a) with the actual datum value d, corre
sponding to a particular realization r, (unknown) of the error random variable:
di = fi(a) + r-j. Then:
P(D'i = di\A = a) = P(Ri = n\A = a) \/i
and:
P(Di = di; i = l,...,n\A = a) = P(Ri=ri; i = 1, ...,n\A = a) (2.11)
The joint data likelihood calls for the equally difficult-to-get joint likelihood
of the n error RV's. Therefore several simplifying hypotheses are made, often
without further justification. The errors Ri are assumed:
1. conditionally independent given A = a,
2. with (homoscedastic) distribution independent of A
Under these two hypothesis, the joint data likelihood (2.11) becomes:
P(Di = di] i = 1, ...,n\A — a) =
JL " (2.12)
H P(IU = n\A = a) = [J P(Ri = n) i = i t= i
Lastly, a third hypothesis of Gaussian error distributions is commonly made.
We argue that errors are often directly related to the unknown A: change in the
unknown value should be reflected also on the distribution of the error term in
equation (2.10). In geostatistics, one particular form of such heteroscedasticity
18 CHAPTER 2. A REVIEW OF EXISTING MODELS
is the commonly observed "proportional effect" [49] which refers to an increase
in the spatial variance in areas with greater local mean. In such cases, Var(i?j)
is directly affected by the specific unknown value a [50].
2.1.3 Bayesian networks
Bayesian networks are often used to make a set of variables and their dependencies
visually explicit. One example of such network is the bi-directional Bayesian network
[63] which is used to represent a joint multivariate probability distribution. For exam
ple, consider the tri-variate distribution of variables A, D\, and D2 shown in Figure
2.2. The graph 2.2 depicts all possible joint combinations of the variables with the
dependencies between these variables represented by the bi-directional arrows. Tradi
tionally these dependencies are modeled by covariance-related measures of similarity.
The Bayesian graph of Figure 2.2 considers not only the data dependence between
two data (nodes D\ and D2) but between the two data taken jointly (node D1D2).
As seen in this figure, the Bayesian nets are necessarily data-values dependent requir
ing that dependencies be remodeled for each new data value combination {a ,d[, d'2]
different from {a,di,d2}.
To obtain the fully conditioned probability P(A = a\Di = dx,D2 = d2) one would
need to consider all the dependencies (bidirectional arrows) between the unknown
A and the data events Dx, D2 and, most critically the joint data event DXD2. For
example, the joint marginal probability P(A, Di,D2) is derived as:
P{A,DX,D2) = P(A)P(D1\A)P(D2\A,D1) (2.13)
If one assumes conditional independence of the data D% and D2 given the third
variable A, the equation (2.13) simplifies into:
P{A,D^D2) = P(A)P(D1\A)P(D2\A) (2.14)
2.1. CONDITIONAL INDEPENDENCE 19
Figure 2.2: Graphical representation of joint dependencies between variables A, Di, and £>2-
This simplification is shown in Figure 2.3 with the resulting net requiring less mod
eling efforts than that of Figure 2.2: most arrows starting from the joint data event
D1D2 are not anymore shown (not needed).
In climate studies, such bi-directional relationship is called a "feedback". Positive
feedbacks work to enhance the effect of original forcing. Negative feedbacks decrease
or remove the effect of the original forcing. For example, the ice-albedo feedback [7]
is the mechanism in which warming of temperatures (Di) leads to a reduction of ice
and snow coverage (D2), decreasing albedo (i.e. the reflection coefficient of Earth
surface) and resulting in further snow and ice retreat, more absorption of heat and
warming of air. Thus, the temperature (D\) impacts the ice/snow cover (D2). In
return, the ice/snow cover (D2) influences the temperature (JDI). Based on this polar
amplification concept, high latitudes are the areas where global warming is expected
to be more pronounced.
20 CHAPTER 2. A REVIEW OF EXISTING MODELS
[ D1 D2 J
Figure 2.3: Graphical representation under conditional independence between variables Di, and D^ given A.
An example of simplified bi-directional graph is shown in Figure 2.4 (1). In this
graph, the variables B and C interact affecting each other. However, at times the
relationship between the variables can take a simpler form where only one variable B
influences the other variable C; in Bayesian network such form of dependence is rep
resented by an uni-directional arrow such as in Figure 2.4 (2). For example, change
of incoming radiation (B) may result in change of ocean circulation (C) via change
of its thermal structure. However, ocean circulation has no impact on incoming ra
diation. Hence the relationship between radiative forcing and ocean circulation may
be considered as an uni-directional relationship.
As another example, consider the joint probability P(A — a,B = b,C = c) using the
three different joint representations of uni-directional Bayesian nets shown in Figure
2.5. In this Figure the leftmost graph (1) represents the situation in which data event
2.1. CONDITIONAL INDEPENDENCE 21
( 1 ) ( 2 ) Figure 2.4: Graphical representations of bi-directional (1) and uni-directional (2) Bayesian nets.
B is independent of both A and C and data event C is dependent on A. The middle
graph (2) represents the uni-directional dependence of data event C on data events
B and A. Finally, the graph (3) of Figure 2.5 is more complex since the data event
B also influences A.
The joint probability P(A = a,B = b,C = c) can be written for each of the three
uni-directional graphs:
(1). P(A = a,B = b,C = c) = P{C = c\A = a)P(A = a)P{B = b)
(2). P(A = a,B = b,C = c) = P(C = c\A = a,B = b)P(A = a)P(B = b)
Figure 2.5: Graphical interpretation of the joint probability P(A, B, C) based on a different sets of relationships between three variables A, B, and C.
simplify the computational cost associated with Bayesian nets [8], [10], and [76].
This reduces considerably the effort to simulate the required dependencies. One
such simplification is shown in Figure 2.6 where the two data events D\ and D2
are assumed to be conditionally independent relative to the third variable A.
At the same time, the variable A is assumed independent of the variable B. In
this Figure, the node A is referred to as the parent node while nodes D1 and
D% represent its children. Assuming conditional independence then amounts to
ignoring an important link (arrow) between the two conditioning data children
D\ and Di- However, more often then not the data interact on each other. Such
link can be critical in modeling the joint probabilities.
A possible way to avoid the reliance of Bayesian nets on the assumption of
conditional independence is to use a global representation (proxy image) of the
joint distribution of all variables involved. Such global representation provides a
possible image of the joint interaction between data and unknown. In geostatis-
tics such representation of the joint distribution is called a "training image".
This concept was introduced back in 1992 when Guardiano and Srivastava [35]
CONDITIONAL INDEPENDENCE 23
r I B ) 1 A ;
0 ure 2.6: Graphical representation of conditional independence between variables and D2 given A, and data independence between variables A and B.
and Journel [47] proposed to use a training image to represent the "type of
heterogeneities that the geologists expect to be present in the actual subsurface
reservoir" [70]. Such image can be borrowed directly from a physical outcrop
or could be obtained by computer simulation of the physics that govern data
interaction and their relation with the unknown [70], [74]. For example, a train
ing image could be obtained by an unconditional realization generated from an
object-based algorithm [36]. The geologist expertise combined with massive
modern computer power allows the generation of such image. By scanning this
image, one can retrieve directly all the required conditional probabilities as ob
served proportions, without any call for conditional independence.
As an example, consider the assessment of an unsampled event A from data
events B and C where:
— A is the presence/absence of a subsurface channel sand at an unsampled
location
0
24 CHAPTER 2. A REVIEW OF EXISTING MODELS
(1) Fades map B (2) Seismic signature C
Figure 2.7: Example of dual training images depicting the interaction between two data types B and C. Left: B for sand/no sand data. Right: C for seismic data.
- B indicates the presence/absence of sand data at nearby wells locations
- C is the result of seismic survey whose analysis leads to indirect indication
about channel occurrence [70]
A binary sand/no sand training image such as that of Figure 2.7(1) would
give a concept of spatial distribution of sand (here EW channels). Computer-
based simulation of the seismic survey would provide the seismic signature of
the training image (Figure 2.7(2)). The joint availability of the two related
training images shown in Figure 2.7 allows retrieving all corresponding training
probabilities of the type P(A\B), P(A\C), P(A\B, C) and thus evaluate the
data B, C interaction given A.
Link to Markov Chain
A commonly used model to represent a discrete time (or ID) stochastic process
is that of a Markov chain [57]. The Bayesian net shown in Figure 2.6 can be
2.2. PROBABILITY COMBINATION ALGORITHMS 25
seen as a special case of such a chain. In a Markov process, any previous state
is assumed irrelevant for predicting the probability of subsequent states given
The expression (2.42) is a ratio of data log likelihoods. It is important to note that
this r2 interaction weight is data value and unknown value dependent.
The distance x of expression (2.40) then can be re-written as:
T _ / r l n M " P(£>ro|nonA£>i,-,A>-i) (2A^ {Xl) W "• P{Dn\A,Du-,Dn.{) {1Ai)
where T\ = 1.
38 CHAPTER 2. A REVIEW OF EXISTING MODELS
Generalizing then equation (2.41) to all (n — 1) weights leads to
P(Di\nonA, A , . . . , A - i ) ( P(Di\nonA)
P(A|AA,-,A-i) V P(Di\A)
with the key result:
I p(Dj = dj\A = nona^A- i = d ^ i )
Ti(di, ...dn, a) = , — j ^ -r e [-oo, +ooj,
(2.44)
l o g P ( A _ = dj\A = norm) L ' J' (2.45)
n = i
P ( A = <£ I 4 = a)
Substituting expression (2.44) into (2.43) for i = 3, ...,n, we get the Bordley-Journel
expression:
Interpretation of the tau expression
The denominator of the rrexpression (2.45) measures how datum A = di discrimi
nates the outcome A — a from non&. The numerator measures the same but in the
presence of the previous data A - i = di-i = { A = di,..., A - i = ^ t - i}- Thus the
ratio (of ratios) 7$ indicates how the discrimination power of A = di is changed by
knowledge of the previous data A _ i = di- i taken all together. Critically, this weight
is specific, as mentioned before, to the ordering of n data events A , •••, A M and is
data value and unknown value dependent.
Consider the following specific values for the tau weights:
• Tj = 1
This condition is satisfied when the two ratios in expression 2.45 are equal:
PROBABILITY COMBINATION ALGORITHMS 39
P(Di = dj | A = nona, D^ = d ^ i ) _ P(Dj = dt \ A = nona) .
When Tj = 1, the ability of the datum (or data event) Dj = d^ to discriminate
a from nona is unchanged by knowledge of the previous {% — 1) data events
dj_i = {D\ = d\,..., Z)j_i = dj_i}. Relation (2.47) entails the following equality
of log ratios:
P(Di = dj 1 A = nona, A - i = dj_i) ° g P(Di = di\A = nona)
. P(Di = di\A = a,~Dl_l=di_l) = l oe r— ; = r 8 P{A = di | A = a)
Note that the tau model with unit tau weights is less constraining than the
assumption of conditional independence. While data conditional independence
given both the unknown event A and its complement nonA leads to unit tau
weights, the reverse is not true. Unit tau weights need not imply any data
conditional independence. It suffices that r takes a constant value in equation
(2.48), any value different from one would also result in Tj = 1.
A zero tau interaction weight occurs when the numerator i P(D; = di I A — nona, Z),_i = dj_i) r . ,n . rX . , , „ , ,. log—^r-r-— -L —-——^ of expression (2.45) is equal to 0 leading
P(Di = di | A = a, A - i = dj_i) to:
P(Di = di | A = nona, A - i = di-i) = P(Di = dt\A = a, A - i = d,_i)
(2.49)
In the presence of previously used data Di_1 the datum Di is non-informative in
that it does not discriminate event a from nona. Note, however, that considering
a different data sequence might result in a r, weight different from 0. In such
case, the data Di does add valuable information about the unknown event
40 CHAPTER 2. A REVIEW OF EXISTING MODELS
A = a.
• Ti > 1
1. If X{ > xo, that is if datum Di by itself increases the distance to event
A — a occurring as compared to the prior distance x0l then the interaction
factor Tj > 1 makes that increase even greater.
2. Similarly, if Xi < xo, that is if datum Di by itself decreases the prior
distance to event A — a occurring, then the interaction factor Tj > 1
reduces that decrease.
• If Ti < 1, the previous conclusions are reversed.
In summary, Krishnan has provided a solution to the difficult task of obtaining the
exact conditional probability P(A\Di, ...,Dn). This is done through relation (2.46)
by decomposing the problem into two, each simpler, tasks:
• obtaining information content through the individually conditioned probabili
ties
P{A\Di) with i=l,...,n.
• deriving multiple-point joint data interaction tau parameters whose exact ex
pressions are known.
The tau interaction weights in addition to be dependent on specific ordering of the
n data events Di,...,Dn are data values and unknown value dependent. While such
form of dependence allows for a more comprehensive representation of the fully con
ditioned probability P(A\D1,...Dn), it is too complex to be used in practice. This
calls for approximations built from Krishnan's exact tau parameter expression. We
argue that while sequence-dependent interaction weights are important in some appli
cations, most often it is the global representation of such interaction that is desirable.
Krishnan's derivation fails to provide a measure of such global data interaction.
Moreover, the r, weights are likely to exhibit an unstable behavior versus data values.
When the information is non-discriminating, the denominator of expression (2.45)
2.2. PROBABILITY COMBINATION ALGORITHMS 41
tends toward logl = 0 leading to infinite tau weight Tj —• oo, hence creating an
inference problem.
Krishnan [50] did note that the inference of the interaction T* weights is quite difficult
and the behavior of these tau weights was not fully understood. He pointed out the
need for further analysis starting with synthetic data sets. These data sets should
not only help develop a better understanding of the tau interaction parameters, but
also lead to further theoretical developments.
Chapter 3
The nu representation
The overview presented in the previous Chapter pointed to the need to remain alert
against any simplifying but potentially crippling hypothesis when it comes to data
dependence and interaction. The need to consider data jointly rather than one or
two at a time was also brought out. This is particularly critical when dealing with
spatially distributed phenomena where patterns of similar data carry valuable infor
mation beyond that carried by each datum individually.
This chapter builds the theoretical basis of the nu expression which is a sister of the
tau model proposed by Bordley [6] and Journel [48] and further developed by Kr-
ishnan [50]. In his thesis Krishnan gave the exact expression of the tau weights and
showed them to be directly related to the data interaction associated to any specific
sequence of data. With these weights the original tau model lead to an exact analyt
ical solution to the problem of probabilistic data integration.
The major contribution of the nu expression proposed in the present thesis is the
derivation of a single, data sequence-independent, interaction parameter v0. The
derivation of this UQ parameter and its estimation rely on the original idea of Jour-
nel's paper [48] that ratios of probabilities are more stable than the probabilities
42
3.1. DERIVATION OF THE NU REPRESENTATION 43
themselves-a well proven engineering paradigm. The exact i>o expression given here
after is too complex thus it is unpractical. However, availability of such exact ex
pression leads to avenues for its approximation. In this chapter we propose two such
approximations.
3.1 Derivation of the nu representation
3.1.1 The nu expression
Consider an unknown event A informed by n data events Di,..,Dn. These data have
been evaluated for their individual information content related to unknown event A
through the elementary probabilities P(A | Di)1. The challenge is then to recombine
the prior probability P(A) and the n single-event conditional probabilities P(A | DA
into the posterior probability P(A \ Di,...,Dn) while accounting for possible inter
action among data. The nu representation provides an exact expression for such
recombination.
One of the well-proven paradigms for engineering approximation is the permanence of
ratios: rates of increments are typically more stable than the increments themselves
[48], [62]. Using this key idea, define the following distances to the unknown event
(A) prior to and after observing any single data event:
l-P(A) PI A) . A , ~ x° = —pf A\— = p / A\ ^ [0' ooj; prior distance to A occurring, with A = nonA P(A) P{A)
1 - PI A I DA P(A I A ) r i Xi — — ^ , , . ^ x— = „ , . i ^ x e O.co ; updated distance knowing datum Dit
P(A | Di) P{A | A )
Xi equals to zero if P(A \ Di) = 1, and equals to infinity if P(A \ Di) — 0
The updated distance knowing jointly the n data is then:
•'Throughout this paper the short notation P(A | Di) is used for the (a,di) values-specific expression P(A = a | Di = di)
The conditional probability is immediately retrieved as:
P*(A = 0\B = 0,C = 0) = - (3.14)
Figure 3.1 gives the scatter plots of these two estimates versus the reference true
probability (3.12). The x-axis relates to the 10,000 exact P(A = 0\B = 0,C = 0)
values and the y-axis relates to the approximations of this exact probability based on
vo = 1 model (left graph) and based on conditional independence (right graph).
The approximation of conditional independence leads to 585 illicit probabilities (greater
than 1), that is approximately 585/10000^6% of all estimated probabilities. The most
severe case leads to the estimated probability P(A — 0\B — 0, C = 0) = 14.5. Of
course, in any application any such violation of the law of probabilities would be cor
rected. In most cases, such correction would involve changing the elementary single
datum-conditioned probabilities.
The probabilities estimated under the i/0 = 1 model are always licit by definition
when working with binary data. Also, observe the high correlation coefficient (0.83)
of the vo = 1 model results with the reference. This robustness of the u0 = 1 results
is attributed to the original paradigm stating that ratios of probabilities are likely to
be more stable than the probabilities themselves.
Table 3.2 gives the summary statistics of the 10,000 reference probabilities and the
two sets of 10,000 approximations. Not only the v0 = 1 leads to licit probabilities
better correlated with the reference probabilities, but also, critically, the v§ = 1 model
reproduces the sample statistics of the reference much better than the model based on
the conditional independence assumption. That assumption tends to over-compound
the two data information hence generating a positive bias (overestimation).
3.3. APPROXIMATIONS BASED ON THE NU DERIVATION 59
Figure 3.1: The scatterplots for u0=l model (left) and conditional independence estimator (right) versus reference. The illicit probabilities are shown above the red line. Beware of the different y-axis scaling for these two panels.
60 CHAPTER 3. THE NU REPRESENTATION
reference UQ = 1
conditional independence model
mean 0.50 0.50 0.55
variance 0.056 0.038 0.150
Table 3.2: Summary statistics: means and variances of 10,000 conditional probabilities P(A\B, C) and their approximations.
reference U0 = l
mean 0.50 0.50
variance 0.0560 0.0562
Table 3.3: Summary statistics: means and variances of 10,000 conditional probabilities P(A\B, C) and their transformed estimator.
To ensure that conditional independence does hold, one might consider transforming
the eight original joint probabilities pk of Table 3.1 for each of 10,000 realizations.
Such transform amounts to tampering with actual observations (the p^ s) to fit con
venient models and is not generally recommendable.
One such transformation which ensures conditional independence given both A and
nonA is:
^trans r ' l (P3 + PA)
Pz —
trans PA —
trans Pi —
trans ^ 8 —
Pi +P2
P2JP3+P4.)
P1+P2
Pb{Pl+P%)
P5+P6
P^Pl+Pn)
P5+P6
(3.15)
This transformation leads to the two approximations given by conditional indepen
dence (given A and nonA) and by the v0 = 1 model to be identical.
Table 3.3 gives the summary statistics of the reference conditional probability P(A\B, C)
3.3. APPROXIMATIONS BASED ON THE NU DERIVATION 61
and the estimator of that reference based on transformation (3.15). The mean of the
10,000 resulting approximations is equal to the mean of the exact probability (0.5).
However, as can be seen from Figure 3.2, the resulting estimator based on that trans
form is poorly correlated with the reference true probability with only 0.41 coefficient
of correlation. Note that the correlation coefficient of the v0 = 1 model applied to
the original non-transformed data was 0.83.
Figure 3.2: The scatterplot for the estimator of fully conditional probability P(A = 0\B = 0, C = 0) based on transformed probabilities (y-axis) versus reference (x-axis). The correlation coefficient between them is 0.41.
62 CHAPTER 3. THE NU REPRESENTATION
Instead of tampering with "data" i.e. changing the elementary single-datum condi
tioned probabilities P(A\B) or P(A\C), Journel [48] suggested standardizing the two
conditional independence-based estimates P**(A = a\B, C) and P**(A = nons\B, C)
as follows. Consider the conditional probability P(A\B, C) given by:
P(B, C\A)P(A) P(B, C\A)P(A)
P(B, C, A) + P(B, C, A) P(B, C\A)P{A) + P(B, C\A)P(A)
Assuming conditional independence given A and nonA leads to:
P{B\A)P(C\A)P{A) P(A\B, C) =
P(B\A)P(C\A)P(A) + P(B\A)P(C\A)P(A)
That is:
P(A\B, C) = S(A) ~ e [0,1] (3.16) ; S(A) + S(A) L J '
where S(A) = P(B | A)P(C \ A)P(A)
and S(A) = P(B \ A)P(C | A)P(A)
Were expression (3.16) applied to the estimation of the complement event A = nonA,
the following probability would be obtained:
P(A\B,C)= S(A)
S(A) + S(A)
which ensures that:
P(A\B, C) + P(A\B, C) = 1 (3.17)
Note that neither expression (3.16) nor (3.17) corresponds to conditional indepen
dence.
3.3. APPROXIMATIONS BASED ON THE NU DERIVATION 63
The uQ = 1 model can be shown to identify the standardized expression (3.16). Indeed:
^ | B - C ) = TT% = ^ S ( I j (3'18)
In summary, the no-data-interaction i>0 = 1 model represents a significant contribu
tion beyond the traditional hypothesis of data conditional independence. Conditional
independence giving A and nonA does lead to the v0 = 1 model, but there are many
patterns of data dependence that also lead to the same u0 = 1 model.
Notwithstanding these advantages, the v0 = 1 model can be restrictive in some appli
cations. It is thus important to consider the case when the global interaction weight
UQ is different from one, as shown by the stockbrokers example below.
The s tockbrokers case and UQ ^1
Consider an uncertain decision to be made about buying a particular stock (A = 1).
The prior probability is uninformative: P(A — 1) = 0.5, hence x0 = 1.
1. Two stockbrokers (D\,D2) strongly advise to buy that stock:
P(A = l|Z>i = 1) = P(A = 1\D2 = 1) = 0.9 with xx = x2 = grj = g = 0.11
The likelihood of having the second broker advising a buy {D% = 1) in presence
of (A = 1, D\ = 1) is much greater than in presence of (A = 0, D\ = 1),
Figure 3.3: Four training classes and their respective representative scores.
be exportable to the application field, much more so than the conditional probabil
ities. For example, one would not export to an actual subsurface hydrocarbon field
direct porosity or permeability measurements taken from an analog outcrop. Instead,
one may retain the more stable permeability ratio Kv(u)/Kh(u), with Kv being verti
cal permeability at location u and K^ being the horizontal permeability at that same
location u.
In any particular application, we suggest that
1. the n elementary distances Xi, i. e. the n elementary single datum-conditioned
probabilities P(A\Di = di) be evaluated directly using the actual data values
di. As we mentioned before, this problem has received many solutions. For
3.3. APPROXIMATIONS BASED ON THE NU DERIVATION 71
example, numerous literature sources propose algorithms for obtaining condi
tional probability via neural networks [3],[40], [41]. In geostatistics, one could
consider an indicator algorithm for modeling the elementary conditional dis
tribution functions [34]. Obtaining such elementary, single datum-conditioned,
probability is not in the scope of this thesis.
2. the single weight VQ modeling the joint data interaction be borrowed from a
proxy experiment (or training set) where the relations |^- versus -^- are known,
thus providing proxy values for UQ.
The difficulty with borrowing such proxy weight u0 is that this weight is (a; dj, i =
1,..., n)-values dependent. The proposed VQ classification accounts for such data val
ues dependence although approximating it through summaries (scores) of these data
values. That heteroscedasticity has its positive side: the u0 weights measuring joint
data interaction do depend on data values, as opposed to the homoscedastic kriging
variance and regression weights.
An example of the classified UQ approach
As an example of the classified UQ approach consider obtaining posterior conditional
probability P(A = 1|D, B) where D and B are two data events informing the unknown
A. For example, A could be a binary variable indicating the presence of sand at the
location u of a potential reservoir site. Data event D could be indicator of sand
fades at nearby well locations, and data event B is the indicator of sand facies but at
more remote wells. Assume then the availability of the following prior, pre-posterior
probabilities, and training image:
• P(A = 1): prior probability of A occurring. Such probability could be obtained
from historic data. Note, this prior is common to both data events D and B.
• P(A = 1|D) and P(A = 1|B): probability of A occurring given information
provided by data event D and B taken separately. This is equivalent (through
Bayes relation) to knowing the respective likelihood functions P(D\A — 1) and
72 CHAPTER 3. THE NU REPRESENTATION
P(B\A = 1) of observing data event D or B given the unknown A = 1. For
example:
, , P ( A = 1 , D ) P(D\A)P(A = 1) P(A = 1|D) = v ' ' - v ' ' v !
P(D) P(D) P{B\A)P(A = 1) (3.20)
X) P(D|A = a)P(A = a) (a=0,l)
• the training image of Figure 3.4. Such training image is a synthetic representa
tion of the interaction between data events D and B and between them and the
unknown A. In practice a training image could be obtained from an outcrop or
built using a process-based simulation algorithm [70], [74]. The training image
allows retrieving the data values-dependent global interaction parameter i/0 of
equation (3.5).
Next consider the templates defining the data events D and B as shown in
Figure 3.5 (1) and (2):
— the closest data event D comprises 4 data locations located 10 meters
away from the unknown A(\i). These 4 data are located at the corners of
a square centered at location u.
— the second data event B comprises also 4 data locations with the same
geometry but located 15 meters away from the unknown u;
When conditioning only to either D or B alone, there are 24 = 16 possible
combinations of binary data values to consider. When conditioning jointly to
both D and B data events, there are 28 = 256 possible combinations of binary
data values. Note, that ideally a training image such as that shown in Figure
3.4 should be large enough to depict all possible 256 data value combinations.
The training image provides replicates of the (D, B) joint data event and the
corresponding A value. That training image thus provides all probabilities of
APPROXIMATIONS BASED ON THE NU DERIVATION 73
Training Image p=0.72 0.28
50 100 150 200 250 East-West direction
ure 3.4: Training image depicting the interactions between data and unknown.
74 CHAPTER 3. THE NU REPRESENTATION
( D (2)
Figure 3.5: Data events definitions.
the type P(A\B), P(A\D), and P(A\B, D) and consequently proxy values of the
u0 data interaction parameter, as defined in equation 3.5.
Regarding the inference issue, if the training image is not large and "rich"
enough to display enough replicates of all data events (taken jointly) found
in the actual field, one can reduce these data events through a few summary
statistics or scores. For example, the 256 (D, B) data values combinations of
this example could be summarized by two scores Si and S2, where:
1. Score Si is the arithmetic average of the (4+4)=8 data values
2. Score 52 could a measure of east-west connectivity calculated on the same
8 data values as suggested by Zhang [74].
That is the 256 data value combinations of the data template of Figure 3.5(2)
have been summarized by only two scores Si and S2- Figure 3.6 shows schemat
ically such dimension reduction where the two scores Si and S2 are plotted on
x and y axis, respectively. Each pair (Si, S2) on the score map (Figure 3.6,
3.3. APPROXIMATIONS BASED ON THE NU DERIVATION 75
training image score map ° 2 •?
training data reduced by two summary scores
•
•
• • • ••• • • • • »•
• • • •
•
•
••
• •
• • • •
• •• • • ••
• • •
50 100 150 200 250 S-, East-West direction
Figure 3.6: Training image (left) is summarized by the distribution of two summary scores shown on the score map (right).
right) corresponds to a particular training data occurrence with the configura
tion shown in Figure 3.5(2).
Further using a traditional classification technique such as cross or A;-mean par
titioning, the score space is divided into clusters or classes of similar score values.
For example, Figure 3.6 shows nine such classes. For each such class, we can
retrieve a prototype u0 value by, for example, taking the average or median
of that class training u0 values. These nine prototype u0 values are likely all
different from the value 1, they allow us to step away from the assumption of
no-data-interaction of the VQ = 1 model.
In the application phase, it is a simple task to find the training class closest to the
actual conditioning data scores and retrieve that class prototype vo value to combine
the elementary probabilities. The classified UQ paradigm is general in that:
1. The actual conditioning data event can be quite complex. In the example of
Figure 3.5 the conditioning data events D and B comprise 4 data points each.
76 CHAPTER 3. THE NU REPRESENTATION
In an actual application, the joint conditioning data event might comprise many
more than eight data points. Particularly important are the actual data score
values retained to find the closest training class and retrieve its prototype u0
value.
2. The actual conditioning scores need not match exactly any of the training class
scores. In other words, the actual conditioning data event does not need to have
exact replicates in the training image: it suffices to find the training class with
the closest set of score values.
Of course, the set of scores retained should be chosen so that it reflects the main
characteristic of any specific joint conditioning data set. Too many scores and the
training image available may not offer enough replicates to fill-in reliably the score
space.
Chapter 4
Application to binary data
The purpose of this chapter is to illustrate the nu model with application to binary
data sets. A binary data set consists of two values coded as either zero or one. We will
sometimes refer to the category zero as mud/no sand and to the category one as sand
following petroleum engineering convention. The reference binary data sets presented
in this work are assumed exhaustively known. Such reference data sets provides the
exact fully conditioned proportions and allows checking any approximation including
those resulting from the u0 = 1 model and the classified u0 approach against tradi
tional estimators based on data independence and conditional independence. Various
important parameters controlling data interaction are investigated. Particular focus
is given to the dependence of data interaction on data values. This heteroscedastic
dependence makes more difficult the inference of an accurate u0-raodel. The level
of heteroscedasticity of the tau and nu parameters are compared; we expect the nu
weights to be more stable versus data values and hence easier to infer.
4.1 An elementary case study
4.1.1 Equilateral configuration
To investigate how the nu and tau parameters relate to data interaction, the following
simple experiment is proposed. It involves one unknown A located at the center of an
77
78 CHAPTER 4. APPLICATION TO BINARY DATA
equilateral triangle and three data Ii, I2, h located at its three apices (Figure 4.1).
10.0
6.0
2.0
-2.0
-6.0
-10.0
-
1 1
1
—1—
1—
1
—1—
1—
1—
—1—
1
1
-
/
h ii
i i i
1:
A A
(5.77)
—i—i—r—
VlO.O)
\
— •
b
— i — i — i — — i — i — t —
-10.0 -6.0 -2.0
Figure 4.1: Spatial locations of three data I\, h, h and the unknown A. The distances are given in parentheses.
All four variables are binary (0,1) and were generated by truncation of a simulated
Gaussian field.
More precisely, 100,000 unconditional joint realizations of four corresponding Gaus
sian random variables Z(u) are generated by LU decomposition of their 4x4 covariance
matrix (program LUSIM in [11]).
The isotropic covariance model used to build the covariance matrix is:
7(h) = exp ( —), with practical range 3r
4.1. AN ELEMENTARY CASE STUDY 79
That range 3r is made variable from one set of 100,000 realizations to another set of
equal size. Each set allows to study data dependence and interaction in evaluating
the central value A.
All standard Gaussian realizations, denoted z(u), are truncated at the median value
z = 0 to generate joint realizations of the four binary indicator variables:
1 if z(ua) > 0
0 otherwise
where u0 is the location of the central value A to be evaluated, and u a , a = 1, 2, 3
are the three data locations.
The four variables being binary, there are a total of 24 = 16 possible joint combina
tions of their values. A joint probability of occurrence pk, k = 1,..., 16, is assigned 16
to each of these 16 joint combinations (Table 4.1). Note that ^ p\. = 1. fc=i
P i
P2
Pz Pi
P5
Pe P7
P8
A 0 0 0 0 0 0 0 0
h 0 0 0 0 1 1 1 1
h 0 0 1 1 0 0 1 1
h 0 1 0 1 0 1 0 1
p% PlO
P l l
P\2
Pl3
Pu Pl5
Pl6
A 1 1 1 1 1 1 1 1
h 0 0 0 0 1 1 1 1
h 0 0 1 1 0 0 1 1
h 0 1 0 1 0 1 0 1
Table 4.1: Probability notation for the 16 joint occurrences.
The prior or marginal probability associated with any of the four binary variables is
Po = 0.5 corresponding to the prior distance:
x0 = !^ -£o = 1 with po = P(A =1) = 0.5.
* (u a) =
80 CHAPTER 4. APPLICATION TO BINARY DATA
0.9 v0=1modej.. -
P(A)=1 -e—e- o o o o—e-e-e-o o o -e-e-e—&- o o o <b
10 15 20
Practical range
25
Figure 4.2: Conditional probabilities. Concordant data case: A = I\ = I2 = h = 1. The fo = 1 model outperforms the model based on conditional independence assumption as seen from the zo — 1 model values being closer to reference probability.
The 16 probabilities pk are set equal to the corresponding 16 proportions of joint
occurrence calculated from each set of 100,000 simulated realizations of the four vari
ables A, Ii, I2, h- From such consistent set of probabilities of joint occurrence,
all conditional probabilities can be retrieved and plotted vs. the practical range 3r
(Figure 4.2).
For example, the probability that A = 1 given that all three indicator data are 1 is
This ratio is none other than expression (3.4) of the distance x under data conditional
independence given A — 1 and A = 0, i.e. expression (4.5) entails the vQ — \ model.
However, the u0 — 1 model is not necessarily based on two previous assumptions of
conditional independence (given A=\ and A = 0); in that regard the VQ = 1 model
is a less restrictive hypothesis, that of no-data-interaction.
4.1. AN ELEMENTARY CASE STUDY 83
0.04
0.03
0.01
-0.02 -
-0.03
-0.04
1
data values for l,l2l
^ - . i^ ia i .S^y- 'arr . j ' ! ' ! . ' • '• S"^-;* W " * ^ M S ,
-
-
1
1
3
• '
/ '' - o -
- ' . ^ '•> ' * r '»1- C-v
'v' \ '\ \
,
1 '
. ** ^
^ ' • ' ' ^
.-• , . . -c. ' .•• - - * / " • . ? N \Q
, .» • ' ' - ' .£••''
\ - ' •s v-;\
V % i O . Oln
. _ , - . - . . • '
/'"* -' . ><*.; ff - > • -
:*"''
J0n
, p
'"" ' — ' v p
*" ' 10 15 20
Practical range 25 30
Figure 4.3: Data values-dependent error associated with the vo = 1 model. The largest error is attributed to the cases when all three data I\, I2, h are concordant (the cases [1,1,1] and [0,0,0]) deviating most from assumption of no-data-interaction.
From Figure 4.2, the u0 = 1 model is seen to provide a better approximation than the
conditional independence hypothesis (given A = 1), increasingly better as the corre
lation range 3r increases. The v0 — 1 model corresponds to a hypothesis of no-data-
interaction which is increasingly poorer as the correlation within data and between
data and unknown increases. In the case of concordant data (Ii = I2 = I3 = I) used
to evaluate the probability of A — 1, ignoring data interaction by assuming UQ = 1
leads to over-compounding the three individual probabilities
P(A = 1 I Ik = 1) and an overestimation increasing with the correlation range.
Interestingly, the conditional independence approximation (A — 1) leads to an un
derestimation of the exact fully conditioned probability P(A = l\lil2l3 = 1).
Dependence on Data Values
To evaluate how the vQ = 1 approximation fares depending on the set of three data
values, Figure 4.3 plots the error
84 CHAPTER 4. APPLICATION TO BINARY DATA
1.25
10 15 20
Practical range
Figure 4.4: The sequence-dependent Vi weights for the data concordant case A = I\ = I2 = h = 1- The first weight v\ is equal to 1 by definition. The third weight u3 reflects the greatest interaction. All three weights increase with correlation ranges since all data are concordant.
[P;Q=1{A = I | ilti2,i3) -P(A = I\ h,i2^)]
for the 23 = 8 possible sets of data values (Ji = i i , h = *2, h = H)-
As expected, ignoring data interaction leads to increasing errors as the correlation
range increases with overestimation when two or more of these the data are valued
1 and underestimation for the other cases. Also the largest error occurs when the
three data values are concordant (1, 1, 1) or (0, 0, 0), cases which contradict most
the assumption of no-data-interaction.
EXACT N U WEIGHTS
Availability of the exhaustive set of 16 joint probabilities pk allows calculation of the
exact (yi, vQ) weights as defined by expression (3.5). Figure 4.4 shows the three data
sequence-dependent i/j weights calculated for the case A = 11/2/3 = 1- Three data
4.1. AN ELEMENTARY CASE STUDY 85
h,h, h produce 3! = 6 possible data sequences. However, because of the equilateral
data configuration (Figure 4.1) associated with an isotropic correlation, the data
sequence does not matter here. The first datum in any sequence always receives a
unit weight V\ = \. As the correlation range increases the interaction between the two
first data increases leading to an increasing second nu weight i/2. The third datum
u3 reflects the even greater interaction between the first two data and the last one.
Note that data interaction and hence the nu weights are data values-dependent; that
interaction is maximal here when all data are concordant I\ = I2 — I3 = 1, and it
increases with the correlation range.
0.91 1 1 1 1 1 1 0 5 10 15 20 25 30
practical range
Figure 4.5: The single sequence-independent uQ weight: (1) with only two concordant data A = Ii = Ij = 1 and h = 0 with i 7 j ^ k (solid line) and (2) with all data that are concordant A = I\ = I2 = I2 — 1 (circled line). The interaction is the greatest when all three data Ii, I2, and ^3 = 1 are concordant. This interaction increases with the correlation range.
Figure 4.5 gives the single, data sequence-independent, exact u0 weight for the case
when all data are concordant A = J : = J2 = I3 = 1 (solid curve marked by circles)
and for the case with only two concordant data A = Ii = Ij = 1 with i ^ j = 1, 2, 3
86 CHAPTER 4. APPLICATION TO BINARY DATA
0.04
0.03
-0.01
-0.02
-0.03
v0 vanes
v0=1
10 15
Practical range
20
Figure 4.6: The averaged error associated with data-value-dependent u0 model and with the VQ — 1 model. Data-value-dependent UQ model shows significant improvement reflected in smaller errors.
(solid curve). Note, in this case it does not matter which two of the three data are
concordant because of equilateral data configuration (Figure 4.1). In the presence of
concordant values A = Ii = I2 = h = 1, the strong data interaction is expressed
through an exact vQ value increasingly different from 1 as the data are more depen
dent one to another. With only two concordant data, the v0 weight is still increasing
with the range. However, as expected, this increase is less dramatic than for the case
with three concordant data.
Our inference paradigm consists of two steps. We first evaluate the single datum-
conditioned probabilities P(A = 1 | I3•, = 1), j = l , 2, 3 using the actual data from the
actual field under study. We then use some training image or expert catalog the data
values-dependent vQ weight and export it to the actual field under study to combine
the previous single datum-conditioned probabilities.
For example, assume that from some prior expertise (perhaps built from experiments
4.1. AN ELEMENTARY CASE STUDY 87
on training data sets similar to that used in this study) we have access to the following
VQ weight function:
{ 1, whatever the data values for any small range 3r < 6
1 + ••i2ooo i a s function of the practical range 3r > 6
This function is assumed applicable only when two or more data are valued 1. The
error graphs (Figure 4.3) are re-calculated using this improved i/o-model. The results
of Figure 4.6 show a significant reduction of the error and demonstrate that the
worth and practicality of the nu/tau approach depends on ability to go beyond the
approximation u0 — 1.
4.1.2 Non-equilateral configuration
For this second example, a non-equilateral configuration of three data was retained to
observe the impact of data locations on data interaction. Figure 4.7 shows the data
configuration, and Table 4.2 gives the corresponding Euclidean distances.
The study built around this data configuration is similar to that done for the equi
lateral case. Figure 4.8 shows the conditional probabilities associated to the case
A = I\ — Ii = Is = 1. Note that data values concordance represents an unfavorable
case for any independence-related approximation.
For this example we also included one more estimator for comparison of the results
based on the VQ = 1 model. This estimator considers a hypothesis of data indepen
dence combined with the hypothesis of conditional independence given A = 1.
A
h h h
h 10.63 0.00 21.40 3.61
h 11.18 21.40 0.00
22.83
h 11.66 3.61 22.83 0.00
Table 4.2: Distances between data-to-unknown and data-to-data.
CHAPTER 4. APPLICATION TO BINARY DATA
115
110
105
100
95
90
-
-
1 la
^
|
* \
A/
r
i i i
l2
/t / i
ft i
t
7
"'"--,..
85-
90 94 98 102 106 110
Figure 4.7: Non-equilateral data configuration.
Figure 4.9 shows the sum of the two estimates P*(A \ Ix,I2,h) + P*(A \ Ii,I2,I3)
for the three sets of approximations. That sum should be equal to one. It appears
that only the estimate associated with u0 = 1 verifies that consistency relation for all
ranges. The two independence-based estimates (4.4) and (4.7) are not self consistent
(over A and ^4), particularly the estimate based on conditional independence. This
consistency represents a valuable in-built property of the u0 — 1 approximation in
presence of data dependence.
We will call that combination of hypotheses as "full independence". The resulting
approximation is written:
P*{A\I1,I2,h) = P(A,h,I2,I3) P{A)P{h | A)P(I2 | A,h)P{h I A,h,I2)
P(h,h,h) P(h,h,h)
The numerator per conditional independence given A = 1 is written:
4.1. AN ELEMENTARY CASE STUDY
10 15 20 Practical range
30
Figure 4.8: Conditional probabilities for non-equilateral case with A = Ii = I2 = h = 1. The estimate based on full independence assumption (line marked by points) leads to a large over-compounding of the concordant information. Conditional independence estimate (line marked by plus signs) gives probability that is less than 0.5 for small (< 21) ranges. The u0 = 1 model (dash-dotted line) provides consistently better results.
P(A)P(I\ | A)P(l2 | A)P(h | A). The denominator per data independence is writ
ten: F(/1)P(/2)P(/3). Thus,
P *(A\h,I2,h) = P(A)P{h 1 A)P{I2 1 A)P{h 1 A) P(/1)P(/2)P(/3)
Figure 4.9: Checking the consistency relation. Case I\ — I2 = h = 1- The v0 = 1 model produces the licit probabilities. The estimates based on data independence assumptions (both conditional and full independence) do not follow the general law of probabilities which requires the probabilities to sum to 1.
Or, equivalently:
P\A | Iu I2,13) P(A | h) P(A | I2) P(A | J3)
P(A) P(A) P(A) P{A) (4.7)
For example, in terms of the p'ks of Table 4.1, the probability P(A — 1 | I\ = 1) is
obtained as:
P(A = l\h = l) = 16 E Pk
t=13
16
16 8 £ Pk+ E Pk
k=13 k=5
, and P(A = 1) = E Pk-k=9
From Figure 4.8, we observe that the estimate (4.7) based on "full independence"
leads to a large over-compounding of the concordant information It = 1. Conditional
independence (4.4) gives an estimate which is less than the prior probability (0.5)
at small ranges; this represents a severe error since all three individual probabilities
P(A = 1 | 7j = 1) are above the prior. Again the uo = 1 approximation (4.3) provides
Figure 4.10: Approximation errors for the eight data value configurations. The conditional probability estimated through u0 = 1 model (solid lines) has more stable and smaller errors; the conditional independence assumption (lines marked with stars) leads to the largest errors.
Figure 4.9 shows the sum of the two estimates P*(A | Ii,I2,I3) + P*{A \ IUI2,I3)
for the three sets of approximations. That sum should be equal to 1. It appears
that only the estimate associated with v§ = \ verifies that consistency relation for all
ranges. The two independence-based estimates (4.4) and (4.7) are not self consistent
(over A and A), particularly the estimate based on conditional independence. This
consistency represents a valuable in-built property of the vo = 1 approximation in
presence of data dependence.
The approximation errors defined as:
[ i ^ = 1 (A = 1 | i 1 ) i 2 , i 3 ) - F ( A = l l i i . i a , ^ ) ]
lP£I(A=l\il,i2,i3)-P(A = l\i1,i2,i3)}
[P*FI{A = 1 | h,i2,i3) -P{A = \\ i i , i2 , i3)]
92 CHAPTER 4. APPLICATION TO BINARY DATA
Practical range
Figure 4.11: Error linked to v§ = 1 (non-equilateral case). The errors attributed to the VQ = 1 model are small and stable attesting that this model is the best among others presented.
data values for I, l2l3
10 IS 20
Practical range
Figure 4.12: Error linked to "full independence" hypothesis (non-equilateral case). The largest error is attributed to the case when all three data are equal to 1. This is the case when the consequence of wrong assumption of full independence is most severe.
4.1. AN ELEMENTARY CASE STUDY 93
data values fori, l j l3
. 0 . 2 i 1 1 1 1 1 1 O S 10 15 20 25 30
Practical range
Figure 4.13: Error linked to conditional independence (non-equilateral case). The errors are large and non-stable. The positive errors associated with overestimation of true conditional probability (A = 1) are higher than the negative errors associated with underestimation.
for each of the eight data values combinations when estimating A=\ are plotted in Fig
ure 4.10. Again, the conditional probability estimated through the vQ = 1 assumption
has more stable and smaller errors; the conditional independence assumption leads
to the largest errors. Figures 4.11, 4.12, 4.13 give the errors specific to each estimate
with indication of the three data values. Beware of the different ordinate axis scaling.
The errors associated with the u0 — 1 estimate (4.3) are small and centered around
zero (Figure 4.11). That error is smallest when the two close-by data are different
(Ii 7 J3), corresponding to data values less conflicting with the underlying no-data-
interaction hypothesis. The vo = 1 model appears to downplay the contribution of
the isolated Ii datum value: in Figure 4.11 the two error curves for 72 = 0 and h = 1
are similar for any given combination of the Ii, I3 data values. The smallest errors
for the z/fj = 1 model are related to cases of non-concordant data values, particularly
non-concordant I\ and I3 values, i.e. 001, 011, and 110.
Figure 4.12 shows the errors for the "full independence" estimate. The error is largest
94 CHAPTER 4. APPLICATION TO BINARY DATA
x104
6
5
4
o
UJ 3
2
1
0
-1
5 10 15 20 25 30
Practical range Figure 4.14: Bias (error) averaged over all data values combinations (non-equilateral case). The full independence estimator and uQ — 1 models provide reasonably unbiased estimates, while the conditional independence leads to severe overestimation.
for the case of data Ii = I2 — h = 1 concordant with the outcome A = 1 being
evaluated. In such case, the assumption of data independence is most invalid. The
most significant result is the large error associated with the conditional independence
estimate, see Figure 4.13. The errors are much larger and more unstable than for
the other two estimates. Also, the positive errors associated with overestimation of
the true conditional probability that A = 1 are much higher than the negative errors
associated with underestimation leading to an overall bias.
Figure 4.14 shows the bias or error averaged over the eight data value combinations
when estimating the probability that A=l. On average, the v0 = 1 model (4.3) and
the "full independence" model (4.7) provide reasonably unbiased estimates, while the
estimate based on conditional independence leads to a severe overestimation of the
reference posterior probability.
conditional independence full independence
— • v „ = 1
4.2. A 3D CASE STUDY 95
4.2 A 3D case study
The applicability of the u0 inference paradigm is now tested using a large 3D reference
binary data set where all conditional probabilities involved in the tau and nu expres
sions (2.32) and (3.6) are known, including the exact full data-conditioned probability
P(A = a\Di = di, i = 1, ...,n). Various approximations of that reference probability
can be evaluated. The heteroscedasticity of the uo, Vi and Tj weights, i.e. their level
of dependence on the data values (di, i = 1,..., n) can be evaluated. The greater that
heteroscedasticity, the more difficult would be the inference of these data interaction
parameters in practice.
4.2.1 The reference da ta set
We start by generating a reasonably large 3D non-conditional realization of a Gaussian
field using the sequential Gaussian simulation code sgsim of the GSLIB software [11].
This 3D field is of size 100x100x50, comprising 500,000 nodes. The variogram model
used is spherical with small nugget (10%), isotropic horizontal range equal to 50 pixel
units, and a shorter vertical range equal to 20 pixel units. This Gaussian field is then
truncated at its upper quartile value, yielding the reference binary indicator field
shown in Figure 4.15.
Denote that reference field by S : {A(u) = 0 or 1, u e S } with P(A(u) = 1) =
0.25. In our paper, we will sometimes refer to the binary data valued 1 as sand.
Conversely, the binary data valued 0 will be referred to as non-sand or mud. We
borrowed this convention from petroleum engineering where the location of channel
sand is of the great interest. Figure 4.16 gives the reference indicator variograms
in the x, y, z directions calculated from indicator data from the top 35 layers of S;
the reason for excluding the bottom 15 layers will become apparent soon hereafter.
Those indicator variograms reflect the horizontal-to-vertical anisotropy of the original
Gaussian field.
96 CHAPTER 4. APPLICATION TO BINARY DATA
t=50
t=1
P(A=1)=0.25 P(nonA)= P(A=0)=0.75 shown in black
Figure 4.15: Reference binary image generated by truncating a continuous Gaussian realization at its upper quartile.
0.3 -
0.25 -
D.2
0.19
0.1
0.05 - .
Horizontal EW
0 10 20 30 40 50 SO 70 SI
Mean: p=.274 Variance: p(1-p)=.199
0.3 -
0.25 —
0.15 -^
0.1 -_
0.05 -
Horizontal NS '•;"r,"\' ~v~~\~ :""T"\r"";
|- - - r - i - -1 - - \A
\--'r^^-\--\
•'•&'* - - r - r - f - -
liniJTinjimTfiWpiii
. - . j - - - | . - - ;
- - - p - - » - - ;
0.3 •
025
0.2 •
0.15 •
0.1 •
0.09 •
["" I 1
0 5
Vertical
! | l I , | I I I ! | l l l | l l l l j
10 15 20 25 30 35
Figure 4.16: Exhaustive indicator variograms, calculated over the 35 top layers. EW is the east-west direction and NS is north-south direction.
4.2. A 3D CASE STUDY 97
(1): conditioning to (2): conditioning to (3): conditioning to one data event D two data events D,B three data events D,B,C
Figure 4.17: Data events definition.
4.2.2 The estimation configuration
Consider the evaluation of the conditional probability of an unsampled value A(u)=l,
given any combination of the following three multiple-point data events (Figures 4.17
(1), (2), (3)):
• the closest data event D comprises four data locations at the level just below
that of A(u). These four data are at the corners of a square centered on the
projection of location u on their level (Figure 4.17 (1)).
• the next closest data event B comprises also four data locations with the same
geometry as for data event D, but located five levels below that of A(u); (Figure
4.17 (2)).
• the furthest away data event C again comprises four data locations but located
15 levels below that of A(u) (Figure 4.17 (3)).
98 CHAPTER 4. APPLICATION TO BINARY DATA
If the unsampled location u of A(u) spans only the eroded field
So = {x = 11,. . . , 90; y = 11,...,90; z = 16, ..50} then each value A(u) can be eval
uated by any of the 3 data events D, B, C. From here on, all statistics will refer
to that "common denominator" field S0 comprising 224,000 nodes. Over that cen
tral field So, the marginal statistics for the event A = 1 being assessed is P(A)=0.274.
The definition of an "eroded" field S0 common to all data configurations entails that
the spatial averages of conditional probabilities (proportions) remain the same no
matter the conditioning data event retained. For example, if conditioning is only to
the sole D-data event:
P(A\D) = r ^ E P ^ ( u ) = ^ = d(u))
= i j E p(Aw =*) = pw= ° - 2 7 4
On1 — ueS0
where the data event D can take 24 — 16 possible combinations of data values. When
conditioning jointly to the two D and B data events:
P(A\D, B) = - ^ Y, P W U ) = 1ID = d(u) ' B = b(u))
= p(A = 1) = 0.274
where the data event (D, B) can take 28 = 256 possible combinations of binary data
values.
When conditioning jointly to three data events D , B and C:
P{A\D, B,C) = -±-Yl P ^ ( u ) = 1 I D = d ( u ) ' B = b(u)> C = c ( u ) )
= P(A = 1) = 0.274
where the data event (D, B, C) can take 212 = 4096 possible combinations of data
values. Because S0 is not that large, |So|=224,000 nodes, not all 212 data values
4.2. A 3D CASE STUDY 99
combinations are present in So; this does not affect, however, the previous equality:
P(A = 1|D, B, C) = P(A = 1) = 0.274.
Note also that the nu representation (3.6) does not restrict us to only the point
support of the unknown A (as in this example). Unknown event A can similarly be
defined as the data event provided we can find enough replicates of such data event
in our reference binary data set.
4.2.3 Conditional probabilities and estimates
As an example, Figure 4.18(1) gives the So-volume of the 224,000 exact probability
values P(A — 1|D, B, C) which are valued in the interval [0, 1] with mean 0.274 and
variance 0.067. Again, the mean is equal to that of the reference binary values. The
histogram of the probability values is given in Figure 4.18(2). We will use this field
as the comparison tool for future analysis. Similar figures and statistics are available
for all the following conditional probabilities, although not all are given here:
• single data event-conditioned: P(A(u) = 1|D), P(A(u) = 1|B), P(A(u) = 1|C)
• two data events-conditioned: P(A(u) = 1|D, B), P(A(u) = 1|D, C),
P(A(u) = 1|B, C)
• all three data events-conditioned: P(A(u) = 1|D, B, C)
• the estimated probability P*(A(u — 1)|D, B, C) using the uQ = 1 model (3.6)
to combine the previous single data-event conditioned probabilities.
Under the u0 = 1 model at each location u the estimate is:
P*(A{u) = 1|D, B, C) =
P*(A(u) = 1|D = d(u), B = b(u), C = c(u)) = 1 + ^ ( u ) ,
with for the estimated distance x*(u):
x*(u) _ x D ( u ) x B ( u ) x c ( u )
XQ XQ XQ XQ
100 CHAPTER 4. APPLICATION TO BINARY DATA
(3) reference binary values
P(A=1)=0.274
1
0.893
0.666
0.5
0.333
0
0
V*/o.3
0 .25 -
frequency
o -
N=224,000 | m=0.274 1 a2=0.067
1
L H*L •b^. v0 model
Figure 4.18: (1) The reference eroded data set S0, (2) its histogram , and (3) binary reference field with the prior P(A = 1) = 0.274. The mean of the eroded data set is equal to that of the reference binary values. The area of the probability map (1) is equal to 80x80x35=224,000. This probability map will be used as the comparison tool for future analysis.
where: _ 1 - P(A = 1)
x0
i n 274
0 274 = ^ ' ^ *s ^ e marginal distance;
xD(u) = pfh^L 111)=: d(uY) is t h e d i s t a n c e t 0 ^(u) 1 updated by the data LM.J A | i - * KA.\U.JJ
event D = d(u) .
The distance xD(u) varies from one location u to another. It is obtained by scanning
the reference image 5o with the template definition of Figure 4.17(1) for the proportion
of D-replicates identifying the data values combination rf(u) which also features at
their upper center (one level above) a value A(u) = 1. Note our estimation paradigm
assumes that all elementary conditional probabilities P(A\T)), P(A\B), P(A\C) are
known. This analysis addresses only the problem of combining these elementary
probabilities into an estimate of the fully conditioned probability P(^4|D, B, C)
while accounting for data interaction. Similarly, from the training image one can
4.2. A 3D CASE STUDY 101
retrieve the other two elementary distances xB(u) and xc(u).
The vo = 1 model then provides an estimate of the fully conditioned probability,
P*(A(u) = 1|D, B, C) (Figure 4.19(1)).
Figure 4.19: (1) The estimate of fully conditioned probability P(A | D,B,C) using the uQ = 1 model, (2) its histogram, and (3) reference binary field with the prior P(A = 1) = 0.274. The spatial mean and variance of the estimated probabilities using the uQ = 1 model are greater than the corresponding statistics of the reference case leaving room for improvement.
This estimate is necessarily valued in the interval [0, 1]: its spatial mean is 0.288 and
spatial variance is 0.098. Its histogram is given in Figure 4.19(2). The histogram and
scattergram of the error defined as
P*(A(u) = 1|D, B, C) - P(A(u) = 1|D, B, C)
are shown in Figure 4.20.
The spatial variance and the spatial mean of the estimated probabilities using the
UQ = 1 model are greater than the corresponding statistics of the exact conditional
102 CHAPTER 4. APPLICATION TO BINARY DATA
P*: m=0.288 o2=0.098 P: m=0.274 o2 =0.067
v0 model reference
(1) (2)
Figure 4.20: (1) Histogram of error P*(A | D,B,C) - P(A | D,B,C) and (2) the corresponding scatterplot of P*(A | D,B,C) based on zo = 1 model versus reference P(A | D ,B,C) .
probability of Figure 4.18 leading to a positive mean error of 0.14. One would expect
smoothing (smaller spatial variance) from an estimation. Note that the vo — 1 model
corresponds to an approximation of no-data-interaction which is a poor assumption
in presence of the two well correlated data events D and B. This ignorance of data
interaction results in over-compounding of the individual single-datum conditioned
probabilities leading to an overestimation of the fully conditioned probability and an
associated greater variance.
4.2.4 Ordering the data values combinations
The statistics presented in Figures 4.19 and 4.20 pool together the 224,000 estimated
conditional probabilities over So, irrespective of the actual conditioning data values.
4.2. A 3D CASE STUDY 103
Recall that there are 4 * 3 = 12 binary indicator data grouped four by four into the
three data events D, B, and C; therefore there is only a total of 212 = 4096 possible
data values combinations.
To study heteroscedasticity of the nu and tau parameters, that is their dependence
on data values, we should first rank or classify the 4096 possible data values combi
nations, then plot the Ui, Tj, VQ parameters vs. data values combinations and observe
their data values dependence. Note that the lesser that data dependence, particularly
of the single parameter uQ, the easier would be its inference in practice; this would
justify our paradigm of separating individual data event contribution and data inter
action.
Out of the total of 4096 possible data values combinations, 96 were not found in
So and of the remaining 4,000 only 931 combinations were found with at least 10
replicates. To ensure statistical significance we retain only the latter. These 931 data
values combinations were ranked along the abscissa axes of Figures 4.21 and 4.22
with increasing proportion of binary data valued 1, starting at abscissa 1 with 12
binary data all valued 0 (which may be interpreted as "no sand" event) and ending
at abscissa 931 with all 12 data valued 1 (which may be interpreted as all "sand").
The combinations with the same proportion of binary data valued 1 were then ranked
by physical distance to the unknown event A. From the template definition (Figure
4.17(3)), the data event D is closest to the unknown event A, followed by data event
B; then by data event C which is the furthest from that unknown A.
The proportion of binary data valued 1 increases toward higher abscissa axis and
since we are evaluating the probability of event A = 1, we are expecting increase in
data interaction, hence any hypothesis of no-data-interaction or data independence
would become worse.
The next section discusses advantages in use of nu model versus tau model for the
above cases and for general setting.
104 CHAPTER 4. APPLICATION TO BINARY DATA
01
o w
BDC/DBC
^ |TW
I blue: 02(nu) = 0.48 red: <x2(tau) = 18.6
100 200 300 400 500 600 700 800 9
data value combination id
(1)
CDB/DCB
blue: o2(nu) l = 2.85 red: a2(tau) =80.2
BCD/CBD
blue: a2(nu) = 2.63 red: c2(tau) = 5.24
100 200 300 400 500 600 700 800 900
data value combination id
(2)
100 200 300 400 500 600 700 800 900
data value combination id
(3)
Figure 4.21: Sequence-dependent interaction parameters r3 (red) and u^ (blue) for data sequences (1) D B C / B D C , (2) D C B / C D B , and (3) C B D / B C D .
4.2.5 Heteroscedasticity of the tau and nu weights
It follows from from expressions (3.5) and (2.45) that,
• for any data sequence: v\ = T\ = \
• the Vi and Tj parameters are data sequence-dependent. For example, the last
parameter, v$ or r3, is not the same whether it applies to the sequence BDC or
the sequence CDB (Figures 4.21(1), (2)). However, this last parameter remains
unchanged from sequence BCD to sequence CBD (Figure 4.21(3)), or from
sequence BDC to sequence DBC (Figure 4.21(1)). Indeed, the last parameter
4.2. A 3D CASE STUDY 105
30
:ion
para
mel
te
rad
c
.
v0
iMkjWikiJ*
3
=nv, = l<v2<v3
ll jj
•
111 'IllilliJIf *•
yyMi|i|f|i|iijif|in';
100 200 300 400 500 £00 700 800 900
data value combination id
cr
^ S f
interaction parameter
Figure 4.22: Exact VQ parameter for 931 data value combinations. The VQ = \ model is excellent for the 600 first data value combinations as can be seen from the VQ values being close to 1.
(us or r3) measures the data interaction between the last data event (D in Figure
4.21(3)) and the indifferentiated ensemble of all previous data (BC or CB).
• the single global data interaction parameter u0 — v\v2vz = v2v^ is data sequence-
independent. The greater |1 — v^\ the larger the global data interaction is.
Figures 4.21 give the (r3, u3) parameter values applied to the last datum event in the
data sequence, as calculated from their exact expressions (3.5) and (2.45) using the
exhaustive proportions read from the reference field So- The following observations
can be made:
• the tau parameter is more unstable than the corresponding nu parameter as
seen from higher variability in tau series compared to nu series (Figures 4.21).
This is due to the denominator of the tau expression (2.45) becoming close to
log 1 = 0 whenever a datum is little or non-informative in discriminating A = a
from A = nona, as is the case for the furthest away datum event C.
106 CHAPTER 4. APPLICATION TO BINARY DATA
• the u3 parameter accounting for interaction of the last data event in the sequence
is smallest when applied to the non-informative remote data event C (Figure
4.21(1)) and largest when applied to the two most informative closer data events
D or B (Figures 4.21(2) and 4.21(3)).
• the 1/3 parameter increases along the abscissa indicating that the data interaction
|1 — z/3| is data values-dependent and that data interaction increases as more
of the elementary binary indicator data are valued 1; note that the event being
assessed is A = 1. This is particularly notable when z/3 applies to D, the most
informative data event (Figure 4.21(3)).
Figure 4.22 gives the v0 global data interaction parameter as calculated from the
exact ^-expressions (3.5) with UQ = 1 • z/2 • ^3. This u0 value is seen to be data
value-dependent increasing as the three data events D, B, C become more redundant
in assessing the probability of event A — 1 by displaying a greater proportion of
elementary binary data valued 1 (higher abscissa values). However, for all but the
last 300 data values combinations out of total 931 retained, the approximation u0 = 1
appears quite robust, i.e. essentially data-value independent (homoscedastic). For
the last 300 data values combinations, a quadratic model of the type
u0 = 1 + X(p - pc)2, V p > p c
would provide a good approximation of the data value dependence of that single
global correction parameter u0, where:
• p is the proportion of sand in the two closest data events D and B pooled
together;
• pc is a threshold proportion below which the v0 = 1 model would be applied,
above which the quadratic model would be applied;
• A > 0 is a fitting parameter.
With the previous quadratic approximation, the dimension 3*4=12 for data value
dependence of VQ has been reduced to 2 (the two parameters A, and pc). In a real
4.2. A 3D CASE STUDY 107
application, So would be a training image built to mimic actual data interaction. A
study of data interaction would be developed on that training data set resulting in
some approximation of the global u0 parameter, say:
VQ = f(sj, j — 1, ...,n ), with n small
where p is a function of a few easily accessible statistics Sj summarizing the possibly
much larger space of variability of conditioning data events and values. That function
UQ = <£>(•) is then exported to the study of combining the various single data event-
conditioned probabilities. These single data event-conditioned probabilities should
not be read from the training set; only the interaction parameter VQ, or equivalently
the function p, is to be borrowed from the training set.
4.2.6 Independence-based estimates
To evaluate comparatively the performance of the u0 = 1 (no-data-interaction) model,
from the same 5 0 reference field we calculate estimates of the fully conditioned prob
ability F(A|D,B,C) stemming from two common approaches calling for data (condi
tional) independence. The expressions for these two estimators were given in Section
2.1.
The "conditional independence" (CI) estimator is written:
i3 ,(A|D,B,C) P(A\B) P(A\B) P(A\C) P(D)P(B)P(C)
P(A) P{A) P(A) P(A) ' P ( D , B , C) { '
The "full independence" (FI) estimator is written:
P^AID.B.C) _ P(A\B) P(A\B) P(A\C)
P(A) P(A) P{A) P(A) { '
The two sets of estimated probabilities Pci(A = 1|D,B,C) and Ppj(A = 1|D,B,C)
given by expression (4.8) for conditional independence and expression (4.9) for full
independence are retrieved from the reference set So- These and the VQ = 1 model es
timated probability P*0=1(A = 1|D,B,C) are plotted against the /So-exact probability
108 CHAPTER 4. APPLICATION TO BINARY DATA
P{A = 1|D,B,C) in Figure 4.23.
No order relation violation Fewer order relation violation Most order relation violation Best correlation Poorest correlation Second best correlation
reference reference reference
Figure 4.23: Scatterplots of estimated probabilities P*(A | D,B,C) versus the reference P(A j D,B,C): (1) for estimate based on u0 = 1 model, (2) for estimate based on conditional independence assumption, (3) for estimate based on full independence assumption.
Although there is clearly data interaction (essentially between data events D ,B, and
A), the no-data-interaction model u0 — 1 (Figure 4.23(1)) gives reasonable results
with the largest correlation coefficient p = 0.82 and with estimated values necessarily
valued in the interval [0, 1]. The full independence approximation (Figure 4.23(3))
may appear at first sight to give equivalent good results (/? = 0.70) but expression
(4.9) does not guarantee that the resulting estimate PpJ(A\H,'B,C) lies in the interval
[0, 1] whenever there is actual data dependence and interaction: a large number of
these probability estimates are valued above 1. Assuming independence between data
might lead to severe violations such as probabilities that are greater 1. In practice,
these violations need to be corrected, for example, by setting all illicit probabilities
to 1. However, such artificial correction may add to overall bias of the estimates. For
4.2. A 3D CASE STUDY 109
example, the conditional independence estimator (Figure 4.23(2)) has fewer order vi
olations than the estimator based on full independence assumption (Figure 4.23(3)),
yet its correlation coefficient with the reference case (p = 0.36) is considerably lower
compared to the u0 = 1 model (p = 0.82) or full independence model (p = 0.70).
Using the no-data-interaction I/Q = 1 model calls for considering distances which
are ratios of conditional probabilities. In presence of departure from data indepen
dence hypothesis one is better off approximating ratios of probabilities (aka v0 = 1
model) which are generally more stable than the probabilities themselves. This was
the original point made by Journel [48]. However, much better results (as will be
shown in the next section) than those provided by the uo = 1 model can be obtained
with little additional effort by modeling the heteroscedastic variability of v0 using
a training/calibration data set mimicking the actual data interaction. No matter
how approximative is that training model of data interaction, it is likely to be better
than a blank and wrong hypothesis of no-data-interaction, or worst data conditional
independence. We consider this approach hereafter.
4.2.7 The classified VQ approach
The proposed classified UQ approach described in Section 3.3.2 can be summarized in
the two phases-training phase and application phase.
In the training phase we need to:
1. build a training data set mimicking (even only roughly) the actual data inter
action. From that set, retrieve the training data values-dependent z/o-values,
called proxy i/0-values
2. reduce each set of training data values to a few summary statistics or filter
scores. Based on these scores, classify the proxy values u0. Each class is identi
fied by a single (average or median) u0-value, called a "class z/0-prototype"
The application phase then consists of returning to the actual study field and then:
110 CHAPTER 4. APPLICATION TO BINARY DATA
1. Finding the class closest to the actual conditioning data scores.
2. Retrieving that class "prototype" value u0.
3. Using that u0 value to combine the elementary probabilities. These elemen
tary probabilities must be evaluated from the actual study field, no t from the
training data set.
In the following example, the training data set is the reference data set (an ideal
case). Later in this section, the more realistic and less favorable case of a training set
different from the reference one will be considered.
For demonstration purposes consider then the classified u0 approach applied to the
reference data set shown in Figure 4.15. The goal is to estimate P(A = 1|D,B,C).
Each of the three conditioning data events D, B, and C comprises 4 binary data
points (refer to Figure 4.17 for the geometry of the three data events). There are 931
possible data value combinations for which we can reliably estimate such probability.
Consider as data summary (score) the single statistics defined as an average sand
proportion (i.e. the average of the 3x4 binary data and where sand is defined by
the binary data valued 1). That statistics can take only twelve possible values corre
sponding to the 12 classes of data events. Each class prototype VQ value is the average
of the proxy u0 values falling into that class. In Figure 4.24, the prototype u0 values
are shown in red for each of the 12 classes. The mean of these 12 proxy u0 values is
equal to 2.21 which indicates a significant deviation from the assumption of no-data
interaction (i.e. the vQ = l model). For each set of actual data values we look for the
closest training class and use the corresponding prototype v0 value (instead of u0 = 1)
for building the fully conditioned probability P(A = 1|D,B,C). Important remark
is the uncertainty of these prototype u0 interaction weights is different for each of
12 classes. For example, the variance of the proxy z/0 weights for the last class as
seen on the right of the Figure 4.24 is much larger than the variance of the proxy i/Q
weights for the classes in the middle of this Figure. To account for such uncertainty
we then can consider evaluating the lower and upper quantiles (i.e. 10-quantile and
4.2. A 3D CASE STUDY 111
90-quantile) of the proxy I/Q weights for each class, and then evaluating the fully con
ditional probability P(A = 1|D,B,C) for these two quantiles.
average sand values Figure 4.24: Exact v0 values versus average sand values defined over the three data events D,B,C. The average u0 values and their statistics are shown in red.
Comparison of the classified vo approach with the vo = 1 model performance is shown
in Figure 4.25.
The left graph shows a 0.82 coefficient of correlation between the reference true prob
ability and the UQ = 1 model for the 931 data value combinations. We observe only a
small increase in that correlation when using the classified u0 approach with p = 0.85.
Linear correlation, however, is not a fair measure of comparison between these two
112 CHAPTER 4. APPLICATION TO BINARY DATA
reference reference
Figure 4.25: Scattergram of u0 = 1 model (left) and classified u0 model (right) relative to reference probability. The correlation coefficient of the classified VQ approach with the reference case (p = 0.85) is improved somewhat compared to that of the uQ = 1 model (p = 0.82).
models as it measures only linear dependence. The significant improvement brought
by the classified VQ approach can be observed in reproduction of the reference statis
tics for the 931 data value combinations retained, see Table 4.3.
Because of data over-compounding, the u0 = 1 model overestimates the reference
spatial mean and variance of the 931 exact probabilities. The classified VQ approach
reproduces the statistics of the reference case much better. The class-dependent VQ
model provides a significant improvement which is not fully reflected by the coefficient
correlation.
4.2. A 3D CASE STUDY 113
reference
i/0 = l classified u0 model
mean 0.44 0.52 0.41
variance 0.04 0.07 0.04
Table 4.3: Summary statistics: means and variances of reference conditional probabilities and approximations stemming from nu representation for 931 data value combinations.
Experiments with a different training set
To test further the robustness of the previous results, consider two different data sets.
The first one provides the information content (the actual data), and the other one of
fers training data from which data interaction is borrowed. For this, ten independent
Gaussian realizations were truncated at their respective upper quartiles generating
ten independent binary fields similar to that shown in Figure 4.15. The means and
variances of the eroded ten realizations SQ are given in Table 4.4.
Table 4.4: Means and variances of 10 independent realizations SQ.
We can now consider different combinations of these ten realizations for retrieval of
the various conditional probabilities:
• information content: we obtained the individually conditioned probabilities
114 CHAPTER 4. APPLICATION TO BINARY DATA
P(A\B), P(A\C), P(A\D) from realization i = l,...,n = 10.
• proxy UQ values: for training we then used any realization j ^ i.
There is a total of n * (n — 1) = 90 possible combinations of the pair (actual vs.
training) realizations that can be used for the approximation of
P(A — 1|D,B,C) using the classified u0 approach. These estimates are then com
pared to the results of the u0 — 1 model.
Figure 4.26 shows the histograms of the means of the 90 reference P(A = 1|D,B,C)
values and their estimators based on classified u0 approach and on the u0 = 1 model.
The average of these 90 reference mean values is 0.399 (Figure 4.26 left). The respec
tive averages for the classified u0 approach and for the u0 — 1 model are 0.386 and
0.458 (Figure 4.26 right and center respectively). The u0 = 1 model leads to signifi
cant overestimation by over-compounding the individual probabilities. The classified
u0 approach reproduces well the mean value of the reference case. This similarity is
highly desirable property as it indicates that classified UQ approach is unbiased.
Figure 4.27 shows the histograms of the variances of the 90 reference
P(A = 1|D,B?(H) values and their estimators based on classified u0 approach and on
the u0 = 1 model. The average of these 90 reference variance values is 0.041 (Figure
4.27 left). The respective averages for the classified u0 approach and for the UQ = 1
model are to 0.041 and 0.076 (Figure 4.26 right and center respectively). The classi
fied UQ approach reproduces almost exactly the variance of the reference case.
reference UQ = 1
classified u0
mean 0.399 0.458 0.386
variance 0.041 0.076 0.041
Table 4.5: The average means and variances of P(A = 1|D,B,C) over 90 combinations.
Table 4.5 summarizes the average means and variances of the 90 reference fully con
ditioned probabilities P(A = 1|D,B,C) and the same statistics based on the UQ — 1
4.2. A 3D CASE STUDY 115
Exact v„ model
u a V ^ 0.15 O"
;n=90i • mean in [0.336, 0.<M m(mean) = 0.399
llll
6]
1 0.34 0.36 0.38 MO 0.42 0.44
reference (i)
0.42 D.44 0.46 0.48
v0 model (2)
classified v0 model lmean'into.118,0703] m(mean)>
0.2 03 0.4 0.5 0.6 0.
classified v. model (3)
Figure 4.26: The histograms of the means of the 90 reference P(A = 1|D,B,C) values (left), and their estimators based on the vo = 1 model (center), and classified VQ approach (right).
and the classified VQ approaches. The vo = 1 model over-compounds significantly the
elementary probabilities leading to a significant overestimation (bias). In contrast,
the classified vQ approach accounts better for the joint data interaction, and thus
decreases the over-compounding of information content and hence reduces the overall
bias.
116 CHAPTER 4. APPLICATION TO BINARY DATA
Exact n-90
- mean in [0.029,0.050]! m(mean) =0.041
0.03 0.033 0.40
reference (i)
n=90 mean in [0.046,0.096] m(mean) =0.076 :
classified v0 model
JU 0.05 040 0.07 0.00 0.09
v„ model (2)
am a.02 o.o3 a.a* o.os 0.06
classified y, model (3)
Figure 4.27: The histograms of the variances of the 90 reference P(A = 1|D,B,C) values (left), and their estimators based on the u0 — 1 model (center), and classified uo approach (right).
Chapter 5
Application to non-binary data
As was shown in Chapter 4, in presence of actual data dependence, the v0 = 1
model significantly outperforms the results associated with the traditional estimators
based on any data independence hypothesis. Estimators defined by the independence
assumptions could lead to illicit probabilities, e.g. greater than one. The u0 = 1
model guarantees licit probabilities regardless of the level of data dependence. In this
chapter, we generalize the nu model to the case of non binary variables with extensive
testing using a ternary variable data set.
5.1 A single constraint
Consider the evaluation of the posterior probability P(A = fc|D) for Vfc = 1,...,K
where k is the particular outcome of the unknown A. For example, category k could
indicate the presence/absence of a channel sand.
n
D = P| Di, where the conditioning information D is constituted of n elementary data
events D{.
117
118 CHAPTER 5. APPLICATION TO NON-BINARY DATA
Using the notations of Chapter 3, the distances to event A = k are written:
„(*) P(A # k)
P(A = k)
{k) P(A^k\Di) Xi ~ p{A = k\Diy * - 1 ' - ' n ^l>
x(k) = P(A ± k\D)
P(A = k\D)
The fully conditioned posterior probability P(A = fc|D) is then:
P(A = fe|D) = T ^ r ) (5.2)
These posterior probabilities must verify the law of total probabilities , whatever the
data set D*, i.e. for all x0, x^, and x:
K K
fc=l fc=l l i x*
(5.3)
For each category k, the nu expression from Chapter 3 is written:
x{k) J L . , , x (* )
(fe) 1 1 " * (fe) x0 i=l x0
or n n
4=1 i=\
where
x 0 x 0
The sequence-dependent interaction parameter v\ is written as:
P{Di\A^k£i-1)
P(Di\A)
5.1. A SINGLE CONSTRAINT 119
Note that the prior distances should verify the constraints:
3 These estimates verify the consistency relation J2 P(A = k\D\,D2) = 1-00.
fc=i Without the proposed constraint (5.11) such consistency would not be possible
leading to order violation problem and possibly to biased estimate of the fully
conditional posterior probability P(A = k\Di,D2).
5.2 Large non-Gaussian ternary case study
In Chapter 3, we suggested the data combination paradigm in which posterior prob
ability P(A = k\Di,...,Dn) is obtained by completely separating the single datum
information content through the n elementary probabilities P(A = k\Di) with i =
1,..., n and the data interaction through the nu interaction weights I/Q , k — 1,..., K.
The elementary probabilities should be evaluated from the actual data. The interac
tion weights can be obtained from a training data set providing proxies, or replicates,
of the data interaction. The constraint (5.11) on the K interaction weights I/Q ensures
5.2. LARGE NON-GAUSSIAN TERNARY CASE STUDY 129
that the resulting K fully conditioned posterior probability P*(A = k\D\, ...,Dn) are
all licit probability estimates.
The applicability of the proposed UQ ' inference paradigm is now tested using a large
3D reference ternary data set where all conditional probabilities involved in the nu
expression (3.6) are known, including the exact full data-conditioned probability
P(A = k\Di = di, i = l , . . . ,n) . Various approximations of that reference proba
bility can then be evaluated.
5.2.1 The reference data set
We start by generating a reasonably large 3D non-Gaussian field using the training
image generator code [56] of the SGEMS software [66]. This code generates various
geological structures using a non-iterative, unconditional Boolean simulation [36].
For this data set, we generated a ternary image with three mutually exhaustive cate
gories: category 1 for mud, category 2 for channel sand, and category 3 for fractures.
This 3D field is of size 100x100x50, comprising 500,000 nodes and yielding the refer
ence categorical field shown in Figure 5.1.
Denote that reference field by S : {A(u) = 1, 2, 3, u E S} with
P(A(u) = 1) = 0.67, P(A(u) = 2) = 0.23, and P(A(u) = 3) = 0.10 and 3
with: ^2 P(A(u) = k) = 1.00, and u denotes the location coordinates vector.
Figure 5.2 gives the reference indicator variograms in the x, y, z directions calculated
from indicator data from the top 35 layers of S for each of three categories k = 1,2,3;
the reason for excluding the bottom 15 layers will become apparent soon hereafter.
Those indicator variograms reflect the horizontal-to-vertical anisotropy of the original
categorical field.
130 CHAPTER 5. APPLICATION TO NON-BINARY DATA
Figure 5.1: Reference categorical image generated using a training image generator (the representation of the two categories A = 2 and A = 3 does not reflect their proportions).
5.2.2 The estimation configuration
Consider the evaluation of the conditional probability of an unsampled value A(u)=k,
given any combination of the following three multiple-point data events (Figure 5.3).
As seen from Figure 5.3, the closest data event D comprises four data locations at the
level just below that of A(u). These four data are at the corners of a square centered
on the projection of A(u) on their level. The next closest data event B comprises
also four data locations with the same geometry as for data event D, but located five
levels below that of A(u). The furthest away data event C again comprises four data
locations but located 15 levels below that of A(u).
If the unsampled location u of A(u) spans only the eroded field
5o = {x — 11, ...,90; y = 11, ...,90; z = 16, ..50} then each value A(u) can be eval
uated by any of the three data events D, B, C. From here on, all statistics will
refer to that "common denominator" field So comprising 224,000 nodes. Over that
5.2. LARGE NON-GAUSSIAN TERNARY CASE STUDY 131
k = 1 Mean: p = 0.636
Horizontal EW 0.3 -37
025 4'.'P
">: o.i5 -fY 0.1
O.QS - § • • * - ' - -'• 0 J .u . ' . - i .
"|iili|iiit|iiil|inijiiii[mr^n|li D 5 1016 2026 3035
fllnHIHIffr
0 5 1015 2025 3035 Vertical
0.3 • n -0.25 ! ' [ 0-2 !!<_•
0.15 I 0.1 |.Jfct.
0.05 " 3 * - 1 — i — i - T - t - n 0 - S . C - L . ' - _ ! _ . « _ J - J . J .
-! 1_ -«.. H » -* - *
0 S 10 IS 2020 3030
lag
k = 2 Mean: p = 0.249
Horizontal EW
k = 3 Mean: p = 0.115
Horizontal EW
1015 2025 3035
Horizontal NS
nil] HI i|ini| ini)iin] IIII)IIII)II'
0 5 1015 2026 3035
Horizontal NS
j7 HI j in i jn i7| in 7j» J i] u II |u nji i 0 5 1015 2025 3035
Vertical
i i i i i i i
f f T T T T T T T '
I I I I I I I I I I I I I I
r r T - r - r - r * r - r
-1— i— r - r - r - r " ' )l III) III fill lit HI lift lit lll^lll llll
0 5 1015 2025 3035
Vertical
Figure 5.2: Exhaustive indicator variograms in x, y and z directions, calculated over the 35 top layers for k — 1,2,3.
132 CHAPTER 5. APPLICATION TO NON-BINARY DATA
one data event D two data events D,B three data events D,B>C
Figure 5.3: Data events definition.
central field So, the marginal statistics (prior proportions) are: P(A — 1) = 0.636, 3
Figure 5.4: (1) Spatial distribution, (2) histogram of the conditional proportions P(A(u) = 1|D, B, C) defined over the reference eroded volume So, and (3) the reference categorical field with respective proportions.
• two data events-conditioned: P(A(u) = k\B, B), P(A(u) = k\D, C),
P(A(u) = fc|B, C), for all k.
• all three data events-conditioned: P(A(u) — k\D, B, C).
• the estimated probability P*(A(u = k)\D, B, C) using the VQ model (3.6) to
combine the previous single data-event conditioned probabilities.
Again the mean of these proportions will be equal to that of the reference categorical
field as expected since for all the above proportions we scan the same <So-volume.
When the model VQ — 1 is used, at each location u the estimate of the fully condi
tioned posterior probability is:
P*{A{u) = fc|D = d(u), B = b(u) , C = c(u)) l + z(fc)(u)
(5.18)
5.2. LARGE NON-GAUSSIAN TERNARY CASE STUDY 135
with the estimated distance x^(u) being such that:
# » ( u ) _ ^ fc )(u)xL fc )(u)4 fe )(u) Tn (k) (fe) (k) X0 Xn Xn' Xn
where x0 = — p , } _ ,\ is the marginal distance, with
(1) 1 - 0.6356 Xo = ^ 6 3 5 6 - = ° - 5 7 3 3
(2) _ 1 - 0-2490 _ X° " 0.2490 ~ 6 m b l
(3) _ 1-0.1154 _ X° ~ 0.1154 ~ 7 - b b 5 5
and JCD (u) = —p//tV \ _. I.|T>_ J / \\ is the distance to A(u) = k updated by the
sole data event D = d(u).
The distance xD (u) is obtained by scanning the reference image So with the template
definition shown in Figure 5.3(1) for the proportion of D-replicates identifying the
data values combination d(\i). Our estimation paradigm assumes that all elementary
conditional probabilities P(A = fc|D), P(A — k\B), P(A = k\C) are known. This
study addresses only the problem of combining these elementary probabilities into an
estimate of the fully conditioned probability P(A = fc|D, B, C) while accounting for
data interaction.
Similarly, from the training image one can retrieve the other two elementary distances
XB (u) and a4 (u). The VQ — 1 model (5.18) then provides an estimate of the fully
conditioned probability, P*(A(u) — fc|D, B, C). For example, the spatial distribution
and histogram of the estimates P*(A(u) = 1 | D,B,C) using the UQ=1 model are
shown in Figure 5.5(1).
These estimates are necessarily valued in the interval [0, 1]: their spatial mean is
0.635 with spatial variance 0.010. Their histogram is given in Figure 5.5(2). Com
paring Figure 5.5 to Figure 5.4, shows that the bias of VQ = 1 estimates relative to
Figure 5.5: (1) Spatial distribution, (2) histogram of the conditional probabilities P*(A(u) = 1|D,B,C) estimated with the model v^ = 1, and (3) categorical reference field with respective proportions.
the reference probabilities is small (compare their spatial mean of 0.635 and 0.636).
Note, however, that the estimates based on VQ = 1 model have the lesser spatial
variance (0.010 < 0.084) due to estimation smoothing effect.
The histogram and scattergram of the error defined as
P*(A(u) = 1|D, B, C) - P(A(u) = 1|D, B, C)
are shown in Figure 5.6. The correlation between the local probability estimate and
the actual true proportion is low (p = 0.34), leaving room for finding a better data
interaction parameter va ' different from 1.
Note that the Z/Q = 1 model cannot be extended to all k = 1,2,3. For example, we
calculated the estimates P*(A = 2 | D,B,C) and P*(A = 3 | D,B,C) using the nu
parameter value vQ = 1 and I/Q = 1. Figure 5.7 shows the histogram of the sum
5.2. LARGE NON-GAUSSIAN TERNARY CASE STUDY 137
u
cr
P*: m=0.635 &'• =0.010 P : m=0.636 a2=0.084
B m=0.0002 1 a2=0.0749
, , , , . , , , , . . . , , . „ , _ , . _ .
v0 model (1)
reference (2)
Figure 5.6: (1) Histogram of error P*(A = 1 | D,B,C) - P(A = 1 | D,B,C) and (2) the corresponding scatterplot of estimate P*(A = 1 | D,B,C) versus reference P ( A = 1 | D , B , C ) .
138 CHAPTER 5. APPLICATION TO NON-BINARY DATA
u 0.34
a> cr <V 0.2
0.1-
estimatcdiim =1.00034 exact:: m=l Data count: 224,001
Mean: 1.0003*
Variance: 4.86514
Maximum: 1.0962:
Upper quartile: 1.0029:
Median: 0.9996
Lower quartile: 0.99961
Minimum: 0.9207
- i — i | i — i i i | i i i i | — i — i — i —
0.95 1 1.05
summation
Figure 5.7: Histogram of ]T P*{A = k | D,B,C) . The v^ = 1 model cannot be k=\
3
extended to all categories k since the mean of J2 P*(A = k | D,B,C) > 1 which k=i
contradicts the general law of probabilities.
tk \ of the estimates ^2 P*(A(u) = /c|D, B, C) resulting from the model UQ ' = 1 V k.
fe=i The spatial mean is equal to 1.00034 which is slightly greater than one. Out of the 224,000 estimated values, 104,392 (that is about a half) were outside of the required
interval [0, 1].
In the previous runs, the single data event-conditioned probabilities P(A(u) — k\D),
P(A(\i) = k\B), and P(A(u) = fc|C) were set equal to the corresponding propor
tions over So. Each data event D, B or C can take 34 = 81 possible combinations
of data values. This small number of possible data value combinations ensures that
most likely all such 81 combinations are present in the training image with a num
ber of replicates greater than 10. This in turn ensures that the spatial estimates
P*(A = k | D,B,C) , such as the one given in Figure 5.5 based on the VQ = 1 model
5.2. LARGE NON-GAUSSIAN TERNARY CASE STUDY 139
are statistically significant. However, the statistical significance of the fully condi
tioned proportion P(A = k | D,B,C) shown in Figure 5.4 could be questioned. The
statistics shown in Figures 5.5 and 5.6 pool together the 224,000 estimated condi
tional probabilities over So, irrespective of the actual values of the conditioning data
values and the corresponding number of replicates. Note that there are 4 * 3 = 12
categorical (K = 3) indicator data grouped four by four into the three data events
D, B, and C; therefore there is a total of K12 = 312 = 531,441 possible data values
combinations. Out of the total of 531,441 possible data values combinations, 475,271
(89%) were not found in So and of the remaining 56,170 only 195 combinations were
found with at least 10 replicates. To ensure statistical significance we retain only the
latter 195 data combinations for future analysis.
5.2.4 Determining the Z/Q to ensure consistent probabilities
As was shown in Figure 5.7, the model v$ — 1 cannot be extended to all k = 1,2,3
since it may lead to inconsistent probabilities. To ensure consistency, the K weights
PQ ' must verify the single but non-linear relation (5.11). Consider the model Z/Q ' = (2)
UQ = 1, where the two first nu parameters are set to 1 indicating no-data-interaction
when evaluating A = 1 and A = 2. The third weight v$ is determined using the
The CI estimate (5.21) and the nu model estimated probability P*(A = fc|D,B,C)
as given by (5.20) are plotted against the So-exact proportion P(A — k\D,B,C) in
Figure 5.8 for each category k = 1, 2, 3.
In presence of strong data interaction (essentially between the two close-by data events
D ,B, and the unknown event A), the no-data-interaction model v$ = v$ ' = 1 out
performs significantly the estimator based on the conditional independence assump
tion. That can be seen from Figure 5.8 (bottom) which shows that for the 195 data
value combinations retained the correlation coefficients based on v^ = 1 and vQ = 1
are both equal to 0.67. While for the CI estimator these coefficients are 0.14 and 0.34
for k = 1 and k — 2, respectively (Figure 5.8, top panels). Also, for k — 1, the CI
estimator leads to illicit probabilities, i.e. probabilities that are greater than one (Fig
ure 5.8, top left panel). This inconsistency comes from the fact that out of 195 data
value combinations retained, the category k = 1 is more likely to be present in the
training image than the other two categories. This leads to assumption of conditional
independence to be more likely invalid when evaluating P(A = 1 | D,B,C) than
for the other two categories. For k = 3, the constrained v$ ' model also significantly
outperforms the estimator defined by the conditional independence with a correlation
coefficient equal to 0.43 versus 0.18 (Figure 5.8, top and bottom right panels).
Table 5.1 also shows that the estimator defined by the the nu model allows for a better
reproduction of the spatial mean and variance of the reference across the 195 data
value combinations. The conditional independence estimator tends to underestimate
the spatial mean and to overestimate the spatial variance.
5.2. LARGE NON-GAUSSIAN TERNARY CASE STUDY 141
"S" a. i u S 0.8
5 0.6 s o 53 °-4
k=l k=2 k=3
o u
• ;
f
H
f * 1
4 * : • •
: .;
••
•.! •* * :• " * - • •
••,*.r . • v< • t * •
• . *. * •
' P=0-;14 :
*. . . . . ,«.* . . . . . .?
* • * » • • A.. , « V , * * •
.*V-"rV/ • » • .. •
. . ** . . * .i».... . ..
*
p40.34
.....; ? . ^ w ; . •.^•..•| ,.
0.3 0.4 0.5 0.6 0.7 0.8
P=o.i8 ';
J | • i i i
• • * •
• • . *
V*,-A
. . j . . . ; i . r 1 ' i M • i '
0.1 0.2 0.3 0.4 0.5 0.1 0.15 0.2 0.25 0 J 0.35
0.8 -
del
o *=«-— II -> 0.5-
0.4 -
9:
,
•
w .
=o,67 ; : ; , : .: :. : :•
</ •: .-.'^r-" ' . ». . : « . • • : . . •
• . • . . . f
« ' • * ' : '•• • * • « - • • • - . " - • »
.: «, . : . : •
•• i . 1 • i 1 1 1 i 0.3 0.4 0.5 0.6 0.7 0.8
reference
0.5 -
0.4 -
0 .3 -
0.2-
0.1 •
p=0.67
' . • . . . . •:
- . - . , v ; • : . - • •
- p=0.43
0.1 0.2 0.3 0.4 0.5
reference
1 1 i . 1 1 1 1 . 1 1 1 1 1 1 1 1 1 1 .
0.1 0.15 0.2 0.25 0.3 0.35
reference
Figure 5.8: Top: the reference proportion P(A = k | D,B,C) (x-axis) versus the estimated proportion P(A = k | D,B,C) based on the conditional independence assumption (y-axis) for k = 1,2,3. Bottom: the reference P(A = k | D,B,C) (x-axis) versus the estimated P(A = k | D,B?C) based on the VQ ' model (y-axis) for k = 1,2,3.
142 CHAPTER 5. APPLICATION TO NON-BINARY DATA
k = l reference
^ = 1 conditional independence
k=2 reference
^ = 1 conditional independence
k=3 reference
vi3)
conditional independence
mean 0.61
0.66 0.54
mean 0.24
0.22 0.18
mean 0.15
0.12 0.096
variance 0.0161
0.0078 0.0574
variance 0.0117
0.0073 0.0174
variance 0.0030
0.0002 0.0023
Table 5.1: Summary statistics for k = 1, 2,3: spatial means and variances of reference conditional proportions and of approximations denned by the nu model and from the conditional independence assumption.
The v^ model based on equation (5.20) ensures that:
m (1) m ( 2 ) TO (3) , mA -1. Each axis represents the scores
respectively. Note, the available 195 data value combinations cluster into only 11 points in the 3D m^ space. The resulting classification of these 11 points is shown by different geometric shapes.
classes of similar scores by performing, for example, a k-mean algorithm [59] which (k)
partitions the points into classes. Each training class prototype VQ value is the
average of the proxy Z/Q values falling into that class. Figure 5.9 shows such a
classification. Each axis represents the scores m^\ m^2\ m^3\ respectively. Note,
the available 195 data value combinations cluster into only 11 points in the 3D m^
space. The resulting classification of these 11 points is shown by different geometric
shapes. In this case, there are four classes marked my stars, triangles, diamonds and
squares, respectively.
For each set of actual data values (i.e. of conditioning data values), we then look for
146 CHAPTER 5. APPLICATION TO NON-BINARY DATA
the training class (out of four possible) that is closest to the actual data statistics, and
use the corresponding class prototype v0 value (instead of VQ — 1) for building the
fully conditioned probability P(A = A;|D,B,C). To ensure consistent probabilities,
Had we used the original exact training I/Q values without any classification and
consequent averaging, the consistency relation (5.23) would have been met exactly.
5.2. LARGE NON-GAUSSIAN TERNARY CASE STUDY 147
reference reference reference
Figure 5.10: Scattergram of reference proportion P{A = fcjD,B,C) along x-axis versus estimate P*(A = fc|D,B,C) based on classified v^ ' model for k=l (left), k = 2 (center), k = 3 (right) along y-axis. The highest correlation is attributed to the categories k = 1 and k = 2. The smallest correlation is attributed to category k — 3 which has less spatial structure than the other two categories.
5.2.6 Inference robustness
To test the robustness of the inference paradigm proposed, we now consider a training
data set which is different from the reference data set. The reference data set pro
vides the conditional data and the exact conditional proportions, the training data
set provides the proxy interaction nu parameter values.
For this purpose we draw 50 new realizations, S®, I = 1, ...,50, utilizing the same
image generator code [56] previously used to draw the reference data set. These 50
training images are again of size 100x100x50. The average of the 50 eroded realiza
tions ( S f ) means are given in Table 5.3 for k = 1,2,3. The corresponding reference
means over SQ are also shown in that Table.
148 CHAPTER 5. APPLICATION TO NON-BINARY DATA
k = l reference
classified VQ ' model
k=2 reference
classified Z/Q model
k=3 reference:
constrained classified v$
mean 0.61
0.60
mean 0.24
0.22
mean 0.15
0.19
variance 0.0161
0.0107
variance 0.0117
0.0073
variance 0.0030
0.0011
Table 5.2: Summary statistics: spatial means and variances of reference conditional probabilities and of estimates based on a classified nu representation.
The single datum event conditioned probabilities P(A|B), P(A\C), P(A\D) are re
trieved from the reference data set So shown in Figure 5.4. The proxy i/Q values are
retrieved from all 50 training realizations (So ) pooled into a single inference pool.
k = l k=2 k=3
training 0.665 0.222 0.113
reference 0.636 0.240 0.115
Table 5.3: The average means of the 50 eroded training data sets for k = 1, 2, 3. For comparison, the right column shows reference means.
In Table 5.4, we compare the spatial mean and variance of the reference 195 probabil
ity values P( J4|D,B,C) to the corresponding spatial statistics of the estimate based
on the nu representation. We define classified proxy u^ ' model as the model based
on the 50 training images all different from the reference data set. In Table 5.5, the
correlation coefficients of 195 reference probability values P(A|D,B,C) with the es
timate based on the nu representation are given for k = 1,2,3.
5.2. LARGE NON-GAUSSIAN TERNARY CASE STUDY 149
k = l reference
classified proxy u^ model
k=2 reference
classified proxy UQ model
k=3 reference
constrained classified proxy v$ model
mean 0.61
0.61.
mean 0.24
0.22
mean 0.15
0.17
variance 0.0161
0.0092
variance 0.0117
0.0072
variance 0.0030
0.0005
Table 5.4: Summary statistics: spatial means and variances of the 195 reference conditional probabilities and of the estimates built from a nu representation.
Comparing these two tables with Figure 5.10 and Table 5.2, the classified proxy VQ
model appears quite robust since the two Tables provide very similar results. For
example, in Table 5.4 where we tend to overestimate (underestimate) the spatial
statistics, the similar trend can be observed in Table 5.2. This inference allows us
to conclude that no matter how approximative is the training image it significantly
improves the results provided by v$ — 1 model as such training image will provide
insight on data interaction seen the study field.
classified proxy VQ
k = l
0.69
k=2
0.66
k=3
0.52
Table 5.5: Correlations of 195 reference proportion values P(A|D,B,C) with estimates based on a classified nu representation.
The key lesson learned from these case studies is that we must check the assumptions
underlying any model (whether it is a no-data-interaction model or models based
on data independence). The applicability of each model ultimately depends on the
physics of the data. For example, if conditional independence is inappropriate (as
is often the case in geology-related applications) it should not be imposed for mere
150 CHAPTER 5. APPLICATION TO NON-BINARY DATA
convenience, as it might lead to large bias and various order relation violations.
Chapter 6
Summary and conclusions
6.1 Summary of major theoretical developments
This thesis addresses the problem of integrating diverse data sources while account
ing for the interaction between these data. We consider n data events Di,...,Dn that
inform the same unknown event A. Each of the n + 1 events can be very complex
involving multiple locations in time and/or space. As an example, the unknown A
could be indicative of the presence of channel sand connecting two wells. Data event
Di is the indicator of facies at these two wells. Data event D2 is the result of a
seismic survey providing " soft" probabilities of presence of channel at or around the
same two wells.
In this thesis, we assume that each of n + 1 events had been previously processed
providing the following probabilities:
1. prior probability P(A = a) available from historic data, and
2. datum-specific conditional probabilities P(A = a\Di = di). Each of these prob
abilities captures the specific information about the unknown event A brought
by the datum event Di taken alone. This step is crucial. Many algorithms
exist to process information brought by a single individual data event into such
conditioned probabilities P(A = a\Di = di), e.g. indicator kriging [34], various
151
152 CHAPTER 6. SUMMARY AND CONCLUSIONS
regressions including neural networks [3],[40], and [41] among others). However,
the task of obtaining the probabilities P(A = a\Di = d\) is out of the scope of
this thesis.
The goal of this thesis is to combine the prior probability P(A — a) and the n
individually conditioned probabilities P(A — a\Di = di) into an estimate or model of
the fully conditioned probability P(A = a\D\ = di,..., Dn — dn):