-
Nat. Hazards Earth Syst. Sci., 14, 2605–2626,
2014www.nat-hazards-earth-syst-sci.net/14/2605/2014/doi:10.5194/nhess-14-2605-2014©
Author(s) 2014. CC Attribution 3.0 License.
Bayesian network learning for natural hazard analyses
K. Vogel1,*, C. Riggelsen2, O. Korup1, and F. Scherbaum1
1Institute of Earth and Environmental Sciences, University of
Potsdam, Germany2Pivotal Software Inc., Palo Alto, USA* Invited
contribution by K. Vogel, recipient of the Outstanding Student
Poster (OSP) Award 2012.
Correspondence to:K. Vogel ([email protected])
Received: 14 August 2013 – Published in Nat. Hazards Earth Syst.
Sci. Discuss.: 22 October 2013Revised: 26 June 2014 – Accepted: 19
August 2014 – Published: 29 September 2014
Abstract. Modern natural hazards research requires deal-ing with
several uncertainties that arise from limited processknowledge,
measurement errors, censored and incompleteobservations, and the
intrinsic randomness of the govern-ing processes. Nevertheless,
deterministic analyses are stillwidely used in quantitative hazard
assessments despite thepitfall of misestimating the hazard and any
ensuing risks.
In this paper we show that Bayesian networks offer a flexi-ble
framework for capturing and expressing a broad range
ofuncertainties encountered in natural hazard assessments.
Al-though Bayesian networks are well studied in theory,
theirapplication to real-world data is far from straightforward,and
requires specific tailoring and adaptation of existing al-gorithms.
We offer suggestions as how to tackle frequentlyarising problems in
this context and mainly concentrate onthe handling of continuous
variables, incomplete data sets,and the interaction of both. By way
of three case studiesfrom earthquake, flood, and landslide
research, we demon-strate the method of data-driven Bayesian
network learning,and showcase the flexibility, applicability, and
benefits of thisapproach.
Our results offer fresh and partly counterintuitive in-sights
into well-studied multivariate problems of earthquake-induced
ground motion prediction, accurate flood damagequantification, and
spatially explicit landslide prediction atthe regional scale. In
particular, we highlight how Bayesiannetworks help to express
information flow and independenceassumptions between candidate
predictors. Such knowledgeis pivotal in providing scientists and
decision makers withwell-informed strategies for selecting adequate
predictorvariables for quantitative natural hazard assessments.
1 Introduction
Natural hazards such as earthquakes, tsunamis, floods,
land-slides, or volcanic eruptions have a wide range of
differingcauses, triggers, and consequences. Yet the art of
predict-ing such hazards essentially addresses very similar issues
interms of model design: the underlying physical processes areoften
complex, while the number of influencing factors islarge. The
single and joint effects of the driving forces arenot always fully
understood, which introduces a potentiallylarge degree of
uncertainty into any quantitative analysis. Ad-ditionally,
observations that form the basis for any inferenceare often sparse,
inaccurate and incomplete, adding yet an-other layer of
uncertainty. For example,Merz et al.(2013)point out the various
sources of uncertainty (scarce data, poorunderstanding of the
damaging process, etc.) in the contextof flood damage assessments,
whileBerkes(2007) calls at-tention to the overall complexity of
human–environment sys-tems, as well as the importance of
understanding underlyinguncertainties to improve resilience.
Similarly,Bommer andScherbaum(2005) discuss the importance of
capturing un-certainties in seismic hazard analyses to balance
between in-vestments in provisions of seismic resistance and
possibleconsequences in the case of insufficient resistance.
Nevertheless, deterministic approaches are still widelyused in
natural hazards assessments. Such approaches rarelyprovide
information on the uncertainty related to parame-ter estimates
beyond the use of statistical measures of dis-persion such as
standard deviations or standard errors aboutempirical means.
However, uncertainty is a carrier of infor-mation to the same
extent as a point estimate, and ignor-ing it or dismissing it as
simply an error may entail graveconsequences. Ignoring
uncertainties in quantitative hazard
Published by Copernicus Publications on behalf of the European
Geosciences Union.
-
2606 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
appraisals may have disastrous effects, since it often leads
toover- or underestimates of certain event magnitudes. Yet
de-terministic approaches persist as the state of the art in
manyapplications. For example, tsunami early warning
systemsevaluate pre-calculated synthetic databases and pick out
thescenario that appears closest to a given situation in order
toestimate its hazard (Blaser et al., 2011). Recently
developedmodels for flood damage assessments use classification
ap-proaches, where the event under consideration is assigned toits
corresponding class, and the caused damage is estimatedby taking
the mean damage of all observed events belongingto the same class
(Elmer et al., 2010). In seismic hazard anal-ysis the usage of
regression-based ground motion models iscommon practice,
restricting the model to the chosen func-tional form, which is
defined based on physical constrains(Kuehn et al., 2009).
In this paper we consider Bayesian networks (BNs), whichwe argue
are an intuitive, consistent, and rigorous way ofquantifying
uncertainties.Straub(2005) underlines the largepotential of BNs for
natural hazard assessments, herald-ing not only the ability of BNs
to model various inter-dependences but also their intuitive format:
the representa-tion of (in)dependences between the involved
variables in agraphical network enables improved understandings and
di-rect insights into the relationships and workings of a nat-ural
hazard system. The conditional relationships betweendependent
variables are described by probabilities, fromwhich not only the
joint distribution of all variables but anyconditional probability
distribution of interest can be derived.BNs thus endorse
quantitative analyses of specific hazardscenarios or
process-response chains.
In recent years, BNs have been used in avalanche risk
as-sessment (e.g.,Grêt-Regamey and Straub, 2006), tsunamiearly
warning (e.g.,Blaser et al., 2009, 2011), earthquakerisk management
(e.g.,Bayraktarli and Faber, 2011), proba-bilistic seismic hazard
analysis (e.g.,Kuehn et al., 2011), andearthquake-induced landslide
susceptibility (e.g.,Song et al.,2012). Aguilera et al.(2011) give
an overview of applica-tions of BNs in the environmental sciences
between 1990 and2010, and conclude that the potential of BNs
remains under-exploited in this field. This is partly because, even
thoughBNs are well studied in theory, their application to
real-worlddata is not straightforward. Handling of continuous
variablesand incomplete observations remains the key problem.
Thispaper aims to overcome these challenges. Our objective isto
briefly review the technique of learning BNs from data,and to
suggest possible solutions to implementation prob-lems that derive
from the uncertainties mentioned above. Weuse three examples of
natural hazard assessments to discussthe demands of analyzing
real-world data, and highlight thebenefits of applying BNs in this
regard.
In our first example (Sect.3), we develop a seismic groundmotion
model based on a synthetic data set, which servesto showcase some
typical BN properties. In this context wedemonstrate a method to
deal with continuous variables with-
Fig. 1. The figure shows the BN for the burglary example. The
graph structure illustrates the dependence
relations of the involved variables: The alarm can be triggered
by a burglary or earthquake. An earthquake
might be reported in the radio newscast. The joint distribution
of all variables can be decomposed into the
product of its conditionals accordingly: P (B,E,A,R) = P (B)P
(E)P (A|B,E)P (R|E)
Fig. 2. Illustration of a parent set in a BN.XPa(i) is the
parent set of Xi
(a) (b) (c)
Fig. 3. Working with continuous variables we have to make
assumptions about the functional form of the
probability distributions (gray), e.g. (a) exponential, (b)
normal, (c) uniform. Thus we restrict the distributions
to certain shapes that may not match reality. In contrast using
a discrete multinomial distribution (black), each
continuous distribution can be approximated and we avoid prior
restrictions on the shape. Rather the shape is
learned from the data by estimating the probability for each
interval.
30
Figure 1. The figure shows the BN for the burglary exam-ple. The
graph structure illustrates the dependence relations ofthe involved
variables: the alarm can be triggered by a burg-lary or earthquake.
An earthquake might be reported in theradio newscast. The joint
distribution of all variables can bedecomposed into the product of
its conditionals accordingly:P(B, E, A, R) = P(B)P (E)P (A|B,E)P
(R|E).
out any prior assumptions on their distributional family.
InSect. 4 we use data that were collected after the 2002
and2005/2006 floods in the Elbe and Danube catchments, Ger-many, to
learn a BN for flood damage assessments. This ex-ample is
emblematic of situations where data are incomplete,and requires a
treatment of missing observations, which canbe challenging in
combination with continuous variables.Our final example in Sect.5
deals with a regional landslidesusceptibility model for Japan,
where we investigate how thesame set of potential predictors of
slope stability may pro-duce nearly equally well performing, though
structurally dif-ferent, BNs that reveal important and often
overlooked vari-able interactions in landslide studies. This
application furtherillustrates the model uncertainty related to BN
learning.
2 Bayesian networks (BNs)
The probabilistic framework of BNs relies on the
theoremformulated by Reverend Thomas Bayes (1702–1761),
andexpresses how to update probabilities in light of new evi-dence
(McGrayne, 2011). By combining probability theorywith graph theory,
BNs depict probabilistic dependence re-lations in a graph: the
nodes of the graph represent the con-sidered random variables,
while (missing) edges between thenodes illustrate the conditional
(in)dependences between thevariables. Textbooks often refer to the
burglary alarm sce-nario for a simple illustration of BNs (Pearl,
1998). In thisexample, the alarm of your home may not only be
triggeredby burglary but also by earthquakes. Moreover,
earthquakeshave a chance to be reported in the news. Figure1 shows
thedependence relations of these variables as captured by a BN.Now,
imagine you get a call from your neighbor notifyingyou that the
alarm went off. Supposing the alarm was trig-gered by burglary, you
drive home. On your way home youhear the radio reporting a nearby
earthquake. Even thoughburglaries and earthquakes may be assumed to
occur inde-pendently, the radio announcement changes your belief in
theburglary, as the earthquake “explains away” the alarm. BNs
Nat. Hazards Earth Syst. Sci., 14, 2605–2626, 2014
www.nat-hazards-earth-syst-sci.net/14/2605/2014/
-
K. Vogel et al.: Bayesian network learning for natural hazard
analyses 2607
Table 1. Conditional probabilities in the burglary example,
giv-ing the conditional probabilities forearthquake(e), burglary
(b),alarm (a), andearthquake reported(r). The parameters that
definethe conditional distributions correspond for discrete
variables to theconditional (point) probabilities. Note that the
conditional probabil-ity values forno earthquake(e), no
burglary(b), etc. can be derivedfrom the fact that the conditionals
sum up to 1.
θe= p(e) = 0.001 θa|e,b = p(a|e, b) = 0.98θb = p(b) = 0.01
θa|e,b = p(a|e, b) = 0.95
θr|e= p(r|e) = 0.95 θa|e,b = p(a|e, b) = 0.95θr|e= p(r|e) =
0.001 θa|e,b = p(a|e, b) = 0.03
offer a mathematically consistent framework to conduct
andspecify reasonings of such kind. A detailed introduction toBNs
is provided inKoller and Friedman(2009) andJensenand Nielsen(2001),
whileFenton and Neil(2012) offers easyand intuitive access. In this
paper we restrict ourselves to sev-eral key aspects of the BN
formalism.
2.1 Properties and benefits
Applying BNs to natural hazard assessments, we define
thespecific variables of the hazard domain to be the nodes in aBN.
In the following we denote this set of random variablesasX = {X1, .
. . , Xk}. The dependence relations between thevariables are
encoded in the graph structure, generating a di-rected acyclic
graph (DAG). The directions of the edges de-fine the flow of
information, but do not necessarily indicatecausality. As we shall
see in subsection “Learned ground mo-tion model” of Sect. 3.2, it
may prove beneficial to directedges counterintuitively in order to
fulfill regularization con-straints. The set of nodes from which
edges are directed to aspecific node,Xi , is called the parent
set,XPa(i), of Xi (seeFig. 2). Table2 summarizes the notations used
in this paper.
Apart from the graph structure, a BN is defined by con-ditional
probabilities that specify the dependence relationsencoded in the
graph structure. The conditional probabilitydistribution for each
variable,Xi , is given conditioned on itsparent set:p
(Xi |XPa(i)
). For simplification we restrict our-
selves here to discrete variables for whichθ is the set of
con-ditional (point) probabilities for each combination of
statesfor Xi andXPa(i): θ = {θxi |xPa(i) = p(xi |xPa(i))}. The
condi-tional probabilities for the burglary BN example are given
inTable1. For continuous variables, the design of the param-eters
depends on the family of distributions of the
particulardensitiesp(·|·).
Given the BN structure (DAG) and parameters (θ ), it fol-lows
from the axioms of probability theory that the joint dis-tribution
of all variables can be factorized into a product ofconditional
distributions:
P(X|DAG, θ) =k∏
i=1
p(Xi |XPa(i)
). (1)
Fig. 1. The figure shows the BN for the burglary example. The
graph structure illustrates the dependence
relations of the involved variables: The alarm can be triggered
by a burglary or earthquake. An earthquake
might be reported in the radio newscast. The joint distribution
of all variables can be decomposed into the
product of its conditionals accordingly: P (B,E,A,R) = P (B)P
(E)P (A|B,E)P (R|E)
Fig. 2. Illustration of a parent set in a BN.XPa(i) is the
parent set of Xi
(a) (b) (c)
Fig. 3. Working with continuous variables we have to make
assumptions about the functional form of the
probability distributions (gray), e.g. (a) exponential, (b)
normal, (c) uniform. Thus we restrict the distributions
to certain shapes that may not match reality. In contrast using
a discrete multinomial distribution (black), each
continuous distribution can be approximated and we avoid prior
restrictions on the shape. Rather the shape is
learned from the data by estimating the probability for each
interval.
30
Figure 2. Illustration of a parent set in a BN.XPa(i) is the
parentset ofXi .
Further, applying Bayes theorem,P(A|B) = P(A,B)P (B)
=
P(B|A)P (A)P (B)
, each conditional probability of interest can bederived. In
this way a BN is characterized by many attractiveproperties that we
may profit from in a natural hazard setting,including the following
properties:
– Property 1 – graphical representation: the interactionsof the
variables of the entire “system” are encoded in theDAG. The BN
structure thus provides information aboutthe underlying processes
and the way various variablescommunicate and share “information” as
it is propa-gated through the network.
– Property 2 – use prior knowledge: the intuitive
inter-pretation of a BN makes it possible to define theBN based on
prior knowledge; alternatively it may belearned from data, or even
a combination of the two(cast as Bayesian statistical problem) by
posing a priorBN and updating it based on observations (see
belowfor details).
– Property 3 – identify relevant variables: by learning theBN
from data we may identify the variables that are(according to the
data) relevant; “islands” or isolatedsingle unconnected nodes
indicate potentially irrelevantvariables.
– Property 4 – capture uncertainty: uncertainty can eas-ily be
propagated between any nodes in the BN; we ef-fectively compute or
estimate probability distributionsrather than single-point
estimates.
– Property 5 – allow for inference: instead of
explicitlymodeling the conditional distribution of a
predefinedtarget variable, the BN captures the joint distribution
ofall variables. Via inference, we can express any givenor all
conditional distribution(s) of interest, and reasonin any direction
(including forensic and inverse reason-ing): for example, for a
given observed damage we mayinfer the likely intensity of the
causing event. A detailedexample for reasoning is given in
Sect.4.3.
www.nat-hazards-earth-syst-sci.net/14/2605/2014/ Nat. Hazards
Earth Syst. Sci., 14, 2605–2626, 2014
-
2608 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
Table 2.Summary of notations used in this paper.
Notation Meaning
Xi a specific variablexi a realization ofXiX = {X1, . . . , Xk}
set of the considered variablesXPa(i) parent set ofXixPa(i) a
realization of the parent setX−Y all variables butYDAG directed
acyclic graph (graph structure)p(Xi |XPa(i)) conditional
probability of a variable conditioned on its parent setθxi |xPa(i)
parameter that defines the probability forxi givenxPa(i)
θ ={θxi |xPa(i)
}set of model parameters that defines the conditional
distributions
2 random variable for the set of model parametersBN: (DAG, θ )
Bayesian network, defined by the pair of structure and parametersd
discrete/discretized data set that is used for BN learningdc
(partly) continuous data set that is used for BN learning3
discretization that bins the original datadc into dXMB(i) set of
variables that form the Markov blanket ofXi (Sect.4.2)Ch(i)
variable indices of the children ofXi (Sect.4.2)
Note that inference in BNs is closed under restric-tion,
marginalization, and combination, allowing forfast (close to
immediate) and exact inference.
– Property 6 – use incomplete observations: during pre-dictive
inference (i.e., computing a conditional distribu-tion), incomplete
observations of data are not a problemfor BNs. By virtue of the
probability axioms, it merelyimpacts the overall uncertainty
involved.
In the following we will refer to these properties 1–6 inorder
to clarify what is meant. For “real-life” modeling prob-lems,
including those encountered in natural hazard analysis,adhering
strictly to the BN formalism is often a challeng-ing task. Hence,
the properties listed above may seem undulytheoretical. Yet many
typical natural hazard problems can beformulated around BNs by
taking advantage of these proper-ties. We take a data-driven stance
and thus aim to learn BNsfrom collected observations.
2.2 Learning Bayesian networks
Data-based BN learning can be seen as an exercise in findinga BN
which, according to the decomposition in Eq. (1), couldhave been
“responsible for generating the data”. For this wetraverse the
space of BNs (Castelo and Kocka, 2003) look-ing for a candidate
maximizing a fitness score that reflectsthe “usefulness” of the BN.
This should however be donewith careful consideration to the issues
always arising in thecontext of model selection, i.e.,
over-fitting, generalization,etc. Several suggestions for BN
fitness scoring are derivedfrom different theoretical principles
and ideas (Bouckaert,1995). Most of them are based on the maximum
likelihoodestimation for different DAG structures according to Eq.
(1).
In this paper we opt for a Bayesian approach to learn BNs(note
that BNs are not necessarily to be interpreted froma Bayesian
statistical perspective). Searching for the mostprobable BN, (DAG,θ
), given the observed data,d, we aimto maximize the BN MAP
(Bayesian network maximum aposteriori) score suggested
byRiggelsen(2008):
P(DAG,2|d)︸ ︷︷ ︸posterior
∝ P(d|DAG,2)︸ ︷︷ ︸likelihood
P(2,DAG)︸ ︷︷ ︸prior
. (2)
The likelihood term decomposes according to Eq. (1). Theprior
encodes our prior belief in certain BN structures and pa-rameters.
This allows us to assign domain specific prior pref-erences to
specific BNs before seeing the data (Property 2)and thus to
compensate for sparse data, artifacts, bias, etc.In the following
applications we use a non-informative prior,which nevertheless
fulfills a significant function. Acting asa penalty term, the prior
regularizes the DAG complexityand thus avoids over-fitting.
Detailed descriptions for priorand likelihood term are given in
AppendixA1 andRiggelsen(2008).
The following section illustrates the BN formalism “in ac-tion”
and will also underscore some theoretical and practi-cal problems
along with potential solutions in the context ofBN learning. We
will learn a ground motion model, whichis used in probabilistic
seismic hazard analysis, as a BN; thedata are synthetically
generated. Subsequently, we considertwo other natural hazard
assessments where we learn BNsfrom real-world data.
Nat. Hazards Earth Syst. Sci., 14, 2605–2626, 2014
www.nat-hazards-earth-syst-sci.net/14/2605/2014/
-
K. Vogel et al.: Bayesian network learning for natural hazard
analyses 2609
Table 3. Variables used in the ground motion model and the
corresponding distributions used for the generation of the
synthetic data setwhich is used for BN learning.
Xi Description Distribution[range]
Predictors
M Moment magnitude of the earthquake U[5,7.5]R Source-to-site
distance Exp[1km,200km]SD Stress released during the earthquake
Exp[0bar,500bar]Q0 Attenuation of seismic wave amplitudes in deep
layers Exp[0s−1,5000s−1]κ0 Attenuation of seismic wave amplitudes
near the surface Exp[0s,0.1s]VS30 Average shear-wave velocity in
the upper 30 m U[600ms−1,2800ms−1]
Ground motion parameter
PGA Horizontal peak ground acceleration According to the
stochastic model(Boore, 2003)
3 Seismic hazard analysis: ground motion models
When it comes to decision making on the design of
high-riskfacilities, the hazard arising from earthquakes is an
impor-tant aspect. In probabilistic seismic hazard analysis
(PSHA)we calculate the probability of exceeding a specified
groundmotion for a given site and time interval. One of the most
crit-ical elements in PSHA, often carrying the largest amount
ofuncertainty, is the ground motion model. It describes the
con-ditional probability of a ground motion parameter,Y ,
suchas(horizontal) peak ground acceleration, given earthquake-and
site-related predictor variables,X−Y . Ground motionmodels are
usually regression functions, where the func-tional form is derived
from expert knowledge and the groundmotion parameter is assumed to
be lognormally distributed:ln Y = f (X−Y ) + �, with � ∼N (0, σ 2).
The definition ofthe functional form off (·) is guided by physical
model as-sumptions about the single and joint effects of the
differentparameters, but also contains some ad hoc elements
(Kuehnet al., 2011). Using the Bayesian network approach there isno
prior knowledge required per se, but if present it can beaccounted
for by encoding it in the prior term of Eq. (2).If no reliable
prior knowledge is available, we work witha non-informative prior,
and the learned graph structure pro-vides insight into the
dependence structure of the variablesand helps in gaining a better
understanding of the underlyingmechanism (Property 1). Modeling the
joint distribution ofall variables,X = {X−Y ,Y }, the BN implicitly
provides theconditional distributionP(Y |X−Y , DAG, 2), which
givesthe probability of the ground motion parameter for
specificevent situations needed for the PSHA (Property 5).
3.1 The data
The event situation is described by the predictor variablesX−Y =
{M, R, SD, Q0, κ0, VS30}, which are explained inTable 3. We
generate a synthetic data set consisting of
Fig. 1. The figure shows the BN for the burglary example. The
graph structure illustrates the dependence
relations of the involved variables: The alarm can be triggered
by a burglary or earthquake. An earthquake
might be reported in the radio newscast. The joint distribution
of all variables can be decomposed into the
product of its conditionals accordingly: P (B,E,A,R) = P (B)P
(E)P (A|B,E)P (R|E)
Fig. 2. Illustration of a parent set in a BN.XPa(i) is the
parent set of Xi
(a) (b) (c)
Fig. 3. Working with continuous variables we have to make
assumptions about the functional form of the
probability distributions (gray), e.g. (a) exponential, (b)
normal, (c) uniform. Thus we restrict the distributions
to certain shapes that may not match reality. In contrast using
a discrete multinomial distribution (black), each
continuous distribution can be approximated and we avoid prior
restrictions on the shape. Rather the shape is
learned from the data by estimating the probability for each
interval.
30
Figure 3. When working with continuous variables, we have tomake
assumptions about the functional form of the probability
dis-tributions (gray), e.g.,(a) exponential,(b) normal, and(c)
uniform.Thus we restrict the distributions to certain shapes that
may notmatch reality. In contrast, using a discrete multinomial
distribution(black), each continuous distribution can be
approximated and weavoid prior restrictions on the shape. Rather
the shape is learnedfrom the data by estimating the probability for
each interval.
10 000 records. The ground motion parameter,Y , is the
hor-izontal peak ground acceleration (PGA). It is generated bya
so-calledstochastic modelwhich is described in detail
byBoore(2003). The basic idea is to distort the shape of a ran-dom
time series according to physical principles and thus toobtain a
time series with properties that match the ground-motion
characteristics. The predictor variables are either uni-form (U) or
exponentially (Exp) distributed within a particu-lar interval (see
Table3).
The stochastic model does not have good analytical prop-erties,
and its usage is non-trivial and time consuming.Hence, surrogate
models, which describe the stochasticmodel in a more abstract sense
(e.g., regressions), are usedin PSHA instead. We show that BNs may
be seen as a viablealternative to the classical regression
approach. However, be-fore doing so, we need to touch upon some
practical issuesarising when learning BNs from continuous data.
For continuous variables we need to define the distri-butional
family for the conditionalsp(·|·) and thus makeassumptions about
the functional form of the distribu-tion. To avoid such assumptions
and “let the data speak”,we discretize the continuous variables,
thus allowing for
www.nat-hazards-earth-syst-sci.net/14/2605/2014/ Nat. Hazards
Earth Syst. Sci., 14, 2605–2626, 2014
-
2610 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
Fig. 4.:::::::::::Representation
::of
:::the
::::::::dependency
::::::::::assumptions
::in
::the
::::::::::discretization
::::::::approach:
:::The
:::::::::dependency
:::::::relations
:of
:::the
:::::::variables
:::are
:::::::captured
::by
::::their
::::::discrete
:::::::::::representations
:::::(gray
:::::shaded
:::::area).
::A
::::::::continuous
:::::::variable,
::::Xci ,
::::::depends
::::only
::on
::its
::::::discrete
:::::::::counterpart,
:::Xi.
Fig. 5. A:::For
:::the
::::::::::discretization
:::::::approach
::::each multivariate continuous distribution (a) can be
::is characterized
by a discrete distribution that captures the dependence
relations (b) and a continuous uniform distribution over
each grid cell (c): Assume .:::For
::::::::::::exemplification
::::::assume we consider two dependent, continuous variables
Xc1
and Xc2 . (a) shows a possible realization of a corresponding
sample. According to Monti and Cooper (1998)
we now assume, that we can find a discretization, such that the
resulting discretized variables X1 and X2
capture the dependence relation between Xc1 and Xc2 . This is
illustrated by (b), where the shading of the grid
cells corresponds to their probabilities:::::(which
:::are
::::::defined
::by
:::θ). A darker color means, that we expect more
realizations in this grid cell. Further we say, that within each
grid cell the realizations are uniformly distributed,
as illustrated in (c).
31
Figure 4. Representation of the dependency assumptions in
thediscretization approach: the dependency relations of the
variablesare captured by their discrete representations
(gray-shaded area). Acontinuous variable,Xc
i, depends only on its discrete counterpart,
Xi .
completely data-driven and distribution-free learning
(seeFig.3). In the following subsection we describe an
automaticdiscretization, which is part of the BN learning procedure
andtakes the dependences between the single variables into
ac-count. However, the automatic discretization does not
neces-sarily result in a resolution that matches the requirements
forprediction purposes or decision support. To increase the
po-tential accuracy of predictions, we approximate, once the
net-work structure is learned, the continuous conditionals
withmixtures of truncated exponentials(MTE), as suggested byMoral
et al.(2001). More on this follows in Sect.3.3.
3.2 Automatic discretization for structure learning
The range of existing discretization procedures differs intheir
course of action (supervised vs. unsupervised, globalvs. local,
top-down vs. bottom-up, direct vs. incremental,etc.), their speed
and their accuracy.Liu et al. (2002) pro-vide a systematic study of
different discretization techniques,while Hoyt (2008) concentrates
on their usage in the contextwith BN learning. The choice of a
proper discretization tech-nique is anything but trivial as the
different approaches resultin different levels of information loss.
For example, a dis-cretization conducted as a pre-processing step
to BN learningdoes not account for the interplay of the variables
and oftenmisses information hidden in the data. To keep the
informa-tion loss small, we use a multivariate discretization
approachthat takes the BN structure into account. The
discretizationis defined by a set of interval boundary points for
all vari-ables, forming a grid. All data points of the original
contin-uous (or partly continuous) data set,dc, that lie in the
samegrid cell, correspond to the same value in the discretized
dataset,d. In a multivariate approach, the “optimal”
discretiza-tion, denoted by3, depends on the structure of the BN
andthe observed data,dc. Similar to Sect.2.2, we again cast
theproblem in a Bayesian framework searching for the combin-ation
of (DAG,θ , 3) that has the highest posterior probabil-ity given
the data,
Fig. 4.:::::::::::Representation
::of
:::the
::::::::dependency
::::::::::assumptions
::in
::the
::::::::::discretization
::::::::approach:
:::The
:::::::::dependency
:::::::relations
:of
:::the
:::::::variables
:::are
:::::::captured
::by
::::their
::::::discrete
:::::::::::representations
:::::(gray
:::::shaded
:::::area).
::A
::::::::continuous
:::::::variable,
::::Xci ,
::::::depends
::::only
::on
::its
::::::discrete
:::::::::counterpart,
:::Xi.
Fig. 5. A:::For
:::the
::::::::::discretization
:::::::approach
::::each multivariate continuous distribution (a) can be
::is characterized
by a discrete distribution that captures the dependence
relations (b) and a continuous uniform distribution over
each grid cell (c): Assume .:::For
::::::::::::exemplification
::::::assume we consider two dependent, continuous variables
Xc1
and Xc2 . (a) shows a possible realization of a corresponding
sample. According to Monti and Cooper (1998)
we now assume, that we can find a discretization, such that the
resulting discretized variables X1 and X2
capture the dependence relation between Xc1 and Xc2 . This is
illustrated by (b), where the shading of the grid
cells corresponds to their probabilities:::::(which
:::are
::::::defined
::by
:::θ). A darker color means, that we expect more
realizations in this grid cell. Further we say, that within each
grid cell the realizations are uniformly distributed,
as illustrated in (c).
31
Figure 5. For the discretization approach each multivariate
contin-uous distribution(a) is characterized by a discrete
distribution thatcaptures the dependence relations(b) and a
continuous uniform dis-tribution over each grid cell(c). For
exemplification assume we con-sider two dependent, continuous
variables:Xc1 andX
c2. (a) shows a
possible realization of a corresponding sample. According
toMontiand Cooper(1998) we now assume that we can find a
discretization,such that the resulting discretized variablesX1
andX2 capture thedependence relation betweenXc1 andX
c2. This is illustrated by(b),
where the shading of the grid cells corresponds to their
probabilities(which are defined byθ ). A darker color means that we
expect morerealizations in this grid cell. Further, we say that,
within each gridcell, the realizations are uniformly distributed,
as illustrated in(c).
P(DAG, 2, 3|dc
)︸ ︷︷ ︸posterior
∝ P(dc|DAG, 2, 3
)︸ ︷︷ ︸likelihood
P (DAG, 2, 3)︸ ︷︷ ︸prior
. (3)
Let us consider the likelihood term: expanding on an idea
byMonti and Cooper(1998), we assume that all communica-tion/flow of
information between the variables can be cap-tured by their
discrete representations (see Fig.4) and is de-fined by the
parametersθ . Thus only the distribution of thediscrete datad
depends on the network structure, while thedistribution of the
continuous datadc is, for givend, inde-pendent of the DAG (see
Figs.4 and5). Consequently thelikelihood for observingdc (for a
given discretization, net-work structure and parameters) can be
written as
P(dc|DAG, 2, 3
)=P
(dc|d, 3
)P (d|DAG, 2, 3) (4)
and Eq. (3) decomposes into
P(DAG, 2, 3|dc
)∝ P
(dc|d,3
)︸ ︷︷ ︸continuous data
P(d|DAG, 2, 3)︸ ︷︷ ︸likelihood (discrete)
P(DAG, 2, 3)︸ ︷︷ ︸prior
.
Nat. Hazards Earth Syst. Sci., 14, 2605–2626, 2014
www.nat-hazards-earth-syst-sci.net/14/2605/2014/
-
K. Vogel et al.: Bayesian network learning for natural hazard
analyses 2611
Figure 6. Theoretic BN for the ground motion model. It
capturesthe known dependences of the data-generating model.
The likelihood (discrete) term is now defined as for the
sep-arate BN learning for discrete data (Sect.2.2), and we use
anon-informative prior again. For the continuous data, we as-sume
that all continuous observations within the same inter-val defined
by3 have the same probability (Fig.5). Moreinformation about the
score definition can be found in theAppendixA1, and technical
details are given inVogel et al.(2012, 2013). In the following we
discuss the BN and dis-cretization learned from the synthetic
seismic data set.
Learned ground motion model
Since we generated the data ourselves, we know
which(in)dependences the involved variables should adhere to;
thisis expected to be reflected in the BN DAG we learn from
thesynthetic data (Property 1, 3). Due to data construction,
thepredictor variablesM, R, SD,Q0, κ0, andVS30 are indepen-dent of
each other and PGA depends on the predictors. Fig-ure 6 shows the
dependence structure of the variables. Theconverging edges at PGA
indicate that the predictors becomeconditionally dependent for a
given PGA. This means that,for a given PGA, they carry information
about each other;for example, for an observed large PGA value, a
small stressdrop indicates a close distance to the earthquake. The
knowl-edge about the dependence relations gives the opportunityto
use the seismic hazard application for an inspection of theBN
learning algorithm regarding the reconstruction of the de-pendences
from the data, which is done in the following.
The network that we found to maximizeP(DAG, 2, 3|dc) for the 10
000 synthetic seismicdata records is shown in Fig.7. The
corresponding dis-cretization that was found is plotted in Fig.8,
which showsthe marginal distributions of the discretized
variables.The learned BN differs from the original one, mainlydue
to regularization constraints as we will explain in thefollowing:
as mentioned in Sect.2, the joint distribution
Figure 7. BN for the ground motion model learned from the
gen-erated synthetic data. It captures the most dominant
dependences.Less distinctive dependences are neglected for the sake
of parameterreduction.
of all variables can be decomposed into the product of
theconditionals according to the network structure (see Eq.1).For
discrete/discretized variables, the number of parametersneeded for
the definition ofp(Xi |XPa(i)) in Eq. (1) corre-sponds to the
number of possible state combinations for (Xi ,XPa(i)). Taking the
learned discretization shown in Fig.8,the BN of the data-generating
process (Fig.6) is definedby 3858 parameters, 3840 needed alone for
the descriptionof p(PGA|M, R, SD, Q0, κ0, VS30). A determination
ofthat many parameters from 10 000 records would lead toa strongly
over-fitted model. Instead we learn a BN thatcompromises between
model complexity and its abilityto generate the original data. The
BN learned under theserequirements (Fig.7) consists of only 387
parameters andstill captures the most relevant dependences.
Figure 9 shows the ln PGA values of the data set plot-ted
against the single predictors. A dependence on stressdrop (SD) and
distance (R) is clearly visible. These are alsothe two variables
with remaining converging edges on PGA,revealing that, for a given
PGA, SD contains informationaboutR and vice versa. The dependences
between PGA andthe remaining predictors are much less distinctive,
such thatthe conditional dependences between the predictors are
neg-ligible and the edges can be reversed for the benefit of
pa-rameter reduction. The connection toVS30 is neglected
com-pletely, since its impact on PGA is of minor interest com-pared
to the variation caused by the other predictors.
Note that the DAG of a BN actually maps the indepen-dences (not
the dependences) between the variables. Thismeans that each
(conditional) independence statement en-coded in the DAG must be
true, while encoded dependencerelations must not hold per se (see
Fig.10 for explanation).In turn this implies that each dependence
holding for thedata should be encoded in the DAG. The learning
approach
www.nat-hazards-earth-syst-sci.net/14/2605/2014/ Nat. Hazards
Earth Syst. Sci., 14, 2605–2626, 2014
-
2612 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
Fig. 8. Marginal distribution of the variables included in the
ground motion model, discretized according to the
discretization learned for the BN in Fig. 7. The number of
intervals per variable ranges from 2 to 8.
33
Figure 8. Marginal distribution of the variables included in the
ground motion model, discretized according to the discretization
learned forthe BN in Fig.7. The number of intervals per variable
ranges from 2 to 8.
applied here fulfills the task quite well, detecting the
rele-vant dependences, while keeping the model complexity at
amoderate level.
The model complexity depends not only on the DAG butalso on the
discretization. A complex DAG will enforce asmall number of
intervals, and a large number of intervalswill only be chosen for
variables with a strong influence onother variables. This effect is
also visible for the learned dis-cretization (Fig.8). PGA is split
into eight intervals, distanceand stress drop into four and five,
respectively, and the othervariables consist of only two to three
intervals.
3.3 Approximation of continuous distributions withmixtures of
exponentials (MTEs)
A major purpose of the ground motion model is the predic-tion of
the ground motion (ln PGA) based on observationsof the predictors;
hence, although the BN captures the jointdistribution (Property 5)
of all involved variables, the focusin this context is on a single
variable. The accuracy of theprediction is limited by the
resolution of the discretizationlearned for the variable. For the
BN shown above, the dis-cretization of the target variable into
eight intervals enablesa quite precise approximation of the
continuous distribution,but this is not the case per se. Complex
network structuresand smaller data sets used for BN learning lead
to a coarserdiscretization of the variables. To enable precise
estimates,we may search for alternative approximations of the (or
atleast some, in particular the primary variable(s) of
interest)
continuous conditional distributions once the BN has
beenlearned.
Moral et al.(2001) suggest using MTEs for this purpose,since
they allow for the approximation of a variety of func-tional shapes
with a limited number of parameters (Langsethand Nielsen, 2008) and
they are closed under the opera-tions used for BN inference:
restriction, combination, andmarginalization (Langseth et al.,
2009). The basic idea isto approximate conditional
distributionsp(Xi |XPa(i)) with acombination/mixture of truncated
exponential distributions.For this purpose the domain(Xi ,XPa(i))
is partitioned intohypercubesD1, . . . ,DL, and the density within
each hyper-cube,Dl , is defined such that it follows the form
p↓Dl(Xi |XPa(i)
)= a0 +
J∑j=1
aj ebj Xi+c
Tj XPa(i) . (5)
The determination of the hypercubes and the number of
ex-ponential terms in each hypercube as well as the estimationof
the single parameters is done according to the maximumlikelihood
approach described inLangseth et al.(2010). Inthe following we show
how the MTE approximation im-proves the BN prediction performance
compared to the us-age of the discretized variables, and we compare
the resultsto those from a regression approach.
Prediction performance
We conduct a 10-fold cross validation to evaluate the
pre-diction performance of the BN compared to the regression
Nat. Hazards Earth Syst. Sci., 14, 2605–2626, 2014
www.nat-hazards-earth-syst-sci.net/14/2605/2014/
-
K. Vogel et al.: Bayesian network learning for natural hazard
analyses 2613
Fig. 9. The single figures show the dependences between the
predictor variables M,R,SD,Q0,κ0,VS30 and
the target variable lnPGA by plotting the data used to learn the
BN for ground motion modeling.
(a) P (B)P (E)P (A|B,E)P (R|E) (b) P (B)P (A|B)P (E|A,B)P
(R|E)
Fig. 10. The graph structure of a BN dictates, how the joint
distribution of all variables decomposes into a
product of conditionals. Thus for a valid decomposition each
independence assumption mapped into the BN
must hold. Usually this applies to a variety of graphs, i.e. the
complete graph is always a valid independence
map as it does not make any independence assumption. (a) and (b)
show two valid BN structures and the
corresponding decompositions for the burglary example. The
independence assumptions made in both BNs
hold, however (b) does not capture the independence between
earthquakes and burglaries. An independence
map that maps all independences (a) is called a perfect map, yet
perfect maps do not exist for all applications.
Besides, for parameter reduction it might be beneficial to work
with an independence map that differs from the
perfect map.
34
Figure 9. The individual panels show the dependences between the
predictor variablesM, R, SD,Q0, κ0, andVS30 and the target
variableln PGA by plotting the data used to learn the BN for ground
motion modeling.
approach: the complete data set is divided into 10
disjointsubsamples, of which one is defined as a test set in each
trialwhile the others are used to learn the model (regression
func-tion or BN). The functional form of the regression function
isdetermined by expert knowledge based on the description ofthe
Fourier spectrum of seismic ground motion and followsthe form
f (X−Y ) =a0 + a1M + a2M · lnSD+ (a3 + a4M)
ln√
a25 + R2 + a6κR + a7VS30+ a8 lnSD,
with κ = κ0 + t∗, t∗ = RQ0Vsq andVsq= 3.5 km s−1.
We compare the regression approach in terms of predic-tion
performance to the BN with discretized variables andwith MTE
approximations. For this purpose we determinethe conditional
density distributions of ln PGA given the pre-dictor variables for
each approach and consider how muchprobability it assigns to the
real ln PGA value in each ob-servation. For the regression approach
the conditional den-sity follows a normal distribution,N (f (X−Y ),
σ 2), while itis defined via the DAG and the parametersθ using the
BNmodels. Table4a shows for each test set the conditional den-sity
value of the observed ln PGA averaged over the individ-
ual records. Another measure for the prediction performanceis
the mean squared error of the estimates for ln PGA (Ta-ble 4b).
Here the point estimate for ln PGA is defined as themean value of
the conditional density. For example, in theregression model the
estimate corresponds tof (x−Y ).
Even though the discretization of ln PGA is relative
preciseusing the discrete BNs (eight intervals in each trial,
exceptfor the first trial, where ln PGA is split into seven
intervals),the MTE approximation of the conditional distributions
im-proves the prediction performance of the BN. Still, it doesnot
entirely match the precision of the regression function.However,
the prediction performances are on the same orderof magnitude, and
we must not forget that the success of theregression approach
relies on the expert knowledge used todefine its functional form,
while the structure of the BN islearned in a completely data-driven
manner. Further the re-gression approach profits in this example
from the fact thatthe target variable (ln PGA) is normally
distributed, whichis not necessarily the case for other
applications. Focusingon the prediction of the target variable the
regression ap-proach also does not have the flexibility of the BN,
whichis designed to capture the joint distribution of all
variablesand thus allows for inference in all directions (Property
5),
www.nat-hazards-earth-syst-sci.net/14/2605/2014/ Nat. Hazards
Earth Syst. Sci., 14, 2605–2626, 2014
-
2614 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
Table 4. Results of a 10-fold cross validation to test the
predictionperformance of the BN (with discrete and MTE
approximations ofthe conditional distributions) and the regression
approach.(a) con-tains the calculated conditional densities for the
observed ln PGAvalues averaged over each trial.(b) contains the
mean squared errorof the predicted ln PGA for each trial.
(a) Averaged conditional density
BNdiscrete BNMTE Regression
1 0.237 0.320 0.3312 0.240 0.297 0.3293 0.239 0.298 0.3314 0.218
0.255 0.3235 0.216 0.260 0.3396 0.222 0.257 0.3397 0.215 0.252
0.3328 0.243 0.317 0.3309 0.212 0.249 0.328
10 0.243 0.315 0.331
Avg. 0.229 0.282 0.331
(b) Mean squared error
BNdiscrete BNMTE Regression
1 1.021 0.749 0.6632 1.197 0.963 0.6803 1.082 0.821 0.6734 1.262
0.951 0.7235 1.201 0.851 0.6296 1.298 1.059 0.6257 1.297 1.077
0.6728 1.149 0.713 0.7019 1.343 1.161 0.692
10 1.169 0.841 0.666
Avg. 1.202 0.919 0.672
as exemplified in Sect.4.3. Additional benefits of BNs,
liketheir ability to make use of incomplete observations, will
berevealed in the following sections, where we investigate
real-world data.
4 Flood damage assessment
In the previous section we dealt with a fairly small BN (a
fewvariables/nodes) and a synthetic data set. In this section wego
one step further and focus on learning a larger BN fromreal-life
observations on damage caused to residential build-ings by flood
events. Classical approaches, so-called stage–damage functions,
relate the damage for a certain class ofobjects to the water stage
or inundation depth, while othercharacteristics of the flooding
situation and the flooded ob-ject are rarely taken into account
(Merz et al., 2010). Eventhough it is known that the flood damage
is influenced by a
Fig. 9. The single figures show the dependences between the
predictor variables M,R,SD,Q0,κ0,VS30 and
the target variable lnPGA by plotting the data used to learn the
BN for ground motion modeling.
(a) P (B)P (E)P (A|B,E)P (R|E) (b) P (B)P (A|B)P (E|A,B)P
(R|E)
Fig. 10. The graph structure of a BN dictates, how the joint
distribution of all variables decomposes into a
product of conditionals. Thus for a valid decomposition each
independence assumption mapped into the BN
must hold. Usually this applies to a variety of graphs, i.e. the
complete graph is always a valid independence
map as it does not make any independence assumption. (a) and (b)
show two valid BN structures and the
corresponding decompositions for the burglary example. The
independence assumptions made in both BNs
hold, however (b) does not capture the independence between
earthquakes and burglaries. An independence
map that maps all independences (a) is called a perfect map, yet
perfect maps do not exist for all applications.
Besides, for parameter reduction it might be beneficial to work
with an independence map that differs from the
perfect map.
34
Figure 10. The graph structure of a BN dictates how the joint
dis-tribution of all variables decomposes into a product of
condition-als. Thus for a valid decomposition each independence
assumptionmapped into the BN must hold. Usually this applies to a
variety ofgraphs, i.e., the complete graph is always a valid
independence mapas it does not make any independence assumption.(a)
and(b) showtwo valid BN structures and the corresponding
decompositions forthe burglary example. The independence
assumptions made in bothBNs hold; however(b) does not capture the
independence betweenearthquakes and burglaries. An independence map
that maps all in-dependences(a) is called a perfect map, yet
perfect maps do notexist for all applications. Furthermore, for
parameter reduction itmight be beneficial to work with an
independence map that differsfrom the perfect map.
variety of factors (Thieken et al., 2005), stage–damage
func-tions are still widely used. This is because the number of
po-tential influencing factors is large and the single and joint
ef-fects of these parameters on the degree of damage are
largelyunknown.
4.1 Real-life observations
The data collected after the 2002 and 2005/2006 floodevents in
the Elbe and Danube catchments in Germany (seeFig. 11) offer a
unique opportunity to learn about the driv-ing forces of flood
damage from a BN perspective. The dataresult from computer-aided
telephone interviews with flood-affected households, and contain
1135 records for which thedegree of damage could be reported. The
data describe theflooding and warning situation, building and
household char-acteristics, and precautionary measures. The raw
data weresupplemented by estimates of return periods, building
val-ues, and loss ratios, as well as indicators for flow
velocity,contamination, flood warning, emergency measures,
precau-tionary measures, flood experience, and socioeconomic
fac-tors. Table5 lists the 29 variables allocated to their
domains.A detailed description of the derived indicators and the
sur-vey is given byThieken et al.(2005) andElmer et al.(2010).In
Sect.3.2we dealt with the issue of continuous data whenlearning
BNs; here we will apply the methodology presentedthere. However, in
contrast to the synthetic data from theprevious section, many
real-world data sets are, for differentreasons, lacking some
observations for various variables. Forthe data set at hand, the
percentage of missing values is be-low 20 % for most variables, yet
for others it reaches almost70 %. In the next subsection we show
how we deal with themissing values in the setting of the automatic
discretizationdescribed in Sect.3.2when learning BNs.
Nat. Hazards Earth Syst. Sci., 14, 2605–2626, 2014
www.nat-hazards-earth-syst-sci.net/14/2605/2014/
-
K. Vogel et al.: Bayesian network learning for natural hazard
analyses 2615Ta
ble
5.Va
riabl
esus
edin
the
flood
dam
age
asse
ssm
enta
ndth
eir
corr
espo
ndin
gra
nges
.C:c
ontin
uous
;O:o
rdin
al;N
:nom
inal
.
Varia
ble
Sca
lean
dra
nge
Per
cent
age
ofm
issi
ngda
ta
Flo
odpa
ram
eter
s
Wat
erde
pth
C:2
48cm
belo
wgr
ound
to67
0cm
abov
egr
ound
1.1
Inun
datio
ndu
ratio
nC
:1to
1440
h1.
6F
low
velo
city
indi
cato
rO
:0=
still
to3
=hi
ghve
loci
ty1.
1C
onta
min
atio
nin
dica
tor
O:0=
noco
ntam
inat
ion
to6=
heav
yco
ntam
inat
ion
0.9
Ret
urn
perio
dC
:1to
848
year
s0
War
ning
and
emer
genc
ym
easu
res
Ear
lyw
arni
ngle
adtim
eC
:0to
336
h32
.3Q
ualit
yof
war
ning
O:1
=re
ceiv
erof
war
ning
knew
exac
tlyw
hatt
odo
to6
=re
ceiv
erof
war
ning
had
noid
eaw
hatt
odo
55.8
Indi
cato
rof
flood
war
ning
sour
ceN
:0=no
war
ning
to4=
offic
ialw
arni
ngth
roug
hau
thor
ities
17.4
Indi
cato
rof
flood
war
ning
info
rmat
ion
O:0=
nohe
lpfu
linf
orm
atio
nto
11=
man
yhe
lpfu
lin
form
atio
n19
.1
Lead
time
perio
del
apse
dw
ithou
tusi
ngit
for
emer
genc
ym
easu
res
C:0
to33
5h
53.6
Em
erge
ncy
mea
sure
sin
dica
tor
O:1=
nom
easu
res
unde
rtak
ento
17=m
any
mea
sure
sun
dert
aken
0
Pre
caut
ion
Pre
caut
iona
rym
easu
res
indi
cato
rO
:0=
nom
easu
res
unde
rtak
ento
38=m
any
effic
ient
mea
sure
sun
dert
aken
0
Per
cept
ion
ofef
ficie
ncy
ofpr
ivat
epr
ecau
tion
O:1=
very
effic
ient
to6=
note
ffici
enta
tall
2.9
Flo
odex
perie
nce
indi
cato
rO
:0=no
expe
rienc
eto
9=re
cent
flood
expe
rienc
e68
.6K
now
ledg
eof
flood
haza
rdN
(yes
/no)
32.7
Bui
ldin
gch
arac
teris
tics
Bui
ldin
gty
peN
:(1=
mul
tifam
ilyho
use,
2=se
mi-d
etac
hed
hous
e,3
=on
e-fa
mily
hous
e)0.
1
Num
ber
offla
tsin
build
ing
C:1
to45
flats
1.2
Flo
orsp
ace
ofbu
ildin
gC
:45
to18
000
m21.
9B
uild
ing
qual
ityO
:1=
very
good
to6=
very
bad
0.6
Bui
ldin
gva
lue
C:C
9224
4to
371
867
70.
2
Soc
ioec
onom
icfa
ctor
s
Age
ofth
ein
terv
iew
edpe
rson
C:1
6to
95ye
ars
1.6
Hou
seho
ldsi
ze,i
.e.,
num
ber
ofpe
rson
sC
:1to
20pe
ople
1.1
Num
ber
ofch
ildre
n(<
14ye
ars)
inho
useh
old
C:0
to6
10.1
Num
ber
ofel
derly
pers
ons
(>
65ye
ars)
inho
useh
old
C:0
to4
7.6
Ow
ners
hip
stru
ctur
eN
:(1=
tena
nt;2
=ow
ner
offla
t;3=
owne
rof
build
ing)
0M
onth
lyne
tinc
ome
incl
asse
sO
:11
=be
low
EU
R50
0to
16=
EU
R30
00an
dm
ore
17.6
Soc
ioec
onom
icst
atus
acco
rdin
gtoPla
pp(2
003)
O:3
=ve
rylo
wso
cioe
cono
mic
stat
usto
13=ve
ryhi
ghso
cioe
cono
mic
stat
us25
.5
Soc
ioec
onom
icst
atus
acco
rdin
gtoSch
nell
etal
.(199
9)O
:9=
very
low
soci
oeco
nom
icst
atus
to60=
very
high
soci
oeco
nom
icst
atus
31.7
Flo
odlo
ss
rloss
–lo
ssra
tioof
resi
dent
ialb
uild
ing
C:0=
noda
mag
eto
1=to
tald
amag
e0
www.nat-hazards-earth-syst-sci.net/14/2605/2014/ Nat. Hazards
Earth Syst. Sci., 14, 2605–2626, 2014
-
2616 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
Fig. 11. Catchments investigated for the flood damage assessment
and location of communities reporting losses
from the 2002, 2005 and 2006 flood events in the Elbe and Danube
catchments (Schroeter et al., 2014).
35
Figure 11.Catchments investigated for the flood damage
assessment and location of communities reporting losses from the
2002, 2005, and2006 floods in the Elbe and Danube catchments
(Schroeter et al., 2014).
4.2 Handling of incomplete records
To learn the BN, we again maximize the joint posterior forthe
given data (Eq.3). This requires the number of countsfor each
combination of states for(Xi, XPa(i)), consideringall variables,i =
1, . . . ,k (see AppendixA1). However thisis only given for
complete data, and for missing values it canonly be estimated by
using expected completions of the data.We note that a reliable and
unbiased treatment of incompletedata sets (no matter which method
is applied) is only possiblefor missing data mechanisms that
areignorableaccording tothemissing (completely) at random(M(C)AR)
criteria as de-fined inLittle and Rubin(1987), i.e., the
absence/presence ofa data value is independent of the unobserved
data. For thedata sets considered in this paper, we assume the MAR
cri-terion to hold and derive the predictive
function/distributionbased on the observed part of the data in
order to estimate thepart which is missing.
In the context of BNs a variety of approaches has beendeveloped
to estimate the missing values (so-called “impu-tation”). Most of
these principled approaches are iterativealgorithms based on
expectation maximization (e.g.,Fried-
man, 1997, 1998) or stochastic simulations (e.g.,Tanner andWong,
1987). In our case we already have to run several it-erations of BN
learning and discretization, each iteration re-quiring the
estimation of the missing values. Using an itera-tive approach for
the missing value prediction will thus eas-ily become infeasible.
Instead we use a more efficient albeitapproximate method, using
theMarkov blanket predictorde-veloped byRiggelsen(2006).
The idea is to generate a predictive function which enablesthe
prediction of a missing variableXi based on the observa-tions of
its Markov blanket (MB),XMB(i). The Markov blan-ket identifies the
variables that directly influenceXi , i.e., theparents, and
children ofXi , as well as the parents ofXi ’schildren. An example
is given in Fig.12. Assuming the MBis fully observed, it
effectively blocks influence from all othervariables, i.e., the
missing value depends only on its MB.When some of the variables in
the MB are missing, it doesnot shield offXi . However, for
predictive approximation pur-poses, we choose to always ignore the
impact from outsidethe MB. Hence, the prediction ofXi based on the
observeddata reduces to a prediction based on the observations of
the
Nat. Hazards Earth Syst. Sci., 14, 2605–2626, 2014
www.nat-hazards-earth-syst-sci.net/14/2605/2014/
-
K. Vogel et al.: Bayesian network learning for natural hazard
analyses 2617
Fig. 12.::::::::Illustration
::of
:a::::::Markov
::::::Blanket
:::::(gray
:::::shaded
::::::nodes)
::on
::a
::::blood
:::::group
:::::::example:
::::Lets
::::::assume,
::::that
::for
::::some
::::::reason
:I::do
:::not
:::::know
::my
:::::blood
:::::group,
:::but
:I:::::know
::the
::::::::genotypes
::of
:::my
:::::::relatives.
::::The
::::::::genotypes
::of
::my
::::::parents
::::::provide
:::::::::information
::::about
:::my
:::own
:::::blood
:::::group
:::::::::specification
:–::in
:::the
::::::pictured
:::::::example
:::they
::::::restrict
::the
:::list
::of
::::::::::opportunities
::to
::the
::::four
::::::options:
::::AB,
:::A0,
:::B0
:::and
:::BB
:–
::as
::::well
::as
::the
:::::::genotype
::of
:::my
::::child
::::::reveals
:::::::::information,
::::::::excluding
:::BB
::::from
:::the
::list
:::of
::::::possible
::::::options.
::::::::::Considering
:::the
:::::::genotype
::of
:::the
:::::::::::father/mother
:of
:::my
:::::child
::::alone
::::does
:::not
::::::provide
:::any
:::::::::information
:::::about
::my
:::::blood
::::type
:::(our
:::::blood
::::::groups
::are
::::::::::independent
:::from
::::each
::::::other),
:::but
::::::together
::::with
:::the
:::::::::information
::::about
:::our
::::child
::it
::::again
:::::::restricts
::the
:::list
::of
:::::::::::opportunities,
:::::leaving
::::only
:::AB
:::and
::A0
::as
::::::possible
::::::options
::::::::::(conditioned
::on
:::our
::::child
::our
:::::blood
:::::groups
::::::become
:::::::::dependent).
:::All
::::these
:::::::variables
::::::(parents,
:::::::children,
::::::parents
::of
:::::::children)
::::::provide
::::direct
:::::::::information
:::::about
::the
::::::::considered
:::::::variable
:::(my
:::::blood
::::type)
:::and
::::form
::its
:::::::Markov
::::::Blanket.
::::::::Knowing
:::the
:::::values
::of
:::the
::::::Markov
::::::Blanket
::::::further
:::::::variables
::do
:::not
::::::provide
:::any
::::::::additional
:::::::::information,
::::e.g.
:::::::knowing
:::the
::::::::genotypes
::of
:::my
::::::parents,
:::the
::::::::knowledge
:::::about
::my
::::::::::grandparents
::::does
:::not
::::::deliver
:::any
:::::further
:::::::::information
:::::about
::::::myself
:::(the
:::::::::information
::is
:::::::‘blocked’
:::by
:::my
::::::parents).
::::Yet,
:if
:::the
::::blood
::::type
::of
::my
::::::parents
::is
:::::::unknown
::the
:::::::::information
:::::about
::my
::::::::::grandparents
:::can
:::::‘flow’
:::and
::::::provides
::::new
::::::insights.
36
Figure 12. Illustration of a Markov blanket (gray-shaded nodes)
ona blood group example: let us assume that I do not know my
bloodgroup for some reason, but I know the genotypes of my
relatives.The genotypes of my parents provide information about my
ownblood group specification – in the pictured example they
restrict thelist of opportunities to the four options: AB, A0, B0
and BB – aswell as the genotype of my child reveals information,
excluding BBfrom the list of possible options. Considering the
genotype of thefather/mother of my child alone does not provide any
informationabout my blood type (our blood groups are independent
from eachother), but together with the information about our child
it again re-stricts the list of opportunities, leaving only AB and
A0 as possibleoptions (conditioned on our child our blood groups
become depen-dent). All these variables (blood type of my parents,
my children,and the parents of my children) provide direct
information aboutthe considered variable (my blood type) and form
its Markov blan-ket. If I know the values of the Markov blanket,
further variablesdo not provide any additional information. For
example, knowingthe genotypes of my parents, the knowledge about my
grandparentsdoes not deliver any further information about myself
(the informa-tion is “blocked” by my parents). Yet, if the blood
type of my par-ents is unknown, the information about my
grandparents can “flow”and provides new insights.
MB and factorizes according to the DAG in Fig.13a:
P(Xi |XMB(i), θ , DAG
)∝ θXi |XPa(i)
∏j∈Ch(i)
θXj |XPa(j) , (6)
where Ch(i) are the variable indices for the children ofXi .Thus
the prediction ofXi requires, according to Eq. (6), in-ference in
the BN (albeit very simple) where correct esti-mates ofθ are
assumed. These in general can not be givenwithout resorting to
iterative procedures. To avoid this wedefine a slightly modified
version of the predictive function,for which we define all
variables that belong to the MB ofXito be the parents ofXi in a
modified DAG′ (see Fig.13 forillustration). ThusXDAG
′
Pa(i) corresponds toXDAGMB(i). The result-
ing DAG′ preserves all dependences given in DAG and can
Fig. 13. (a) Illustration of a Markov Blanket of Xi. The Markov
Blanket of a variable ::Xi comprises the ::its
parents and childrenof that variable, as well as the parents of
the:its
:children.
The prediction of missing values is based on the observations of
the variables in the Markov Blanket. To
avoid inference that requires unknown parameters, the subgraph
of DAG that spans the Markov Blanket (a) is
modified by directing all edges towards Xi, receiving the DAG′
pictured in (b).
37
Figure 13. (a)The Markov blanket ofXi comprises its parents
andchildren, as well as the parents of its children. The prediction
ofmissing values is based on the observations of the variables in
theMarkov blanket. To avoid inference that requires unknown
parame-ters, the subgraph of DAG that spans the Markov blanket(a)
is mod-ified by directing all edges towardsXi , receiving the
DAG
′ picturedin (b).
alternatively be used for the prediction ofXi ,
P(Xi |X
DAG′Pa(i) , θ
DAG′ ,DAG′)
def= θDAG
′
Xi |XPa(i). (7)
For this predictive distribution we need to estimate the
pa-rametersθDAG
′
Xi |XPa(i). Note that more parameters are required
for the newly derived predictive distribution, but now at
leastall influencing variables are considered jointly and an
iter-ative proceeding can be avoided. The parameters are esti-mated
with asimilar-casesapproach, which is described inAppendix A2. A
detailed description for the generation ofthe predictive
distribution is given inRiggelsen(2006) andVogel et al.(2013).
It is worth noting that, as the MBs of variables change dur-ing
the BN learning procedure, the prediction of missing val-ues
(depending on the MB) needs to be updated as well.
4.3 Results
Coming back to the flood damage data, we have three vari-ables
with more than one-third of the observations miss-ing: flood
experience (69 % missing), warning quality (56 %missing) and lead
time elapsed without emergency measures(54 % missing). In a first
“naive” application (Vogel et al.,2012), no special attention was
paid to a proper treatmentof missing values; the missing values
were simply randomlyimputed, resulting in the isolation of two
variables (flood ex-perience and lead time elapsed) in the network;
no connec-tion to any other variable was learned (Fig.14a). With
appli-cation of the Markov blanket predictor, the situation
changesand a direct connection from the relative building
damage,rloss, to flood experience is found, as well as a
connectionbetween warning source and elapsed lead time
(Fig.14b).These relations, especially the first one, match with
experts’expectations and speak for an improvement in the learnedBN
structure.
www.nat-hazards-earth-syst-sci.net/14/2605/2014/ Nat. Hazards
Earth Syst. Sci., 14, 2605–2626, 2014
-
2618 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
(a)
(b)
Fig. 14. BNs learned for flood damage assessments, showing the
effect of the applied missing value estimator.
The algorithm used to learn (a) replaces missing values
randomly, while the one used to learn (b) applies the
Markov Blanket predictor for the estimation of missing values.
Nodes with a bold frame belong to the Markov
Blanket of relative building loss and are thus assumed to have
direct impact on the caused flood damage.
38
Figure 14. BNs learned for flood damage assessments, showing the
effect of the applied missing value estimator. The algorithm used
tolearn(a) replaces missing values randomly, while the one used to
learn(b) applies the Markov blanket predictor for the estimation of
missingvalues. Nodes with a bold frame belong to the Markov blanket
ofrelative building lossand are thus assumed to have a direct
impact on thecaused flood damage.
Using the graphical representation (Property 1), as men-tioned
in Sect.2.1, the learned DAG (Fig.14b) gives in-sight into the
dependence relations of the variables. It revealsa number of direct
links connecting the damage-describingvariable with almost all
subdomains. This supports the de-mand for improved flood damage
assessments that take sev-eral variables into account (Merz et al.,
2010). Moreover,the DAG shows which variables are the most relevant
for
the prediction of rloss. The domains “precaution” and
“floodparameters” in particular are densely connected to
buildingdamage and should be included in any damage
assessment(Property 3).
Existing approaches for flood damage assessments usuallyconsider
fewer variables and an employment of a large num-ber of variables
is often considered as disadvantageous, sincecomplete observations
for all involved variables are rare. The
Nat. Hazards Earth Syst. Sci., 14, 2605–2626, 2014
www.nat-hazards-earth-syst-sci.net/14/2605/2014/
-
K. Vogel et al.: Bayesian network learning for natural hazard
analyses 2619
(a)0.0 0.2 0.4 0.6 0.8 1.0
05
1015
relative building loss
dens
ity
bad precaution ( 14)
(b)0.0 0.2 0.4 0.6 0.8 1.0
05
1015
relative building loss
dens
ity
bad precaution ( 14)
7.5 14) and a bad precau-tion (precautionary measures indicator≤
14) in a generalcase (Fig.15a: all other variables are unknown and
summedout) and for a specific flood event (Fig.15b: 7.5 m≤
waterdepth< 96.5 m; 82 h≤ duration< 228 h; 1≤ velocity).
Wemay appreciate how a good precaution increases the chancefor no
or only small building losses.
Similar investigations may support the identification of
ef-ficient precautionary measures, not only in the context offlood
events but also for natural hazards in general. They mayalso help
to convince authorities or private persons to under-take the
suggested precautions. Using the flexibility of BNsand their
ability to model specific situations, BNs may thuscontribute to a
better communication between scientists andnon-scientific
stakeholders. BNs can also be used for forensicreasoning, i.e., we
can turn around the direction of reasoningin the example just
considered and ask what a likely stateof precaution is for a given
observed damage in a specificor general event situation. Forensic
reasoning might be of
www.nat-hazards-earth-syst-sci.net/14/2605/2014/ Nat. Hazards
Earth Syst. Sci., 14, 2605–2626, 2014
-
2620 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
interest, for instance, for insurance companies. Forensic
rea-soning might be of interest, for instance, for insurance
com-panies.
5 Landslides
So far we assumed the existence of a unique model that ex-plains
the data best. In practical problems, however, theremay be many
models almost as good as the best, i.e., onesthat explain the data
similarly well. This results in an un-certainty about which BN
structure to use. We consider thisproblem in our last application,
where we apply BN learningto landslides, which are another
ubiquitous natural hazard inmany parts of the world.
A key theme in many landslide studies is the search forthose
geological, hydroclimatological, topographic, and en-vironmental
parameters that sufficiently predict the suscepti-bility to slope
failure in a given region. A wide range of mul-tivariate data
analysis techniques has been proposed to meetthis challenge.
Amongst the more prominent methods arelogistic regression,
artificial neural networks, and Bayesianweights of evidence. The
popularity of such methods is onlymatched by their seeming success:
a recent review of 674 sci-entific papers on the topic indicates
that most reported suc-cess rates are between 75 and 95 % (Korup
and Stolle, 2014),where in the majority of studies the success rate
is defined asthe percentage of correctly (true positives and true
negatives)identified locations that were subject to slope
instability inthe past. This raises the question as to why
landslides stillcontinue to cause massive losses despite this
seemingly highpredictive accuracy. Moreover, success rates do not
showany significant increase over the last 10 years regardless
ofthe number of landslide data or predictors used (Korup andStolle,
2014). An often overlooked key aspect in these analy-ses is the
potential for correlated or interacting predictor can-didates. Few
studies have stringently explored whether thislikely limitation is
due to physical or statistical (sampling)reasons.
5.1 Data
The landslide data are taken from an inventory of∼ 300 000
digitally mapped landslide deposit areas acrossthe Japanese islands
(Korup et al., 2014). These landslideswere mapped systematically
mostly from stereographic im-age interpretation of air photos, and
compiled by the Na-tional Research Institute for Earth Science and
Disaster Pre-vention NIED
(http://lsweb1.ess.bosai.go.jp/gis-data/index.html). The dominant
types of failure in this database aredeep-seated slow-moving
earthflows and more rapid rock-slides. The mapped size range of the
deposits from theselandslides spans from 102 to 107 m2 footprint
area and isdistinctly heavy tailed (Korup et al., 2012). Many of
thelandslide deposits are covered by vegetation. Individual de-
posits do not carry any time-stamp information, and so
theinventory contains both historic and prehistoric slope
fail-ures, likely containing landslides up to several thousands
ofyears old. Smaller rockfalls or soil slips are not
included.Similarly, the inventory contains no data on specific
triggermechanisms (such as earthquakes, rainfall, or snowmelt),
thedominant type of materials mobilized, or absolute age
infor-mation for the bulk of individual landslides. In this
context,the data nicely reflect common constraints that scientists
en-counter when compiling large landslide databases from re-mote
sensing data covering different time slices. Yet thistype of
inventory is frequently used as a key input for as-sessing and
mapping regional landslide susceptibility froma number of
statistical techniques, including BNs. However,data-driven learning
of BNs containing landslide informationhas, to the best of our
knowledge, not been attempted before.We have compiled a number of
geological, climatic, and to-pographic metrics for individual
catchments throughout theJapanese islands to test their influence
on the average frac-tion of landslide-affected terrain that we
computed within a10 km radius. Most of our candidate predictors
(Table7) havebeen used in modified form in other studies (Korup et
al.,2014). While all of these candidate predictors may be
phys-ically related to slope instability, our choice of predictors
isintentionally arbitrary in order to learn more about their
ef-fects on BN learning and structure. The final data set usedfor
the BN learning consists of landslide and predictor datathat we
averaged at the scale of 553 catchments that are upto 103 km2
large, and that we sampled randomly from thedrainage network across
Japan. This averaging approach pro-duced∼ 0.4 % missing data in the
subset, and aptly simulatesfurther commonly encountered constraints
on the quality oflarge landslide inventories.
5.2 Uncertainty in BN structure
Ideally, a given model should adequately encapsulate
naturalphenomena such as the causes and triggers of slope
instabil-ity. However, there may be several equally well poised,
butcompeting, models because of the intrinsic uncertainty tiedto
the governing processes. In practice we also face otherlimitations
that prevent us from focusing on one single bestmodel. The finite
number of observations we have at ourdisposal for learning, and the
fact that it is unclear whichrelevant predictor variables to
consider for landslide predic-tion, implies that several models may
be justifiable. This is ageneral problem when attempting to
formally model naturalsystems. In our case this means that several
BNs might ex-plain the data (almost) equally well, i.e., they
receive a simi-lar score according to Eq. (2).
An additional source of uncertainty stems from the struc-ture
learning algorithm used to maximize the score definedin Eq. (2) or
– for continuous variables – in Eq. (3). For in-finite data sets
the algorithm terminates according toMeek’sconjecturein the
(unique) optimal equivalence class of DAGs
Nat. Hazards Earth Syst. Sci., 14, 2605–2626, 2014
www.nat-hazards-earth-syst-sci.net/14/2605/2014/
http://lsweb1.ess.bosai.go.jp/gis-data/index.htmlhttp://lsweb1.ess.bosai.go.jp/gis-data/index.html
-
K. Vogel et al.: Bayesian network learning for natural hazard
analyses 2621
Table 7.Variables used in the landslide model.
Name Definition Unit
Mean elevation Average of elevation values within catchment
boundaries [m]Catchment area Log-transformed catchment area
[a.u.]Catchment perimeter Total length of catchment divides [m]Mean
local topographic relief Maximum elevation difference in a 10 km
radius [m]Mean annual precipitationa Based on interpolated rainfall
station data (reference period 1980–2010) [mm]Mean coefficient of
variation ofannual precipitationa
Based on interpolated rainfall station data, with standard
deviation divided by mean (referenceperiod 1980–2010)
[1]
Mean coefficient of variation ofmonthly precipitationa
Based on interpolated rainfall station data, with standard
deviation divided by mean (referenceperiod 1980–2010)
[1]
Mean surface uplift 2001–2011b GPS-derived accumulated surface
uplift 2001–2011 [m]Mean surface uplift 2010–2011b GPS-derived
accumulated surface uplift 2010–2011 [m]Mean fraction of 10 %
steepest bedrockchannels
Average fraction of 10 % steepest channels per unit length of
bedrock-river drainage network ina 10 km radius, based on an
arbitrarily set referenceconcavityθ = 0.45
[1]
Mean bedrock channelsteepness
Average of channel steepness index per reach length, based on an
arbitrarily set reference con-cavity θ = 0.45
[1]
Regionalized river sinuosity Average bedrock-channel sinuosity
weighted by drainage network length ina 10 km radius calculated as
the flow length of a given channel segmentdivided by its shortest
vertex distance
[1]
Fraction of volcanic rocksc Fraction of catchment area underlain
by volcanic rocks [1]Fraction of lakes Fraction of catchment area
covered by lakes [1]Fraction of plutonic rocksc Fraction of
catchment area underlain by plutonic rocks [1]Fraction of
sedimentary rocksc Fraction of catchment area underlain by
sedimentary rocks [1]Fraction of accretionary complexrocksc
Fraction of catchment area underlain by accretionary
complexrocks
[1]
Fraction of metamorphic rocksc Fraction of catchment area
underlain by metamorphic rocks [1]Median area of
landslide-affectedterrain
Fraction of landslide terrain per unit catchment area within a
10 km radiuscalculated using an inventory of mostly prehistoric
landslide-deposit areas
[1]
a Calculated using data provided by the Japan Meteorological
Agency (JMA,http://www.jma.go.jp/jma/indexe.html).b Calculated from
secular high-precision leveling data (Kimura et al., 2008).c
Calculated using the seamless digital geological map of Japan (1 :
200 000) available from the Geological Survey of Japan
(https://gbank.gsj.jp/seamless).
(Chickering, 2002), but this does not necessarily hold for
fi-nite data sets, incomplete observations and a search
spaceextended by the discretization. The algorithm for the
traver-sal of the BN hypothesis space contains stochastic
elementsand may get stuck in local optima, providing slightly
differ-ent results for different runs.
To analyze this random behavior, we run the BN learningand
discretization algorithm 10 times on the same data set oflandslide
data. We do not expect to end up with the same BNin each trial, as
the constraints to meet Meek’s conjecture arenot fulfilled.
Instead, we are more interested in documentinghow strongly the
results differ from each other.
Figure 16 gives a summarized representation of the BNDAG
structures. The frequency with which an edge betweentwo variables
is learned is encoded according to its widths(by scaling it
accordingly). Despite the differences in DAGstructures, all learned
BNs seem to model the data-generatingprocess almost equally well,
which can be gathered fromthe score obtained by Eq. (3): for the
BNs learned, we ob-served scores between−64 364.42 and−64 253.98.
This isa promising result, since it indicates that, even though the
al-gorithm gets stuck in local maxima, the quality of the
results
does not differ significantly. This supports the assumptionthat
the quality of the learned BN is not seriously affectedby random
effects of the learning algorithm. Multiple runs ofthe algorithm on
other data sets confirm this assumption.
In literature on BN learning (and on model learning basedon data
in general), ideas of how to handle several compet-ing, but all
justifiable, BNs have been investigated.Fried-man et al.(1999) use
bootstrap sampling to learn BNs fromdifferent variations of the
data set. Based on those they de-velop a confidence measure on
features of a network (e.g., thepresence of an edge or membership
of a node to a cer-tain Markov blanket). A Bayesian approach is
presented byFriedman and Koller(2000) andRiggelsen(2005), who
ap-proximate the Bayesian posterior on the DAG space using aMarkov
chain Monte Carlo approach. An adaptation of thesemethods for the
extended MAP score introduced in this paperis left for future
work.
5.3 Results
Despite (or rather thanks to) the DAG structural differences,we
can glean some instructive insights from the learned BNs.
www.nat-hazards-earth-syst-sci.net/14/2605/2014/ Nat. Hazards
Earth Syst. Sci., 14, 2605–2626, 2014
http://www.jma.go.jp/jma/indexe.htmlhttps://gbank.gsj.jp/seamless
-
2622 K. Vogel et al.: Bayesian network learning for natural
hazard analyses
Fig. 16. Summary of ten learned network structures modeling
landslides susceptibility, all based on the same
data set. Arrow widths between the variables are scaled to the
number of times they occur in the learned
BNs. Likewise, we color-coded the variables by the frequency
with that they occur as part of the Markov
Blanket of fraction of landslide affected terrain (circular node
shape), where darker hues indicate more frequent
occurrence.
Fig. 17. Illustration for the calculation of s(·) used for the
parameter estimation in DAG′. The graph on the
left shows a DAG′ for the estimation of C conditioned on A and
B. The three variables take the values t and
f . An exemplary data set is given in the table on the right
together with the contribution for each record to
s(C = t,(A= t,B = f)).
40
Figure 16. Summary of 10 learned network structures modeling
landslides susceptibility, all based on the same data set. Arrow
widthsbetween the variables are scaled to the number of times they
occur in the learned BNs. Likewise, we color-coded the variables
accordingto the frequency with which they occur as part of the
Markov blanket offraction of landslide-affected terrain(circular
node shape), wheredarker hues indicate more frequent
occurrences.
The fact that we can learn something about the
landslide-affected terrain from several BN structures indicates
that thedifferent predictors are highly interacting, and that a
missedlink between two variables can often be compensated for
byother interactions. To understand which variables are
mostrelevant for the prediction of landslide-affected terrain,
wecoded the variables in Fig.16 according to the frequency atwhich
they occur as part of the target variable’s Markov blan-ket, where
darker hues indicate more frequent occurrences.
Perhaps the most surprising aspect of the learned BNs isthat
only few of the predictors that have traditionally been in-voked to
explain landslide susceptibility are duly representedin the Markov
blanket. These include mean annual precipita-tion (part of the MB
in each run) – including some derivativessuch as precipitation
variability (either annual or mont