7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
1/42
DataData--drivendriven modellingmodelling
in waterin water--related problems.related problems.PART 3PART 3
Dimitri P. Solomatine
www.ihe.nl/hi/sol [email protected]
UNESCO-IHE Institute for Water EducationHydroinformatics Chair
D.P. Solomatine. Data-driven modelling (part 3). 2
Finding groups (clusters) in dataFinding groups (clusters) in data
(unsupervised learning)(unsupervised learning)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
2/42
D.P. Solomatine. Data-driven modelling (part 3). 3
ClusteringClustering
classificationis aimed at identifying mapping (function) thatmaps any given input xito a nominal variable (class) yi.
finding the groups (clusters) in an input data set is clustering
Clustering is often the preparation phase for classification:
the identified clusters could be labelled as classes, each inputinstance then can be associated with an output value (class) andthe instances set {xi, yi} can be built
Cluster1
Cluster 2
Cluster 3
a) b)
D.P. Solomatine. Data-driven modelling (part 3). 4
Reasons to use clusteringReasons to use clustering
labelling large data sets can be very costly;
clustering may actually give an insight into the data and helpdiscover classes which are not known in advance;
clustering may find featuresthat can be used for categorization.
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
3/42
D.P. Solomatine. Data-driven modelling (part 3). 5
VoronoiVoronoi diagramsdiagrams
D.P. Solomatine. Data-driven modelling (part 3). 6
Methods for clusteringMethods for clustering
partition-based clustering (K-means, fuzzy C-means, based onEuclidean distance);
hierarchical clustering (agglomerative hierarchical clustering,nearest-neighbour algorithm);
feature extraction methods: principal component analysis(PCA), self-organizing feature (SOF) maps (also referred to asKohonen neural networks).
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
4/42
D.P. Solomatine. Data-driven modelling (part 3). 7
kk--meansmeans clausteringclaustering
find the best division ofNsamples by Kclusters Cisuch that thetotal distance between the clustered samples and theirrespective centers (that is, the total variance) is minimized:
where i is the center of class i.
=
=K
i Cn
in
i
xJ1
2||
D.P. Solomatine. Data-driven modelling (part 3). 8
kk--means clustering: algorithmmeans clustering: algorithm
1 randomly assigning instances to the clusters
2 compute the centers according to
3 reassigne the instances to the nearest clusters center
4 recalculate centers
5 reassign the instances to new centers
repeat 2-5 until total variance J stops decreasing (or centersstop to move).
=iCn
n
i
i xN
|1
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
5/42
D.P. Solomatine. Data-driven modelling (part 3). 9
kk--means clustering: illustrationmeans clustering: illustration
D.P. Solomatine. Data-driven modelling (part 3). 10
KohonenKohonen networknetwork
(Self(Self--organizing feature maporganizing feature map -- SOFM)SOFM)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
6/42
D.P. Solomatine. Data-driven modelling (part 3). 11
SOFM: main ideaSOFM: main idea
x1
1
1 2 3
j
N
2 M
x2
1112
1M
j(0)j(t1)
j(t2)
NM
xM
...
...
a) b)
D.P. Solomatine. Data-driven modelling (part 3). 12
SOFM: algorithm (1)SOFM: algorithm (1)
0 Initialize weight, normally with small random values.
Set topological neighborhood parameters.
Set learning rate parameters.
Iteration number t= 1.
1 While stopping conditionis false, do iteration t(steps 28):
2 For each input vector x = {x1,..., xN} do steps 3 8:
3 For each output node kcalculate the similarity measure (in thiscase the Euclidean distance) between the input and the weight
vector: =
=N
i
iikxwkD
1
2)()(
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
7/42
D.P. Solomatine. Data-driven modelling (part 3). 13
SOFM: algorithm (2)SOFM: algorithm (2)
4 Find index kmaxsuch that D(k)is a minimum this will refer to thewinning node.
5 Update the weights for the node kmaxand for all nodes kwithin aspecified neighborhood radius rfrom kmax:
6 Update learning rate (t)
7 Reduce radius rused in the neigborhood function N(this can bedone less frequently than at each iteration).
8 Test stopping condition.
)]([),()()()1( twxtrNttwtw ikiikik +=+
D.P. Solomatine. Data-driven modelling (part 3). 14
SOFM: exampleSOFM: example
Input set: sampling points in a square randomly (the probabilityof sampling a point in the central square region was 20 timesgreater than elsewhere in a square)
The target space is discrete and includes 100 output nodesarranged in 2 dimensions
SOFM is able to find the cluster the area of the pointsconcentration
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
8/42
D.P. Solomatine. Data-driven modelling (part 3). 15
SOFM: visualisation and interpretationSOFM: visualisation and interpretation
count maps, which is the easiestand mostly used method. This is aplot showing for each output nodenumber of times when it was awinning one. It can beinterpolated into colour shading aswell
distance matrix (of size K x K)which elements are Euclideandistance of each output unit to its
immediate neighbouring units
D.P. Solomatine. Data-driven modelling (part 3). 16
SOFM: visualization and interpretationSOFM: visualization and interpretation
vector positionor cluster maps:
colours are coded according totheir similarity in the inputspace
each dot corresponds to oneoutput map unit
each map unit is connected toits neighbours by line
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
9/42
D.P. Solomatine. Data-driven modelling (part 3). 17
SOFM: visualization and interpretationSOFM: visualization and interpretation
vector positionor cluster maps:
in 3D
D.P. Solomatine. Data-driven modelling (part 3). 18
InstanceInstance--based learningbased learning
(lazy learning)(lazy learning)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
10/42
D.P. Solomatine. Data-driven modelling (part 3). 19
Lazy and eager learningLazy and eager learning
Eager learning:
first ML (data-driven) model is built
then it is tested and used
Lazy learning
no ML model is built (i.e lazy)
when newexamples come, the output is generated immediatelyon the basis of the training examples
Other names for lazy learning:
Instance-based
Exemplar-based
Case-based
Experience-based
Edited k-nearest neighbor
D.P. Solomatine. Data-driven modelling (part 3). 20
kk--Nearest neighbors method: classificationNearest neighbors method: classification
instances are points in 2-dim. space, output is boolean (+ or -)
new instance xq is classified w.r.t. proximity of nearest traininginstances
to class + (if 1 neighbor is considered)
to class - (if 4 neighbors are considered)
for discrete-valued outputs assign: the most common value
VoronoiVoronoi diagram for 1diagram for 1--Nearest neighborNearest neighbor
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
11/42
D.P. Solomatine. Data-driven modelling (part 3). 21
NotationsNotations
instance x as {a1(x) ... an(x)} where ar(x) denotes the value ofthe r-th attribute of instance x.
distance between two instances xiand xj is defined to bed(xi, xj) where
2))()((),( jrirji xaxaxxd =
D.P. Solomatine. Data-driven modelling (part 3). 22
kk--NearestNearest neighborneighbor algorithmalgorithm
Training
Build the set of training examples D.
Classification
Given a query instance xq to be classified,
Let x1... xkdenote the kinstances from Dthat are nearest to xq Return
where (a, b)=1, ifa= b, and (a, b)=0 otherwise
V= {v1,,vs} set of possible output values.
==
k
i
iVv
q xfvxF1
))(,(maxarg)(
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
12/42
D.P. Solomatine. Data-driven modelling (part 3). 23
kk--Nearest neighbors: regressionNearest neighbors: regression
(target function is real(target function is real--valued )valued )
model a real-valued target function F: n .
instances are points in n-dim. space, output is a real number
new instance xq is valued w.r.t.
values of nearest training instances (average ofkinstances istaken, or the weighted average)
values and proximity of nearest training instances (locallyweighted regressionmodel is built and used to predict the value ofnew instance)
In this case the final line on the k-NN algorithm should bereplaced by the line
k
xf
xF
k
i
i
q
== 1
))(
)(
D.P. Solomatine. Data-driven modelling (part 3). 24
Distance weighted kDistance weighted k--NN algorithmNN algorithm
(classification)(classification)
weigh the contribution of each of the kneighbors according totheir distance to the query point xq, giving greater weight witocloser neighbors
This can be accomplished by replacing the final line in thealgorithm by
where the weight is
=
=k
i
iiVv
q xfvwxF1
))(,(maxarg)(
2),(
1
iq
ixxd
w =
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
13/42
D.P. Solomatine. Data-driven modelling (part 3). 25
Distance weighted kDistance weighted k--NN algorithmNN algorithm
(numerical prediction)(numerical prediction)
for real-valued output this is accomplished by replacing the finalline in the algorithm by
where the weight is
2),(
1
iqi xxdw =
=
==k
i
i
k
i
ii
q
w
xfw
xF
1
1
))(
)(
D.P. Solomatine. Data-driven modelling (part 3). 26
kk--Nearest neighbors: using all examplesNearest neighbors: using all examples
for classification:
for regression:
=
=instancesAll
i
iiVv
q xfvwxF1
))(,(maxarg)(
=
==instancesAll
i
i
instancesAll
i
ii
qw
xfw
xF
1
1
))(
)(
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
14/42
D.P. Solomatine. Data-driven modelling (part 3). 27
kk--Nearest neighbors: commentsNearest neighbors: comments
k-NN creates a localmodel of the proximity of new instance, insteadof a globalmodel of all training instances
robust to noisy training data
requires considerable amount of data
distance between instances is calculated based on allattributes (andnot on 1 as in decision trees). Possible problem: imagine instances described by 20 attributes, but only 2 are relevant to
target function
curse of dimensionality: nearest neighbor method is easily mislead whenhigh-dimensional X
solution: stretch j-th axis by weight zj chosen to minimize prediction error
with number of training instances , kNN approaches Bayesian
optimal classification
D.P. Solomatine. Data-driven modelling (part 3). 28
Locally weighted regression (1)Locally weighted regression (1)
construct an explicit approximation F(x) of the target functionf(x)over a localregion surrounding the new query point xq
If F(x) is linear then this is called locally weighted linearregression
Instead of minimizing the global error E, here the local errorE(xq) has to be minimized
)(...)()( 110 xawxawwxF nn+++=
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
15/42
D.P. Solomatine. Data-driven modelling (part 3). 29
Locally weighted regression (2)Locally weighted regression (2)
Various approaches to minimizing error E(xq):
Minimize the squared error over just k nearest neighbors
Minimize the squared error over entire set Dof training examples,while weighting the error of each training example by somedecreasing function Kof its distance from xq:
Combine 1 and 2 (to reduce computational costs):
2
1 ))()((2
1)(
=qxofnbrsnearestkx
q xFxfxE
)),(())()((2
1)( 22 xxdKxFxfxE q
Dx
q
=
)),(())()((2
1)(
2
3 xxdKxFxfxE qxofnbrsnearestkx
q
q
=
D.P. Solomatine. Data-driven modelling (part 3). 30
CaseCase--base reasoning (CBR)base reasoning (CBR)
instance-based learning, but output is not-real valued but isrepresented by symbolic descriptions
methods used to retrieve similar instances are more elaborate(not just Euclidean distance)
Applications:
conceptual design of mechanical devices based on a stored libraryof previous designs (Sycara 1992)
new legal cases based on previous rulings (Ashley 1990)
selection of an appropriate hydrological model based on previousexperience (Kukuric 1997, PhD of IHE)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
16/42
D.P. Solomatine. Data-driven modelling (part 3). 31
Remarks on Lazy and Eager learningRemarks on Lazy and Eager learning
Lazy methods: k-NN, locally weighted regression, CBR
Eager learners: are "eager" to before they observe the testinginstance xqthey already built the global approximation of thetarget function.
Lazy learners:
defer the decision of how to generalize beyond the training data untileach new instance is encountered,
when newexamples come, the output is generated immediately onthe basis of nearest training examples
Lazy learners have a richer set of hypotheses - they select anappropriate hypothesis (e.g. linear function) for each new instance
So Lazy methods are better suited to customize to unknownfuture instances
D.P. Solomatine. Data-driven modelling (part 3). 32
Fuzzy ruleFuzzy rule--based systemsbased systems
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
17/42
D.P. Solomatine. Data-driven modelling (part 3). 33
Fuzzy logicFuzzy logic
introduced in 1965 by Lotfi ZADEH, Univ. of Berkeley
Boolean logic is two-valued (False, True). Fuzzy logic is multi-valued (False...AlmostFalse...AlmostTrue...True)
Fuzzy set theory deals with degree of truththat the outcomebelongs to a certain category (partial truth)
a fuzzy seton a universe U: for any uU there is acorresponding real number A(u)[0,1] called grade ofmembershipof u belonging to A
mappingA: U [0,1]is called membership functionof A
D.P. Solomatine. Data-driven modelling (part 3). 34
Example of an ordinary and a fuzzy set "tall people"Example of an ordinary and a fuzzy set "tall people"
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
18/42
D.P. Solomatine. Data-driven modelling (part 3). 35
Various shapes of membership functionsVarious shapes of membership functions
[, +] is supportof the fuzzy set, 1 is its kernel
-
1
+
0
1
a) Triangular membership function
-
1
+
0
1
b) Bell-shaped function
-
1
+
0
1
c) Dome-shaped function
-
1
+
0
1
d) Inverted cycloid function
D.P. Solomatine. Data-driven modelling (part 3). 36
Example of a membership functionExample of a membership function
"appropriate water level in the reservoir""appropriate water level in the reservoir"
supportsupportkernelkernel
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
19/42
D.P. Solomatine. Data-driven modelling (part 3). 37
AlphaAlpha--cutcut
0.5-cut = [4.5, 7.0]
D.P. Solomatine. Data-driven modelling (part 3). 38
Fuzzy numbersFuzzy numbers
Special cases of fuzzy sets are fuzzy numbers
A fuzzy subset Aof the set of real numbers is called a fuzzynumberif
there is at least one zsuch that A(z) = 1 (normality assumption)
for every real numbers a, b, cwith a< c< b
A(c) min (A(a), A(b))(convexity assumption, meaning that the membership function of a
fuzzy number consists of an increasing and decreasing part, and
possibly flat parts)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
20/4222
D.P. Solomatine. Data-driven modelling (part 3). 39
Linguistic variable: exampleLinguistic variable: example
WATER LEVEL
Enough Volume for
flood detentionNavigableEnvironmentally
Friendly
11
11
1
0
10.8
0.9
0.3
0.80.7
0.5
0.7
0.9
11
1
0.2
0 5 10 15 20 25 30 35 40 45 50
Water level (m)
BASE
VARIABLE
LINGUISTIC VARIABLE
Fuzzy RestrictionFuzzy Values
of water level
CompatibilityLinks
Linguistic variablecan take linguistic values (like low, high, navigable)associated with fuzzy subsets Mof the universe U(here U= [0,50])
D.P. Solomatine. Data-driven modelling (part 3). 40
Operations on fuzzy setsOperations on fuzzy sets
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
21/4222
D.P. Solomatine. Data-driven modelling (part 3). 41
Fuzzy rulesFuzzy rules
Fuzzy rules are linguistic constructs of the type
IF A THEN B
where A and B are collections of propositions containing linguisticvariables (i.e. variables with linguistic values). A is called apremiseand B is the consequenceof the rule.
If there are Kpremises in a system, the i-th rule has the form:
where ais a crisp input,Aand Bare linguistic variables, is oneof the operators AND, OR, XOR.
ikikii BthenAisaAisaAisaIf ,2,21,1 ...
D.P. Solomatine. Data-driven modelling (part 3). 42
Additive model of combining rulesAdditive model of combining rules
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
22/4222
D.P. Solomatine. Data-driven modelling (part 3). 43
Fuzzy ruleFuzzy rule--based systems (FS)based systems (FS)
use linguistic variables based on fuzzy logic
based on encoding relationships between variables in the formof rules
rules are generated through the analysis of a large datasamples
such rules are used to produce the values of the outputvariables given new input values
D.P. Solomatine. Data-driven modelling (part 3). 44
Example: Fuzzy rules in controlExample: Fuzzy rules in control
MEDIUM
SLOW
STOP
FAST
BLAST
AIR MOTOR SPEED0 20 40 60 80 100
1.0
0.0
0.6
0.2
AIR MOTOR SPEED0 20 40 60 80 100
1.0
0.0
0.6
0.2
AIR MOTOR SPEED0 20 40 60 80 100
1.0
0.0
0.6
0.2
The weighted sum
combination method.
The crested weightedsum combinationmethod.
RIGHT
COOL
COLD
WARM
HOT
TEMPERATURE0C
5 10 15 20 25 30 35
1.0
0.0
0.6
0.2
defuzzyfication using
centroid of the area
RIGHT
COOL
COLD
WARM
HOT
BLAST
FAST
MEDIUM
SLOW
STOP
TEMPERATURE5 10 15 20 25 30 35
If Warm,then
fast
If Cool,then
slow
If Right,then
medium
If Hot,then
blast
If Cold,then
stop
100
80
60
40
20
0
AIRMOTORSPEED
rules like: IF Temperature is CoolTHEN AirMotorSpeed := Slow
Input: Temperature = 22What will be the AirMotorSpeed?
Temperature is
RIGHT with
degree of
fulfillment (DOF)
= 0.6
and WARM with
DOF = 0.2
two rules are fired
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
23/4222
D.P. Solomatine. Data-driven modelling (part 3). 45
Combining premises in a ruleCombining premises in a rule
Degree of fulfillment (DOF)is the extent to which the premise(left) part of a fuzzy rule is satisfied
The means to combine the memberships of the inputs to thecorresponding fuzzy sets into a DOF is called inference
Product inferencefor rule iis defined as:
(rule is sensitive to the change in the amount of truth containedin each premise)
Minimum inferencefor rule i is define like this:
( ) =
==K
k
kAii aADOF ki1
)(,
( ) ( )kA
Kkii aMinADOF ki ,..1
=
==
D.P. Solomatine. Data-driven modelling (part 3). 46
Combining rules: example for 2 inputsCombining rules: example for 2 inputs
Input 2
Output
Input 1
LL
H
M
M
H
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
24/4222
D.P. Solomatine. Data-driven modelling (part 3). 47
Combining rules: weighted sum combinationCombining rules: weighted sum combination
weighted sum combinationuses the DOF of each rule as aweight
If there are Irules each having a response fuzzy set BiwithDOF ofi, the combined membership function
=
==I
i
Biu
I
i
Bi
B
xMax
x
x
i
i
1
1
)(
)(
)(
AIR MOTOR SPEED0 20 40 60 80 100
1.0
0.0
0.6
0.2 The weighted sumcombination method.
D.P. Solomatine. Data-driven modelling (part 3). 48
Combining rules: crested weighted sum combinationCombining rules: crested weighted sum combination
crested weighted sum combinationis there, when each outputmembership function is clipped off at a height corresponding tothe rules degree of fulfillment
If there are Irules each having a response fuzzy set BiwithDOF ofi, the combined membership function
( )
( )
=
==I
i
Biu
I
i
Bi
B
xMinMax
xMin
x
i
i
1
1
)(,
)(,
)(
AIR MOTOR SPEED0 20 40 60 80 100
1.0
0.0
0.6
0.2
The crested weighted
sum combination
method.
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
25/4222
D.P. Solomatine. Data-driven modelling (part 3). 49
Combining rules:Combining rules: defuzzificationdefuzzification
Defuzzificationis a mapping from a fuzzy consequence ofconsequences Bito a crisp consequence
this is actually the identification of the fuzzy mean
the most widely used method is:
find the centroid (center of gravity) of the area below themembership function and take its abscissa coordinate as the crispoutput.
AIR MOTOR SPEED0 20 40 60 80 100
1.0
0.0
0.6
0.2The weighted sum
combination method.
defuzzyfication usingcentroid of the area
D.P. Solomatine. Data-driven modelling (part 3). 50
In the previous example the rules were given.In the previous example the rules were given.
But how to build them from data?But how to build them from data?
the following is given/assumed:
the known rule structure, that is the number of premises in eachrule
shapes of membership functions
the number of rules
the training set Tis given: a set ofSobserved inputs (a) andoutput (b) real-valued vectors:
It is assumed that we are training Irules with Kpremises in asystem, where the i-th rule has the following form:
where a is a crisp input,Aand Bare triangular fuzzy numbers.
parameters ofA and B (supports and kernels) are to be found
( ) ( ) ( )( ){ }SssbsasaT K ,...,1;,,...,1 ==
ikikii BthenAisaANDANDAisaANDAisaIf ,2,21,1 ...
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
26/4222
D.P. Solomatine. Data-driven modelling (part 3). 51
Building rules from data:Building rules from data:
weighted counting algorithm (1)weighted counting algorithm (1)
D.P. Solomatine. Data-driven modelling (part 3). 52
Building rules from data:Building rules from data:
weighted counting algorithm (2)weighted counting algorithm (2)
uses the subset of the training set that satisfies the premises ofa rule at least to a degree of fulfilment of threshold toconstruct the shape of the corresponding consequence
It is accomplished with the following steps (iis the rule number,kis the premise number)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
27/4222
D.P. Solomatine. Data-driven modelling (part 3). 53
Building rules from data:Building rules from data:
weighted counting algorithm (3)weighted counting algorithm (3)
1 Define the support (-i,k +i,k) of the i-th rules premiseAi,k 2 Ai,k is assumed to be a triangular fuzzy number
(-i,k 1i,k ,
+i,k)T where
1i,k is the mean of all possible ak(s)values
which fulfil at least partially the ithrule:
3 Calculate the DOFs i(s)for each premise vector(a1(s) ak(s))corresponding to the training set Tand each rule iwhose premises were determined in step 1.
4 Select a threshold > 0such that only responses with DOF >will be considered in the construction of the rule response. The
corresponding response is assumed to be also a triangular fuzzynumber (-i,k 1i,k , +i,k)T defined by:
)()(
sbMins
ii
>
=
>
>=
)(
)(1
)(
)()(
s i
s i
i
i
i
s
sbs)(
)(sbMax
si
i
>
+ =
=iRs
k
i
ki saN
)(11
,
D.P. Solomatine. Data-driven modelling (part 3). 54
Fuzzy ruleFuzzy rule--based system:based system:
learning rules from datalearning rules from data
HISTORICAL
DATATRAINING
RULES
CRISP
INPUT
(X)
FUZZIFIER
FUZZYINFERENCE
ENGINE
DEFUZZIFIERCRISP
OUTPUT
(Y)
EXPERT
JUDGEMENTS
(are not considered here)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
28/4222
D.P. Solomatine. Data-driven modelling (part 3). 55
Modeling spatial
rainfall distr ibut ionusing Fuzzy ru le-
based system :
filling missing data inpast records estimating rainfalldepth at a stationCaprile (based on datafor Arabba andAndraz) in case of asudden equipmentfailure
Arabba
Andraz
Caprile
Case study: catchment in Veneto region, ItalyCase study: catchment in Veneto region, Italy
D.P. Solomatine. Data-driven modelling (part 3). 56
Problem formulationProblem formulation
Daily precipitation at three stations in 1985-91
Data split for training and verification
Daily precipitation at Andraz & Arabba used to determine thedaily precipitation at Caprile
Performance indices
Mean square error (MSE) b/n modeled & observed data
Percentage of predictions within a predefined tolerance target (5%is used)
Problems:
missing records in training data
non-uniform distribution ofdata
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
29/4222
D.P. Solomatine. Data-driven modelling (part 3). 57
MethodsMethods cosideredcosidered
Traditional Normal ratio method
Neural network
Fuzzy rule-based system
++
= CC
XB
B
XA
A
XX P
N
NP
N
NP
N
NP
3
1
D.P. Solomatine. Data-driven modelling (part 3). 58
How many rules to use?How many rules to use?
Too many rules lead to overfitting and the higher error onverification
Effect of the Number of Rules
2
4
6
8
10
12
4 9 16 25 36
Number of rules
MeanSq
uareError
1988-91(T)
1985-87(V)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
30/4233
D.P. Solomatine. Data-driven modelling (part 3). 59
Results: best performanceResults: best performance
Training Performance
(1989-91)
0
10
20
30
40
50
60
70
80
0 10 20 30 40 50 60 70 80
Observed Precipitation
SimulatedPrecipitation
Verification Performance
(1985-88)
0
10
20
30
40
50
60
70
80
0 10 20 30 40 50 60 70 80
Observed Precipitation
SimulatedPrecipitation
Precipitation at CAPRILE for the
first 120 days of 1987
D.P. Solomatine. Data-driven modelling (part 3). 60
Veneto case study: comparison of fuzzy rules,Veneto case study: comparison of fuzzy rules,
neural network and the normal ratio methodneural network and the normal ratio method
Performance Comparison
(Case 1)
2
3
4
5
6
7
8
9
1989-9
1(T)198
5(V)198
6(V)198
7(V)198
8(V)
1985-8
8(V)
MeanSquareError
FRBS NNN TRAD
Performance Comparison
(Case 1)
86
88
90
92
94
96
98
1989-9
1(T)198
5(V)198
6(V)198
7(V)198
8(V)
1985-8
8(V)
Within5%Tolerance
FRBS NNN TRAD
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
31/4233
D.P. Solomatine. Data-driven modelling (part 3). 61
Veneto case study: conclusionsVeneto case study: conclusions
FRBS was the most accurate than the ANN and the Normal ratiomethod
training is faster than of ANN
Issues to pay attention to:
curse of dimensionality: more than 5 inputs is very difficult tohandle
too many rules may cause overfitting
non-uniformly distributed data lead to empty areas where rulescannot be trained
D.P. Solomatine. Data-driven modelling (part 3). 62
Case studyCase study DelflandDelfland: training ANN or Fuzzy: training ANN or Fuzzy
controller on data obtained from an optimal controllercontroller on data obtained from an optimal controllerin water level controlin water level control
Hydrological
processes in
the polders
ANN or
FRBS
model+ -
Aquarius
optimal
controller
Training
y(t)d y(t) u(t) y(t)
water
level
Target
waterlevel
Error in
controlsignal
pumping
rate
data-driven controller (ANN or Fuzzy rule-based system) is trained ondata generated by the optimal controller, then can replace it
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
32/4233
D.P. Solomatine. Data-driven modelling (part 3). 63
Case study:Case study: DelflandDelfland
D.P. Solomatine. Data-driven modelling (part 3). 64
Replicating controller by ANNReplicating controller by ANN
(output(output -- pump status at timepump status at time tt))
Input variables in Local control
water level at time t-1
water level at time t
pump status at time t-1
Input variables in Centralised dynamic control
precipitation at time t-2
precipitation at time t-1
precipitation at time t
water level at time t-1 water level at time t
groundwater level at time t
pump status at time t-1
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
33/4233
D.P. Solomatine. Data-driven modelling (part 3). 65
Performance of the Neural network reproducingPerformance of the Neural network reproducing
behaviourbehaviour of an optimal controllerof an optimal controller
Pumpstatus
D.P. Solomatine. Data-driven modelling (part 3). 66
Fuzzy rules reproducing optimal control of water level inFuzzy rules reproducing optimal control of water level in
DelflandDelfland
Pumpstatus
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
34/4233
D.P. Solomatine. Data-driven modelling (part 3). 67
Bayesian learningBayesian learning
D.P. Solomatine. Data-driven modelling (part 3). 68
Bayesian theoremBayesian theorem
we are interested in determining the best hypothesis hfromsome space H, given the observed data D
Some notations:
P(h) = prior probability that hypothesis h holds
P(D) = prior probability that training data D will be observed(without knowledge which hypothesis holds)
P(D/h) = probability of observing data D given h holds
P(h/D) = probability that h holds given observed data D
Bayes theorem:
)(
)()/()/(
DP
hPhDPDhP =
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
35/4233
D.P. Solomatine. Data-driven modelling (part 3). 69
Selecting "best" hypothesis usingSelecting "best" hypothesis using BayesBayes theoremtheorem
learning in Bayesian sense: selecting the most probablehypothesis (maximum a posteriori hypothesis MAP)
P(D/h) is called likelihoodof data D given h
if all hypotheses are equally probable, then maximum likelihood(ML)hypothesis:
)()/(maxarg
)(
)()/(maxarg
)/(maxarg
hPhDP
DP
hPhDP
DhPh
Hh
Hh
HhMAP
=
=
)/(maxarg hDPhHh
ML
=
D.P. Solomatine. Data-driven modelling (part 3). 70
Bayesian learning: exampleBayesian learning: example
hypothesis h = "patient has cancer", alternative = "no cancer"
prior knowledge (without data): P(h)=0.008
data that can be observed: test with 2 outcomes (+ or -):
right results:
P(+/cancer) = 0.98 P(-/nocancer) = 0.97
errors:
P(-/cancer) = 0.02 P(+/nocancer) = 0.03
suppose data is observed: a patient is tested and result is +is then hypothesis correct?: choose hypothesis with MAP, that ishypothesis for which P(D/h)P(h) = max
P(+/cancer) P(cancer) = 0.98 * 0.008 = 0.0078
P(+/nocancer) P(nocancer) = 0.03 * 0.992 = 0.0298
--> hypothesis "no cancer" wins
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
36/4233
D.P. Solomatine. Data-driven modelling (part 3). 71
NaiveNaive BayesBayes classifierclassifier
assume that each instance xof the data set is characterized bythe several attributes {a1,, an}
target function F(x) can take on any value from a finite set V
a set of training examples {xi} is provided
when a new instance < a1,, an> is presented, the classifiershould identify the most probable target value vMAP.
D.P. Solomatine. Data-driven modelling (part 3). 72
NaNaveve BayesBayes classifier (2)classifier (2)
This condition can be written like this
or by applying Bayes theorem:
P(vj) can be estimated simply by counting the frequency withwhich each target value vjoccurs in data
)()/,...,(maxarg
),...,(
)()/,...,(maxarg
1
1
1
jjnVv
n
jjn
VvMAP
vPvaaP
aaP
vPvaaPv
j
j
=
=
),...,/(maxarg 1 njVv
MAP aavPvj
=
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
37/4233
D.P. Solomatine. Data-driven modelling (part 3). 73
NaNaveve BayesBayes classifier (3)classifier (3)
terms P(a1,, an/ vj) can be estimated by counting in a similarway, however, the total number of these terms is equal to thenumber of possible instances times the number of possibletarget values - so it is difficult
The solution is in a simplifying assumption that the attributevalues a1,, anare conditionally independent given the targetvalue. In this case P(a1,, an/ vj) = i P(ai/ vj) and to estimateP(ai/ vj) is much easier also by counting the frequency.
This gives the rule of the nave Bayes classifier:
=i
jijVv
NaiveBayes vaPvPvj
)/()(maxarg
D.P. Solomatine. Data-driven modelling (part 3). 74
Modular models:Modular models:
committee machinescommittee machines, ensembles,, ensembles,
mixtures of experts, boostingmixtures of experts, boosting
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
38/4233
D.P. Solomatine. Data-driven modelling (part 3). 75
Committee machine (modular model)Committee machine (modular model)
Instead of building one model, several models are built eachresponsible for a particular situation
High flows
Low flows
Medium flows
Rainfall (t-3)
Rainfall (t-2)
Flow Q(t)
separate models are built
past records
New record
(hydrometeorological
condition).
It is to be attributed to one (or
several classes), and the
corresponding models will be
run
Consider a forecasting model Q(t+1) =f (R(t-2), R(t-3), Q(t-1)
D.P. Solomatine. Data-driven modelling (part 3). 76
Committee machine (modular model)Committee machine (modular model)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
39/4233
D.P. Solomatine. Data-driven modelling (part 3). 77
Committee machinesCommittee machines (modular model)(modular model)
input data is split into subsets and separate data-driven modelsare trained:
hard split : sort according to position in the input space (low - highrainfall) this allows to bring in the physical insight
no split : do not sort, but train several models on the same dataand then combine results by some voting scheme (committeemachine)
voting by majority, weighted majority, by averaging
soft split : split according to how well a given model trained withthis data, and then train also other models. Example: boosting
present the original training data (N examples) set to machine 1
assign higher probability to samples that are badly classified sample N examples from training set based on the new distribution
train machine 2
continue, ending with n machines
D.P. Solomatine. Data-driven modelling (part 3). 78
Committee machine with hard split,Committee machine with hard split,
expert (expert (specialisedspecialised) models trained on subsets) models trained on subsets
y1
Machine 2Machine 1 Machine n
Inputx
y2 yn
Splitting (gating machine)
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
40/4244
D.P. Solomatine. Data-driven modelling (part 3). 79
Committee machine with no split (ensemble),Committee machine with no split (ensemble),
all models are trained on the same setall models are trained on the same set
y1
Machine 2Machine 1 Machine n
Inputx
y2 yn
No splitting
Combiner (averaging scheme)
y
D.P. Solomatine. Data-driven modelling (part 3). 80
Committee machine with soft split of data.Committee machine with soft split of data.
BoostingBoosting
y1
Machine 2Machine 1 Machine n
Input x
Combiner (weighted averaging scheme)
y2
yn
samplingNtraining examples from the distribution where
badly predicted examples are given higher probability
y
redistribution
redistribution
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
41/4244
D.P. Solomatine. Data-driven modelling (part 3). 81
UsingUsing mix ture of expert s (models)mix ture of expert s (models) ::
each model is for particular hydrological conditioneach model is for particular hydrological condition
Condition 3
Pa t-1 > 50
Pa Mov2 t-2 200
Module
1
Module
2
Y
N
M5
ANN
M5
ANN
?
D.P. Solomatine. Data-driven modelling (part 3). 82
Combining physicallyCombining physically--based and databased and data--driven models.driven models.
Complementary use of a DataComplementary use of a Data--driven modeldriven model
HYDROLOGIC
FORECASTING
MODEL
Input data
Observed output
Model
errors
Forecastederrors
DATA-
DRIVENerror
forecastingmodel
Improved
output
Model
parameters
Model output
PHYSICAL SYSTEM
HYDROLOGIC
FORECASTING
MODEL
Input data
Observed output
Model
errors
Forecastederrors
DATA-
DRIVENerror
forecastingmodel
Improved
output
Model
parameters
Model output
PHYSICAL SYSTEM
7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)
42/42
D.P. Solomatine. Data-driven modelling (part 3). 83
End of Part 3End of Part 3