Page 1
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 1/19
Machine Learning
The field of machine learning is concerned with
the question of how to construct computer
programs that automatically improve with
experience.
(Mitchell, 1997)
Things learn when they change their behavior in
a way that makes them perform better in a
future.
(Witten, Frank, 1999)
types of learning: knowledge acquisition
skill refinement
Relation between machine learning and data mining
Page 2
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 2/19
select learning
representation
learning knowledge
decision
making
object object decision decision
description making
General scheme of a learning system
Learning methods:
1. rote learning,
2. learning from instruction, learning by being told),
3. learning by analogy, instance-based learning, lazy
learning,
4. explanation-based learning,
5. learning from examples,
6. learning from observation and discovery.
Page 3
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 3/19
Learning methods:
statistical methods - regression methods,
discriminant analysis, cluster analysis,
symbolic machine learning methods - decision
trees and rules, case-based reasoning (CBR)
sub-symbolic machine learning methods –
neuronal networks, bayesian networks or genetic
algorithms.
Feedback during learning:
pre-classified examples (supervised learning)
small number of pre-classified examples and a
large number of examples without known class
(semi-supervised learning)
algorithm can query the teacher for class
membership for unclassified examples (active
learning),
indirect hints derived from the teacher´s
behavior (apprenticeship learning)
no feedback (unsupervised learning)
Page 4
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 4/19
representation of examples:
1. attributes: categorial (binary, nominal, ordinal)
and numeric [hair=black & height=180 & beard=yes & education=univ]
2. relations father(jan_lucembursky, karel_IV)
Algorithms:
batch – all examples are processed at once
incremental – examples are processed
subsequently system can be „re-trained“
Learning methods:
empirical – uses large set of (training) examples
and limited (or no) background knowledge
analytic – uses large background knowledge and
several (one or even no) illustrative examples
Page 5
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 5/19
Principles of empirical concept learning
1. examples of the same class have similar
characteristics (similarity-based learning)
1. examples of the same class create
clusters in the attribute space
The goal of learning is to find and
represent these clusters
„garbage in, garbage out“ problem
Importance of data understanding and
preprocessing
Page 6
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 6/19
2. General knowledge inferred from a finite set
of examples (inductive learning)
Examples divided into 2 (or 3) sets:
o training set to build a model
o (validation set to tune the parameters)
o testing set to test the model
Page 7
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 7/19
General definition of (supervised) machine
learning
Analyzed data:
m n2 n1 n
m 22 21 2
m 12 11 1
x......xx
:::
x......xx
x......xx
D
Rows in the table represent objects (examples, instances)
Columns in the table correspond to attributes (variables)
When adding target attribute to the data table, we obtain
data suitable for supervised learning methods (so called
training data).
n
2
1
m n2 n1 n
m 22 21 2
m 12 11 1
TR
y
:
y
y
x......xx
:::
x......xx
x......xx
D
Classification task: to find knowledge (represented by a
decision function f), that assigns value of target attribute y
to an object described by values of input attributes x
f: x y.
Page 8
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 8/19
We infer during classification for values of input attributes
x for an object the value of target attribute:
ŷ = f (x).
The derived value ŷ can be different from the real value y.
We can thus compute for every object oi DTR the
classification error Qf(oi, ŷi).
for numeric attribute C e.g. as:
Q y = (y - yf i i i( , ) )oi2
for categorial attribute C e.g. as:
ii
ii
ify = y iff 0
y y iff 1 = )y ,(Q io
We can compute the overall error Err(f,DTR) for the whole
training set DTR e.g. as mean error:
Err(f,D = 1
nQ yTR f
i=1
n
i) ( , )o i .
The goal of learning is to find such knowledge f*, that will
minimize this error
Err(f*,DTR) )TRf
DErr(f, min .
Page 9
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 9/19
1. Learning as search
Looking for both structure and parameters
of the model - number of clusters and their location
Models as cluster descriptions:
MGM – most general model (one cluster for all
examples)
MSM – most specific model(s) (each example
creates a cluster)
M1 is more general than M2, M2 is more specific
than M1
Bell numbers
1159755215521)(
1054321
1)0(),(1
)(1
nB
n
BkBk
nnB
n
k
Page 10
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 10/19
Search methods:
Direction
Top down (from general to specific
models)
Bottom up (from specific to general
models)
Strategy
blind (we consider each possibility how
to specialize/generalize given model)
heuristic (we use some criterion to
select only the “best” possibilities how
to specialize/generalize given model)
random
Bandwidth
single (we consider only one
transformation of actual model)
parallel (we consider more
transformations)
Entia non sunt multiplicanda praeter
necessitatem.
(William of Ockham, 1285 – 1327)
Page 11
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 11/19
Example:
Let us assume, that both input attributes and target
attribute are categorial – let us denote category the value
of an attribute:
1. atomic formula that expresses property of object
kij
kijikji
v x 0
v x 1 ))((v A:
pro
prooo
2. set of objects that fulfill given property
}kijkj v x : { )(vA io
Combinations are created from categories using logical AND
)(v A... )(v A )(v A )](vA),...,(v A),(v A[ Combll2211ll2211 kjkjkjkjkjkj
1.
else 0
v x... v x v xif 1 )Comb( : ll2211 kijkijkij
ii oo
2. }ll2211
kijkijkij v x ... v x v x : { Comb io .
Comb covers object oi iff Comb(oi) = 1
We can create supercombinations by adding categories to a
combination and create subcombinations by removing
categories from a combination.
Page 12
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 12/19
Partial ordering between combinations:
If combination Comb1 is a subcombination of
combination Comb2, then combination Comb1 is more
general than combination Comb2 and combination
Comb2 is more specific than combination Comb1.
If combination Comb1 is more general than combination
Comb2, then Comb1 covers at least all objects that are
covered by Comb2. (downward-closure property)
The resulting knowledge will be represented by
combinations that cover only examples of given class.
Combination Comb is consistent, iff it covers only examples
of a single class:
tiiTRit v y 1)Comb( :D C(v oo)
Example data: příjem konto pohlaví nezaměstnaný auto bydlení úvěr
vysoký vysoké žena ne ano vlastní Ano
vysoký vysoké muž ne ano vlastní Ano
nizký nízké muž ne ano nájemní Ne
vysoký vysoké muž ne ne nájemní Ano
Combination Comb (hypothesis representing the concept
„úvěr“) can contain following values of an attribute:
„?“ to indicate that the value of this attribute is
irrelevant,
value of the attribute,
„“ to indicate that no value of this attribute is
applicable.
Page 13
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 13/19
Hypothesis space
We can traverse the hypothesis space using two methods:
from general to specific (top-down, specialization),
from specific to general (bottom-up, generalization).
[vysoký,?, ?, ?, ?, ?] [?,vysoké, ?, ?, ?, ?]
[vysoký, ?, ?, ne, ?, ?] [vysoký,vysoké,?, ?, ?, ?] [?, vysoké, ?, ne, ?, ?, ?]
[?, ?, ?, ?, ?, ?]
[?,?, ?, ?, ?, vlastní] [?, ?, žena, ?, ?, ?] ... ....
[, , , , , ]
[vysoký,vysoké, ?, ne, ?, ?]
[vysoký,vysoké,?, ne,ano,?] [vysoký,vysoké,?,ne, ?,vlastní] [vysoký,vysoké,muž, ne, ?,?]
[vysoký,
vysoké,?,
ne,ano,
vlastní]
[vysoký,
vysoké,
muž,ne,
?,vlastní]
[vysoký,
vysoké,
muž,ne,
ano, ?]
[vysoký,vysoké,žena,ne,
ano, vlastní]
[vysoký,vysoké,muž,ne,
ano, vlastní]
[vysoký,vysoké,muž,ne, ne,
nájemní]
[vysoký,
vysoké,
žena,ne,
?,vlastní]
[vysoký,
vysoké,
muž,ne, ne,
?]
Page 14
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 14/19
[vysoký, vysoké,?,ne, ?, ?]
[vysoký,vysoké,?, ne,ano,?] [vysoký,vysoké,?,ne, ?,vlastní] [vysoký,vysoké,muž, ne, ?,?]
[vysoký,
vysoké,?,
ne,ano,
vlastní]
[vysoký,vysoké,žena,ne,
ano, vlastní]
[vysoký,vysoké,muž,ne,
ano, vlastní]
[vysoký,vysoké,muž,ne, ne,
nájemní]
[vysoký,
vysoké,
muž,ne, ne,
?]
S:
Find-S algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training example x
2.1. For each attribute ai from hypothesis h
if value of attribute ai does not correspond to x
then replace value of ai by the next more general
value that corresponds to x
3. output h
Page 15
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 15/19
[vysoký,?, ?, ?, ?, ?] [?, vysoké, ?, ?, ?, ?]
[vysoký, ?, ?, ne, ?, ?] [vysoký,vysoké,?, ?, ?, ?] [?, vysoké, ?, ne, ?, ?, ?]
[vysoký, vysoké,?,ne, ?, ?]
G:
S:
Candidate-Elimination algorithm
1. Initialize G to the set of maximally general hypotheses in H
2. Initialize S to the set of maximally specific hypotheses in H
3. for each example x
3.1. if x is a positive example then
remove form G any hypothesis inconsistent with x
for each hypothesis s in S that is not consistent with x
remove s from S
add to S minimal generalization h of s such, that h is
consistent with x and some member of G is more
general than h
remove from S hypotheses that are more general than
another hypothesis in S 3.2. if x is a negative example then
remove from S any hypothesis inconsistent with x
for each hypothesis g in G that is not consistent with x
remove g from G
add to G minimal specialization h of g such, that h is
consistent with x and some member of S is more
specific than h
remove from G hypotheses that are more specific than
another hypothesis in G
Page 16
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 16/19
2. Learning as approximation
Looking “only” for parameters of the model
within a given class of models
Example:
using data points [xi, yi] to find parameters of a linear
function to best fit the data
f(x) = q1x + q0
least squares method:
the problem of finding the minimum of the overall error
min i (yi - f(xi)) 2
is transformed to solving the equation
d
dq
i
(yi - f(xi))2 = 0
x
x
x
x
x
x
x
x
y=f(x)
Page 17
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 17/19
solution:
1) analytic (we know the type of function)
solving the equations for the parameters of
functions
q0 = (kyk )(kxk
2) - (kxkyk )(kxk )
n(kxk2) - (kxk)2
q1 = n(kxkyk ) - (kxk )(kyk)
n(kxk2)- (kxk)2
2) numeric (we do not know the type of function)
gradient methods
Err(q) =
Qq
Err,...,
q
Err,
q
Err
10.
Modification of knowledge q = [q0, q1, ..., qQ] according the
algorithm
qj qj + qj
where
j
jq
Errη- Δq
and is a parameter expressing „step“ used to approach the
minimum of function Err.
Page 18
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 18/19
E.g. for error function
Err(f,D = 1
2(y - y
1
2(y - f`(TR i i
i=1
n
i
i=1
n
) ) ))2 2
x i
and expected function f as linear combination of inputs
f(x) = q x ,
we can derive the gradient of function Err as
ii
n
1 j
ii
2
ii
n
1i jj
y~-y q
y~-y22
1=y~-y
q2
1=
q
Err i
ij
n
1i
iii
n
1i j
ii x-y~-y=-y q
y~-y=
qx
So
q = y - y xj i i ij
i=1
n
Problem with convergence to local minimum
Page 19
Knowledge Discovery in Databases T4: machine learning
P. Berka, 2019 19/19
Tuning hyperparameters:
Incorporating search into learning as
approximation, i.e. search for optimal class of models
specifying number of clusters for k-means
clustering
specifying type of function for regression
analysis
specifying topology of neural network
approaches: heuristic methods e.g. genetic algorithms