Supervised Automatic learning models: A new perspective Eugenio F. Sánchez Úbeda Modelling and Simulation in Science 6 th Workshop of DAA 15-22 April 2007, Erice, Italy
Supervised Automatic learning models: A new perspective
Eugenio F. Sánchez Úbeda
Modelling and Simulation in Science6th Workshop of DAA15-22 April 2007, Erice, Italy
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 2Erice, April 2007
Contents
• Summary of main concepts• Nature of learning problem• Learning difficulties• Multidimensional approaches• Summary and future research
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 3Erice, April 2007
Motivation
• Huge amounts of data are available in many disciplines of Science (& Industry)
• A large number of “different” learning approaches have been proposed
• Each domain uses its own terminology
Data
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 4Erice, April 2007
Objective
• Present a new outlook on the main existing learning strategies by:– Providing a rich overview of the main principles and
methods underlying most of the supervised models – Using a taxonomy that allows highlighting the similarity of
some models whose original motivation comes from different fields
DataUniverse of automatic
learning models Knowledge
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 5Erice, April 2007
Data Mining process: typical procedure
• Problem definition (variables)• Attribute selection• Model generation • Interpretation and validation of results • Model application
DB of examples
Automatic Learning
Knowledge about the system
Collection of data
RealSystem
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 6Erice, April 2007
Automatic learning: main goals
MODEL
System
Data
Running the model
Estimate
Looking at the model
Understand
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 7Erice, April 2007
Supervised learning: Idea
Real System
Observed Inputs
Non-observed Inputs
Outputs
z 1
zΩ
yi = gi ( x1 ,L , xp , z1 ,L , zΩ )yq
y1
y2
x2
x1
xp
ModelObserved Inputs Outputs
yq
y1
y2x2
x1
xp ipii xxy εφ += ),,( 1 L
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 8Erice, April 2007
Data base of examples
Real System
Observed Inputs Outputs
DB of examples
Non-observed Inputs
x, y( )
yi = gi ( x1 ,L , xp , z1 ,L , zΩ )yq
y1
y2
z 1
zΩ
x2 x1
xp
Set of possible situations (i.e. input vectors)
Simulate behaviour of the real system
(i.e. obtain output vectors)
Generate possible
situationssimulated outputs
DB of examples
x, y( )
yx
z
• Via simulation:
• Via collection:
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 9Erice, April 2007
Sets of examples
{ } Neyyxx qeepee ,1 ,),,(),,,( 11 =LL
Growing set Pruning set Test or Validation set
Build model(learning process)
Evaluate future effectiveness
of the model
Model
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 10Erice, April 2007
Learning problem definition
• Simple to state:
– “Find a model of a desired dependence using a limited number of observations”
• … but difficult to solve in general
– Two main difficulties:• Finite size of the set of examples• Random noise
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 11Erice, April 2007
Learning difficulties: Lack of examples
(a) Infinite number of points (b)
x
φ (x )y
Finite number of points
x
φ (x )y
x y
∞
model
? ? ? ??
InterpolateExtrapolateInfinite examples covering the whole input space
(Considering the case without noise)
Ideally Ideal reality
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 12Erice, April 2007
Learning difficulties: Noise
(a) Infinite number of points (b)
x
y
Finite number of points
x
y
? ? ? ??
φ (x )+ ε
φ (x )
φ (x )+ ε
mean = φ (x)
Infinite examples covering the whole input space
No problem
True Reality
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 13Erice, April 2007
Learning problem: Statistical viewpoint (I)
qigzzxxgy ipii ,1),,(),,,,,( 11 === Ω zxLL
zzxx ∫= dpp ),()(
(a) y=g(x,z)
0
5
10 0
5
10246
x
z
(b) joint probability density function p(x,z)
0
5
10 0
5
10
x
z
(c)y=f(x)+noise
0 5 10
2
4
6
x
(d)
0 2 4 6 8 10x
marginal probability density function p(x)
εφ += )(xy
Ideally Reality
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 14Erice, April 2007
Learning problem: Statistical viewpoint (II)
• For a given point x we can assume that the outputs are random variables defined by:
∫=∀
=yzxz
zxzxy),(/
)()(g
dpp
qixxy ipii ,1,),,( 1 =+= εφ L
y φ (x )
x
φ (x )φ (x ) + ε
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 15Erice, April 2007
Overfitting and oversmoothing
(b) Oversmoothing (d)
classification problem
(a) Oversmoothing
classification problem
Overfitting
(c)
regression problem regression problem
Overfitting
• Learning algorithms must avoid being trapped by the overfitting and oversmoothing problems
Typical in
“Conservative”
algorithms
Typical in “Risky”algorithms
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 16Erice, April 2007
The bias-variance trade-off: Idea (I)• We are playing darts
– dart player's objective is to hit the bull's-eye target– Imagine that the player has his eyes bandaged– the player has to estimate where is the bull's-eye before
throwing the dart
True underlying function (unknown)
Learning set 1
Learning set 2model
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 17Erice, April 2007
The bias-variance trade-off: Idea (II)
target
bias
realization
+variance
Error = bias2 + variance
• The dart player's error can be decomposed in two components: – Systematic error (bias)– Random error (variance)
we are using (for the same true target) several realizations to
measure the dart player's accuracy.
In practice the bias and variance cannot be easily estimated
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 18Erice, April 2007
The bias-variance trade-off: Example I
• Model: running-line smoother
0
0.2
0.4
0.6
0.8
1
1.2
1.4
-4 -3 -2 -1 0 1 2 3 4x
True functionLS1 (N=100)LS2 (N=100)
y
x
Pass over the data
xexe−w xe+w
2w + 1
y
x
xexe−w xe+w
2w + 1
(auxiliary) model
se
averaging process
abxaves ewe +)( =•=
• Problem:
• 100 learning sets– 100 examples
each one
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 19Erice, April 2007
The bias-variance trade-off: Example I(a)
0
0.2
0.4
0.6
0.8
1
1.2
-4 -3 -2 -1 0 1 2 3 4x
True functionmodels (w=1)
(b)
0
0.2
0.4
0.6
0.8
1
1.2
-4 -3 -2 -1 0 1 2 3 4x
True functionmodels (w=30)
(a)
0
0.2
0.4
0.6
0.8
1
1.2
-4 -3 -2 -1 0 1 2 3 4x
True functionaverage model (w=1)
(b)
0
0.2
0.4
0.6
0.8
1
1.2
-4 -3 -2 -1 0 1 2 3 4x
True functionaverage model (w=30)
• Too rigid models:
• Too flexible models:
Small variance
High variance
Small bias
High biasOversmoothing
Overfitting
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 20Erice, April 2007
The bias-variance trade-off: Example I
0
0.001
0.002
0.003
0.004
0.005
0 5 10 15 20 25 30w
squared biasvariance
MSE
MSE = bias2 + variance
Min. MSE (validation set)Good compromise between bias and variance
variance
bias
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 21Erice, April 2007
The bias-variance trade-off: Example II
Straight lineLarge MSE (mostly bias)
2nd degree polynomialSmallest MSE
• 100 learning sets– 500 examples
each one
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 22Erice, April 2007
Curse of dimensionality (I)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
p=1 N=1000 (Uniform)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
p=2 N=1000 (Uniform)p=3 N=1000 (Uniform)
0 0.3 0.6 0.9 00.3
0.60.9
00.30.60.9
• This curse replaces the geometrical intuition gained from low dimensional spaces with surprising and unexpected properties of the high dimensional ones
• Same number of points, uniformy generated on [0,1]p
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 23Erice, April 2007
Curse of dimensionality (II)
• Maintaining the sampling density:
0
50
100
150
200
250
300
350
400
450
500
1 1e2 1e4 1e6 1e8 1e10
sam
plin
g de
nsity
N
p=1p=2p=3p=4p=5
high-dimensional spaces are always very sparse
p=1 p=5p=2 p=3 p=4
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 24Erice, April 2007
Supervised models
• Two main parts:– internal structure (set of possible functions)
– parameters (select the function)
• Example:M
M xxxxfy ββββ ++++== L2210)(
{ }Mββ ,,0 L
MParameters:
No. parameters (structure):M=2
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 25Erice, April 2007
Learning strategies• To be successful, several decisions need to be made
correctly:
Select internal structure
of the model
Adjust parameters of the model
Try with another structure?
yes
human decision
human decision or automatic
automatic
yes
Try with another type?
Select type of model
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 26Erice, April 2007
)(xfy =
Dealing with high dimensions
• Standard approach:– Divide/Combine and conquer via additive models
• Weighted sum of basis functions
)(xjB
∑=
=Mj
jj Bf,0
)()( xx β
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 27Erice, April 2007
Multidimensional models: strategies
• Approaches– Partitioning the input space– Projecting– Using norms
RBFN’sClassification and regression trees MLP’s
∑=
==Mj
jj Bfy,0
)()( xx β
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 28Erice, April 2007
Partitioning the input space (I)
• Membership function:
⎩⎨⎧ ∈
=otherwise0
if1)(
RR
xxμ
x1
x2
μR (x)
μ a1 ,b1[ ]
μ a2 ,b2[ ]
R
[ ]∏=
=pu
ubaR xuu
,1, )()( μμ x
• Using axis-oriented membership functions:
R
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 29Erice, April 2007
Partitioning the input space (II)
• Crisp vs fuzzy partitioning:
μR (x)
x
1
(b)
0
Fuzzy partitioning
x
1
(a)
0
Crisp partitioning
μR (x)
very_low low medium high very_high
non-overlapping regions
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 30Erice, April 2007
Partitioning the input space (III)• Example: Regression tree (a)
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-3.5 -2.4 -1.1 0.7 1.7 2.6 4x
modelTrue f.Scatt.
(b)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
-3.5 -2.4 -1.1 0.7 1.7 2.6 4x
0.255*B20.644*B40.968*B60.734*B7
Learning set: parabola_noise.tr N=81Algorithm:Regression Var_min: 0.012Number_of_nodes=19 (Number_of_basis_functions=10)
Output:
-0.2000 (y min) 0.2667 0.7333 1.2000 (y max)0.657
1e-01
x1 < -2.45?
81
Y0.2904e-02
x1 < -3.25?
16
Y0.1142e-02
x1 < -3.55?
8
Y0.0305e-03
0.030
5N
0.2551e-03
0.255
3
N0.4672e-03
0.467
8
N0.7479e-02
x1 < 2.65?
65
Y0.8762e-02
x1 < -1.95?
51
Y0.6449e-03
0.644
5
N0.9012e-02
x1 < 1.75?
46
Y0.9421e-02
x1 < -1.15?
37
Y0.8484e-03
0.848
8
N0.9681e-02
0.968
29
N0.7348e-03
0.734
9
N0.2754e-02
x1 < 3.45?
14
Y0.4039e-03
0.403
8N
0.1043e-02
x1 < 3.95?
6
Y0.1651e-02
0.165
5 N-0.1990e+00
-0.199
1B4
B6
0.644*B4
0.968*B6
∑=
==Mj
jj Bfy,0
)()( xx β
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 31Erice, April 2007
Projecting (I)
• Inner product:
• To obtain the standard projection of v along u the previous quantity must be normalized (dividing by the module u)
vuvuT
=θcos
∑=
==pk
kkTT vu
,1uvvu
u
v
uT vu
u T vv
θ
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 32Erice, April 2007
Projecting (II)
-1 α0
α1
α2x2
x1
Neural network representation
(b)
0)( α+== xαx Tfy
),,( 1 pαα L=α
0)( α+== xαx Tfy
2=p
α : controls the slope: controls the orientationα
0α : mean value
• Example: Straight hyperplane
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 33Erice, April 2007
Sigmoid-like basis functions (I)
-1
Contour lines
f (−α 0 + αT x)α0
α1
α2x2
x1
x1
x2
x2 x1
Neural network representation
(a)orientation
position
(b)
(α 1,α2 )
α0
Sigmoid
)exp(11)()()( 0 η
ηα−+
==−= ffB Tj xαx +
• S-shaped functions:
xαT+02 αη −=
∑=
==pk
kkTT vu
,1uvvu
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 34Erice, April 2007
Sigmoid: varying the orientation
),( 21 αα : controls the orientation
)exp(11)()()( 0 η
ηα−+
==−= ffB Tj xαx + xαT+0
2 αη −=
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 35Erice, April 2007
Sigmoid: varying the slope
αincreasing
α : controls the slope
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 36Erice, April 2007
Sigmoid: varying the position
0αincreasing α(fixed )
α/0α : center of the sigmoid
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 37Erice, April 2007
Norms
( ) ( ) ( )vuvuvu −−=−=− ∑=
T
pkkk vu
,1
22
( ) ( )vuWvuvuW
−−=− T2
• Weighted norms
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
2
21
10
01
pσ
σOW
• Elliptical norms
• The (squared) Euclidean distance:
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 38Erice, April 2007
Gaussian-like Basis Functions (III)
( ) ( )vuWvuvuW
−−=− T2
(a)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
center-20 20x
Gaussian
sigma=0.1sigma=0.2sigma=0.5
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛ −−= 2
2
2exp)(
σςx
xf
• One-dimensional:
: spread: center (position)ς
σ
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛−−
⎥⎦
⎤⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛−−
−=22
112
2
21
22
11
/100/1exp)(
ςς
σσ
ςς
xx
xx
fT
x
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 39Erice, April 2007
Gaussian: varying the position
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 40Erice, April 2007
Gaussian: varying the orientation
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 41Erice, April 2007
Gaussian: varying the spread (both dir.)
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 42Erice, April 2007
Gaussian: varying the spread (one dir.)
1σincreasing
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 43Erice, April 2007
Norms and projections (I)
A B
AB
Euclidean distance
Inner product
C
C
AB
contour linescontour lines
Euclidean distance x − z Inner product zTx
x2
x1
z
x1
x2
zC
AB
C
(a) (b)
u
v
uT vu
u T vv
θ
u − v
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 44Erice, April 2007
Norms and projections (II)• They are mathematical rules which provide
– a particular grouping of the data in the input space (clusters)– a ranking of these clusters.
• All points within a particular cluster have the same index. These indexes provide an order relationship between clusters.
multidimensionalinput space
Clustering
Index assignment
one-dimensionalspace
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 45Erice, April 2007
Taxonomy of models: criteria
• Number of input variables– One-dimensional models (e.g. straight line)– Multidimensional models (e.g. hyper plane)
• Complexity– Basic models (e.g. Gaussian)– Sophisticated models (e.g. RBFN)
• Internal structure of the model– Structured models (e.g. linear regression)– Unstructured models (e.g. K-NN)– Hybrid
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 46Erice, April 2007
Taxonomy of one-dimensional models
ConstantStraight lineSigmoid-likeGaussian-like...
basic
PolynomialsSplinesWaveletsHinges...
sophisticated
structured
Running-meanRunning-line...
basic
Supersmoother...
sophisticated
unstructured
one-dimensional models
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 47Erice, April 2007
Taxonomy of multidimensional models
ConstantStraight hyperplaneSigmoid-likeGaussian-likeHinges...
basic
RBFNMLPCARTFuzzy treesMARSORTHO & OBLIQUE...
sophisticated
structured
SMART
hybrid
Standard K-NN...
basic
MacheteDT GANN...
sophisticated
unstructured
multidimensional models
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 48Erice, April 2007
Example: Sophisticated structured models
• Radial Basis Function neuralnetworks (RBFN’s):– Gaussian-like BF’s
• MultiLayer Perceptrons(MLP’s):– Sigmoid-like BF’s
β0
β j
βM
InputLayer
Output Layer
BM
Bj
B0
B1
β1
x1
x2
xk
xp
HiddenLayer
A B
AB
Euclidean distance
Inner product
C
C
RBFN’s
MLP’s ∑=
==Mj
jj Bfy,0
)()( xx β
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 49Erice, April 2007
Summary
• The learning problem is difficult to solve in general• Be careful when estimating the generalization
capability of a model• Multidimensional models (like buildings) are made of
small simple pieces• Norms and projections are power mathematic
instruments for building models• The proposed taxonomy provide a rational approach
to automatic learning models
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 50Erice, April 2007
Expected future research
High interpretability
Low interpretability
Low precision High precision
Artificial neural networks
Decision trees
Very accurate models, but still black-boxesVery comprehensible models, but not too accurate
Goal forfuture models
Very comprehensible and accurate models?
Supervised Automatic Learning models: A new perspective (E. Sánchez-Úbeda) - 51Erice, April 2007
SUPERVISED AUTOMATIC LEARNING MODELS:
A NEW PERSPECTIVE
Eugenio Fco. Sánchez Úbeda