Interpolation RBF Regularized RBF Generalized RBF XOR problem Optmization Methods for Machine Learning Radial Basis function Laura Palagi http://www.dis.uniroma1.it/∼palagi Dipartimento di Ingegneria informatica automatica e gestionale A. Ruberti Sapienza Universit ` a di Roma Via Ariosto 25 RBF Networks L. Palagi
28
Embed
Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Optmization Methods for Machine LearningRadial Basis function
Laura Palagihttp://www.dis.uniroma1.it/∼palagi
Dipartimento di Ingegneria informatica automatica e gestionale A. RubertiSapienza Universita di Roma
Via Ariosto 25
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Interpolation problem
Given p distinct points in Rn:
X = {x i ∈ Rn, i = 1, . . . ,p},
and a corresponding set of real numbers
Y = {y i ∈ R, i = 1, . . . ,p}.
The interpolation problem consists in finding a functionf : Rn → R, in a given class of real functions F , which satisfies:
f (x i) = y i i = 1, . . . ,P. (1)
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Interpolation propertiesFor n = 1 the Interpolation pb. can be solved explicitly usingpolynomials
f (x) =P−1∑i=0
ci t i
For n > 1, the 2-layer MLP with g not polynomial satisfies
P∑j=1
v jg(w j T x i − bj) = y i , i = 1, . . . ,P
for some w j ∈ Rn, and v j ,bj ∈ R.
MLP can approximate arbitrarily well a continuous functionprovided that an arbitrarily large number of units is available.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Interpolation properties
Being an universal approximator may be not enough fromtheoretical point of vew. An important property is the
existence of a best approximation
Informally: given a function f belonging to some set of functionsF and given a subset A of F find an element of A which isclosest to f . If d(f ,g) is the distance between two elements f ,gin F , we consider the problem
d∗A = infa∈A
d(f ,a)
If there exists a∗ ∈ A that attains the infimum, namelyd∗A = d(f ,a∗) then a∗ is the best approximation to f from A.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Best approximation properties
MLP does not have the best approximation property.
Consider other approximation scheme based on Radial Basisfunctions (RBF)
φ(‖x − x j‖)
with j = 1, . . .P.φ : R+ → R is a suitable continuous function, called radial basisfunction since it is assumed that the argument of φ is the radiusr = ‖x − x j‖.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Gaussian
φ(r) = e−(r/σ)2
with r > 0
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Multiquadric
φ(r) = (r2 + σ2)1/2
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Inverse Multiquadric
φ(r) = (r2 + σ2)−1/2
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Other RBF
φ(r) = r linear splineφ(r) = r3 cubic splineφ(r) = r2 log r , thin plate spline.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Interpolation by RBF
Given p distinct points in Rn:
X = {x i ∈ Rn, i = 1, . . . ,P},
and consider functions of the form
f (x) =P∑
j=1
wjφ(‖x − x j‖), (2)
where the data points x j ∈ X are the so called centers and thecoefficients wj ∈ R are the weights.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Interpolation by RBFBy imposing the interpolation conditions we get:
P∑j=1
wjφ(‖x i − x j‖) = y i , i = 1, . . . ,P. (3)
It is a linear system of P equations in P unknowns. Let definethe vectors w =
(w1 · · · wP
)T, and y =
(y1 · · · yP
)T,
and the symmetric P × P matrix Φ with elements
Φi,j = φ(‖x i − x j‖), 1 ≤ i , j ≤ P,
system (3) cam be written as:
Φw = y .
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Matrix Φ is non singular, provided that P ≥ 2, that theinterpolation points x j , j = 1, . . . ,P are distinct and using
I Gaussian (Φ positive definite)I the multiquadricI the inverse multiquadric (Φ positive definite)I linear spline
Thus, the interpolation problem Φw = y admits a uniquesolution. When φ pos. def. it can be computed by minimizingthe (strictly) convex quadratic function in RP
F (w) =12
wT Φw − yT w ,
whose gradient is given by ∇F (w) = Φw − y .
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
From Interpolation to approximation properties
Because of the remarkable properties of the RBFs, the RBFmethod is one of the most often applied approaches inmultivariable interpolation.
This has motivated the attempt of employing RBFs also withinapproximation algorithms for the solution of classification andregression problems in data mining.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Regularized RBF neural networks
Suppose that the set {(xp, yp), p = 1, . . . ,P} of data has beenobtained by random sampling of a function belonging to somespace of functions X in the presence of noise
This problem of recovering the function or an estimate of it fromthe set of data is clearly ill posed since it has an infinite numberof solutions.
In order to choose one particular solution we need to havesome a priori knowledge of the function that has to bereconstructed.
The most common form of a priori knowledge consists inassuming that the function is smooth in the sense that twosimilar inputs correspond to two similar outputs.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Regularized RBF neural networksThe solution can be obtained from a variational principle whichcontains both the data and smoothness information.
Smoothness is a measure of the ”oscillatory” behavior of f .Within a class of differentiable functions, one function is said tobe smoother than another one if it oscillates less. Asmoothness functional E2(f ) is defined and we consider
minfE(f ) = E1(f ) + λE2(f ) =
12
P∑i=1
[y i − f (x i)]2 + λE2(f ),
where the first term is enforcing closeness to the data and thesecond smoothness while the regularization parameter λ > 0controls the tradeoff be tween these two terms.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Regularized RBF neural networksIt can be shown that for a wide class of smoothness functionalsE2(f ), the solutions of the minimization all have the same form
P∑i=1
wiφ(‖x − c i‖) = y ,
Centers coincides with inputs
c i = x i , i = 1, . . . ,P
and weights solve the regularized system
(Φ + λI)w = y
whereΦ = {Φij}i,j=1,...,P = {φ(‖x i − x j‖)}i,j=1,...,P
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
2-layer Regularized RBF network
y(x)
-x t����
�������3
�����
������:
QQQQQQQQQQQs
φ(‖x − x1‖)
φ(‖x − x2‖)
φ(‖x − xP‖)
w1
w2
wP
XXXXXXXXXXz
����������3
QQQQQQQQQQs n+ -uu
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
2-layer Regularized RBF network
I RBF are universal approximator: any continuous functioncan be approximated arbitrarily well on a compact set,provided a sufficiently large number of units, and for anappropriate choice of the parameters
I RBF possess the best approximation property, namelythere exists the best approximation and in most cases(under assumptions often satisfied) is unique (RBF islinear in parameters w)
I The value of λ can be selected by employing crossvalidation techniques and this may require that system(Φ + λI)w = y is solved several times.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
2-layer Generalized RBF networkWhen P is very large, the cost of constructing a regularizedRBF network can be prohibitive. Indeed, the computation of theweights w ∈ RP requires the solution of a possible illconditioned linear system, which costs O(P3).
Generalized RBF neural network are used where the number Nof neural units is much less than P.
The output of the network can be defined by
y(x) =N∑
j=1
wjφj(‖x − cj‖), (4)
where both the centers cj ∈ Rn and the weights wj j = 1, . . . ,Nmust be selected appropriately.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
2-layer Generalized RBF network
y(x)
-x t����
�������3
�����
������:
QQQQQQQQQQQs
φ(‖x − c1‖)
φ(‖x − c2‖)
φ(‖x − cN‖)
w1
w2
wN
XXXXXXXXXXz
����������3
QQQQQQQQQQs n+ -uu
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
2-layer Generalized RBF networkI GRBF are universal approximator: any continuous function
can be approximated arbitrarily well on a compact set,provided a sufficiently large number of units, and for anappropriate choice of the parameters
I GRBF may NOT possess the best approximation property.However if the centers are fixed, the approximationproblem becomes linear with respect to w and theexistence of a best approximation is guaranteed
I in he general case, both the centers and the weights aretreated as variable parameters and the approximation isnonlinear
I As N << P, GRBF performs inherently a structuralstabilization which may prevent the occurrence ofovertraining.
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
An example: Exclusive OR
The logical function XOR
XORp x1 x2 yp
1 -1 -1 -12 -1 1 13 1 -1 14 1 1 -1
ccs
s1
2
3
4-
6x2
x1
Perceptron (linear separator) doesn’t work
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Two layer MLP
w
w
-
-
x2
-��������7x1 S
SSSSSSSw
-
sign(·)
sign(·)
w22
w12
w21
w11
b1
b2
v1
v2i+a2-
6
i+a1-
6QQQQQQs
������3
i+6
-sign(·)
-
wb3y(x)
w
w
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Two layer MLP
Choose w11 = w22 = 1 and w12 = w21 = −1, b1 = b2 = −1,v1 = v2 = 1 b3 = 0.1 (output bias). We get
Interpolation RBF Regularized RBF Generalized RBF XOR problem
Two layer MLP
This MLP network with two hidden nodes realizes a nonlinearseparation (each hidden node describes one of the two lines).The output node combines the outputs of the two hidden layer.
eeu
u1
2
3
4
-
@@@@@@@
@@@@@@@
6x2
x1
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
RBF network
Consider a RBF network with two units (N = 2) with centersc1, c2 and assume the activation function is a gaussiangj = e−(‖x−cj‖/σ)2
w
w
-
-
x2
-��������7x1S
SSSSSSSw
-
w1
w2
z2 = e−‖x−c2‖
2
σ2������3
z1 = e−‖x−c1‖
2
σ2 QQQQQQs i+6
-sign(·)
-
wb y(x)
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
RBF network
Choose σ =√
2 and c1 =
(11
)c2 =
(−1−1
)We transform
the problem into a linearly separable form.
XOR
p e−‖x−c1‖
2
σ2 e−‖x−c1‖
2
σ2 yp
1 e−4 1 -12 e−2 e−2 13 e−2 e−2 14 1 e−4 -1
fv
v1
423
@@@@@@@@@
-
6z2
z1
RBF Networks L. Palagi
Interpolation RBF Regularized RBF Generalized RBF XOR problem
The output takes the form
y(x) = w1e−‖x−c1‖
2
σ2 + w2e−‖x−c2‖
2
σ2 + b
Minimizing the training error
minw ,b
4∑p=1
(y(xp)− yp)2
we get the optimal solution (w∗,b∗) that gives E = 0 w1w2b