Top Banner
Lecture 3 Theory of Kernel Functions Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universit¨ at T¨ ubingen, Germany Advanced Topics in Machine Learning, 2012 P. Laskov and B. Nelson (T¨ ubingen) Lecture 3: Kernel Functions April 17, 2012 1 / 47
57

Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Apr 11, 2018

Download

Documents

trandung
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Lecture 3Theory of Kernel Functions

Pavel Laskov1 Blaine Nelson1

1Cognitive Systems GroupWilhelm Schickard Institute for Computer ScienceUniversitat Tubingen, Germany

Advanced Topics in Machine Learning, 2012

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 1 / 47

Page 2: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Part I

Introduction: Kernel Functions

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 2 / 47

Page 3: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Overview

In this lecture, we will formally define kernel functions

Recall: advantages of kernel-based learning:1 Kernels allow for learning in high-dimensional feature spaces without explicit

mapping into feature space2 Kernels make learning in high-dimensional feature spaces computationally

feasible3 Kernel methods learn non-linear function with the machinery of algorithms

for learning linear functions4 Kernels provide an abstraction that separates data representation & learning

Questions to be addressed:1 What properties do kernels have & what properties does a function need to

be a kernel?2 How can we verify that a kernel function is valid?3 How does one construct a kernel function?

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 3 / 47

Page 4: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Recall: Kernel MagicExample 2: 2-dimensional Polynomials of Degree 3

Consider a (slightly modified) feature space for 2-dimensionalpolynomials of degree 3:

Φ(x) = [x31 , x32 ,√3x21 x2,

√3x1x

22 ,√3x21 ,√3x22 ,√6x1x2,

√3x1,√3x2, 1]

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 4 / 47

Page 5: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Recall: Kernel MagicExample 2: 2-dimensional Polynomials of Degree 3

Consider a (slightly modified) feature space for 2-dimensionalpolynomials of degree 3:

Φ(x) = [x31 , x32 ,√3x21 x2,

√3x1x

22 ,√3x21 ,√3x22 ,√6x1x2,

√3x1,√3x2, 1]

Let us compute the inner product between two points in the featurespace:

Φ(x)⊤Φ(y) = x31 y31 + x32 y

32 + 3x21 x2y

21 y2 + 3x1x

22 y1y

22 + 3x21 y

21 + 3x22 y

22

+ 6x1x2y1y2 + 3x1y1 + 3x2y2 + 1

= (x1y1 + x2y2 + 1)3

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 4 / 47

Page 6: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Recall: Kernel MagicExample 2: 2-dimensional Polynomials of Degree 3

Consider a (slightly modified) feature space for 2-dimensionalpolynomials of degree 3:

Φ(x) = [x31 , x32 ,√3x21 x2,

√3x1x

22 ,√3x21 ,√3x22 ,√6x1x2,

√3x1,√3x2, 1]

Let us compute the inner product between two points in the featurespace:

Φ(x)⊤Φ(y) = x31 y31 + x32 y

32 + 3x21 x2y

21 y2 + 3x1x

22 y1y

22 + 3x21 y

21 + 3x22 y

22

+ 6x1x2y1y2 + 3x1y1 + 3x2y2 + 1

= (x1y1 + x2y2 + 1)3

Complexity: 3 multiplications instead of 10.

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 4 / 47

Page 7: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Kernel Questions

Which of the following functions are kernels?

κ1 (x, z) =∑D

i=1 (xi + zi ) κ2 (x, z) =∏D

i=1 h(xi−ca

)h( zi−ca

)

κ3 (x, z) = − 〈x,z〉‖x‖2‖z‖2

κ4 (x, z) =√

‖x− z‖22 + 1

where h(x) = cos(1.75x) exp(−x2/2)

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 5 / 47

Page 8: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Part II

Linear Algebra Review

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 6 / 47

Page 9: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Vector Space

Definition 1

A set X is a vector space (over the reals) if it is closed under an additionoperator ‘+’ (i.e., ∀ x, z ∈ X x+ z ∈ X ) & a scalar multiplicationoperator ‘·’ (i.e., ∀ x ∈ X , a ∈ ℜ a · x ∈ X ) and these operators satisfy;

1 (Additive Associativity) u+ (v + w) = (u+ v) + w

2 (Additive Commutativity) u+ v = v + u

3 (Additive Identity) ∃ 0 ∈ X s.t. ∀ u ∈ X u+ 0 = u

4 (Additive Inverse) ∀ u ∈ X ∃ −u ∈ X s.t. u+ (−u) = 0

5 (Distibutivity) a · (u+ v) = a · u+ a · v & (a + b) · u = a · u+ b · u6 (Multiplicative Associativity) a · (b · u) = (a · b) · u7 (Multiplicative Identity) 1 · u = u

Example: For any D ∈ ℵ, ℜD is a vector space.

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 7 / 47

Page 10: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Vectors I

A D-dimensional vector x is a list of D-reals in vector space ℜD

Vectors {xi}Ni=1 are linearly dependent if there exists c1, c2, . . . , cD (atleast one not 0) such that

∑Ni=1 ci · xi = 0 ;

otherwise, they are linearly independent.

The inner product of x and z is defined as: x⊤z =∑D

i=1 xi · ziNon-trivial vectors {xi}Ni=1 are orthogonal if for i 6= j , x⊤i xj = 0. They arenormal vectors if for all i , x⊤i xi = 1. They are orthonormal if both hold; i.e.,∀ i , j 〈xi , xj〉 = δi ,j = I [i == j ]A set of orthogonal vectors are linearly independentEuclidean norm of a vector: ‖x‖2 =

〈x, x〉Angle between vectors: θ (u, v) = arccos 〈u,v〉

‖u‖‖v‖

Projection onto vector z: projz(x) =〈z,x〉〈z,z〉z

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 8 / 47

Page 11: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Vectors II

Vectors {xi}Ni=1 spans X if for every x ∈ X , x =∑N

i=1 αixi

{xi}Ni=1 is a basis for X if it spans X & is linearly independent

The dimension of X is the number of elements in any basis of X :Every vector in X can be represented as its projection onto an orthonormalbasis {xi}Ni=1 of X :

z =∑D

i=1 projxi (z) =∑D

i=1 〈xi , z〉 · xi

This is a Fourier decomposition of z in that basis.

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 9 / 47

Page 12: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Matrices

Matrices are a N × D grid of reals.

A =

A1,1 A1,2 . . . A1,D

A2,1 A2,2 . . . A2,D

......

. . ....

AN,1 AN,2 . . . AN,D

=

−A⊤

1,•−

−A⊤

2,•−...

−A⊤

N,•−

=

| | |A•,1 A•,2 . . . A•,D

| | |

Matrix-Vector Multiplication (x is a D-vector & z is a N-vector):

Ax =

−A⊤

1,•−−A⊤

2,•−...

−A⊤

N,•−

|x|

=

A⊤

1,•xA⊤

2,•x...

A⊤

N,•x

z⊤A =[

z⊤A•,1 z⊤A•,2 . . . z⊤A•,1

]

Matrix-Matrix Multiplication:

AB =

−A⊤

1,•−−A⊤

2,•−...

−A⊤

N,•−

| | |B•,1 B•,2 . . . B•,D

| | |

=

A⊤

1,•B•,1 A⊤

1,•B•,2 . . . A⊤

1,•B•,D

A⊤

2,•B•,1 A⊤

2,•B•,2 . . . A⊤

2,•B•,D

......

. . ....

A⊤

N,•B•,1 A⊤

N,•B•,2 . . . A⊤

N,•B•,D

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 10 / 47

Page 13: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Matrices

Matrix Multiplication as summations:

[Ax]i =∑

ℓ Ai ,ℓxℓ

[

z⊤A]

j=

k zkAk,j

z⊤Ax =∑

k,ℓ zkAk,ℓxℓ [AB]i ,k =∑

j Ai ,jBj ,k

Special forms of matrices:

Lower Triangular

• 0 0 . . . 0• • 0 . . . 0• • • . . . 0...

......

. . ....

• • • . . . •

Diagonal Matrix

• 0 0 . . . 00 • 0 . . . 00 0 • . . . 0...

......

. . ....

0 0 0 . . . •

Upper Triangular

• • • . . . •0 • • . . . •0 0 • . . . •...

......

. . ....

0 0 0 . . . •

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 11 / 47

Page 14: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Basic Linear Algebra

Suppose matrix A is an N × D matrix

A square matrix has same # of rows & columns; i.e., N = D

The identity matrix IN is a N × N diagonal matrix of 1’s

For any A, AID = A and INA = A

The transpose of A is denoted by A⊤ (it is D × N

A symmetric matrix is its own transpose: A = A⊤

The inverse of A is denoted by A−1: AA−1 = I & A−1A = I

An orthonormal or unitary matrix is its own inverse:AA⊤ = A⊤A = I. . . both the columns & rows of A form a basis

The rank of matrix A is the maximum number of columns of A that arelinearly independent (i.e., the dimension of its column space). A isfull-rank if rank (A) = min(M,N)

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 12 / 47

Page 15: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Singularity

Definition 2

A matrix A is singular if there exists some x 6= 0 such that Ax = 0;otherwise, A is nonsingular.

Theorem 3

The following are equivalent:

Matrix A is invertible

Matrix A is nonsingular

Matrix A is full-rank

The spectrum of A does not contain 0; i.e., 0 6∈ eig (A)

If A is square, the (linear) function f (x) = Ax is one-to-one & onto,f (x) = b has at least 1 solution, and f (x) = 0 only has solution x = 0

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 13 / 47

Page 16: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Eigenvalues & Eigenvectors

Given an N × N matrix A, an eigenvector of A is a non-trivial vector vthat satisfies

Av = λv ;

the corresponding value λ is an eigenvalue

The Rayleigh quotient is defined by

λ =v⊤Av

v⊤v

In fact, the maximum eigen-value/vector pair of A is a solution to

max‖x‖=1

x⊤Ax

x⊤x

with x restricted to have norm 1

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 14 / 47

Page 17: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Eigen-Decomposition & Deflation

Deflation: for any eigen-value/vector pair (λ,v) of A, the transform

A← A− λvv⊤

deflates the matrix; i.e., v is an eigenvector of A but has eigenvalue 0

A symmetric matrix has N orthonormal eigenvectors {vi} correspondingto N eigenvalues—its spectrum; eig (A)

λ1 (A) ≥ λ2 (A) ≥ . . . ≥ λN (A)

Eigen-vectors/values form orthonormal matrix V & diagonal matrix Λ

V =

| | |v1 v2 . . . vN| | |

Λ =

λ1 (A) 0 . . . 00 λ2 (A) . . . 0...

.... . .

...0 0 λN (A)

which form the eigen-decomposition of A: A = VΛV⊤

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 15 / 47

Page 18: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

The Spectral Theorem

Theorem 4

If A is a symmetric N × N real-valued matrix, it can be written as

A =

N∑

i=1

λN (A) viv⊤i

where (λi , vi ) are eigen-value/vector pairs of A. This is called the spectraldecomposition of A

For a matrix with rank K < N, the spectral decomposition of A onlyhas K summands

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 16 / 47

Page 19: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Matrix Functions

Properties of diagonal matrix D with entries Di ,i :

For k = 0, 1, 2, . . .: Dk is diagonal with entries [Dk ]i ,i = (Di ,i)k

If ∄ i s.t. Di ,i = 0 then D−1 exists, is diagonal, & [D−1]i ,i = (Di ,i)−1

√D is diagonal with entries [

√D]i ,i =

Di ,i

Functions of A are defined by its eigen-decomposition A = VΛV⊤ &the fact that V⊤V = I

For k = 0, 1, 2, . . . Ak = VΛkV⊤

If A is non-singular, then A−1 = VΛ−1V⊤√A = V

√ΛV⊤ (Note: this satisfies

√A√A = A)

exp (A) = V exp (Λ)V⊤

log (A) = V log (Λ)V⊤

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 17 / 47

Page 20: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Part III

Positive (Semi-)Definiteness

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 18 / 47

Page 21: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Positive (Semi-)Definite Matrices

Definition 5 (Positive Semi-Definite Matrix)

Matrix A is positive semi-definite (PSD) if all its eigenvalues arenon-negative (∀ i λi (A) ≥ 0); i.e., for all x ∈ X :

x⊤Ax ≥ 0

from the Rayleigh quotient. We use A � 0 to denote that A is PSD

Definition 6 (Positive Definite Matrix)

Matrix A is positive definite if all its eigenvalues are positive(∀ i λi (A) > 0); i.e., A is PSD &

x⊤Ax = 0 ⇔ x = 0

We denote this as A ≻ 0.

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 19 / 47

Page 22: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

PSD Matrices

Proposition 7

Matrix A is PSD iff there exists a real matrix B such that A = B⊤B

ProofCase ⇐: Suppose A = B⊤B, then for any x

x⊤Ax = x⊤B⊤Bx = ‖Bx‖2 ≥ 0

Case ⇒: If A � 0 then its eigen-decomposition (A = VΛV⊤) has onlynon-negative eigenvalues and thus,

√Λ is a real-valued matrix. Thus, let

B =√ΛV⊤ and we have

B⊤B = V√Λ√ΛV⊤ = VΛV⊤ = A

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 20 / 47

Page 23: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Part IV

Reproducing Kernel Hilbert Spaces

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 21 / 47

Page 24: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Inner Product Space

Definition 8

An inner product space X is a vector space with an associated innerproduct 〈·, ·〉 : X × X → ℜ that satisfies:

1 (Symmetry) 〈x, z〉 = 〈z, x〉2 (Linearity) 〈a · x, z〉 = a · 〈x, z〉 & 〈w + x, z〉 = 〈w, z〉+ 〈x, z〉3 (Positive Semi-Definiteness) 〈x, x〉 ≥ 0

The inner product space is strict if 〈x, x〉 = 0 ⇔ x = 0

A strict inner product space X has a natural norm given by‖x‖2 =

〈x, x〉. The associated metric is d (x, z) = ‖x− z‖2The space ℜD has the inner product 〈x, z〉 = x⊤z which yields theEuclidean norm:

‖x‖22 =D∑

i=1

x2i

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 22 / 47

Page 25: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Hilbert Space

Definition 9

A strict inner product space F is a Hilbert space if it is

1 Complete: Every (Cauchy) sequence {hi ∈ F}∞i=1 such that

limn→∞

supm>n‖hn − hm‖ = 0

converges to an element h ∈ F ; i.e., hi → h

2 Separable: There is a countable subset F = {hi ∈ F}∞i=1 such that forall h ∈ F and ǫ > 0, there exists hi ∈ F such that

‖hi − h‖ < ǫ

Hilbert Space Examples: the interval [0, 1], the reals ℜ, the complexnumbers C, & Euclidean spaces ℜD for D ∈ ℵ.P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 23 / 47

Page 26: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Hilbert Space

Definition 9

A strict inner product space F is a Hilbert space if it is

1 Complete: Every (Cauchy) sequence {hi ∈ F}∞i=1 such that

limn→∞

supm>n‖hn − hm‖ = 0

converges to an element h ∈ F ; i.e., hi → h

Technical Condition required for potentially infinite-dimensional sets

2 Separable: There is a countable subset F = {hi ∈ F}∞i=1 such that forall h ∈ F and ǫ > 0, there exists hi ∈ F such that

‖hi − h‖ < ǫ

Hilbert Space Examples: the interval [0, 1], the reals ℜ, the complexnumbers C, & Euclidean spaces ℜD for D ∈ ℵ.P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 23 / 47

Page 27: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Hilbert Space

Definition 9

A strict inner product space F is a Hilbert space if it is

1 Complete: Every (Cauchy) sequence {hi ∈ F}∞i=1 such that

limn→∞

supm>n‖hn − hm‖ = 0

converges to an element h ∈ F ; i.e., hi → h

Technical Condition required for potentially infinite-dimensional sets

2 Separable: There is a countable subset F = {hi ∈ F}∞i=1 such that forall h ∈ F and ǫ > 0, there exists hi ∈ F such that

‖hi − h‖ < ǫCondition required to make Hilbert space isomorphisms

Hilbert Space Examples: the interval [0, 1], the reals ℜ, the complexnumbers C, & Euclidean spaces ℜD for D ∈ ℵ.P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 23 / 47

Page 28: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Function Spaces

What is a vector? An ordered list of D elements from ℜ indexed by theindex set ID = {1, 2, . . . ,D}. The set of all such lists is ℜD

We can extend this notion to countable sequences x = (x1, x2, . . .) byusing the index set I = ℵ

Inner-product generalizes naturally as

〈x, y〉 =∑

i∈ℵ

xi · yi

However, we need additional restrictions to make such spaces well-behavedThe subspace ℓ2 for which ∀ x 〈x, x〉 <∞ is a Hilbert space

Further, a function f : X → ℜ maps each x ∈ X to exactly one y ∈ ℜ;i.e., it is also a vector with an uncountable index set (e.g., I = ℜD)

Inner-product again generalizes naturally as

〈f , g〉 =∫

X

f (x) g (x)dx

Subspace L2 (X ) defined on X , a compact subspace of ℜD , for which∀ f ∈ L2 (X ) , 〈f , f 〉 <∞ is a Hilbert space

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 24 / 47

Page 29: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Properties of (Separable) Hilbert Spaces

Hilbert space F is isomorphic to H if there is a one-to-one linearmapping T : F → H such that for all x, z ∈ F

〈T (x),T (z)〉H = 〈x, z〉F

Every separable Hilbert space (A) of dimension D is isomorphic to ℜD

and (B) of infinite dimension is isomorphic to ℓ2

Since Hilbert space F is isomorphic to ℜD or ℓ2, F has an orthonormalbasis {φi} & element in x ∈ F have a Fourier decomposition:

x =∑

i 〈φi , x〉F · φi

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 25 / 47

Page 30: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Part V

Characterizing Kernel Functions

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 26 / 47

Page 31: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Kernel Terminology

Definition 10

A kernel is a two-argument real-valued function over X ×X(κ : X × X → ℜ) such that for any x, z ∈ X

κ (x, z) = 〈φ (x) , φ (z)〉F (1)

for some inner-product space F such that ∀ x ∈ X φ (x) ∈ F

Kernel functions must be symmetric since inner products are symmetric

To show that κ is a valid kernel, it is sufficient to show that a mapping φexists that yields Eq. 1. However, this is generally difficult to construct.

In this rest of this lecture, we will demonstrate additional ways toconstruct & validate kernels

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 27 / 47

Page 32: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Kernel Matrices

Definition 11

A kernel matrix (or Gram matrix) K is the matrix that results fromapplying κ to all pairs of datapoints in set {xi}Ni=1

K =

κ (x1, x1) κ (x1, x2) . . . κ (x1, xN)κ (x2, x1) κ (x2, x2) . . . κ (x2, xN)

......

. . ....

κ (xN , x1) κ (xN , x2) . . . κ (xN , xN)

that is, Ki ,j = κ (xi , xj )

Kernel matrices are square & symmetric

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 28 / 47

Page 33: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Kernel Matrices are PSD

Proposition 12

Kernel matrices, which are constructed from a kernel corresponding to astrict inner product space F , are PSD.

ProofBy definition of a kernel matrix, for all i , j ∈ 1, . . . ,N

Ki ,j = κ (xi , xj ) = 〈φ (xi ) , φ (xj )〉FThus, for any v ∈ ℜN :

v⊤Kv =∑

i ,j viKi ,jvj =∑

i ,j vi 〈φ (xi ) , φ (xj )〉F vj

=⟨

i viφ (xi ),∑

j vjφ (xj)⟩

F

= ‖∑

i viφ (xi )‖2F ≥ 0

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 29 / 47

Page 34: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Reproducing Kernel Function

Definition 13 (Reproducing Kernel Function (Aronszajn, 1950) [1])

Suppose F is a Hilbert space of functions over X ; the functionκ : X ×X → ℜ is a reproducing kernel of F if

1 For every x ∈ X , the function fx (·) = κ (·, x) is in F .2 Reproducing Property: for every z ∈ X and every f ∈ F

f (z) = 〈f , κ (·, z)〉F

Further, the space is called a Reproducing Kernel Hilbert Space (RKHS)

By 1st property & closure of F , for any αi ∈ ℜ and xi ∈ X , we have∑N

i=1 αi · κ (·, xi ) ∈ XApplying fx from 1st property to 2nd property, for any x, z ∈ X , we have

κ (x, z) = 〈κ (·, x) , κ (·, z)〉FP. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 30 / 47

Page 35: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Kernel Functions

Definition 14 (Finitely Positive Semi-definite)

A function κ : X × X → ℜ is finitely positive semi-definite (FPSD) if

It is symmetric; i.e., ∀ x, z ∈ X κ (x, z) = κ (z, x)

The matrix K formed by applying κ to any finite subset of X is positivesemi-definite: K � 0

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 31 / 47

Page 36: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Kernel Functions I

Theorem 15

κ : X ×X → ℜ (either continuous or with a countable domain) is FPSDiff ∃ Hilbert space F with feature map φ : X → F s.t.

κ (x, z) = 〈φ (x) , φ (z)〉

ProofCase ⇐: Follows from Proposition 12.Case ⇒: Suppose κ if FPSD & we construct Hilbert Space Fκ with κ asits reproducing kernel; i.e., Fκ is the closure of functions: fx (·) = κ (·, x).Thus, for any αi , xi , g (·) = ∑

i αiκ (·, xi ) is in Fκ &, by the reproducingproperty,

〈g , g〉 = ∑

i ,j αiαjκ (xi , xj ) = α⊤Kα

where K is the kernel matrix {xi}, & thus α⊤Kα ≥ 0 since K � 0.

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 32 / 47

Page 37: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Kernel Functions II

(Completeness) Follows from the Cauchy-Schwarz inequality, but beyondthe scope of this course.

(Separability) Separability follows from κ being continuous or having acountable domain, but is not shown here.

Finally, the mapping φ is specified by κ and φ (x) = κ (·, x) ∈ Fκ .

Note, the inner product defined above is strict since if ‖f ‖ = 0, then forall x, |f (x)| ≤ ‖f ‖ ‖φ (x)‖ = 0

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 33 / 47

Page 38: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Part VI

Kernel Constructions

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 34 / 47

Page 39: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Simple Kernels

Clearly, the linear kernel defined by

κlin (x, z) = 〈x, z〉 = x⊤z

is a valid kernel function since it is an inner product in XFor any N × N matrix B � 0,

κB (x, z) = 〈x |B| z〉 = x⊤Bz

is a valid kernel function

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 35 / 47

Page 40: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Closure Properties of Kernels I

Proposition 16

Suppose κ1 & κ2 are kernels on X , a > 0, f : X → ℜ, φ : X → ℜM , & κ3is a kernel on ℜM . Then these are all kernel functions on X :

16.1 κ (x, z) = κ1 (x, z) + κ2 (x, z)

16.2 κ (x, z) = a · κ1 (x, z)16.3 κ (x, z) = κ1 (x, z) · κ2 (x, z)16.4 κ (x, z) = f (x) f (z)

16.5 κ (x, z) = κ3 (φ (x) , φ (z))

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 36 / 47

Page 41: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Closure Properties of Kernels II

ProofLet K1 & K2 be the kernel matrices of κ1 & κ2 applied to any set{xi}Ni=1—both these matrices are PSD. Also let α be any N-vector:

(Part 1): K = K1 +K2 ⇒ α⊤Kα = α⊤K1α+α⊤K2α ≥ 0(Part 2): K = aK1 ⇒ α⊤Kα = a · α⊤K1α ≥ 0(Part 3): Take the spectral decomposition of K1 =

∑Ni=1 λiviv

⊤i and

K2 =∑N

i=1 γiwiw⊤i . The spectral decomposition of their element-wise

product, K = K1 ⊙K2, is then K =∑N

i ,j=1

λiγj(vi ⊙ wj)(vi ⊙ wj)⊤;

i.e., a summation of rank-1 matrices with positive coefficients ⇒ PSD.(Part 4): κ (x, z) = 〈ψ (x) , ψ (z)〉 where ψ : x 7→ f (x); thus, κ is PSD.(Part 5): Since κ3 is a kernel, applying it to any set of vectors {φ (xi )}Ni=1

yields a PSD matrix.

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 37 / 47

Page 42: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Closure Properties of Kernels III

The feature spaces for these kernels are as follows:

For kernel κ1 (x, z) + κ2 (x, z), the new feature map is equivalent tostacking the feature maps of κ1 & κ2:

φ (x) =

[

φ1 (x)φ2 (x)

]

For kernel a · κ1 (x, z), its feature space is scaled by√a

For kernel κ1 (x, z) · κ2 (x, z), if φ1 has dimension N1 and φ2 hasdimension N2, φ has N1N2 features given by

[φ (x)]ij = [φ1 (x)]i · [φ2 (x)]j

It follows that the features of κ1 (x, z)d are all monomials of the form

[φ1 (x)]d11 [φ1 (x)]

d22 . . . [φ1 (x)]

dNN

i di = d

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 38 / 47

Page 43: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Additional Kernel Functions

Proposition 17

Suppose κ1 is a kernel on X & p : ℜ → ℜ is a polynomial withnon-negative coefficients. Then, the following are kernels:

1 κ (x, z) = p (κ1 (x, z))

2 κ (x, z) = exp (κ1 (x, z))

3 Gaussian or RBF kernel: κ (x, z) = exp(

−‖x−z‖222σ2

)

Proof(Part 1) Constructing a polynomial kernel from base kernel κ1 proceedsdirectly from Proposition 16.1, 16.2, & 16.3(Part 2) Consider that exp (x) = 1+ x + 1

2x2 + . . . 1

i !xi + . . .. Thus, it is a

limit of polynomials & the PSD property is closed under pointwise limits.(Part 3) Left as an exercise.

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 39 / 47

Page 44: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Common Kernel Functions

Linear Kernel: κlin (x, z) = x⊤z

Polynomial Kernel: κpoly (x, z) = (x⊤z+ R)d

RBF Kernel: κrbf (x, z) = exp(

−‖x−z‖222σ2

)

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 40 / 47

Page 45: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Kernel Questions

Which of the following functions are kernels?

κ1 (x, z) =∑D

i=1 (xi + zi ) κ2 (x, z) =∏D

i=1 h(xi−ca

)h( zi−ca

)

κ3 (x, z) = − 〈x,z〉‖x‖2‖z‖2

κ4 (x, z) =√

‖x− z‖22 + 1

where h(x) = cos(1.75x) exp(−x2/2)κ1 is not a kernel. Consider x1 =

[

1 0]⊤

& x2 =[

0 2]⊤

. Their kernelmatrix has eigenvalues −1 and 5.κ2 is a kernel because it can be written as the product f (x)f (z) wheref (x) =

∏Di=1 h(

xi−ca

)κ3 is not a kernel because it is the negation of a valid non-trivial kernel& thus will have negative eigenvalues

κ4 is not a kernel. Consider x1 =[

1 0]⊤

& x2 =[

0 1]⊤

. Again, theirkernel matrix has a negative eigenvalue

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 41 / 47

Page 46: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Part VII

Transforming Kernel Matrices

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 42 / 47

Page 47: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesSimple Transformations

Adding a non-negative constant to the Kernel Matrix: corresponds toadding a new constant feature to each training example; i.e., given thematrix Φ of features such that K = ΦΦ⊤,

[

Φ c1]

∗[

Φ c1]⊤

= K+ c211⊤

Adding a non-negative constant to its diagonal: corresponds to addingan indicator feature for every data point

φ (x1) c 0 . . . 0φ (x2) 0 c . . . 0

......

.... . .

...φ (xN) 0 0 . . . c

φ (x1) c 0 . . . 0φ (x2) 0 c . . . 0

......

.... . .

...φ (xN) 0 0 . . . c

= K+ c2I

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 43 / 47

Page 48: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesCentering Data

Suppose we want to translate the origin to the data’s center of mass. . .

x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 44 / 47

Page 49: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesCentering Data

Suppose we want to translate the origin to the data’s center of mass. . .

x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12x

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 44 / 47

Page 50: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesCentering Data

Suppose we want to translate the origin to the data’s center of mass. . .

x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12x

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 44 / 47

Page 51: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesCentering Data

Suppose we want to translate the origin to the data’s center of mass. . .

x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12x

As we will see next lecture, this transformation can be expressed askernel transform

K← K− 1N11⊤K− 1

NK11⊤ + 1⊤K1

N2 11⊤

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 44 / 47

Page 52: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesNormalizing Data

Suppose we want to project all data to be norm 1; i.e., ‖x‖ = 1. . .

x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 45 / 47

Page 53: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesNormalizing Data

Suppose we want to project all data to be norm 1; i.e., ‖x‖ = 1. . .

x1

x2

x3

x4

x5x6

x7

x8

x9

x10

x11

x12

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 45 / 47

Page 54: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesNormalizing Data

Suppose we want to project all data to be norm 1; i.e., ‖x‖ = 1. . .

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 45 / 47

Page 55: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Operations on Kernel MatricesNormalizing Data

Suppose we want to project all data to be norm 1; i.e., ‖x‖ = 1. . .

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

This transformation can be achieved using only the information from thekernel matrix:

κ (x, z) =κ (x, z)

κ (x, x) κ (z, z)

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 45 / 47

Page 56: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Summary

We explored a formal framework for kernels

We saw a formal defintion for kernel functions & matrices

We saw the properties that kernels must exhibit and how thoseproperties can be used to validate kernel functions & construct newkernels from existing kernels

We explored some operations that allow us to manipulate data infeature space

Next Lecture: we will see basic kernel-based learning algorithms

We will explore how to take the mean of data in feature space & use that toconstruct a novelty detection algorithmWe will explore how to project data in feature space & use that for a basicsubspace algorithm

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 46 / 47

Page 57: Lecture 3 Theory of Kernel Functions - uni-tuebingen.de · Lecture 3 Theory of Kernel Functions ... In this lecture, we will formally define kernel functions ... Recall: Kernel Magic

Bibliography I

The Majority of the work from this talk can be found in the lecuture’saccompanying book, “Kernel Methods for Pattern Analysis.”

[1] N. Aronszajn. Theory of reproducing kernels. Transactions of theAmerican Mathematical Society, 68(3):337–404, 1950.

P. Laskov and B. Nelson (Tubingen) Lecture 3: Kernel Functions April 17, 2012 47 / 47