YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Ran El-Yaniv and Dmitry Pechyony

Technion – Israel Institute of Technology ,Haifa, Israel

24.08.2007

Transductive Rademacher Complexity and its

Applications

Page 2: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Induction vs. TransductionInductive learning:

Distribution

of examples

training set learning algorit

hm

hypothesis

labels

unlabeled examples

Transductive learning (Vapnik ’74,’98):training set

test set

learning algorit

hm

labels of the test set

Goal: minimize

Goal: minimize

f (xi ;yi )gmi=1

SmM= f (xi ;yi )gm

i=1

X uM= fxi g

m+ui=m+1

L uM= E(x;y)2X u f (̀h(x);y)g

E(x;y)» D f (̀h(x);y)gD

(x;y)

Page 3: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Distribution-free Model [Vapnik ’74,’98]

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X XXX X

X

Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

m+ um+ u

Page 4: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X XXX X

X

Given: “Full sample” of unlabeled examples, each withits true (unknown) label.

m+ um+ u

Full sample is partitioned: training set (m points) test set (u points)

Distribution-free Model [Vapnik ’74,’98]

Page 5: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X XXX X

XLabels of the training

examples are revealed.

Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

m+ um+ u

Full sample is partitioned: training set (m points) test set (u points)

Distribution-free Model [Vapnik ’74,’98]

Page 6: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Labels of the training points are revealed.

Goal: Label test examples

X?

? X

??

? ?

X?

? ?

??

? ?

X?

X ?

??

? ?

??

X ?

??

? ? ?X? ?

?

Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

m+ um+ u

Full sample is partitioned: training set (m points) test set (u points)

Distribution-free Model [Vapnik ’74,’98]

Page 7: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Rademacher complexity Induction Hypothesis space : set of functions . - training points. - i.i.d. random

variables, Rademacher:

Transduction (version 1) Hypothesis space : set of vectors , .

- full sample with training and test

points. - distributed as in induction.

Rademacher:

X mX m ¾= f¾i gm

i=1

Rm(F ) = 1m EX m E¾f supf 2F

P mi=1 ¾i f (xi )gRm(F ) = 1

m EX m E¾f supf 2FP m

i=1 ¾i f (xi )g

FF f : D ! Rf : D ! R

Prf¾i = 1g= Prf¾i = ¡ 1g= 12

Prf¾i = 1g= Prf¾i = ¡ 1g= 12

Rm+u(H) = ( 1m + 1

u ) ¢E¾f suph2HP m+u

i=1 ¾i hi gRm+u(H) = ( 1m + 1

u ) ¢E¾f suph2HP m+u

i=1 ¾i hi g

HH hh H µ Rm+uH µ Rm+u

X m+uX m+u mm uu¾= f¾i gm+u

i=1

Page 8: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Transductive Rademacher complexity Version 1: - full sample with training and test

points. - transductive hypothesis space. - i.i.d. random

variables distributed by : .

Rademacher complexity:

Version 2: sparse distribution, , of Rademacher variables

We develop risk bounds with .

X m+uX m+u mm uu

¾= f¾i gm+ui=1HH

Rm+u(H;D1) = ( 1m + 1

u ) ¢E¾» D1 f suph2HP m+u

i=1 ¾i hi gRm+u(H;D1) = ( 1m + 1

u ) ¢E¾» D1 f suph2HP m+u

i=1 ¾i hi g

Prf¾i = 1g= Prf¾i = ¡ 1g= 12

Prf¾i = 1g= Prf¾i = ¡ 1g= 12

Prf¾i = 1g= Prf¾i = ¡ 1g= mu(m+u)2

Prf¾i = 1g= Prf¾i = ¡ 1g= mu(m+u)2

Prf¾i = 0g= 1¡ 2 mu(m+u)2

Prf¾i = 0g= 1¡ 2 mu(m+u)2

D1D1

DsDs

Lemma 1: .

Rm+u(H;Ds) · Rm+u(H;D1)Rm+u(H;Ds) · Rm+u(H;D1)

Rm+u(H) M= Rm+u(H;Ds)Rm+u(H) M= Rm+u(H;Ds)

Page 9: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Risk boundNotation: - 0/1 error of on test examples . - empirical -margin error of on training examples

.

Theorem: For any , with probability at least

over the random partition of the full sample into

, for all hypotheses it holds that

.

Proof: based on and inspired by the results of [McDiarmid, ‘89],

[Bartlett and Mendelson, ‘02] and [Meir and Zhang, ‘03].

Previous results: [Lanckriet et al., ‘04] - case of .

h 2 Hh 2 H

±> 0; ° > 0±> 0; ° > 0 1¡ ±1¡ ±

Sm+uSm+u

(Sm;X u)(Sm;X u)

L u(h)L u(h)

L °m(h)L °

m(h)hh X u

X u

hh SmSm°°

L u(h) · L °m(h) + 1

° Rm+u(H) + O³ q ¡

1m + 1

u

¢ln 1

±

´L u(h) · L °

m(h) + 1° Rm+u(H) + O

³ q ¡1m + 1

u

¢ln 1

±

´

m = um = u

Page 10: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Inductive vs. Transductive hypothesis spacesInduction: To use the risk bounds, the hypothesis space

shouldbe defined before observing the training set.

Transduction: The hypothesis space can be defined afterobserving , but before observing the actual

partition .

Conclusion: Transduction allows for the choosing a data-dependent hypothesis space. For example, it can beoptimized to have low Rademacher complexity.

This cannot be done in induction!

X m+uX m+u

(Sm;X u)(Sm;X u)

Page 11: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Another view on transductive algorithms

learner

compute KK

®®compute

h = K ®h = K ®

X m+uX m+u

(Sm;X u)(Sm;X u)

(m+ u) £ r(m+ u) £ rmatrix

r £ 1r £ 1vector

Example:

- inverse of graph Laplacian iff ; otherwise.

KK

®i = yi®i = yi xi 2 Sm

xi 2 Sm

®i = 0®i = 0

Unlabeled-Labeled Decomposition (ULD)

Page 12: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Bounding Rademacher complexity

Hypothesis space : the set of all , obtained by operatingtransductive algorithm on all possible partitions .

Notation: , - set of ‘s generated by . - all singular values of .

Lemma 2:

Lemma 2 justifies the spectral transformations performedto improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05]).

.

= f ! i gri=1

= f ! i gri=1 KK

h = K ®h = K ®

HAHA hh

(Sm;X u)(Sm;X u)AA

TT ®® AA¹ = sup®2T f k®k2g¹ = sup®2T f k®k2g

Rm+u(HA ) · ¹q

2mu

P ri=1 ! 2

iRm+u(HA ) · ¹

q2

mu

P ri=1 ! 2

i

Page 13: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Bounds for graph-based algorithms

Consistency method [Zhou, Bousquet, Lal, Weston, Scholkopf,

‘03]:

where are singular values of .

Similar bounds for the algorithms of [Joachims,’03],

[Belkin et al., ‘04], etc.

Rm+u(HA ) ·q

2u

P m+ui=1 ! 2

iRm+u(HA ) ·

q2u

P m+ui=1 ! 2

i

f ! i gm+ui=1

f ! i gm+ui=1 KK

Page 14: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Topics not covered Bounding the Rademacher complexity

when is a kernel matrix. For some algorithms: data-dependent

method of computing probabilistic upper and lower bounds on Rademacher complexity.

Risk bound for transductive mixtures.

KK

Page 15: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Direction for future research

Tighten the risk bound to allow effective model

selection: Bound depending on 0/1 empirical error. Usage of variance information to obtain

better convergence rate. Local transductive Rademacher

complexity. Clever data-dependent choice of low-

Rademacher hypothesis spaces.

Page 16: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.
Page 17: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Monte Carlo estimation of transductive Rademacher complexity

Rm+u(H) = ( 1m + 1

u ) ¢E¾f suph2H ¾¢hgRm+u(H) = ( 1m + 1

u ) ¢E¾f suph2H ¾¢hgRademacher: .

Draw uniformly vectors of Rademacher variables, .

By Hoeffding inequality: for any , with prob. at least ,

.

How to compute the supremum? For the Consistency Method of [Zhou et al., ‘03] can be computed in time.

Symmetric Hoeffding inequality probabilistic lower bound on the transductive Rademacher complexity.

±> 0±> 0

nn ¾(1); : : : ;¾(n)¾(1); : : : ;¾(n)

1¡ ±1¡ ±

Rm+u(H) · ( 1m + 1

u ) ¢1n

P ni=1 suph2H ¾(i ) ¢h + O

³ q1n ln 1

±

´Rm+u(H) · ( 1

m + 1u ) ¢1

n

P ni=1 suph2H ¾(i ) ¢h + O

³ q1n ln 1

±

´

suph2H ¾(i ) ¢hsuph2H ¾(i ) ¢h

O¡(m+ u)2

¢O¡(m+ u)2

¢

!!

Page 18: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Induction vs. Transduction: differences

Induction Unknown

underlying distribution

Transduction No unknown

distribution. Each example has unique label.

Test examples not known. Will be sampled from the same distribution.

Test examples are known.

Generate a general hypothesis.

Want generalization!

Only classify given examples.

No generalization!

Independent training examples.

Dependent training and test examples.

Page 19: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Justification of spectral transformations

, - set of ‘s generated by .

- all singular values of .

Lemma 2: . Lemma 2 justifies the spectral transformations

performedto improve the performance of transductive

algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang

and Ando,‘05]).

= f ! i gri=1

= f ! i gri=1 KK

TT ®® AA¹ = sup®2T f k®k2g¹ = sup®2T f k®k2g

Rm+u(HA ) · ¹q

2mu

P ri=1 ! 2

iRm+u(HA ) · ¹

q2

mu

P ri=1 ! 2

i


Related Documents