Top Banner
Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology , Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications
19

Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Dec 31, 2015

Download

Documents

Camilla Holland
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Ran El-Yaniv and Dmitry Pechyony

Technion – Israel Institute of Technology ,Haifa, Israel

24.08.2007

Transductive Rademacher Complexity and its

Applications

Page 2: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Induction vs. TransductionInductive learning:

Distribution

of examples

training set learning algorit

hm

hypothesis

labels

unlabeled examples

Transductive learning (Vapnik ’74,’98):training set

test set

learning algorit

hm

labels of the test set

Goal: minimize

Goal: minimize

f (xi ;yi )gmi=1

SmM= f (xi ;yi )gm

i=1

X uM= fxi g

m+ui=m+1

L uM= E(x;y)2X u f (̀h(x);y)g

E(x;y)» D f (̀h(x);y)gD

(x;y)

Page 3: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Distribution-free Model [Vapnik ’74,’98]

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X XXX X

X

Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

m+ um+ u

Page 4: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X XXX X

X

Given: “Full sample” of unlabeled examples, each withits true (unknown) label.

m+ um+ u

Full sample is partitioned: training set (m points) test set (u points)

Distribution-free Model [Vapnik ’74,’98]

Page 5: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X

XX

X X XXX X

XLabels of the training

examples are revealed.

Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

m+ um+ u

Full sample is partitioned: training set (m points) test set (u points)

Distribution-free Model [Vapnik ’74,’98]

Page 6: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Labels of the training points are revealed.

Goal: Label test examples

X?

? X

??

? ?

X?

? ?

??

? ?

X?

X ?

??

? ?

??

X ?

??

? ? ?X? ?

?

Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

m+ um+ u

Full sample is partitioned: training set (m points) test set (u points)

Distribution-free Model [Vapnik ’74,’98]

Page 7: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Rademacher complexity Induction Hypothesis space : set of functions . - training points. - i.i.d. random

variables, Rademacher:

Transduction (version 1) Hypothesis space : set of vectors , .

- full sample with training and test

points. - distributed as in induction.

Rademacher:

X mX m ¾= f¾i gm

i=1

Rm(F ) = 1m EX m E¾f supf 2F

P mi=1 ¾i f (xi )gRm(F ) = 1

m EX m E¾f supf 2FP m

i=1 ¾i f (xi )g

FF f : D ! Rf : D ! R

Prf¾i = 1g= Prf¾i = ¡ 1g= 12

Prf¾i = 1g= Prf¾i = ¡ 1g= 12

Rm+u(H) = ( 1m + 1

u ) ¢E¾f suph2HP m+u

i=1 ¾i hi gRm+u(H) = ( 1m + 1

u ) ¢E¾f suph2HP m+u

i=1 ¾i hi g

HH hh H µ Rm+uH µ Rm+u

X m+uX m+u mm uu¾= f¾i gm+u

i=1

Page 8: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Transductive Rademacher complexity Version 1: - full sample with training and test

points. - transductive hypothesis space. - i.i.d. random

variables distributed by : .

Rademacher complexity:

Version 2: sparse distribution, , of Rademacher variables

We develop risk bounds with .

X m+uX m+u mm uu

¾= f¾i gm+ui=1HH

Rm+u(H;D1) = ( 1m + 1

u ) ¢E¾» D1 f suph2HP m+u

i=1 ¾i hi gRm+u(H;D1) = ( 1m + 1

u ) ¢E¾» D1 f suph2HP m+u

i=1 ¾i hi g

Prf¾i = 1g= Prf¾i = ¡ 1g= 12

Prf¾i = 1g= Prf¾i = ¡ 1g= 12

Prf¾i = 1g= Prf¾i = ¡ 1g= mu(m+u)2

Prf¾i = 1g= Prf¾i = ¡ 1g= mu(m+u)2

Prf¾i = 0g= 1¡ 2 mu(m+u)2

Prf¾i = 0g= 1¡ 2 mu(m+u)2

D1D1

DsDs

Lemma 1: .

Rm+u(H;Ds) · Rm+u(H;D1)Rm+u(H;Ds) · Rm+u(H;D1)

Rm+u(H) M= Rm+u(H;Ds)Rm+u(H) M= Rm+u(H;Ds)

Page 9: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Risk boundNotation: - 0/1 error of on test examples . - empirical -margin error of on training examples

.

Theorem: For any , with probability at least

over the random partition of the full sample into

, for all hypotheses it holds that

.

Proof: based on and inspired by the results of [McDiarmid, ‘89],

[Bartlett and Mendelson, ‘02] and [Meir and Zhang, ‘03].

Previous results: [Lanckriet et al., ‘04] - case of .

h 2 Hh 2 H

±> 0; ° > 0±> 0; ° > 0 1¡ ±1¡ ±

Sm+uSm+u

(Sm;X u)(Sm;X u)

L u(h)L u(h)

L °m(h)L °

m(h)hh X u

X u

hh SmSm°°

L u(h) · L °m(h) + 1

° Rm+u(H) + O³ q ¡

1m + 1

u

¢ln 1

±

´L u(h) · L °

m(h) + 1° Rm+u(H) + O

³ q ¡1m + 1

u

¢ln 1

±

´

m = um = u

Page 10: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Inductive vs. Transductive hypothesis spacesInduction: To use the risk bounds, the hypothesis space

shouldbe defined before observing the training set.

Transduction: The hypothesis space can be defined afterobserving , but before observing the actual

partition .

Conclusion: Transduction allows for the choosing a data-dependent hypothesis space. For example, it can beoptimized to have low Rademacher complexity.

This cannot be done in induction!

X m+uX m+u

(Sm;X u)(Sm;X u)

Page 11: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Another view on transductive algorithms

learner

compute KK

®®compute

h = K ®h = K ®

X m+uX m+u

(Sm;X u)(Sm;X u)

(m+ u) £ r(m+ u) £ rmatrix

r £ 1r £ 1vector

Example:

- inverse of graph Laplacian iff ; otherwise.

KK

®i = yi®i = yi xi 2 Sm

xi 2 Sm

®i = 0®i = 0

Unlabeled-Labeled Decomposition (ULD)

Page 12: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Bounding Rademacher complexity

Hypothesis space : the set of all , obtained by operatingtransductive algorithm on all possible partitions .

Notation: , - set of ‘s generated by . - all singular values of .

Lemma 2:

Lemma 2 justifies the spectral transformations performedto improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05]).

.

= f ! i gri=1

= f ! i gri=1 KK

h = K ®h = K ®

HAHA hh

(Sm;X u)(Sm;X u)AA

TT ®® AA¹ = sup®2T f k®k2g¹ = sup®2T f k®k2g

Rm+u(HA ) · ¹q

2mu

P ri=1 ! 2

iRm+u(HA ) · ¹

q2

mu

P ri=1 ! 2

i

Page 13: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Bounds for graph-based algorithms

Consistency method [Zhou, Bousquet, Lal, Weston, Scholkopf,

‘03]:

where are singular values of .

Similar bounds for the algorithms of [Joachims,’03],

[Belkin et al., ‘04], etc.

Rm+u(HA ) ·q

2u

P m+ui=1 ! 2

iRm+u(HA ) ·

q2u

P m+ui=1 ! 2

i

f ! i gm+ui=1

f ! i gm+ui=1 KK

Page 14: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Topics not covered Bounding the Rademacher complexity

when is a kernel matrix. For some algorithms: data-dependent

method of computing probabilistic upper and lower bounds on Rademacher complexity.

Risk bound for transductive mixtures.

KK

Page 15: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Direction for future research

Tighten the risk bound to allow effective model

selection: Bound depending on 0/1 empirical error. Usage of variance information to obtain

better convergence rate. Local transductive Rademacher

complexity. Clever data-dependent choice of low-

Rademacher hypothesis spaces.

Page 16: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.
Page 17: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Monte Carlo estimation of transductive Rademacher complexity

Rm+u(H) = ( 1m + 1

u ) ¢E¾f suph2H ¾¢hgRm+u(H) = ( 1m + 1

u ) ¢E¾f suph2H ¾¢hgRademacher: .

Draw uniformly vectors of Rademacher variables, .

By Hoeffding inequality: for any , with prob. at least ,

.

How to compute the supremum? For the Consistency Method of [Zhou et al., ‘03] can be computed in time.

Symmetric Hoeffding inequality probabilistic lower bound on the transductive Rademacher complexity.

±> 0±> 0

nn ¾(1); : : : ;¾(n)¾(1); : : : ;¾(n)

1¡ ±1¡ ±

Rm+u(H) · ( 1m + 1

u ) ¢1n

P ni=1 suph2H ¾(i ) ¢h + O

³ q1n ln 1

±

´Rm+u(H) · ( 1

m + 1u ) ¢1

n

P ni=1 suph2H ¾(i ) ¢h + O

³ q1n ln 1

±

´

suph2H ¾(i ) ¢hsuph2H ¾(i ) ¢h

O¡(m+ u)2

¢O¡(m+ u)2

¢

!!

Page 18: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Induction vs. Transduction: differences

Induction Unknown

underlying distribution

Transduction No unknown

distribution. Each example has unique label.

Test examples not known. Will be sampled from the same distribution.

Test examples are known.

Generate a general hypothesis.

Want generalization!

Only classify given examples.

No generalization!

Independent training examples.

Dependent training and test examples.

Page 19: Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Justification of spectral transformations

, - set of ‘s generated by .

- all singular values of .

Lemma 2: . Lemma 2 justifies the spectral transformations

performedto improve the performance of transductive

algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang

and Ando,‘05]).

= f ! i gri=1

= f ! i gri=1 KK

TT ®® AA¹ = sup®2T f k®k2g¹ = sup®2T f k®k2g

Rm+u(HA ) · ¹q

2mu

P ri=1 ! 2

iRm+u(HA ) · ¹

q2

mu

P ri=1 ! 2

i