Top Banner
A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation Introductory example Relational learning Other applications Pairwise learning methods Kronecker kernel ridge regression Two-step kernel ridge regression Computational aspects Cross-validation Exact online learning Take home messages KERMIT A two-step method to incorporate task features for large output spaces Michiel Stock 1 , Tapio Pahikkala 2 , Antti Airola 2 , Bernard De Baets 1 & Willem Waegeman 1 1 KERMIT Department of Mathematical Modelling, Statistics and Bioinformatics Ghent University 2 Department of Computer Science University of Turku NIPS: extreme classification workshop December 12, 2015
27

A two-step method to incorporate task features for large output spaces

Apr 15, 2017

Download

Engineering

Michiel Stock
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

A two-step method to incorporate task featuresfor large output spaces

Michiel Stock1, Tapio Pahikkala2, Antti Airola2, BernardDe Baets1 & Willem Waegeman1

1KERMITDepartment of Mathematical Modelling, Statistics and Bioinformatics

Ghent University

2Department of Computer ScienceUniversity of Turku

NIPS: extreme classification workshopDecember 12, 2015

Page 2: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 4

1 4

4

2 1

4 3

Alice Bob Cedric Daphne

Page 3: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 4

1 4

4

2 1

4 3

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

Page 4: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 2.3 4 3.1

1 4.5 1.3 4

3.9 4 3.8 0.8

2 5.2 1 4.5

4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

Page 5: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 2.3 4 3.1

1 4.5 1.3 4

3.9 4 3.8 0.8

2 5.2 1 4.5

4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

4.8 1.1 3.7 2.31 1 0 1

Page 6: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 2.3 4 3.1

1 4.5 1.3 4

3.9 4 3.8 0.8

2 5.2 1 4.5

4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

4.8 1.1 3.7 2.31 1 0 1

2.3

4.0

1.7

4.8

2.9

Eric

Page 7: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

What will we read next?

5 2.3 4 3.1

1 4.5 1.3 4

3.9 4 3.8 0.8

2 5.2 1 4.5

4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph

Genre

1 1 0 1

0 0 1 0

1 1 0 0

0 0 1 0

0 1 0 1

4.8 1.1 3.7 2.31 1 0 1

2.3

4.0

1.7

4.8

2.9

Eric

2.4

Page 8: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning relations

Setting A

Setting B

Setting C

Setting D

Training

In-sampletasks

Out-of-sampletasks

Out-of-sample

instances

In-sampleinstances

Page 9: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Other cool applications: drug design

Predicting interaction between proteins and small compounds

Page 10: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Other cool applications: social network analysis

Predicting links between people

Page 11: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Other cool applications: food pairing

Finding ingredients that pair well

Page 12: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with pairwise feature representations

Features books

Features readers

� d : instance (e.g. book)

φ(d) : instance features(e.g. genre)

t : task (e.g. reader)

ψ(t) : task features (e.g.social network)

Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))

Page 13: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with pairwise feature representations

Features books

Features readers

� �⌦

⌦ =

d : instance (e.g. book)

φ(d) : instance features(e.g. genre)

t : task (e.g. reader)

ψ(t) : task features (e.g.social network)

Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))

Page 14: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with pairwise feature representations

Features books

Features readers

� �⌦

⌦ =

d : instance (e.g. book)

φ(d) : instance features(e.g. genre)

t : task (e.g. reader)

ψ(t) : task features (e.g.social network)

Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))

Page 15: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning relations in two steps

In-sampletasks

Out-of-sampletasks

Task KRR

InstanceKRR

Virtual instances

In-sampleinstances

Out-of-sample

instances

1 Build a ridgeregression model togeneralize to newinstances

2 Build a ridgeregression model togeneralize to newtasks

Page 16: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

The two-step ridge regression

Prediction function:

f (d , t) = φ(d)ᵀWψ(t)

Parameters can be found by solving:

ΦᵀYΨ = (ΦᵀΦ + λd I)W(ΨᵀΨ + λtI)

Two hyperparameters: λd and λt !

Page 17: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

The two-step ridge regression

Prediction function:

f (d , t) = φ(d)ᵀWψ(t)

Parameters can be found by solving:

ΦᵀYΨ = (ΦᵀΦ + λd I)W(ΨᵀΨ + λtI)

Two hyperparameters: λd and λt !

Page 18: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Four ways of cross validation

Setting A Setting B

Setting DSetting CTrain

Test

Discarded

Analytic shortcutscan be derived toperform LOOCV foreach setting!

Tuning λd and λtessentially free!

Page 19: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Four ways of cross validation

Setting A Setting B

Setting DSetting CTrain

Test

Discarded

Analytic shortcutscan be derived toperform LOOCV foreach setting!

Tuning λd and λtessentially free!

Page 20: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Effect of regularization for the four settings

Data: protein-ligand interactions.Evaluation by AUC (lighter = better performance)

lambda drugs

lam

bda t

arg

ets

0.550

0.600

0.6

50

0.7

00

0.7

50

0.8

00

0.800

0.8

00

0.8

50

nr dataCV for Setting A

lambda drugs

lam

bda t

arg

ets

0.5600.6

00

0.6

40

0.6

80

0.7

20

0.7

60

nr dataCV for Setting B

lambda drugs

lam

bda t

arg

ets

0.7

90

0.8

00

0.8

00

0.810

0.8

10

0.8

20

0.8

20

0.8

30

0.8

30

0.8

40

0.8

50

nr dataCV for Setting C

lambda drugs

lam

bda t

arg

ets

0.600

0.6

25

0.625

0.650

0.6

75

0.7

00

0.7

25

nr dataCV for Setting D

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

Clear difference between four settings and λd and λt !

Page 21: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with mini-batches

Initial training data

New training instances

Even more training instances

TasksIn

stan

ces

New

trai

ning

task

s

Exact updating of theparameters when newtraining instances and/ortaks become available

scalable for “BigData” applications

updating model indynamicenvironment

Page 22: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Learning with mini-batches

Initial training data

New training instances

Even more training instances

TasksIn

stan

ces

New

trai

ning

task

s Exact updating of theparameters when newtraining instances and/ortaks become available

scalable for “BigData” applications

updating model indynamicenvironment

Page 23: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Exact online learning for hierarchical textclassification

Hierarchical text classification (> 12, 000 labels): from 5,000to 350,000 instances in steps of 1,000 instances.

Page 24: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning...in one line of code

Theoretically well founded

Allows for nifty computational tricks

‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches

Page 25: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning...in one line of code

Theoretically well founded

Allows for nifty computational tricks

‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches

Page 26: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning...in one line of code

Theoretically well founded

Allows for nifty computational tricks

‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches

Page 27: A two-step method to incorporate task features for large output spaces

A two-stepmethod forlarge output

spaces

Michiel Stocktwitter:

@michielstock

Motivation

Introductoryexample

Relationallearning

Otherapplications

Pairwiselearningmethods

Kronecker kernelridge regression

Two-step kernelridge regression

Computationalaspects

Cross-validation

Exact onlinelearning

Take homemessages

KERMIT

A two-step method to incorporate task featuresfor large output spaces

Michiel Stock1, Tapio Pahikkala2, Antti Airola2, BernardDe Baets1 & Willem Waegeman1

1KERMITDepartment of Mathematical Modelling, Statistics and Bioinformatics

Ghent University

2Department of Computer ScienceUniversity of Turku

NIPS: extreme classification workshopDecember 12, 2015