Page 1
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
A two-step method to incorporate task featuresfor large output spaces
Michiel Stock1, Tapio Pahikkala2, Antti Airola2, BernardDe Baets1 & Willem Waegeman1
1KERMITDepartment of Mathematical Modelling, Statistics and Bioinformatics
Ghent University
2Department of Computer ScienceUniversity of Turku
NIPS: extreme classification workshopDecember 12, 2015
Page 2
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
What will we read next?
5 4
1 4
4
2 1
4 3
Alice Bob Cedric Daphne
Page 3
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
What will we read next?
5 4
1 4
4
2 1
4 3
Alice Bob Cedric Daphne
Social graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
Page 4
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
Page 5
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
Page 6
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
2.3
4.0
1.7
4.8
2.9
Eric
Page 7
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
What will we read next?
5 2.3 4 3.1
1 4.5 1.3 4
3.9 4 3.8 0.8
2 5.2 1 4.5
4 2.5 3 3.6
Alice Bob Cedric Daphne
Social graph
Genre
1 1 0 1
0 0 1 0
1 1 0 0
0 0 1 0
0 1 0 1
4.8 1.1 3.7 2.31 1 0 1
2.3
4.0
1.7
4.8
2.9
Eric
2.4
Page 8
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Learning relations
Setting A
Setting B
Setting C
Setting D
Training
In-sampletasks
Out-of-sampletasks
Out-of-sample
instances
In-sampleinstances
Page 9
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Other cool applications: drug design
Predicting interaction between proteins and small compounds
Page 10
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Other cool applications: social network analysis
Predicting links between people
Page 11
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Other cool applications: food pairing
Finding ingredients that pair well
Page 12
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Learning with pairwise feature representations
Features books
Features readers
� d : instance (e.g. book)
φ(d) : instance features(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.social network)
Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))
Page 13
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Learning with pairwise feature representations
Features books
Features readers
� �⌦
⌦ =
d : instance (e.g. book)
φ(d) : instance features(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.social network)
Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))
Page 14
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Learning with pairwise feature representations
Features books
Features readers
� �⌦
⌦ =
d : instance (e.g. book)
φ(d) : instance features(e.g. genre)
t : task (e.g. reader)
ψ(t) : task features (e.g.social network)
Pairwise prediction function: f (d , t) = wᵀ(φ(d) ⊗ψ(t))
Page 15
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Learning relations in two steps
In-sampletasks
Out-of-sampletasks
Task KRR
InstanceKRR
Virtual instances
In-sampleinstances
Out-of-sample
instances
1 Build a ridgeregression model togeneralize to newinstances
2 Build a ridgeregression model togeneralize to newtasks
Page 16
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
The two-step ridge regression
Prediction function:
f (d , t) = φ(d)ᵀWψ(t)
Parameters can be found by solving:
ΦᵀYΨ = (ΦᵀΦ + λd I)W(ΨᵀΨ + λtI)
Two hyperparameters: λd and λt !
Page 17
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
The two-step ridge regression
Prediction function:
f (d , t) = φ(d)ᵀWψ(t)
Parameters can be found by solving:
ΦᵀYΨ = (ΦᵀΦ + λd I)W(ΨᵀΨ + λtI)
Two hyperparameters: λd and λt !
Page 18
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Four ways of cross validation
Setting A Setting B
Setting DSetting CTrain
Test
Discarded
Analytic shortcutscan be derived toperform LOOCV foreach setting!
Tuning λd and λtessentially free!
Page 19
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Four ways of cross validation
Setting A Setting B
Setting DSetting CTrain
Test
Discarded
Analytic shortcutscan be derived toperform LOOCV foreach setting!
Tuning λd and λtessentially free!
Page 20
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Effect of regularization for the four settings
Data: protein-ligand interactions.Evaluation by AUC (lighter = better performance)
lambda drugs
lam
bda t
arg
ets
0.550
0.600
0.6
50
0.7
00
0.7
50
0.8
00
0.800
0.8
00
0.8
50
nr dataCV for Setting A
lambda drugs
lam
bda t
arg
ets
0.5600.6
00
0.6
40
0.6
80
0.7
20
0.7
60
nr dataCV for Setting B
lambda drugs
lam
bda t
arg
ets
0.7
90
0.8
00
0.8
00
0.810
0.8
10
0.8
20
0.8
20
0.8
30
0.8
30
0.8
40
0.8
50
nr dataCV for Setting C
lambda drugs
lam
bda t
arg
ets
0.600
0.6
25
0.625
0.650
0.6
75
0.7
00
0.7
25
nr dataCV for Setting D
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Clear difference between four settings and λd and λt !
Page 21
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Learning with mini-batches
Initial training data
New training instances
Even more training instances
TasksIn
stan
ces
New
trai
ning
task
s
Exact updating of theparameters when newtraining instances and/ortaks become available
scalable for “BigData” applications
updating model indynamicenvironment
Page 22
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Learning with mini-batches
Initial training data
New training instances
Even more training instances
TasksIn
stan
ces
New
trai
ning
task
s Exact updating of theparameters when newtraining instances and/ortaks become available
scalable for “BigData” applications
updating model indynamicenvironment
Page 23
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Exact online learning for hierarchical textclassification
Hierarchical text classification (> 12, 000 labels): from 5,000to 350,000 instances in steps of 1,000 instances.
Page 24
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...in one line of code
Theoretically well founded
Allows for nifty computational tricks
‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches
Page 25
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...in one line of code
Theoretically well founded
Allows for nifty computational tricks
‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches
Page 26
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
Why two-step ridge regression?
Zero-shot learning, transfer learning, multi-task learning...in one line of code
Theoretically well founded
Allows for nifty computational tricks
‘free’ tuning for the hyperparameters‘free’ LOOCV for all four settings!closed-form solution for updating with mini-batches
Page 27
A two-stepmethod forlarge output
spaces
Michiel Stocktwitter:
@michielstock
Motivation
Introductoryexample
Relationallearning
Otherapplications
Pairwiselearningmethods
Kronecker kernelridge regression
Two-step kernelridge regression
Computationalaspects
Cross-validation
Exact onlinelearning
Take homemessages
KERMIT
A two-step method to incorporate task featuresfor large output spaces
Michiel Stock1, Tapio Pahikkala2, Antti Airola2, BernardDe Baets1 & Willem Waegeman1
1KERMITDepartment of Mathematical Modelling, Statistics and Bioinformatics
Ghent University
2Department of Computer ScienceUniversity of Turku
NIPS: extreme classification workshopDecember 12, 2015