Bipartite Edge Prediction via Transductive Learning over Product Graphs Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 1
52
Embed
Bipartite Edge Prediction via Transductive Learning over ...nyc.lti.cs.cmu.edu/.../Publications/liu-icml2015-slides.pdfICML 2015 Bipartite Edge Prediction via Transductive Learning
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bipartite Edge Prediction via Transductive Learning over Product Graphs
Bipartite Edge Prediction via TransductiveLearning over Product Graphs
Hanxiao Liu, Yiming Yang
School of Computer Science, Carnegie Mellon University
July 8, 2015
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 1
Bipartite Edge Prediction via Transductive Learning over Product GraphsProblem Description
Sometimes, vertex sets on both sides are intrinsically structured.Heterogeneous info: G + H + partial observationsCombine them to make better edge predictions?
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 3
Bipartite Edge Prediction via Transductive Learning over Product GraphsProblem Description
Problem Description
Many applications involve predicting the edges of a bipartite graph.
Sometimes, vertex sets on both sides are intrinsically structured.Heterogeneous info: G + H + partial observationsCombine them to make better edge predictions?
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 6
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
Transductive learning should be effective1 Labeled edges (red) are highly sparse2 Unlabeled edges (gray) are massively available
Assumption: similar edges should have similar labelsPrerequisite: a similarity measure among the edges, i.e. a “Graph ofEdges” (not directly provided)Can be induced from G and H via Graph Product!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 7
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
Transductive learning should be effective1 Labeled edges (red) are highly sparse2 Unlabeled edges (gray) are massively available
Assumption: similar edges should have similar labels
Prerequisite: a similarity measure among the edges, i.e. a “Graph ofEdges” (not directly provided)Can be induced from G and H via Graph Product!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 8
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
Transductive learning should be effective1 Labeled edges (red) are highly sparse2 Unlabeled edges (gray) are massively available
Assumption: similar edges should have similar labelsPrerequisite: a similarity measure among the edges, i.e. a “Graph ofEdges” (not directly provided)
Can be induced from G and H via Graph Product!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 9
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
Transductive learning should be effective1 Labeled edges (red) are highly sparse2 Unlabeled edges (gray) are massively available
Assumption: similar edges should have similar labelsPrerequisite: a similarity measure among the edges, i.e. a “Graph ofEdges” (not directly provided)Can be induced from G and H via Graph Product!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 10
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
The “Graph of Edges” can be induced by taking the product of G and H
In the product graph G ◦HEach Vertex ∼ edge (in the original bipartite graph)Each Edge ∼ edge-edge similarity
The adjacency matrix of the product graph is defined by “◦” (to bediscussed later).
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 11
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
The “Graph of Edges” can be induced by taking the product of G and H
In the product graph G ◦HEach Vertex ∼ edge (in the original bipartite graph)Each Edge ∼ edge-edge similarity
The adjacency matrix of the product graph is defined by “◦” (to bediscussed later).
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 12
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
Problem Mapping
Edge Prediction(Original Problem)Given G, H and labeled edges,predict the unlabeled edges
I
II
A
B
C
?
?
?
?-2
+5
Vertex Prediction(Equivalent Problem)Given G◦H and labeled vertices,predict the unlabeled vertices
(I, C)?
(I, A)-2
(I, B)?
(II, C)?
(II, A)?
(II, B)+5
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 13
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 29
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 30
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Even if κ(A)−1 is given, it is expensive to compute ∇r(f) naively
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 31
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Prohibitive to load it into memoryProhibitive to compute its inverseEven if κ(A)−1 is given, it is expensive to compute ∇r(f) naively
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 32
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Prohibitive to load it into memory No need to store κ(A)Prohibitive to compute its inverseEven if κ(A)−1 is given, it is expensive to compute ∇r(f) naively
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 33
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Prohibitive to load it into memory No need to store κ(A)Prohibitive to compute its inverse No need of matrix inverseEven if κ(A)−1 is given, it is expensive to compute ∇r(f) naively
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 34
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Prohibitive to load it into memory No need to store κ(A)Prohibitive to compute its inverse No need of matrix inverseEven if κ(A)−1 is given, it is expensive to compute ∇r(f) naivelyCan be performed much more efficiently
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 35
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Keys for complexity reduction1 Instead of matrices—
κ only manipulates eigenvalues◦ only manipulates the interplay of eigenvalues
2 The “vec” trick:
Bottleneck: multiplication (X ⊗ Y )ff = vec(F ), where Fij
def= system-predicted score for edge (i, j)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 36
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Keys for complexity reduction1 Instead of matrices—
κ only manipulates eigenvalues◦ only manipulates the interplay of eigenvalues
2 The “vec” trick:Bottleneck: multiplication (X ⊗ Y )f
f = vec(F ), where Fijdef= system-predicted score for edge (i, j)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 37
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Keys for complexity reduction1 Instead of matrices—
κ only manipulates eigenvalues◦ only manipulates the interplay of eigenvalues
2 The “vec” trick:Bottleneck: multiplication (X ⊗ Y )ff = vec(F ), where Fij
def= system-predicted score for edge (i, j)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 38
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Keys for complexity reduction1 Instead of matrices—
κ only manipulates eigenvalues◦ only manipulates the interplay of eigenvalues
2 The “vec” trick:Bottleneck: multiplication (X ⊗ Y )ff = vec(F ), where Fij
def= system-predicted score for edge (i, j)(X ⊗ Y )f︸ ︷︷ ︸
O(m2n2) time/space
= (X ⊗ Y )vec(F )
≡ vec(XFY >)︸ ︷︷ ︸O(mn(m + n)) time, O((m + n)2) space
(7)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 39
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matrices
The cost of each alternating gradient step is proportional torank(F ) · rank(Σ)Σ: a “Characteristic Matrix” where Σij = 1
κ(λi◦µj)
An interesting observation: rank(Σ) is usually a small constant!Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 40
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matricesThe cost of each alternating gradient step is proportional torank(F ) · rank(Σ)
Σ: a “Characteristic Matrix” where Σij = 1κ(λi◦µj)
An interesting observation: rank(Σ) is usually a small constant!Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 41
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matricesThe cost of each alternating gradient step is proportional torank(F ) · rank(Σ)Σ: a “Characteristic Matrix” where Σij = 1
κ(λi◦µj)
An interesting observation: rank(Σ) is usually a small constant!Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 42
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matricesThe cost of each alternating gradient step is proportional torank(F ) · rank(Σ)Σ: a “Characteristic Matrix” where Σij = 1
κ(λi◦µj)An interesting observation: rank(Σ) is usually a small constant!
Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 43
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matricesThe cost of each alternating gradient step is proportional torank(F ) · rank(Σ)Σ: a “Characteristic Matrix” where Σij = 1
κ(λi◦µj)An interesting observation: rank(Σ) is usually a small constant!Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 44
Bipartite Edge Prediction via Transductive Learning over Product GraphsExperiment