Holographic Embeddings of Knowledge Graphs Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio
Holographic Embeddings of Knowledge Graphs
Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio
Knowledge Graphs Ñ Search Engines
Knowledge Graphs Ñ Digital Assistants
Relational Knowledge Representation
Knowledge graphs provide machine-interpretable data by modeling
knowledge « entities ` their relationships
Facts are represented as binary relations Rppes , eoq.
vicePresident(Obama, Biden)memberOf(Obama, Democrats)memberOf(Biden, Democrats)
ñ
Barack Obama Joe Biden
Democratic Party
vicePresident
party party
Modern knowledge graphs like Freebase, YAGO, DBpedia are
• Very large (FB: 40M entities, 35K relations, 637M facts)
• Very incomplete (FB: Nationality for 71% of persons missing)
Relational Knowledge Representation
Knowledge graphs provide machine-interpretable data by modeling
knowledge « entities ` their relationships
Facts are represented as binary relations Rppes , eoq.
Multigraph structure
Entity “ Node
Fact “ Edge
Relation type “ Edge type
“
Barack Obama Joe Biden
Democratic Party
vicePresident
party party
Modern knowledge graphs like Freebase, YAGO, DBpedia are
• Very large (FB: 40M entities, 35K relations, 637M facts)
• Very incomplete (FB: Nationality for 71% of persons missing)
Machine Learning on Knowledge Graphs
Learn a statistical model of a knowledge graph
Predict probability of any edge (link prediction)
Barack Obama Joe Biden
Democratic Party
Bill Clinton Al Gore
vicePresident
vicePresident
party
party party
?
Applications
• KG Completion
• “Structured” prior forMachine Reading
• Probabilistic QA
Challenges
• Relational nature of data
• Size of modern KGs
Knowledge Graph Embeddings
Knowledge graph embeddings consist of
Entity Embeddings + Relation Embeddings + Score Function
Goal: Learn embeddings that best explain the data according toscore function
RESCAL (Nickel, Tresp, et al., 2011)
scorepRppes , eoqq “ eJs Rpeo
• Interpretation as tensor completion
• State-of-the-art results on SRLbenchmarks
• Runtime & memory complexity Opd2q
«
i-thentity
j-th entity
k-threlation
Xk ERkEJ
Knowledge Graph Embeddings
Knowledge graph embeddings consist of
Entity Embeddings + Relation Embeddings + Score Function
Goal: Learn embeddings that best explain the data according toscore function
TransE (Bordes et al., 2013):
scorepRppes , eoqq “ ´}es ` rp ´ eo}1
• Inspired by Word2Vec
• Runtime & memory complexity Opdq
• Less powerful than RESCAL
es ` rp
eo
Holographic Embeddings
Interlude: Relations « Classification of Tuples
Let E be the set of all entities in a domain
A binary relation R Ď E ˆ E is the subset ofall pairs of entities for which the relationshipis true
partyOf
E ˆ E
Characteristic Function of Relations
ϕpps,oq “
#
1, ps,oq P Rp
0, otherwise
Observation: this is what we want to learn in link prediction
Relational Learning « Classification of Tuples
(Nickel, Rosasco, et al., 2016)
Holographic Embeddings (HOLE)
Holographic Embeddings
• model entities as vectors
• model relations types as vectors
• represent pairs of entities as
ei ÞÑ ei P Rd
Rk ÞÑ rk P Rd
pes , eoq ÞÑ es ‹ eo P Rd
where ‹ : Rd ˆ Rd Ñ Rd denotes circular correlation
ra ‹ bsk “
d´1ÿ
i“0aibpk`iq mod d .
Model relationships via the classification of pairs of entities
PrpRppes , eoq “ 1|Θq “ σ´
rJp pes ‹ eoq
¯
where Θ “ teiunei“1 Y trku
nrk“1
(Nickel, Rosasco, et al., 2016)
Holographic Embeddings (HOLE)
Holographic embeddings use circular correlation ‹ : Rd ˆ Rd Ñ Rd
pes , eoq « es ‹ eo
which is defined for a,b P Rd as
ra ‹ bsk “
d´1ÿ
i“0aibpk`iq mod d .
Compressed Tensor Product
a2
a1
a0
b0 b1 b2
c2 c1 c0
(Plate, 1995)
Holographic Embeddings (HOLE)
Holographic embeddings use circular correlation ‹ : Rd ˆ Rd Ñ Rd
pes , eoq « es ‹ eo
which is defined for a,b P Rd as
ra ‹ bsk “
d´1ÿ
i“0aibpk`iq mod d .
Compressed Tensor Product
a2
a1
a0
b0 b1 b2
c2 c1 c0
(Plate, 1995)
Holographic Embeddings (HOLE)
Holographic embeddings use circular correlation ‹ : Rd ˆ Rd Ñ Rd
pes , eoq « es ‹ eo
which is defined for a,b P Rd as
ra ‹ bsk “
d´1ÿ
i“0aibpk`iq mod d .
Compressed Tensor Product
a2
a1
a0
b0 b1 b2
c2 c1 c0(Plate, 1995)
Circular Correlation as a Compositional Operator
Components of entity embeddings « latent features of entities
Model relation instances via interactions of latent features
e.g., partyOf relation in the US presidents example:
Liberal persons are members of liberal partiesConservative persons are members of conservative parties
HOLE as a Neural Network
es1 es2 es3 eo1 eo2 eo3
rJp pes ‹ eoq
‹
rp
subject object
(Liberal Person ^ Liberal Party)_
(Conserv. Person ^Conserv. Party)
Liberal Person Liberal Party
Liberal Person ^ Liberal Party
Circular Correlation as a Compositional Operator
Components of entity embeddings « latent features of entities
Model relation instances via interactions of latent features
e.g., partyOf relation in the US presidents example:
Liberal persons are members of liberal partiesConservative persons are members of conservative parties
HOLE as a Neural Network
es1 es2 es3 eo1 eo2 eo3
rJp pes ‹ eoq
‹
rp
subject object
(Liberal Person ^ Liberal Party)_
(Conserv. Person ^Conserv. Party)
Liberal Person Liberal Party
Liberal Person ^ Liberal Party
Computing Holographic Embeddings
Runtime Complexity: We can compute circular correlation efficientlyvia fast Fourier transforms (FFT) in Opd logdq
a ‹ b “ F ´1pF paq d F pbqq
where F and F ´1 denote the FFT and its inverse.
Memory Complexity: Since circular correlation is a functionRd ˆ Rd Ñ Rd , the memory complexity is Opdq
es1 es2 es3 eo1 eo2 eo3
rppes b eoq
b
rp
subject object
Tensor Product
es1 es2 es3 eo1 eo2 eo3
rJp pes ‹ eoq
‹
rp
subject object
Circular Correlation
Holographic Associative Memory
Holographic Associative Memory
Let pai ,bi q be stimulus-response pairs
Storage m Ðř
i ai ˚ bi
Retrieval b1 Ð a ‹m
Clean-up b Ð arg maxbi bJi pa ‹mq
Holographic Embeddings
Let So “
ps,pqˇ
ˇRppes , eoq “ 1(
Storage eo Ðř
ps,pq rp ˚ es
Retrieval r 1 Ð es ‹ eo
Probability σprJp pes ‹ eoqq
Generalization, not memorization
Storage
Retrieval
(Plate, 1995; Poggio, 1973; Gabor, 1969; Willshaw, 1985)
Experiments
Link Prediction on WordNet
• WordNet consists of lexicalrelationships between words
• WN18 subset (Bordes et al., 2013)
Entities 40,943Relation types 18Facts 151,442
Optics
Holography
optical
hypernym
derivational form
HOLE TRANSE TRANSRRESCAL ER-MLP0
0.2
0.4
0.6
0.8
1 0.94
0.50.61
0.89
0.71
0.93
0.11
0.34
0.84
0.63
0.950.89 0.880.9
0.78
0.95 0.94 0.940.930.86
MRR Hits@1 Hits@3 Hits@10
Link Prediction on Freebase
• Freebase consists of general factsabout the world (e.g., harvested fromWikipedia, MusicBrainz, etc.)
• FB15k subset (Bordes et al., 2013)
Entities 14,951Relation types 1345Facts 592,213
BarackObama
DemocraticParty
Joe Biden
party
vicePresident
HOLE TRANSE TRANSRRESCAL ER-MLP0
0.2
0.4
0.6
0.8
0.520.46
0.350.350.29
0.4
0.30.220.24
0.17
0.61 0.58
0.40.41
0.32
0.74 0.75
0.580.590.5
MRR Hits@1 Hits@3 Hits@10
MRR vs Number of Parameters
FB15k
TRANSE
TRANSR
RESCAL
ER-MLP
HOLE
0 5 10 15 20 25 30 35
0.3
0.4
0.5
0.6
Number of Parameters in Millions
MR
R
Summary
• HOLE combines state-of-the-art relational learning andhigh scalability in a single model
• Enables complex models of knowledge graphs
• Interpretation in terms of associative memory
Future WorkSince circular correlation is a function Rd ˆ Rd Ñ Rd
Tuple is vector of same size as entity
John loves Mary
believesTom
Essential property to createrecursive representations
Nested Factsbelieves(Tom,loves(John,Mary))
Higher-arity RelationstaughtAt(Tom,AI,MIT)
Thank you
Software
• Open-Source Library for Knowledge Graph Embeddingshttp://github.com/mnick/scikit-kge
• Experiments for this Paperhttps://github.com/mnick/holographic-embeddings
Recent Review ArticleMaximilian Nickel, Kevin Murphy, et al. (2016). “A Review of Relational MachineLearning for Knowledge Graphs”. In: Proc. of the IEEE
Simple Reasoning
• Task: Predict the region of countries• Setting: 10-fold cross validation over countries
Region
Subregion
Test Country
Train Country
partOf
partOf
partOf
partOfneighbors
partOf
(Nickel et al., 2015)
Simple Reasoning
• Task: Predict the region of countries
• Setting: 10-fold cross validation over countries
Region
Subregion
Test Country
Train Country
partOf
partOf
partOfneighbors
partOfRANDOM
RULE
MLN-S
TRANSE
ER-MLP
RESCAL
HOLE
0
0.2
0.4
0.6
0.8
1
0.32
1
0.34
1 0.96 1 1
AUC-P
R
S1 : partOfpc, sq ^ partOfps, rq ñ partOfpc, rq
(Nickel et al., 2015)
Simple Reasoning
• Task: Predict the region of countries
• Setting: 10-fold cross validation over countries
Region
Subregion
Test Country
Train Country
partOf
partOfneighbors
partOfRANDOM
RULE
MLN-S
TRANSE
ER-MLP
RESCAL
HOLE
0
0.2
0.4
0.6
0.8
1
0.32
0.78
0.34
0.74 0.73 0.75 0.77
AUC-P
R
S2 : neighborspc1, c2q ^ partOfpc2, rq ñ partOfpc1, rq
(Nickel et al., 2015)
Simple Reasoning
• Task: Predict the region of countries
• Setting: 10-fold cross validation over countries
Region
Subregion
Test Country
Train Country
partOfneighbors
partOf
RANDOM
RULE
MLN-S
TRANSE
ER-MLP
RESCAL
HOLE
0
0.2
0.4
0.6
0.8
1
0.32
0.78
0.34
0.69 0.65 0.650.71
AUC-P
R
S3 : neighborspc1, c2q ^ partOfpc2, sq ^ partOfps, rq ñ partOfpc1, rq
(Nickel et al., 2015)
Link Prediction on SRL Benchmarks
• Holographic Embeddings keep excellent performance onSRL benchmark datasets
• Other knowledge graph embedding models perform worse
MLN
TRANSE
ER-MLP
RESCALHOLE
MLN
TRANSE
ER-MLP
RESCALHOLE
MLN
TRANSE
ER-MLP
RESCALHOLE
0
0.2
0.4
0.6
0.8
10.85
0.75
0.98
0.34
0.740.84
0.14
0.840.940.98
0.85
0.990.980.88
0.98
AU
C-P
R
Kinships Nations UMLS
(Nickel, Tresp, et al., 2011; Garcia-Duran et al., 2015)
Relational Learning with HOLE
MAP estimates for Θ “ teiuni“1 Y trkumk“1 for the joint distribution
PrpY |Θq “
nź
s“1
mź
p“1
nź
o“1Prpyspo “ 1|σprJ
p pes ‹ eoqqq
Shared representations enable relational learning
yspo
λe
es eo
rp
λr
NN
M
• Entities have same embeddings assubjects, objects, and over all relations
• Embeddings are learned jointly: allows topropagate information between triples
• Decoupling effect
• Known parameters: local computation• Parameter learning: global dependencies
• Holds for many compositional models