Unsupervised Relation Extraction with General Domain Knowledge Mirella Lapata Institute for Language, Cognition and Computation School of Informatics University of Edinburgh [email protected]Joint Symposium on Semantic Processing, JSSP 2013 Lapata Unsupervised Relation Extraction 1
48
Embed
Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Unsupervised Relation Extractionwith General Domain Knowledge
Mirella Lapata
Institute for Language, Cognition and ComputationSchool of Informatics
Mintz et al. (2009), Hoffmann et al. (2011), Surdeanu et al. (2012)What about relations not in the knowledge base?
Lapata Unsupervised Relation Extraction 9
MotivationProblem FormulationModeling Framework
Evaluation
Meaning Acquisition
Unsupervised Relation Extraction
Mikel Laboa was born in June 15, 1934 inPasaia,
Gipuzkoa
SherlockHolmes
was written bySir Arthur
Conan Doyle
Elizabeth II is the head of53-member
Commonwealthof Nations
Hasegawa et al. (2004), Banko et al. (2007) Yao et al. (2011).
Lapata Unsupervised Relation Extraction 10
MotivationProblem FormulationModeling Framework
Evaluation
Meaning Acquisition
Unsupervised Relation Extraction
Mikel Laboa was born in June 15, 1934 inPasaia,
Gipuzkoa
SherlockHolmes
was written bySir Arthur
Conan Doyle
Elizabeth II is the head of53-member
Commonwealthof Nations
Hasegawa et al. (2004), Banko et al. (2007) Yao et al. (2011).
Lapata Unsupervised Relation Extraction 10
MotivationProblem FormulationModeling Framework
Evaluation
Meaning Acquisition
Weakly Supervised Relation Extraction
C1: Mikel Laboa was born in June 15, 1934 in Pasai Gipuzkoa.
C2: Sherlock Holmes was written by Sir Arthur Conan Doyle.
C3: Elizabeth II is the head of the 53-member Commonwealth of Nations.
Hasegawa et al. (2004), Banko et al. (2007) Yao et al. (2011).
Domain-, language-independent, less accurate alternative!
Lapata Unsupervised Relation Extraction 11
MotivationProblem FormulationModeling Framework
Evaluation
Meaning Acquisition
Weakly Supervised Relation Extraction
C1: Mikel Laboa was born in June 15, 1934 in Pasai Gipuzkoa.
C2: Sherlock Holmes was written by Sir Arthur Conan Doyle.
C3: Elizabeth II is the head of the 53-member Commonwealth of Nations.
Hasegawa et al. (2004), Banko et al. (2007) Yao et al. (2011).Domain-, language-independent, less accurate alternative!
Lapata Unsupervised Relation Extraction 11
MotivationProblem FormulationModeling Framework
Evaluation
Meaning Acquisition
Our approach
Adopt unsupervised learning paradigm
Use topic model to infer relations
Impose domain knowledge (e.g., task specific constraints)
Fold-all framework (Andrzejewski et al., 2011) for relation extraction:
Relational LDA (Yao et al., 2011): represents relations via statisticsof tuples (features) accross documents
First-order logic rules for capturing domain knowledge as inMLN(Richardson and Domingos, 2006)
Lapata Unsupervised Relation Extraction 12
MotivationProblem FormulationModeling Framework
Evaluation
Meaning Acquisition
Our approach
Adopt unsupervised learning paradigm
Use topic model to infer relations
Impose domain knowledge (e.g., task specific constraints)
Fold-all framework (Andrzejewski et al., 2011) for relation extraction:
Relational LDA (Yao et al., 2011): represents relations via statisticsof tuples (features) accross documents
First-order logic rules for capturing domain knowledge as inMLN(Richardson and Domingos, 2006)
Lapata Unsupervised Relation Extraction 12
MotivationProblem FormulationModeling Framework
Evaluation
Learning setting
Input: Corpus of documents
Corpus: bag of relation tuples obtained from a dependency parser
Tuple: syntactic relationship between two named entities (NE)
Source NE: Mikel Laboa (PER)
Target NE: Pasaia, Gipuzkoa (LOC)
Dependency path: →nsubjpass→born→prep→in→pobj→
Output: assign tuples to clusters of semantic relations
Mikel Laboa was born inPasaia,
Gipuzkoa
relation#14
Lapata Unsupervised Relation Extraction 13
MotivationProblem FormulationModeling Framework
Evaluation
Learning setting
Input: Corpus of documents
Corpus: bag of relation tuples obtained from a dependency parser
Tuple: syntactic relationship between two named entities (NE)
Source NE: Mikel Laboa (PER)
Target NE: Pasaia, Gipuzkoa (LOC)
Dependency path: →nsubjpass→born→prep→in→pobj→
Output: assign tuples to clusters of semantic relations
Mikel Laboa was born inPasaia,
Gipuzkoa
relation#14
Lapata Unsupervised Relation Extraction 13
MotivationProblem FormulationModeling Framework
Evaluation
Learning setting
Input: Corpus of documents
Corpus: bag of relation tuples obtained from a dependency parser
Tuple: syntactic relationship between two named entities (NE)
Source NE: Mikel Laboa (PER)
Target NE: Pasaia, Gipuzkoa (LOC)
Dependency path: →nsubjpass→born→prep→in→pobj→
Output: assign tuples to clusters of semantic relations
Mikel Laboa was born inPasaia,
Gipuzkoa
relation#14
Lapata Unsupervised Relation Extraction 13
MotivationProblem FormulationModeling Framework
Evaluation
Learning setting
Input: Corpus of documents
Corpus: bag of relation tuples obtained from a dependency parser
Tuple: syntactic relationship between two named entities (NE)
Source NE: Mikel Laboa (PER)
Target NE: Pasaia, Gipuzkoa (LOC)
Dependency path: →nsubjpass→born→prep→in→pobj→
Output: assign tuples to clusters of semantic relations
Mikel Laboa was born inPasaia,
Gipuzkoa
relation#14
Lapata Unsupervised Relation Extraction 13
MotivationProblem FormulationModeling Framework
Evaluation
Learning setting
Input: Corpus of documents
Corpus: bag of relation tuples obtained from a dependency parser
Tuple: syntactic relationship between two named entities (NE)
Source NE: Mikel Laboa (PER)
Target NE: Pasaia, Gipuzkoa (LOC)
Dependency path: →nsubjpass→born→prep→in→pobj→
Output: assign tuples to clusters of semantic relations
Mikel Laboa was born inPasaia,
Gipuzkoa
relation#14
Lapata Unsupervised Relation Extraction 13
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Modeling Framework
The model contains two main components:
Relational LDA: statistics on tuples (features) accross documents;captures local information
First-order logic (FOL) rules: domain knowledge as in MLN;supply global constraints and capture intuitions about NE behavior.
1 “ORG-LOC entities express the same relation”
2 “These two syntactic patterns do not express the same relation”
3 “This relation is incompatible with this name entity”
4 “The tuples are similar and should express same relation type”
Lapata Unsupervised Relation Extraction 14
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Modeling Framework
The model contains two main components:
Relational LDA: statistics on tuples (features) accross documents;captures local information
First-order logic (FOL) rules: domain knowledge as in MLN;supply global constraints and capture intuitions about NE behavior.
1 “ORG-LOC entities express the same relation”
2 “These two syntactic patterns do not express the same relation”
3 “This relation is incompatible with this name entity”
4 “The tuples are similar and should express same relation type”
Lapata Unsupervised Relation Extraction 14
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Relational LDA (Yao et al., 2011)
Mohammed
Zahir Shah
, said the body of the for-
mer monarch’s third son,Prince Shah Mahmood
parents
title
Mahmood who was 56, died Monday in Italy
age
date of death
country of death
The king ’s wife Homeria , who died four month ago in Italy
wife
date of death
country of death
Lapata Unsupervised Relation Extraction 15
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Relational LDA
Documents
Relation proportions andassignments
Relations
PER LOC 0.05Live in 0.03New York 0.02native 0.01...
PER ORG 0.04President of 0.03Republican 0.02White House 0.01...
ORG LOC 0.04Based in 0.03Division of 0.01Headquarters0.01...
PER PER 0.08Wife of 0.03Wedding 0.02Husband 0.01...
Lapata Unsupervised Relation Extraction 16
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Posterior Distribution
Documents
Relation proportions andassignments
Relations
Lapata Unsupervised Relation Extraction 17
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Relational LDA: Graphical Model
P(z, φ, θ|α, β,d) ∝R∏r
p(φr |β)D∏j
p(θj |α)N∏i
θdi(zi )
∏k∈pi
φzi (fk)
Lapata Unsupervised Relation Extraction 18
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Features
George Bush traveled to France on Thursday for a summit
source: first entity mention of relation tuple George Bush
dest: second entity mention of relation tuple France
nepair: type and order of entity mentions PER-LOC
path: DepPath between entity mentions [subj ,traveled,prep,to pobj ]
trigger: content words occurring in dependency path. travel
Lapata Unsupervised Relation Extraction 19
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Adding logic to Relational LDA
Represent Relational LDA as a Markov Logic Network(Richardson and Domingos, 2006)
Encode rules and model variables in weighted FOL knowledge base.
Learn relations (φ) influenced by:
Feature-document statistics (Relational LDA)Domain knowledge rules (as in MLN)
Lapata Unsupervised Relation Extraction 20
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Undirected Relational LDA
We represent the directed graphical model as a factor graph (MRF)
Convinient representation for introducing logic knowledge
Lapata Unsupervised Relation Extraction 21
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Logical predicates
Define logical predicates for each relational LDA variable:
Value Predicate Description
zi = r Z(i , r) Latent relationfk = v F(k , v) feature of relation tuplepi = i P(i , fk) tuple i contains feature fkdi = j D(i , j) observed document
Z(i , r) is true if zi = r , false otherwiseSame for the rest of the predicates
Rules form a weighted knowledge base in Conjuctive Normal Form:
KB = {(λ1, ψ1), ..., (λL, ψL)}
Lapata Unsupervised Relation Extraction 22
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Logic Rules
Must-Link Tuple: tuples that share features express the same relation
∀i , j , k : F(i ,path:is the president of) ∧ P(j , fi ) ∧ P(k, fi )
⇒ ¬Z(j , t) ∨ Z(k , r)
Cannot-Link Tuple: define rules to express incompatibilities over tuples
∀i , j , k, l : F(i ,nepair:PER-PER)
∧F(j ,nepair:ORG-LOC)
∧P(k , fi ) ∧ P(l , fj)⇒ ¬Z(k , r) ∨ Z(l , r)
Lapata Unsupervised Relation Extraction 23
MotivationProblem FormulationModeling Framework
Evaluation
Relational LDAFirst Order Logic and Relational LDA
Inference
Rule Grounding
The KB is tied to the probabilistic model via its groundings
G (ψl) set of ground formulas g for the rule ψl
Get the tuples indices that satisfy the rule
Alternating Optimization with Mirror Descent
Combinatiorial explosion: Continuous relaxation and randomsampling of groundings to do Stochastic Mirror Descent over latentrelation type assigments (Andrzejewski et al., 2011)
nerpair:ORG-LOC vs nerpair:PER-PERnerpair:LOC-LOC vs trigger:presidentnerpair:PER-LOC vs trigger:membernerpair:PER-PER vs trigger:sell
Lapata Unsupervised Relation Extraction 26
MotivationProblem FormulationModeling Framework
Evaluation
Experimental Setup
Experiments: Evaluation
Baseline models:
Hasegawa et al (2004): clustering of co-occurring named entities
LDA: estimate relations over bag-of-words
Relational LDA: without domain knowledge
Fscore measure (Agirre and Soroa, 2007)
Precision: correct members of cluster / items in cluster
Recall: correct members of cluster / items in gold-standar class
Lapata Unsupervised Relation Extraction 27
MotivationProblem FormulationModeling Framework
Evaluation
Experimental Setup
Results
HASEGAWALDA
RELLDA20
25
30
35
40
45
50
Fsco
re (
%)
Relation SubtypeRelation Type
Lapata Unsupervised Relation Extraction 28
MotivationProblem FormulationModeling Framework
Evaluation
Experimental Setup
Results
HASEGAWALDA
RELLDA
MLT ALL
CLT ALL
CLT+MLT ALL20
25
30
35
40
45
50
Fsco
re (
%)
Relation SubtypeRelation Type
Lapata Unsupervised Relation Extraction 28
MotivationProblem FormulationModeling Framework
Evaluation
Experimental Setup
Results
HASEGAWALDA
RELLDA
MLT ALL
CLT ALL
CLT+MLT ALL
MLT NE
CLT NE
CLT+MLT NE20
25
30
35
40
45
50
Fsco
re (
%)
Relation SubtypeRelation Type
Lapata Unsupervised Relation Extraction 28
MotivationProblem FormulationModeling Framework
Evaluation
Experimental Setup
Examples
source path dest
Republican president of SenateSenate director of YankeesHouse professor at RepublicanBush chairman of CongressDemocrat spokesman for HouseMr. Bush executive of MetsDemocrats director at U. of CaliforniaRepublican analyst at United Nations
Em
plo
ymen
t
source path dest
Yankees defeat World SeriesMets win OlympicUnited States beat World CupGiants play YankeesJets win Super BowlNets lose OlympicsKnicks sign MetsRangers victory over Giants
Sport
s
Lapata Unsupervised Relation Extraction 29
MotivationProblem FormulationModeling Framework
Evaluation
Experimental Setup
Conclusions
New model for unsupervised relation extraction
Cluster tuples into underlying semantic relations
Incorporates domain knowledge expressed in First-order Logic
Competitive results on ACE 2007
In the future...
Explore addtional rule types (e.g. seed rules)
Learn different rule weights
Exploit Freebase-like knowledge bases to learn new rules