Top Banner
Unsupervised Relation Extraction with General Domain Knowledge Mirella Lapata Institute for Language, Cognition and Computation School of Informatics University of Edinburgh [email protected] Joint Symposium on Semantic Processing, JSSP 2013 Lapata Unsupervised Relation Extraction 1
48

Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

Feb 26, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

Unsupervised Relation Extractionwith General Domain Knowledge

Mirella Lapata

Institute for Language, Cognition and ComputationSchool of Informatics

University of [email protected]

Joint Symposium on Semantic Processing, JSSP 2013

Lapata Unsupervised Relation Extraction 1

Page 2: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

Joint work with

Oier Lopez de Lacalle

Lapata Unsupervised Relation Extraction 2

Page 3: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Part I

Motivation

Lapata Unsupervised Relation Extraction 3

Page 4: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

What is Relation Extraction?

Relation Extraction identifies factual relationships between entities;is a subtask of information extraction:

identify that a person is employed by particular company

identify places in a particular region

identify what protein-protein interactions exist

Applications can usefuly employ such relations; question answering,summarization, information retrieval, knowledge base population.

Lapata Unsupervised Relation Extraction 4

Page 5: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Example: Knowledge Base Population

To create and maintain an up-to-date summary infobox of a KnowledgeBase such as Wikipedia or DBpedia.

Lapata Unsupervised Relation Extraction 5

Page 6: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Example: Knowledge Base Population

To create and maintain an up-to-date summary infobox of a KnowledgeBase such as Wikipedia or DBpedia.

Lapata Unsupervised Relation Extraction 5

Page 7: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Example: Knowledge Base Population

To create and maintain an up-to-date summary infobox of a KnowledgeBase such as Wikipedia or DBpedia.

Lapata Unsupervised Relation Extraction 5

Page 8: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Example: Knowledge Base Population

Lapata Unsupervised Relation Extraction 6

Page 9: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

More Formally

Relation extraction: to detect and characterize the semantic relationbetween two named entities.

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

bornInDate

bornInLoc

Lapata Unsupervised Relation Extraction 7

Page 10: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

More Formally

Relation extraction: to detect and characterize the semantic relationbetween two named entities.

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

bornInDate

bornInLoc

Lapata Unsupervised Relation Extraction 7

Page 11: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

More Formally

Relation extraction: to detect and characterize the semantic relationbetween two named entities.

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

bornInDate

bornInLoc

Lapata Unsupervised Relation Extraction 7

Page 12: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Supervised Relation Extraction

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

BornInDate(Mikel Laboa,June 15 1934) YESBornInLoc(Mike Laboa,Pasaia, Gipuzkoa) YESWriterOf(Arthur Conan Doyle, Sherlock Holmes) YESEmployee(Queen Elisabeth, US government) NO

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Culotta and Sorenson (2004), Surdeanu and Ciaramita (2007)High performance, costly, does not scale well!

Lapata Unsupervised Relation Extraction 8

Page 13: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Supervised Relation Extraction

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

BornInDate(Mikel Laboa,June 15 1934) YESBornInLoc(Mike Laboa,Pasaia, Gipuzkoa) YESWriterOf(Arthur Conan Doyle, Sherlock Holmes) YESEmployee(Queen Elisabeth, US government) NO

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Culotta and Sorenson (2004), Surdeanu and Ciaramita (2007)

High performance, costly, does not scale well!

Lapata Unsupervised Relation Extraction 8

Page 14: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Supervised Relation Extraction

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

BornInDate(Mikel Laboa,June 15 1934) YESBornInLoc(Mike Laboa,Pasaia, Gipuzkoa) YESWriterOf(Arthur Conan Doyle, Sherlock Holmes) YESEmployee(Queen Elisabeth, US government) NO

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Culotta and Sorenson (2004), Surdeanu and Ciaramita (2007)High performance, costly, does not scale well!

Lapata Unsupervised Relation Extraction 8

Page 15: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Weakly Supervised Relation Extraction

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

BornInDate(Mikel Laboa,June 15 1934) YESBornInLoc(Mike Laboa,Pasaia, Gipuzkoa) YESWriterOf(Arthur Conan Doyle, Sherlock Holmes) YESEmployee(Queen Elisabeth, US government) NO

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Mintz et al. (2009), Hoffmann et al. (2011), Surdeanu et al. (2012)What about relations not in the knowledge base?

Lapata Unsupervised Relation Extraction 9

Page 16: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Weakly Supervised Relation Extraction

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

BornInDate(Mikel Laboa,June 15 1934) YESBornInLoc(Mike Laboa,Pasaia, Gipuzkoa) YESWriterOf(Arthur Conan Doyle, Sherlock Holmes) YESEmployee(Queen Elisabeth, US government) NO

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Mintz et al. (2009), Hoffmann et al. (2011), Surdeanu et al. (2012)

What about relations not in the knowledge base?

Lapata Unsupervised Relation Extraction 9

Page 17: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Weakly Supervised Relation Extraction

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

BornInDate(Mikel Laboa,June 15 1934) YESBornInLoc(Mike Laboa,Pasaia, Gipuzkoa) YESWriterOf(Arthur Conan Doyle, Sherlock Holmes) YESEmployee(Queen Elisabeth, US government) NO

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Mintz et al. (2009), Hoffmann et al. (2011), Surdeanu et al. (2012)What about relations not in the knowledge base?

Lapata Unsupervised Relation Extraction 9

Page 18: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Unsupervised Relation Extraction

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

SherlockHolmes

was written bySir Arthur

Conan Doyle

Elizabeth II is the head of53-member

Commonwealthof Nations

Hasegawa et al. (2004), Banko et al. (2007) Yao et al. (2011).

Lapata Unsupervised Relation Extraction 10

Page 19: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Unsupervised Relation Extraction

Mikel Laboa was born in June 15, 1934 inPasaia,

Gipuzkoa

SherlockHolmes

was written bySir Arthur

Conan Doyle

Elizabeth II is the head of53-member

Commonwealthof Nations

Hasegawa et al. (2004), Banko et al. (2007) Yao et al. (2011).

Lapata Unsupervised Relation Extraction 10

Page 20: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Weakly Supervised Relation Extraction

C1: Mikel Laboa was born in June 15, 1934 in Pasai Gipuzkoa.

C2: Sherlock Holmes was written by Sir Arthur Conan Doyle.

C3: Elizabeth II is the head of the 53-member Commonwealth of Nations.

Hasegawa et al. (2004), Banko et al. (2007) Yao et al. (2011).

Domain-, language-independent, less accurate alternative!

Lapata Unsupervised Relation Extraction 11

Page 21: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Weakly Supervised Relation Extraction

C1: Mikel Laboa was born in June 15, 1934 in Pasai Gipuzkoa.

C2: Sherlock Holmes was written by Sir Arthur Conan Doyle.

C3: Elizabeth II is the head of the 53-member Commonwealth of Nations.

Hasegawa et al. (2004), Banko et al. (2007) Yao et al. (2011).Domain-, language-independent, less accurate alternative!

Lapata Unsupervised Relation Extraction 11

Page 22: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Our approach

Adopt unsupervised learning paradigm

Use topic model to infer relations

Impose domain knowledge (e.g., task specific constraints)

Fold-all framework (Andrzejewski et al., 2011) for relation extraction:

Relational LDA (Yao et al., 2011): represents relations via statisticsof tuples (features) accross documents

First-order logic rules for capturing domain knowledge as inMLN(Richardson and Domingos, 2006)

Lapata Unsupervised Relation Extraction 12

Page 23: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Meaning Acquisition

Our approach

Adopt unsupervised learning paradigm

Use topic model to infer relations

Impose domain knowledge (e.g., task specific constraints)

Fold-all framework (Andrzejewski et al., 2011) for relation extraction:

Relational LDA (Yao et al., 2011): represents relations via statisticsof tuples (features) accross documents

First-order logic rules for capturing domain knowledge as inMLN(Richardson and Domingos, 2006)

Lapata Unsupervised Relation Extraction 12

Page 24: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Learning setting

Input: Corpus of documents

Corpus: bag of relation tuples obtained from a dependency parser

Tuple: syntactic relationship between two named entities (NE)

Source NE: Mikel Laboa (PER)

Target NE: Pasaia, Gipuzkoa (LOC)

Dependency path: →nsubjpass→born→prep→in→pobj→

Output: assign tuples to clusters of semantic relations

Mikel Laboa was born inPasaia,

Gipuzkoa

relation#14

Lapata Unsupervised Relation Extraction 13

Page 25: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Learning setting

Input: Corpus of documents

Corpus: bag of relation tuples obtained from a dependency parser

Tuple: syntactic relationship between two named entities (NE)

Source NE: Mikel Laboa (PER)

Target NE: Pasaia, Gipuzkoa (LOC)

Dependency path: →nsubjpass→born→prep→in→pobj→

Output: assign tuples to clusters of semantic relations

Mikel Laboa was born inPasaia,

Gipuzkoa

relation#14

Lapata Unsupervised Relation Extraction 13

Page 26: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Learning setting

Input: Corpus of documents

Corpus: bag of relation tuples obtained from a dependency parser

Tuple: syntactic relationship between two named entities (NE)

Source NE: Mikel Laboa (PER)

Target NE: Pasaia, Gipuzkoa (LOC)

Dependency path: →nsubjpass→born→prep→in→pobj→

Output: assign tuples to clusters of semantic relations

Mikel Laboa was born inPasaia,

Gipuzkoa

relation#14

Lapata Unsupervised Relation Extraction 13

Page 27: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Learning setting

Input: Corpus of documents

Corpus: bag of relation tuples obtained from a dependency parser

Tuple: syntactic relationship between two named entities (NE)

Source NE: Mikel Laboa (PER)

Target NE: Pasaia, Gipuzkoa (LOC)

Dependency path: →nsubjpass→born→prep→in→pobj→

Output: assign tuples to clusters of semantic relations

Mikel Laboa was born inPasaia,

Gipuzkoa

relation#14

Lapata Unsupervised Relation Extraction 13

Page 28: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Learning setting

Input: Corpus of documents

Corpus: bag of relation tuples obtained from a dependency parser

Tuple: syntactic relationship between two named entities (NE)

Source NE: Mikel Laboa (PER)

Target NE: Pasaia, Gipuzkoa (LOC)

Dependency path: →nsubjpass→born→prep→in→pobj→

Output: assign tuples to clusters of semantic relations

Mikel Laboa was born inPasaia,

Gipuzkoa

relation#14

Lapata Unsupervised Relation Extraction 13

Page 29: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Modeling Framework

The model contains two main components:

Relational LDA: statistics on tuples (features) accross documents;captures local information

First-order logic (FOL) rules: domain knowledge as in MLN;supply global constraints and capture intuitions about NE behavior.

1 “ORG-LOC entities express the same relation”

2 “These two syntactic patterns do not express the same relation”

3 “This relation is incompatible with this name entity”

4 “The tuples are similar and should express same relation type”

Lapata Unsupervised Relation Extraction 14

Page 30: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Modeling Framework

The model contains two main components:

Relational LDA: statistics on tuples (features) accross documents;captures local information

First-order logic (FOL) rules: domain knowledge as in MLN;supply global constraints and capture intuitions about NE behavior.

1 “ORG-LOC entities express the same relation”

2 “These two syntactic patterns do not express the same relation”

3 “This relation is incompatible with this name entity”

4 “The tuples are similar and should express same relation type”

Lapata Unsupervised Relation Extraction 14

Page 31: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Relational LDA (Yao et al., 2011)

Mohammed

Zahir Shah

, said the body of the for-

mer monarch’s third son,Prince Shah Mahmood

parents

title

Mahmood who was 56, died Monday in Italy

age

date of death

country of death

The king ’s wife Homeria , who died four month ago in Italy

wife

date of death

country of death

Lapata Unsupervised Relation Extraction 15

Page 32: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Relational LDA

Documents

Relation proportions andassignments

Relations

PER ­ LOC 0.05Live in 0.03New York 0.02native 0.01...

PER ­ ORG 0.04President of 0.03Republican 0.02White House 0.01...

ORG ­ LOC 0.04Based in 0.03Division of 0.01Headquarters0.01...

PER ­ PER 0.08Wife of 0.03Wedding 0.02Husband 0.01...

Lapata Unsupervised Relation Extraction 16

Page 33: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Posterior Distribution

Documents

Relation proportions andassignments

Relations

Lapata Unsupervised Relation Extraction 17

Page 34: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Relational LDA: Graphical Model

P(z, φ, θ|α, β,d) ∝R∏r

p(φr |β)D∏j

p(θj |α)N∏i

θdi(zi )

∏k∈pi

φzi (fk)

Lapata Unsupervised Relation Extraction 18

Page 35: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Features

George Bush traveled to France on Thursday for a summit

source: first entity mention of relation tuple George Bush

dest: second entity mention of relation tuple France

nepair: type and order of entity mentions PER-LOC

path: DepPath between entity mentions [subj ,traveled,prep,to pobj ]

trigger: content words occurring in dependency path. travel

Lapata Unsupervised Relation Extraction 19

Page 36: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Adding logic to Relational LDA

Represent Relational LDA as a Markov Logic Network(Richardson and Domingos, 2006)

Encode rules and model variables in weighted FOL knowledge base.

Learn relations (φ) influenced by:

Feature-document statistics (Relational LDA)Domain knowledge rules (as in MLN)

Lapata Unsupervised Relation Extraction 20

Page 37: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Undirected Relational LDA

We represent the directed graphical model as a factor graph (MRF)

Convinient representation for introducing logic knowledge

Lapata Unsupervised Relation Extraction 21

Page 38: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Logical predicates

Define logical predicates for each relational LDA variable:

Value Predicate Description

zi = r Z(i , r) Latent relationfk = v F(k , v) feature of relation tuplepi = i P(i , fk) tuple i contains feature fkdi = j D(i , j) observed document

Z(i , r) is true if zi = r , false otherwiseSame for the rest of the predicates

Rules form a weighted knowledge base in Conjuctive Normal Form:

KB = {(λ1, ψ1), ..., (λL, ψL)}

Lapata Unsupervised Relation Extraction 22

Page 39: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Logic Rules

Must-Link Tuple: tuples that share features express the same relation

∀i , j , k : F(i ,path:is the president of) ∧ P(j , fi ) ∧ P(k, fi )

⇒ ¬Z(j , t) ∨ Z(k , r)

Cannot-Link Tuple: define rules to express incompatibilities over tuples

∀i , j , k, l : F(i ,nepair:PER-PER)

∧F(j ,nepair:ORG-LOC)

∧P(k , fi ) ∧ P(l , fj)⇒ ¬Z(k , r) ∨ Z(l , r)

Lapata Unsupervised Relation Extraction 23

Page 40: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Relational LDAFirst Order Logic and Relational LDA

Inference

Rule Grounding

The KB is tied to the probabilistic model via its groundings

G (ψl) set of ground formulas g for the rule ψl

Get the tuples indices that satisfy the rule

Alternating Optimization with Mirror Descent

Combinatiorial explosion: Continuous relaxation and randomsampling of groundings to do Stochastic Mirror Descent over latentrelation type assigments (Andrzejewski et al., 2011)

Iterate between: 1. MAP(φ, θ), 2. arg max(z), 3. MirrorDescent(zkb)

Lapata Unsupervised Relation Extraction 24

Page 41: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Experimental Setup

Experiments: Data

Training data set:

Documents from New York Times (2000-2007)

Tokenized, sentence-split, POS-tagged, NER, dependency parsed

Discard tuples containing paths longer than 10 edges

Evaluation data set:

Test on ACE 2007 RDC dataset (test set)

6 general relations and 18 subtypes

25% of ACE 2007 training set for development

Logic Rules:

extracted automatically from NYT corpus

heuristic approach based on feature co-occurrences

Lapata Unsupervised Relation Extraction 25

Page 42: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Experimental Setup

Rule examples

Must-link Tuple

nerpair:PER-PER ∧ trigger:wifenerpair:PER-LOC ∧ trigger:diepath:←nsubj←die→prep→in→pobj→source:Kobe ∧ dest:Lakers

Cannot-link Tuple

nerpair:ORG-LOC vs nerpair:PER-PERnerpair:LOC-LOC vs trigger:presidentnerpair:PER-LOC vs trigger:membernerpair:PER-PER vs trigger:sell

Lapata Unsupervised Relation Extraction 26

Page 43: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Experimental Setup

Experiments: Evaluation

Baseline models:

Hasegawa et al (2004): clustering of co-occurring named entities

LDA: estimate relations over bag-of-words

Relational LDA: without domain knowledge

Fscore measure (Agirre and Soroa, 2007)

Precision: correct members of cluster / items in cluster

Recall: correct members of cluster / items in gold-standar class

Lapata Unsupervised Relation Extraction 27

Page 44: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Experimental Setup

Results

HASEGAWALDA

RELLDA20

25

30

35

40

45

50

Fsco

re (

%)

Relation SubtypeRelation Type

Lapata Unsupervised Relation Extraction 28

Page 45: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Experimental Setup

Results

HASEGAWALDA

RELLDA

MLT ALL

CLT ALL

CLT+MLT ALL20

25

30

35

40

45

50

Fsco

re (

%)

Relation SubtypeRelation Type

Lapata Unsupervised Relation Extraction 28

Page 46: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Experimental Setup

Results

HASEGAWALDA

RELLDA

MLT ALL

CLT ALL

CLT+MLT ALL

MLT NE

CLT NE

CLT+MLT NE20

25

30

35

40

45

50

Fsco

re (

%)

Relation SubtypeRelation Type

Lapata Unsupervised Relation Extraction 28

Page 47: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Experimental Setup

Examples

source path dest

Republican president of SenateSenate director of YankeesHouse professor at RepublicanBush chairman of CongressDemocrat spokesman for HouseMr. Bush executive of MetsDemocrats director at U. of CaliforniaRepublican analyst at United Nations

Em

plo

ymen

t

source path dest

Yankees defeat World SeriesMets win OlympicUnited States beat World CupGiants play YankeesJets win Super BowlNets lose OlympicsKnicks sign MetsRangers victory over Giants

Sport

s

Lapata Unsupervised Relation Extraction 29

Page 48: Unsupervised Relation Extraction with General Domain Knowledge · 2014. 3. 25. · Mikel Laboa was born in June 15, 1934 in Pasaia, Gipuzkoa bornInDate bornInLoc LapataUnsupervised

MotivationProblem FormulationModeling Framework

Evaluation

Experimental Setup

Conclusions

New model for unsupervised relation extraction

Cluster tuples into underlying semantic relations

Incorporates domain knowledge expressed in First-order Logic

Competitive results on ACE 2007

In the future...

Explore addtional rule types (e.g. seed rules)

Learn different rule weights

Exploit Freebase-like knowledge bases to learn new rules

Generalise rules to improve their impact

Lapata Unsupervised Relation Extraction 30