Joint Information Extraction Ph.D. Thesis Defense Qi Li Advisor: Heng Ji Computer Science Department Rensselaer Polytechnic Institute April 7th, 2015 Doctoral.

Joint Information Extraction

Ph.D. Thesis Defense

Qi Li

Advisor: Heng Ji

Computer Science Department

Rensselaer Polytechnic Institute

April 7th, 2015

Doctoral Committee

Dr. Heng Ji, Chair, RPI

Dr. James Hendler, RPI

Dr. Peter Fox, RPI

Dr. Dan Roth, UIUC

Dr. Daniel Bikel, Google

2

…, dozens of Israeli tanks advanced into the northern Gaza Strip backed by helicopters which fired at least three rockets in the Jabaliya area, Palestinian security sources said. …

advanced

fired

Owner Vehicle

Destination

Instrument

Place

Instrument

Event:Attack

Event: Transport

AFP 2003/03/05

Israeli

GPE

tanks

VEHICLE

Gaza Strip

LOCATION

helicopters

VEHICLE

Jabaliya area

LOCATION

rockets

WEAPON

Relation

Background

Entity Mention: a mention of an entity in the worldRelation: a semantic relationship between two entity

mentionsACE Relation Type

Example

Physical a town(GPE) some 50 miles south of Salzburg(GPE)

Person-Social Relatives(PER) of the dead(PER)

EMP-ORG The tire maker(ORG) still employs 1,400(PER)

Agent-Artifact Rubin Military Design, the makers(ORG) of the Kursk(VEH)

PER/ORG Affiliation

Republican(ORG) senators(PER)

GPE-Affiliation Salzburg(GPE) Red Cross officials(PER)

• Automatic Content Analysis

3

Background

Background

Event Mention: an occurrence of an event with a particular type and subtype

Event Mention Trigger (anchor): the word (or phrase) that most clearly expresses the event mention

Event Argument: the entity mentions that serve as participant or attribute to the event“In Baghdad, a cameraman died when an American tank

fired on the Palestine Hotel.”

Trigger fired

ACE Type Attack

Place Baghdad

Target Palestine Hotel

Instrument tank

• Automatic Content Analysis

4

Outline• Background• Overview• Joint Extraction Framework

o Leverage Cross-component Dependencies

• Cross-document Inferenceo Leverage Cross-doc Dependencies

• Bilingual Name Taggingo Leverage Cross-lingual Dependencies

• Conclusions & Future Directions• Related Publications

5

6

IE Hill-climbing

TextsTexts

Entity Mentions

Relations

Events

[McCallum et al., 2003; Finkel et al., 2005;Florian et al., 2006; Ji&Grishman, 2006]

[Sun2011; Jiang2007; Bunescu 2005; Zhao2005;Qian2010; Chan&Roth2011; Plank 2013]

[Ji&Grishman2008, Liao2010, Hong2011]

90 ％

50 ％

40 ％Traditional Approach

•Focused on each individual subtask

•Lack of global inference about the entire results

•End-to-End Performances are limited

o Relation: 40.8%o Event Arg: 36.6%

7

OverviewThis thesis investigates cross-component, cross-document, and cross-lingual dependencies to improve Information Extraction.

8

Cross-component Dependencies

Different components have various dependencies •Long-distance dependencies •Dependencies among multiple subtasks

9

“Since the June 4 summit in Jordan between Abbas, Sharon and George W. Bush,

Hamas has been a thorn in the side of Abbas ...”

George W. Bush Republican PartyMember of

0.91

George W. Bush HamasMember of

0.47

Cross-document Dependencies

“The list included Sheik Ahmed Yassin, Hamas’ founder and spiritual

leader, senior Hamas official Abdel Aziz Rantisi”

10

Cross-lingual Dependencies

Different languages in parallel corpora are complimentary •Resources •Patterns, features, and language phenomenon

11

• Constrained Conditional Models, ILP Inference[Roth2004; Punyakanok2005; Roth2007; Chang2012; Yang2013; Jindal&Roth2013; Cheng&Roth2013]• Our method is a single unified joint model for both

learning and inference • Re-ranking Methods

[Ji2005; Huang2002; Chen2010; McClosky2011]• Their models were separately learned• Need additional training data for re-ranking

• Probabilistic Graphical Models[Sutton2004; Poon2007; Poon2010; Kiddon2012; Wick2012; Singh2013]• Computationally expensive• Our method uses beam-search, and thus can

explore global features with low cost

Related Joint Modeling Methods

12

ContributionsI. Leverage Cross-component Dependencies

• Proposed a new representation: information networks

• The first attempt to extract entity mentions, relations and events in a single joint model

• Achieved state-of-the-art performance for each subtask

II. Leverage Cross-doc Dependencies• Our method can effectively remove incorrect IE

results without using any additional training data

III.Leverage Cross-lingual Dependencies • Modeled the task of name tagging in parallel

corpora as a single structured prediction problem

Outline• Background• OverviewI. Joint Extraction Framework


II.Cross-document Inferenceo Leverage Cross-doc Dependencies

III.Bilingual Name Taggingo Leverage Cross-lingual Dependencies

• Conclusions & Future Directions • Related Publications

13

14

“Asif Mohammed Hanif detonated explosives in Tel Aviv”

Input

GPE Tel Aviv

PER Asif Mohammed Hanif

WEA explosives

Relation: Physical

arg-1 PER Asif Mohammed Hanif

arg-2 GPE Tel Aviv

• Typical pipelined architecture

Event: Attack

trigger detonated

attacker PER Asif Mohammed Hanif

place GPE Tel Aviv

instrument

WEA explosives

Relation Extraction

Event Extraction

Entity Mention Boundaries +

Types

Motivation

Motivation

• A Closer Look at Event Pipeline

☹Components do not talk to each other• Error propagation without feedback

☹Incapable of dealing with global dependencies

15

Trigger DetectionPattern matching / Classifier

1

Argument ClassificationMaxent / SVMs Binary Classifier

2

Arg Role ClassificationMaxent / SVMs Multiclass Classifier

3

Reportability Classification

Maxent / SVMs Binary Classifier

4

EventExtractio

n

[Ji and Grishman 2008, Liao and Grishman 2010, Hong et al. 2011]

Interactions among Multiple

Components

16

• “fired” is ambiguous: Attack, End-Position, NIL … • Argument labeling can benefit event trigger decisions

American tank: Instrument (Attack)

Air defense chief: Position (End-

Position)

1) “In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel.”

2) “He has fired his air defense chief. ”

victim target

17

• Make use of arbitrary global features

• For example:

• Attack event usually has one Place

• penalize assignments with more than one Places

In Baghdad, a cameraman died when a tank fired on a hotel.

place place

In Baghdad, a cameraman died when a tank fired on a hotel.

place target

Interactions among Multiple

Components

Joint Extraction w/ Inexact Search

18

Propose a novel representation: Information NetworksJoint Extraction of Entity Mentions, Relations, and

Events

Joint Search

Algorithm“Kiichiro Toyoda founded the automaker”

search space

beam

• Jointly Construct Information Networks• Nodes: entity mentions, event triggers• Edges : relations, event argument links

1. Joint Search Algorithm• beam search

2. Evaluate Candidates: • local and global features

3. Estimate Feature Weights• structured perceptron

Joint Extraction w/ Inexact Search

19

search space exponentially largebeam

+

-

Search Algorithms• Joint search algorithm for multiple IE

componentso Beam search

• Node-step: extract triggers, entity mentions, etc.

• Edge-step: extract relation links, event-argument links.

o Token-based vs. Segment-based decoding

20

The tire maker still employs 1,400 O B-ORG L-ORG O O U-PER

The tire maker still employs 1,400

ORG PER

Token-based

vs.Segment-based

(Sarawagi & Cohen, 2004)(Zhang & Clark, 2008)

(Florian et. al., 2006)(Ratinov & Roth, 2009)

BILOU schema: B-X: beginning of X;L-X: last token of X; U-X: single token of X; O: no type

• Structured Perceptron with Beam Search

oUpdate Weights: • Perceptron Update:

• K-best MIRA (Margin Infused Relaxed Algorithm)[McDonald et. al., 2005]

Parameter Estimation

21

Beam Search

update weights

[Collins and Roark 2004, Huang et al. 2012]

input

ground-truth

prediction

Standard-update vs. Early-update

standard update: invalid update!

early update:early update

beam

[Collins and Roark 2004, Huang et al. 2012]

beam

1-best prefix z

global 1-best z

correct solution y

22

ground-truth prefix falls off beam

Token-based Search Algorithm

• Assume argument candidates are given• Decoding example (beam size = 1):

In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel.

place

placevictim

target

targetinstrumentinstrument

LOC

Die Attack O O O O O O O O O O O O

23

PER VEH FAC

Segment-based Search Algorithm

• Limitations of the Token-based decoder

o unfair to compare nodes with different boundaries

• Complete mention is biased by the model

o difficult to synchronize edge steps• (NewB-FAC YorkI-FAC) is not yet a complete mention

no link can be made at this step

24

Not parsed yet

✓

✗


• Node-step (search for entity mentions and event triggers)o propose various nodes at the current tokeno append to previous assignmentso evaluate and rank new assignments

25

ORG

PER

O

…

Asif Mohammed Hanif detonated explosives in Tel Aviv



26

PER

ORG

O

…

PER

Context Features:noun phrase

person gazetteerprevious word:

“the”…

× PER

Segment-based Feature




27

Attack

PER

O

…

Injure




28


Attack×

Append each candidate to previous prefixes

PER

ORG

PERO

…

Buffer at “hanif”


• Edge-step (search for relation/argument links)o At each sub-step, connect each new node with a

previous one by a typed edge, or NIL.

29


AttackPER WEAPON

Attacker Instrument

agent-artifact

Relation-Event Feature

Attacker Instrument

Agent-artifact


• Return the candidate with the highest model scoreas the final prediction

30


AttackPER WEAPON

attacker

O GPE

place

physical

agent-artifact

instrument

Can

did

at

es

• The maximal length of each node typeo ORG example: “Pearl River Hang Cheong Real Estate Consultants Ltd”


31

Local Features• Local Features

o Similar to the features in pipelined approacheso Only care about local decisions

32

In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel

place

targettarget

instrument

1. Trigger Word: “fired”2. Trigger POS: VDB3. Argument Word: “Baghdad”4. Dependency Path: argument prep_indiedadvcl trigger…

Global Features• Global Features

o Involve a wider range of the output structureo Ask arbitrary questions about the entire

structure

33

In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel

place

targettarget

instrument

1. does “fired” have only one Place ? 2. is “Baghdad” an argument to “died” ?3. …

Global Trigger Feature

34

“a cameraman died when an American tank fired on …”

advcl

Die

advcl

Attack

Dependency link:

Die

“when”

Attack

Context word:

advcl: adverbial clause modifier

o two triggers share the same mention as arguments

“ a cameraman died when an American tank fired on …”

35

Global Argument Feature

Die(“died”)

Attack(“fired”)

Entity(“cameramen

”)

AdvclVict

imTarget

Global Entity Mention Features

• Neighbor entity mentions should have coherent types

36

prep_from

“Barbara Starr was reporting from the Pentagon”

FACPER

prep_from

PERPER

prep_from

Positive feature

Negative feature

prep_from: prepositional modifier “from”

Global Relation Features• Dependency compatibility

o two dependent mentions should have compatible relations

37

“U.S. forces in Somalia, Haiti and Kosovo”

GPE(“Somalia”

)

GPE(“Kosovo”)

PER(“forces”)

conj_andPHYS PHYS

conj_and: conjunction by “and”

• Data Setso ACE’05 corpus: excluding informal genres cts and uno ACE’04 corpus: bnews and nwire subsets

• Evaluate the performance for each subtask and the end-to-end systems by using F1 measure

38

Data Set # sents

# mentions

# relations

# triggers

# args

ACE’05Train 7.2k 26.4k 4.7k 2.8k 4.5k

Dev 1.7k 6.4k 1.1k 0.7k 1.1k

Test 1.5k 5.4k 1.1k 0.6k 1.0k

ACE’04 6.7k 22.7 4.3k N/A

Experiments

• Results on ACE’05 With gold-standard entity mentions, values and timex

Experiments

39

Token-based Decoder

[Q. Li, H. Ji, L. Huang. ACL 2013]

• Results on ACE’05 (Li and Ji, ACL 2014)

40

End-to-end Relation Extraction

Experiments

[Q. Li, H. Ji. ACL 2014]

• Three types of loss functions in K-best MIRAo F1 Measure

o 0-1 loss

o Similar to F1 loss, but sensitive to the size of structures

41


InjurePER

Victim

ExperimentsComplete Model (Entity Mention, Relation, Event)

• Overall Performance

42

ApproachEntity Mentio

n

Relation

Event Trigger

Event Argument

Preliminary Results

Pipelined Baseline 79.5

51.6 64.4 35.7

Pipelined + Token-based

64.5 43.1

Li and Ji (2014) 80.8 52.1

Complete Joint Model

Joint w/ Avg. Perceptron

81.0 52.0 65.3 45.6

Joint w/ MIRA w/ F1 Loss

79.0 49.2 61.5 47.4

Joint w/ MIRA w/ 0-1 Loss

80.0 51.0 63.2 47.9

Joint w/ MIRA w/ Loss 3

80.7 52.8 65.2 46.8

ExperimentsComplete Model (Entity Mention, Relation, Event)

[Q. Li, H. Ji, Y. Hong, S. Li. ACL 2014]

Remaining Challenges• Capture world knowledge

o Williams picked up the child and this time, threwAttack her out the window.

o We believe that the likelihood of them usingAttack those weapons goes up.

• Disambiguate physical and non-physical eventso Sam Brownback vowed Monday to defend Kansas' ban of ... o it is still hurts me to read this. (“hurt” is not an attack here)

• Pronoun resolutiono It’s important that people know that we don’t believe in the warAttack.

o Nobody questions whether thisAttack is right or not.

• Semantic inferenceo Negotiations between Washington and Pyongyang on their nuclear

dispute have been set for April 23 in Beijing and are widely seen here as a blow to Moscow efforts to stamp authority on the region by organizing such a meeting.

38

44

This work • Provided a novel view about the whole task.• Significantly improved the end-to-end

performance.

• Is limited to single-sentence and single language.

Can we go beyond the sentence boundaries, and break the barrier of different languages?

Next: we study cross-doc dependencies and cross-lingual dependencies.





• Conclusions & Future Directions• Related Publications

45

Joint Inference for Cross-doc IE

• Input:

• Output:

46

Sentence-level IE System

A large collection of documents

Information graph

cleaned graph

Join

t Infe

rence


47

• Estimate constraints from ACE 2005 corpus using point-wise mutual information (PMI)

• : frequency of relation Li

• : co-occurrence frequency of and , for any two entities and

PMI < threshold Incompatible resultsUse as hard constraint

relation/event

entity

Pairwise

Li Lj

Person A founded Organization B

Organization B hired Person A

Person A has a Business relation with Person B

Person A has a Person-Social relation (e.g. family) with Person B

Triangle Entity

Li Lj

Person A and B are involved in Meet events

Person A and Person B are members of the same Organization C

Triangle Link

Li Lj Lk

Entity A is involved in a Transport event originated from Location B

Person C is affiliated with/member of Entity A

Person C is located in Location B

• Example of hard constraintso 34 pairwise and 16 triangle

Meet

Mem

ber Mem

ber

Democratic Party


48


• ILP inference framework

49

p(i,j) is local confidence for j-th occurrence of i-th fact

If xi is 1, i-th fact is selected, otherwise, i-th

fact is discarded


• Data Set: DARPA GALE distillation task data, 381,588 newswire

• Scoring metric: Browsing Cost (i) = the number of incorrect or redundant facts that a user must examine before finding i correct facts

50

# of correct relation in objective function

# o

f in

corr

ect

re

latio

n

# o

f re

mo

vals

Limitations: •Only took 1-best results as input•Cross-document co-reference is accomplished by string matching

[Qi Li et al. CIKM 2011]





• Conclusions & Future Work • Related Publications

51

Bilingual Name Tagging

52

• Monolingual name tagging for parallel corpora☹lack of cross-lingual consistency☺features from two languages are

complementary

o Chinese side is ambiguous between Person name and Organization name

o English side has more clues for Organization: capitalization, key word “Bank”, and “the”

BIO schema: B-X: beginning of X;I-X: within X; O: no type

Bilingual Name Tagging• Linear-chain CRFs with cross-lingual

featureso two separate classifierso propagate features from one language to the

other using word alignmento implicitly enforce bilingual consistency

53

English Features for “ 亚行” :

first word = theword=Asianword=Developmentlast word=BankCapitalized=True…


54

• Joint Bilingual CRFso factor graph representation

Monolingual bigram factorsLinear-chains on each sentence

Bilingual factorsBased on word alignmentExplicitly model the cross-language dependency

InferenceApply loopy belief propagation to do approximate inference (Wainwright et al., 2001; Sutton et al., 2007)

Bilingual Name Tagging• Training/Test Data

o 288 Chinese-English documents from Parallel Treebank

o 230 documents for training; 58 documents for test

• Evaluation Metric: bilingual name pair metrico Precision/Recall/F-measure on

bilingual name pairs

55

Type English Chinese Bilingual Pairs

GPE(Geo-political entity)

4.0k 4.0k 4.0k

Person 1.0k 1.0k 1.0k

Organization 1.5k 1.5k 1.5k

All 6.6k 6.6k 6.6k


56

• Overall Performance (Li et al., CIKM 2012)

[Q. Li et al. CIKM 2012]


57

• Bilingual name tagging improves name-aware machine translation & word alignment (Haibo Li et al., ACL 2013)o Baseline: Hierarchical Phrase-based Machine Translation

(Zheng et al., 2009)Task Metric Baseline MT Name-aware

MT

Name Translation

Weak Accuracy

66.5% 72.9%

Overall MT BLEU 35.8% 36.3%

Name-aware BLEU

36.1% 39.4%

Name Alignment

F-measure 46.0% 50.3%

[H. Li et al. ACL 2013]

Conclusions

58

• We investigated cross-component, cross-document, and cross-lingual dependencies to improve IE performance

Conclusions

59

• We investigated cross-component, cross-document, and cross-lingual dependencies to improve IE performance

1. For the first time, we formulated the problem of IE as the task of constructing information networks. We showed that performing structured learning with global features is possible and very useful to this task. Our joint framework achieved state-of-the-art in each subtask.

2. Beyond sentence-level, our cross-document inference method can effectively remove conflicting results.

3. Our bilingual name tagger significantly outperforms the traditional monolingual method. It can improve name-aware machine translation.

Future Directions• Expand Information Types

• Knowledge Acquisition for IEo Use world knowledge to guide IE (Chan & Roth 2010

etc.)

60


AttackPER WEAPON

attacker

O GPE

place

physical

agent-artifact

instrument

“A Germanwings flight 9525 crashed in the French Alps”•Germanwings: Commercial ORG•Flight 9525: Air Vehicle•New Event Types: Accident, Rescue, Evacuation etc.

Related Publications• Constructing Information Networks Using One Single Model

Qi Li, Heng Ji, Yu Hong, Sujian Li. EMNLP 2014• Incremental Joint Extraction of Entities and Relations

Qi Li, Heng Ji. ACL 2014• Joint Event Extraction via Structured Prediction with Global Features

Qi Li, Heng Ji, Liang Huang. ACL 2013

• Joint Bilingual Name Tagging for Parallel CorporaQi Li, Haibo Li, Heng Ji, Wen Wang, Jing Zheng, Fei Huang. CIKM 2012

• Name-aware Machine TranslationHaibo Li, Jing Zheng, Heng Ji, Qi Li, Wen Wang. ACL 2013

• Joint Inference for Cross-document Information ExtractionQi Li, Sam Anzaroot, Wen-Pin Lin, Xiang Li, and Heng Ji. CIKM 2011

61

36 citations

18 citations

19 citations

Results in International Evaluations

62

Task Year Ranking

KBP Temporal Slot Filling

2011 1st

KBP Event Argument Extraction

2014 3rd

Event Nugget Evaluation

2014 1st

63

Thank You

Thanks to my committee, Blender members, friends, and family

for their advice and support!

Joint Information Extraction Ph.D. Thesis Defense Qi Li Advisor: Heng Ji Computer Science Department Rensselaer Polytechnic Institute April 7th, 2015 Doctoral.

Documents

dependencies dependencies

various dependencies

conditional models

hamas founder

joint information extractionph

event mentionevent argument

overviewthis thesis

thesis defenseqi liadvisor