Top Banner
Learning to Extract Relations from the Web using Minimal Supervision Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin [email protected] Raymond J. Mooney Machine Learning Group Department of Computer Sciences University of Texas at Austin [email protected]
44

Learning to Extract Relations from the Web using Minimal Supervision

Jun 14, 2015

Download

Documents

butest
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning to Extract Relations from the Web using Minimal Supervision

Learning to Extract Relations from the Webusing Minimal Supervision

Razvan C. BunescuMachine Learning GroupDepartment of Computer

SciencesUniversity of Texas at Austin

[email protected]

Raymond J. MooneyMachine Learning GroupDepartment of Computer

SciencesUniversity of Texas at Austin

[email protected]

Page 2: Learning to Extract Relations from the Web using Minimal Supervision

Introduction: Relation Extraction

• People are often interested in finding relations between entities:– What proteins interact with IRAK1?

– Which companies were acquired by Google?

– In which city was Mozart born?

• Relation Extraction (RE) is the task of automatically locating predefined types of relations in text documents.

2

Page 3: Learning to Extract Relations from the Web using Minimal Supervision

• Relation Examples:1) Protein Interactions:

2) Company Acquisitions:

3) People Birthplaces:

Introduction: Relation Extraction

– The phosphorylation of Pellino2 by activated IRAK1 could trigger the translocation of IRAKs from complex I to II.

– Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.

– Wolfgang Amadeus Mozart was born to Leopold and Ana Maria

Mozart, in the front room of Getreidegasse 9 in Salzburg.

3

Page 4: Learning to Extract Relations from the Web using Minimal Supervision

Motivation: Minimal Supervision

• Developing an RE system usually requires a significant amount of human effort:– Extraction patterns designed by a human expert [Blaschke et al.,

2002].– Extraction patterns learned from a corpus of manually annotated

examples [Zelenko et al., 2003; Culotta and Sorensen, 2004].

• A different RE approach:– Extraction patterns learned from weak supervision derived from a

significantly reduced amount of human supervision.

4

Page 5: Learning to Extract Relations from the Web using Minimal Supervision

Relation Extraction with Minimal Supervision

• Human supervision a handful of pairs of entities known to exhibit (+) or not exhibit (–) a particular relation.

• Weak supervision bags of sentences containing the pairs, automatically extracted from a very large corpus.

• Use bags of sentences in a Multiple Instance Learning framework [Dietterich et al., 1997] to train a relation extraction model.

5

Page 6: Learning to Extract Relations from the Web using Minimal Supervision

Types of Supervision for RE

• Single Instance Learning (SIL):– A corpus of positive and negative sentence examples, with the two

entity names annotated.– A sentence example is positive iff it explicitly asserts the target

relationship between the two annotated entities.

• Multiple Instance Learning (MIL):– A corpus of positive and negative bags of sentences.– A bag is positive iff it contains at least one positive sentence

example.

6

Page 7: Learning to Extract Relations from the Web using Minimal Supervision

RE from Web with Minimal Supervision

+/ Argument a1 Argument a2

+ Google YouTube

+ Adobe Systems Macromedia

+ Viacom DreamWorks

+ Novartis Eon Labs

Yahoo Microsoft

Pfizer Teva

Example pairs of named entities for R Corporate Acquisitions.

7

Page 8: Learning to Extract Relations from the Web using Minimal Supervision

Minimal Supervision: Positive bags

Use a search engine to extract bags of sentences containing both entities in a pair.

Google, YouTube

S1Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.

S2The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet.

. .

. .

. .

SnGoogle has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006.

8

Page 9: Learning to Extract Relations from the Web using Minimal Supervision

Minimal Supervision: Positive bags

Use a search engine to extract bags of sentences containing both entities in a pair.

Google, YouTube

S1Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.

S2The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet.

. .

. .

. .

SnGoogle has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006.

9

Page 10: Learning to Extract Relations from the Web using Minimal Supervision

Minimal Supervision: Positive bags

Use a search engine to extract bags of sentences containing both entities in a pair.

Google, YouTube

S1Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.

S2The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet.

. .

. .

. .

SnGoogle has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006.

10

Page 11: Learning to Extract Relations from the Web using Minimal Supervision

Minimal Supervision: Negative Bags

Use a search engine to extract bags of sentences containing both entities in a pair.

Yahoo, Microsoft

S1Yahoo is starting to look more like Microsoft and less like the innovative, unified service that got my loyalty in the first place.

S2Whatever it is, Yahoo is dashing in front, with Microsoft close behind.

. .

. .

. .

SnYahoo and Microsoft teamed up on October 12 to make their instant messaging software compatible.

11

Page 12: Learning to Extract Relations from the Web using Minimal Supervision

Minimal Supervision: Negative Bags

Use a search engine to extract bags of sentences containing both entities in a pair.

Yahoo, Microsoft

S1Yahoo is starting to look more like Microsoft and less like the innovative, unified service that got my loyalty in the first place.

S2Whatever it is, Yahoo is dashing in front, with Microsoft close behind.

. .

. .

. .

SnYahoo and Microsoft teamed up on October 12 to make their instant messaging software compatible.

12

Page 13: Learning to Extract Relations from the Web using Minimal Supervision

MIL Background: Domains

• Originally introduced to solve a Drug Activity prediction problem in biochemistry [Dietterich et al., 1997]– Each molecule has a limited set of low energy conformations

bags of 3D conformations.– A bag is positive is at least one of the conformations binds to a

predefined target.– MUSK dataset [Dietterich et al., 1997]

• A bag is positive if the molecule smells “musky”.

• Content Based Image Retrieval [Zhang et al., 2002]• Text categorization [Andrews et al., 03], [Ray et al., 05].

13

Page 14: Learning to Extract Relations from the Web using Minimal Supervision

MIL Background: Algorithms

• Axis Parallel Rectangles [Dietterich, 1997]

• Diverse Density [Maron, 1998]

• Multiple Instance Logistic Regression [Ray & Craven, 05]

• Multi-Instance SVM kernels of [Gartner et al., 2002]

– Normalized Set Kernel.

– Statistic Kernel.

14

Page 15: Learning to Extract Relations from the Web using Minimal Supervision

MIL for Relation Extraction

• Focus on SVM approaches– Through kernels, can work efficiently with instances that implicitly

belong to a high-dimensional feature spaces.

– Can reuse existing relation extraction kernels.

• Multi-Instance kernels of [Gartner et al., 2002] not appropriate when very few bags:– Bags (not instances) are considered as training examples.

– The number of SVs is upper bounded by the number of bags

– Very few bags very few SVs insufficient capacity.

15

Page 16: Learning to Extract Relations from the Web using Minimal Supervision

MIL for Relation Extraction

• A simple approach to MIL is to transform it into a standard supervised learning problem:– Apply the bag label to all instances inside the bag.– Train a standard supervised algorithm on the transformed dataset.– Despite class noise, obtains competitive results [Ray & Craven, 05]

Google, YouTube

S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.

S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet.

. .

. .

. .

Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006.

16

Page 17: Learning to Extract Relations from the Web using Minimal Supervision

MIL for Relation Extraction

• A simple approach to MIL is to transform it into a standard supervised learning problem:– Apply the bag label to all instances inside the bag.– Train a standard supervised algorithm on the transformed dataset.– Despite class noise, obtains competitive results [Ray & Craven, 05]

Google, YouTube

S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.

S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet.

. .

. .

. .

Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006.

17

Page 18: Learning to Extract Relations from the Web using Minimal Supervision

SVM Framework with MIL Supervision

np X Xxx

pn

X Xxx

np L

Lc

L

Lc

L

CwJ 2

2

1)(

0

,1)(

,1)(

x

nx

px

Xxbxw

Xxbxw

minimize:

subject to:

18

Page 19: Learning to Extract Relations from the Web using Minimal Supervision

SVM Framework with MIL Supervision

np X Xxx

pn

X Xxx

np L

Lc

L

Lc

L

CwJ 2

2

1)(

0

,1)(

,1)(

x

nx

px

Xxbxw

Xxbxw

minimize:

subject to:Regularization term

19

Page 20: Learning to Extract Relations from the Web using Minimal Supervision

SVM Framework with MIL Supervision

np X Xxx

pn

X Xxx

np L

Lc

L

Lc

L

CwJ 2

2

1)(

0

,1)(

,1)(

x

nx

px

Xxbxw

Xxbxw

minimize:

subject to:

Error on positive bags

20

Page 21: Learning to Extract Relations from the Web using Minimal Supervision

SVM Framework with MIL Supervision

np X Xxx

pn

X Xxx

np L

Lc

L

Lc

L

CwJ 2

2

1)(

0

,1)(

,1)(

x

nx

px

Xxbxw

Xxbxw

minimize:

subject to:

Error on negative bags

21

Page 22: Learning to Extract Relations from the Web using Minimal Supervision

SVM Framework with MIL Supervision

np X Xxx

pn

X Xxx

np L

Lc

L

Lc

L

CwJ 2

2

1)(

0

,1)(

,1)(

x

nx

px

Xxbxw

Xxbxw

minimize:

subject to:

• cp, cn > 0, cp+ cn = 1, controls the relative influence that false negative vs. false positives have on the objective function.

• want cp < 0.5 (penalize false negatives less than false positives); used cp = 0.1

22

Page 23: Learning to Extract Relations from the Web using Minimal Supervision

SVM Framework with MIL Supervision

np X Xxx

pn

X Xxx

np L

Lc

L

Lc

L

CwJ 2

2

1)(

0

,1)(

,1)(

x

nx

px

Xxbxw

Xxbxw

minimize:

subject to:

• Dual formulation kernel between bag instances K(x1,x2) (x1)(x2).

• Use SSK a subsequence kernel customized for relation extraction.

[Bunescu & Mooney, 2005]

23

Page 24: Learning to Extract Relations from the Web using Minimal Supervision

The Subsequence Kernel for Relation Extraction

• Implicit features are sequences of words anchored at the two entity names.

e1 … bought … e2 … billion … deal.

s a word sequence

Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.

g1 1 g2 3 g3 4 g4 0

x an example sentence, containing s as a subsequence

[Bunescu & Mooney, 2005].

s(x) the value of feature s in example x

0431),()( xsgapg

six

24

Page 25: Learning to Extract Relations from the Web using Minimal Supervision

The Subsequence Kernel for Relation Extraction

• K(x1,x2) (x1)(x2) the number of common “anchored” subsequences between x1 and x2, weighted by their total gap.

• Many relations require at least one content word modify kernel to optionally ignore sequences formed exclusively of stop words and punctuation signs.

• Kernel is computed efficiently by a generalized version of the dynamic programming procedure from [Lodhi et al., 2002].

[Bunescu & Mooney, 2005].

25

Page 26: Learning to Extract Relations from the Web using Minimal Supervision

Two Types of Bias

• The MIL approach to RE differs from other MIL problems in two respects:– The training dataset contains very few bags.– The bags can be very large.

• These properties lead to two types of bias:– [Type I] Combinations of words that are correlated to the two

relation arguments are given too much weight in the learned model.

– [Type II] Words specific to a particular relation instance are given too much weight.

26

Page 27: Learning to Extract Relations from the Web using Minimal Supervision

Type I Bias

Google, YouTubeS1 Search engine giant Google has bought video-sharing website YouTube

in a controversial $1.6 billion deal.

S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet.

• Overweighted Patterns:– search … e1 … video … e2

– … e1 … video … e2

– e1 … search … e2

– e1 … search … e2 … video

27

Page 28: Learning to Extract Relations from the Web using Minimal Supervision

Type II Bias

Google, YouTube

S1

Ever since Google paid $1.65 billion for YouTube in October , plenty of pundits from Mark Cuban to yours truly have been waiting for the other shoe to drop.

S2Google Gobbles Up YouTube for $1.6 BILLION October 9, 2006

S3Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006.

• Overweighted Patterns:– … e1 … for … e2 … October

– … e1 … has … e2 … October

28

Page 29: Learning to Extract Relations from the Web using Minimal Supervision

A Solution for Type I Bias

• Use the SSK approach, with new feature weight:

sw

xsgaps wx )()( ),( ),()( xsgap

s x

• Modify subsequence kernel computations to use word weights (w).

• Want small (w) for words w correlated with either of the two relation arguments.

29

Page 30: Learning to Extract Relations from the Web using Minimal Supervision

A Solution for Type I Bias: Word Weights

),(

)()..|(),()( 21

wXC

XCaXaXwPwXCw

30

Use a formula for word weights (w) that discounts the effect of correlations of w with either of the two arguments a1 and a2.

Page 31: Learning to Extract Relations from the Web using Minimal Supervision

A Solution for Type I Bias: Word Weights

),(

)()..|(),()( 21

wXC

XCaXaXwPwXCw

The # of sentences in bag X.

31

Page 32: Learning to Extract Relations from the Web using Minimal Supervision

A Solution for Type I Bias: Word Weights

),(

)()..|(),()( 21

wXC

XCaXaXwPwXCw

The # of sentences in bag X that contain word w.

32

Page 33: Learning to Extract Relations from the Web using Minimal Supervision

A Solution for Type I Bias: Word Weights

),(

)()..|(),()( 21

wXC

XCaXaXwPwXCw

The probability that the word w appears in a sentence due only to the presence of X.a1 or X.a2, assuming X.a1 and X.a2 are independent causes for w.

)).|(1()).|(1(1)..|( 2121 aXwPaXwPaXaXwP

).|().|().|().|( 2121 aXwPaXwPaXwPaXwP

• P(w|a) is the probability that w appears in a sentence due to the presence of a.• Estimate P(w|a) using counts from a separate bag of sentences containing a.

33

Page 34: Learning to Extract Relations from the Web using Minimal Supervision

MIL Relation Extraction Datasets

• Given two arguments a1 and a2, submit query string “a1 * * * * * * * a2” to Google.

• Download the resulting documents (less than 1000).

• Split text into sentences and tokenize using the OpenNLP package.

• Keep only sentences containing both a1 and a2.

• Replace closest occurrences of a1 and a2 with generic tags e1 and e2 .

34

Page 35: Learning to Extract Relations from the Web using Minimal Supervision

MIL Relation Extraction Datasets

+/ Argument a1 Argument a2 Bag size

+ Google YouTube 1375

+ Adobe Systems Macromedia 622

+ Viacom DreamWorks 323

+ Novartis Eon Labs 311 Yahoo Microsoft 163 Pfizer Teva 247

+ Pfizer Rinat Neuroscience 50 (41)

+ Yahoo Inktomi 433 (115) Google Apple 281 Viacom NBC 231

Training Pairs

Testing Pairsmanually labeledall bag sentences

Corporate Acquisitions Dataset

35

Page 36: Learning to Extract Relations from the Web using Minimal Supervision

MIL Relation Extraction Datasets

+/ Argument a1 Argument a2 Bag size

+ Franz Kafka Prague 522

+ Andre Agassi Las Vegas 386

+ Charlie Chaplin London 292

+ George Gershwin New York 260 Luc Besson New York 74 W. A. Mozart Vienna 288

+ Luc Besson Paris 126 (6)

+ Marie Antoinette Vienna 39 (10) Charlie Chaplin Hollywood 266 George Gershwin London 104

Training Pairs

PersonBirthplace Dataset

36

Testing Pairsmanually labeledall bag sentences

Page 37: Learning to Extract Relations from the Web using Minimal Supervision

Experimental Results: Systems

• [SSK-MIL] MIL formulation using the original SSK.

• [SSK-T1] MIL formulation with the SSK modified to use word weights in order to reduce Type I bias.

• [BW-MIL] MIL formulation using a bag-of-words kernel.

• [SSK-SIL] SIL formulation using the original subsequence kernel:

– Use manually labeled instances from the test bags.

– Train on instances from one positive bag and one negative bag, test on instances from the other two bags.

– Average results over all four combinations.

37

Page 38: Learning to Extract Relations from the Web using Minimal Supervision

Experimental Results: Evaluation

1) Plot Precision vs. Recall (PR) graphs:

– vary a threshold on the extraction confidence.

2) Report Area Under PR Curve (AUC).

38

Page 39: Learning to Extract Relations from the Web using Minimal Supervision

Company Acquisitions

39

Page 40: Learning to Extract Relations from the Web using Minimal Supervision

Person–Birthplace

40

Page 41: Learning to Extract Relations from the Web using Minimal Supervision

Experimental Results: AUC

• SSK-T1 is significantly more accurate than SSK-MIL.

• SSK-T1 is competitive with SSK-SIL, however:

– SSK-T1 supervision only 6 pairs (4 positive).

– SSK-SIL average supervision:

• ~500 manually labeled sentences (78 positive) for Acquisitions.

• ~300 manually labeled sentences (22 positive) for Birthplaces.

Dataset SSK-MIL SSK-T1 BW-MIL SSK-SIL

Company Acquisitions 76.9% 81.1% 45.8% 80.4%

People Birthplace 72.5% 78.2% 69.2% 73.4%

41

Page 42: Learning to Extract Relations from the Web using Minimal Supervision

Applications & Extensions

• A “Google Sets” system for relation extraction– Ideally, the user provides only positive pairs.– Likely negative examples are created by pairing the argument

entity with other named entities in the same sentence.– Any pair of entities different from the relation pair is likely to be

negative implicit negative evidence.

Google YouTube

Adobe Systems Macromedia

Viacom DreamWorks

Novartis Eon Labs

Pfizer Rinat Neuroscience

Yahoo Inktomi. .

. .

. .

Input Output42

Page 43: Learning to Extract Relations from the Web using Minimal Supervision

Future Work

• Investigate methods for reducing Type II bias.

• Experiment with other, more sophisticated MIL algorithms.

• Explore the effect of Type I and Type II bias when using dependency information in the relation extraction kernel.

43

Page 44: Learning to Extract Relations from the Web using Minimal Supervision

Conclusion

• Presented a new approach to Relation Extraction, trained using only a handful of pairs of entities known to exhibit or not exhibit the target relationship.

• Extended an existing subsequence kernel to resolve problems caused by the minimal supervision provided.

• The new MIL approach is competitive with its SIL counterpart that uses significantly more human supervision.

44