Top Banner
Distant supervision for relation extraction without labeled data Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky ACL 2009 Introduced by Makoto Morishita
28

[Paper Introduction] Distant supervision for relation extraction without labeled data

Apr 15, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: [Paper Introduction] Distant supervision for relation extraction without labeled data

Distant supervision for relation extraction without labeled dataMike Mintz, Steven Bills, Rion Snow, Dan JurafskyACL 2009

Introduced by Makoto Morishita

Page 2: [Paper Introduction] Distant supervision for relation extraction without labeled data

Contribution of this paper

• Proposed “distant supervision” for the first time.

• By using distant supervision,we can extract the relation between entities from the sentences without annotation work.

2

Page 3: [Paper Introduction] Distant supervision for relation extraction without labeled data

Current training methods

• Supervised learning

• Unsupervised learning

• Self-training

• Active learning

3

Page 4: [Paper Introduction] Distant supervision for relation extraction without labeled data

Supervised learning

• Use only annotated data to train a model.

• Need a heavy cost to make the data.

4

Annotated data

Page 5: [Paper Introduction] Distant supervision for relation extraction without labeled data

Unsupervised learning

5

• Use only unannotated data.

• The result may not be suitable for some purposes.

Unannotated data

Page 6: [Paper Introduction] Distant supervision for relation extraction without labeled data

Self-training

6

• Use annotated data for the seed of training model, then annotate the unlabeled data by myself.

• It may be low precision and have a bias from the annotated data.

Unannotated data

Annotated data

Page 7: [Paper Introduction] Distant supervision for relation extraction without labeled data

Active learning

7

• Use existing model to evaluate what data we want to next, then annotate the selected data.

Unannotated dataAnnotated data

Evaluate

Annotate

Page 8: [Paper Introduction] Distant supervision for relation extraction without labeled data

Distant supervision

8

• We use existing database and unannotated data to train classifier, then annotate the new data.

Unannotated dataClassifier

Unannotated data

Existing database

train

train

annotate

Page 9: [Paper Introduction] Distant supervision for relation extraction without labeled data

In this paper…

Page 10: [Paper Introduction] Distant supervision for relation extraction without labeled data

10

Page 11: [Paper Introduction] Distant supervision for relation extraction without labeled data

11

Page 12: [Paper Introduction] Distant supervision for relation extraction without labeled data

What we want to do

• Extract the relation between entities from sentences.

• e.g.sentence: Kyoto, the famous place in Japan.entity: Japan, Kyotorelation: location-contains <Japan, Kyoto>

12

Page 13: [Paper Introduction] Distant supervision for relation extraction without labeled data

In this work…

13

• Freebase: 102 relations, 940k entities, 1.8M instances.

Unannotated dataClassifier

Unannotated data

Freebase

train

train

annotate

Wikipedia

Multiclass logisticregression classifier Wikipedia

Page 14: [Paper Introduction] Distant supervision for relation extraction without labeled data

Freebase

14

Page 15: [Paper Introduction] Distant supervision for relation extraction without labeled data

Training

• Find the sentence that contains two entities.- This sentence tends to express the relation.- Entities are found by a named entity tagger.

• Train classifier.- I will explain the features later.

15

Page 16: [Paper Introduction] Distant supervision for relation extraction without labeled data

Example

• Known relation:location-contains <Virginia, Richmond>location-contains <France, Nantes>

• We found the sentences like:- Richmond, the capital of Virginia.- Henry’s Edict of Nantes helped the Protestants of France.

• Train the classified using these sentences.

16

Page 17: [Paper Introduction] Distant supervision for relation extraction without labeled data

Testing

• Find the sentence that contains two entities.- This sentence tends to express the relation.- Entities are found by a named entity tagger.

• Using trained classifier, we can know these entities have a relation.

17

Page 18: [Paper Introduction] Distant supervision for relation extraction without labeled data

Features

• Lexical features:- specific words between and surrounding the two entities in the sentence.

• Syntactic features:- dependency path

18

Page 19: [Paper Introduction] Distant supervision for relation extraction without labeled data

Lexical features• The sequence of words between the two entities. • The part-of-speech tags of these words. • A flag indication which entity came first in the sentence. • A window of k words to the left of Entity 1 and their part-of-speech tags. • A window of k words to the right of Entity 2 and their part-of-speech tags.

19

Astronomer Edwin Hubble was born in Marshfield, Missouri.

Page 20: [Paper Introduction] Distant supervision for relation extraction without labeled data

Syntactic features

20

• A dependency path between the two entities. • For each entity, one “window” node that is not part of the dependency path.

Page 21: [Paper Introduction] Distant supervision for relation extraction without labeled data

Result

Page 22: [Paper Introduction] Distant supervision for relation extraction without labeled data

Trained features

22

Page 23: [Paper Introduction] Distant supervision for relation extraction without labeled data

Automatic evaluation

23

Page 24: [Paper Introduction] Distant supervision for relation extraction without labeled data

Human evaluation

24

Page 25: [Paper Introduction] Distant supervision for relation extraction without labeled data

Conclusion

• By using this method, we can extract the relation from unlabeled texts.

• By using database, the label is suit for the current database.

• Extracted relations are seemed to be accurate.

25

Page 26: [Paper Introduction] Distant supervision for relation extraction without labeled data

Example usage of distant supervision

26

Existing database Target annotation

Freebase(relation between entities)

Wikipedia sentences(find new relations)

Emoticon Tweet(annotate positive, negative)

Dependency parse tree, knowledge base

semantic parser

Page 27: [Paper Introduction] Distant supervision for relation extraction without labeled data

Comments

• Distant supervision can be useful for other tasks.- Currently, this method is used mainly for relation extraction task.

• However, it supposes that we already have a large database.

27

Page 28: [Paper Introduction] Distant supervision for relation extraction without labeled data

END