Top Banner
1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Lucy Vanderwende at Microsoft Research)
47

1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

Mar 26, 2015

Download

Documents

Joshua Kerr
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

1

Joint Inference for Knowledge Extraction from

Biomedical Literature

Hoifung PoonDept. Computer Science & Eng.

University of Washington

(Joint work with Lucy Vanderwende

at Microsoft Research)

Page 2: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

2

Outline

Motivation Bio-event extraction Our system Experimental results Conclusion

Page 3: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

3

Knowledge Extraction From Web……

WWW

Page 4: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

4

Knowledge Extraction From Web

If we succeed ……Breach knowledge acquisition bottleneckSemantic search, question answering, …

But where should we start?More urgent and/or amenableGeneral approaches

Page 5: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

5

Knowledge Extraction From Biomedical Literature

PubMed: 18 million abstracts; += 2000 / mo. Success would mean:

Revolutionize biomedical research Dramatic speed-up in drug design

Grammatical English General challenges:

Beyond traditional information extraction Complex, nested structures Naturally call for joint inference

Page 6: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

6

BioNLP: An Emerging Field

Protein name recognition Protein-protein interaction Bio-event extraction: Shared task of 2009

[Kim et al. 2009]

Pathway Network

……

Page 7: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

7

BioNLP: An Emerging Field

Protein name recognition Protein-protein interaction (top F1 ~ 60%) Bio-event extraction: Shared task of 2009

[Kim et al. 2009]

Pathway Network

……

This talk

Page 8: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

8

This Talk: Bio-Event Extraction

We present the first joint approach that achieves state-of-the-art results

Based on Markov logic [Domingos & Lowd 2009]

Novel formulation that expands the scope of joint inference

Adding a few joint inference formulasto simple logistic regression

doubles the F1

Page 9: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

9

Outline

Motivation Bio-event extraction Our system Experimental results Conclusion

Page 10: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

10

Bio-Event: State change of bio-molecules

Gene expression Transcription Protein catabolism Localization Phosphorylation Binding Regulation Positive regulation Negative regulation

Page 11: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

11

Example

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...

T1 Protein 15 29 p70(S6)-kinaseT2 Protein 44 49 IL-10T3 Protein 86 90 gp41

T4 Regulation 0 11 InvolvementT5 Positive_regulation 30 40 activationE1 Regulation:T4 Theme:E2 Cause:T3E2 Positive_regulation:T5 Theme:T1

Page 12: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

12

Why Is It Hard?

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...

Page 13: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

13

Why Is It Hard?

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...

involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

Traditional information extraction ignores this

Page 14: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

14

Why Is It Hard?

Variations in denoting same eventsE.g., negative regulation

532 inhibited, 252 inhibition, 218 inhibit, 207 blocked, 175 inhibits, 157 decreased, 156 reduced, 112 suppressed, 108 decrease, 86 inhibitor, 81 Inhibition, 68 inhibitors, 67 abolished, 66 suppress, 65 block, 63 prevented, 48 suppression, 47 blocks, 44 inhibiting, 42 loss, 39 impaired, 38 reduction, 32 down-regulated, 29 abrogated, 27 prevents, 27 attenuated, 26 repression, 26 decreases, 26 down-regulation, 25 diminished, 25 downregulated, 25 suppresses, 22 interfere, 21 absence, 21 repress ……

Page 15: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

15

Why Is It Hard?

Same word denotes different eventsE.g., appearance

“in the nucleus” Localization

“mRNA” Transcription

“IL-2 activity” Positive-regulation

……

Page 16: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

16

Participants

Page 17: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

17

Top System: UTurku

Adopts the pipeline architecture First, determines event candidates and types Then, classifies for each pair of candidates

whether the latter is a theme or cause No way to feedback information to events

given evidence of arguments Decisions are made independently

Page 18: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

18

Joint Inference for Bio-Event Extraction Complex, nested structures naturally argue

for joint inference However, under-explored for this task Previous best joint approach [Riedel et al. 2009]

still lags UTurku by a large margin

Page 19: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

19

Outline

Motivation Bio-event extraction Our system Experimental results Conclusion

Page 20: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

20

Design Desiderata

Jointly predict events and arguments Incorporate prior knowledge, e.g.,

Each event has a theme Only regulation events can have cause

Expand scope of joint inference to include individual dependency edges

Page 21: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

21

Markov Logic [Domingos & Lowd 2009]

Syntax: Weighted first-order formulas Semantics: Feature templates for Markov nets A Markov Logic Network (MLN) is a set of pairs

(Fi, wi) where Fi is a formula in first-order logic

wi is a real number

1( ) exp ( )i i

i

P x w N xZ

Number of true

groundings of Fi

Page 22: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

22

Markov Logic

Unifying framework for joint inference A plethora of efficient algorithms available Open-source implementation: Alchemyalchemy.cs.washington.edu

Page 23: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

23

Input: Stanford Dependencies

involvement

up-regulation

IL-10human

monocyte

prep_innn prep_by

gp41 p70(S6)-kinase

activation

prep_in prep_of

nn

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocyte by gp41 …

Page 24: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

24

Joint Predictions

involvement

up-regulation

IL-10human

monocyte

prep_innn prep_by

gp41 p70(S6)-kinase

activation

prep_in prep_of

nn

Trigger word?Event type?

Trigger word?Event type?

Trigger word?Event type?

Trigger word?Event type?

Trigger word?Event type?

Trigger word?Event type?

Trigger word?Event type?

Page 25: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

25

Joint Predictions

involvement

IL-10human

monocyte

prep_innn prep_by

gp41 p70(S6)-kinase

activation

prep_in prep_of

nn

In theme path?In cause path?

In theme path?In cause path?

In theme path?In cause path?

In theme path?In cause path?

In theme path?In cause path?

In theme path?In cause path?

Page 26: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

26

Why Individual Dependencies?

regulate

dobj

IL-10

regulate

dobj

protein

regulate

dobj

IL-8

IL-10 IL-10

nn conj

… regulate IL-10 … … regulate IL-10 protein … … regulate IL-8 and IL-10 …

Page 27: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

27

Why Individual Dependencies?

regulate

dobj

IL-10

regulate

dobj

protein

regulate

dobj

IL-8

IL-10 IL-10

nn conj

… regulate IL-10 … … regulate IL-10 protein … … regulate IL-8 and IL-10 …Beginning of theme paths

Page 28: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

28

Why Individual Dependencies?

regulate

dobj

IL-10

regulate

dobj

protein

regulate

dobj

IL-8

IL-10 IL-10

nn conj

… regulate IL-10 … … regulate IL-10 protein … … regulate IL-8 and IL-10 …

Continuation of a path …

Page 29: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

29

MLN For Bio-Event Extraction

Logistic regression Hard constraints Linguistically motivated joint formulas

Page 30: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

30

Logistic Regression

Lexical evidenceE.g.: “activation” probably refers to positive-regulation

Syntactic evidenceE.g.: “nsubj” probably leads to a cause

Lexical-syntactic evidenceE.g.: “nsubj” from “binds” probably leads to a theme

Page 31: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

31

Hard Constraints

EventsE.g.: Event must have a theme

Argument pathsE.g.: If edge s t is in a theme path, then

either s is an event or there is some p s in the theme path

Decisions about events and argument edges interdependent with each other

Page 32: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

32

Linguistically-Motivated Joint Formulas

Syntactic alternations, e.g.: A increases the level of B The level of B increases

Add context-specific formulaE.g., if increases signifies an event, and it has

both nsubj and dobj dependencies, then nsubj probably leads to a cause

Page 33: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

33

Correct Syntactic Error with Semantic Information

Coordination: expression of IL-8 and IL-10

expression

IL-8 IL-10

prep_of conj

expression

IL-8

IL-10

prep_of

conj

Page 34: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

34

Correct Syntactic Error with Semantic Information

PP-attachment: involvement of IL-8 in IL-10 regulation

involvement

IL-8

regulation

prep_of

prep_in

IL-10

nn

involvement

IL-8 regulation

prep_of prep_in

IL-10

nn

Page 35: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

35

Outline

Motivation Bio-event extraction Our system Experimental results Conclusion

Page 36: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

36

Dataset

BioNLP-09 Shared Task (PubMed abstracts) Training: 800 Development: 150 Test: 260

Main evaluation criteria for the task Event-level recall, precision, F1 Account for nested event structures

Page 37: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

37

Experiment Objectives

Relative contributions of feature components Identify the bottlenecks for performance Comparison with state-of-the-art systems

Page 38: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

38

Results: Development Set

25

35

45

55

F1

LR

Page 39: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

39

Results: Development Set

25

35

45

55

F1

LR LR+HARD

Add hard joint inference formulas

26

Page 40: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

40

Results: Development Set

25

35

45

55

F1

LR LR+HARD FULL

Add soft joint inference formulas

2

Page 41: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

41

Results: Development Set

25

35

45

55

F1

LR LR+HARD NO-SYN-FIXFULL

If no fixing syntactic errors

4

Page 42: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

42

Results: Development Set

25

35

45

55

F1

LR LR+HARD NO-SYN-FIX UTurkuFULL

UTurku

Page 43: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

43

Per-Type Performance

Event F1

Catabolism 92

Phosphorylation 87

Expression 77

Localization 75

Transcription 71

Binding 48

Negative-Reg. 46

Positive-Reg. 46

Regulation 37

Page 44: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

44

Per-Type Performance

Event F1 Trigger-Word F1

Catabolism 92 91

Phosphorylation 87 90

Expression 77 80

Localization 75 73

Transcription 71 70

Binding 48 71

Negative-Reg. 46 64

Positive-Reg. 46 68

Regulation 37 51

Page 45: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

45

Results: Test Set

25

35

45

55

F1

UTurku JULIELab Riedel et al. Our MLNConcordU

Reduce F1 error by over 10%Compare to previous best joint approach

Page 46: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

46

Future Work

Incorporate more features More joint inference opportunities Leverage discourse (e.g., coreference) Joint syntactic / semantic processing

Page 47: 1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

47

Conclusion

First joint approach for bio-event extraction with state-of-the-art results

Based on Markov Logic Novel formulation with expanded joint inference Correcting syntactic errors

with semantic information helps