Top Banner
1 Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1
91

Statistical Relational Learning for Knowledge Extraction from the Web

Jan 08, 2016

Download

Documents

kerryn

Statistical Relational Learning for Knowledge Extraction from the Web. Hoifung Poon Dept. of Computer Science & Eng. University of Washington. 1. “Drowning in Information, Starved for Knowledge”. WWW. 2. 2. 2. Great Vision: Knowledge Extraction from Web. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical Relational Learning  for Knowledge Extraction  from the Web

1

Statistical Relational Learning for Knowledge Extraction

from the Web

Hoifung PoonDept. of Computer Science & Eng.

University of Washington

1

Page 2: Statistical Relational Learning  for Knowledge Extraction  from the Web

22

“Drowning in Information, Starved for Knowledge”

2

WWW

2

Page 3: Statistical Relational Learning  for Knowledge Extraction  from the Web

3

Great Vision:Knowledge Extraction from Web

Also need: Knowledge representation and reasoning Close the loop: Apply knowledge to extraction

Machine reading [Etzioni et al., 2007]

Craven et al., “Learning to Construct Knowledge Bases from the World Wide Web," Artificial Intelligence, 1999.

3

Page 4: Statistical Relational Learning  for Knowledge Extraction  from the Web

44

Machine Reading: Text Knowledge

4

……

4

Page 5: Statistical Relational Learning  for Knowledge Extraction  from the Web

5

Rapidly Growing Interest

AAAI-07 Spring Symposium on Machine Reading DARPA Machine Reading Program (2009-2014) NAACL-10 Workshop on Learning By Reading Etc.

5

Page 6: Statistical Relational Learning  for Knowledge Extraction  from the Web

6

Great Impact

Scientific inquiry and commercial applications Literature-based discovery, robot scientists Question answering, semantic search Drug design, medical diagnosis Breach knowledge acquisition bottleneck for

AI and natural language understanding Automatically semantify the Web Etc.

6

Page 7: Statistical Relational Learning  for Knowledge Extraction  from the Web

7

This Talk

Statistical relational learning offers promising solutions to machine reading

Markov logic is a leading unifying framework A success story: USP

Unsupervised, end-to-end machine reading Extracts five times as many correct answers as

state of the art, with highest accuracy of 91%

7

Page 8: Statistical Relational Learning  for Knowledge Extraction  from the Web

88

USP: Question-Answer Example

Q: What does IL-2 control?

A: The DEX-mediated IkappaBalpha induction

Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells.

8

Page 9: Statistical Relational Learning  for Knowledge Extraction  from the Web

999

Overview

Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions

9

Page 10: Statistical Relational Learning  for Knowledge Extraction  from the Web

10

Key Challenges

Complexity Uncertainty Pipeline accumulates errors Supervision is scarce

10

Page 11: Statistical Relational Learning  for Knowledge Extraction  from the Web

111111

Languages Are Structural

IL-4 induces CD11B

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41......

George Walker Bush was the 43rd President of the United States.…… Bush was the eldest son of President G. H. W. Bush and Babara Bush. …….In November 1977, he met Laura Welch at a barbecue.11

governments

lm$pxtm(Hebrew: according to their families)

Page 12: Statistical Relational Learning  for Knowledge Extraction  from the Web

121212

Languages Are Structural

govern-ment-s

l-m$px-t-m(Hebrew: according to their families)

S

V NP

NP VP

IL-4 induces CD11B

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41......

involvement

up-regulation

IL-10human

monocyte

SiteTheme Cause

gp41 p70(S6)-kinase

activation

Theme Cause

Theme

George Walker Bush was the 43rd President of the United States.…… Bush was the eldest son of President G. H. W. Bush and Babara Bush. …….In November 1977, he met Laura Welch at a barbecue.12

Page 13: Statistical Relational Learning  for Knowledge Extraction  from the Web

1313

Knowledge Is Heterogeneous

IndividualsE.g.: Socrates is a man

TypesE.g.: Man is mortal

Inference rulesE.g.: Syllogism

Ontological relations

Etc.13

MAMMAL

HUMAN

ISA

FACE

EYE

ISPART

Page 14: Statistical Relational Learning  for Knowledge Extraction  from the Web

141414

Complexity

Can handle using first-order logic Trees, graphs, dependencies, hierarchies, etc.

easily expressed Inference algorithms (satisfiability testing,

theorem proving, etc.) But … logic is brittle with uncertainty

Page 15: Statistical Relational Learning  for Knowledge Extraction  from the Web

151515

G. W. Bush ………… Laura Bush ……Mrs. Bush ……

Languages Are Ambiguous

I saw the man with the telescope

I saw the man with the telescope

NP

NP ADVP

I saw the man with the telescope

Here in London, Frances Deek is a retired teacher …In the Israeli town …, Karen London says …Now London says …

London PERSON or LOCATION?

Microsoft buys Powerset

Microsoft acquires Powerset

Powerset is acquired by Microsoft Corporation

The Redmond software giant buys Powerset

Microsoft’s purchase of Powerset, …

……

Which one?

15

Page 16: Statistical Relational Learning  for Knowledge Extraction  from the Web

161616

Knowledge Has Uncertainty

We need to model correlations Our information is always incomplete Our predictions are uncertain

Page 17: Statistical Relational Learning  for Knowledge Extraction  from the Web

17

Uncertainty

Statistics provides the tools to handle this Mixture models Hidden Markov models Bayesian networks Markov random fields Maximum entropy models Conditional random fields Etc.

But … statistical models assume i.i.d. data(independently and identically distributed) objects feature vectors

Page 18: Statistical Relational Learning  for Knowledge Extraction  from the Web

18

Pipeline is Suboptimal

E.g., NLP pipeline:

Tokenization Morphology Chunking Syntax …

Accumulates and propagates errors Wanted: Joint inference

Across all processing stages Among all interdependent objects

18

Page 19: Statistical Relational Learning  for Knowledge Extraction  from the Web

191919

Supervision is Scarce

Tons of text … but most is not annotated Labeling is expensive (Cf. Penn-Treebank)

Need to leverage indirect supervision

19

Page 20: Statistical Relational Learning  for Knowledge Extraction  from the Web

20

Redundancy

Key source of indirect supervision State-of-the-art systems depend on this

E.g., TextRunner [Banko et al., 2007]

But … Web is heterogeneous: Long tail Redundancy only present in head regime

Page 21: Statistical Relational Learning  for Knowledge Extraction  from the Web

212121

Overview

Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions

21

Page 22: Statistical Relational Learning  for Knowledge Extraction  from the Web

2222

Statistical Relational Learning

Burgeoning field in machine learning Offers promising solutions for machine reading Unify statistical and logical approaches Replace pipeline with joint inference Principled framework to leverage both

direct and indirect supervision

22

Page 23: Statistical Relational Learning  for Knowledge Extraction  from the Web

2323

Machine Reading: A Vision

Challenge: Long tail

Page 24: Statistical Relational Learning  for Knowledge Extraction  from the Web

2424

Machine Reading: A Vision

Page 25: Statistical Relational Learning  for Knowledge Extraction  from the Web

252525

Challenges in Applying Statistical Relational Learning

Learning is much harder Inference becomes a crucial issue Greater complexity for user

Page 26: Statistical Relational Learning  for Knowledge Extraction  from the Web

262626

Progress to Date

Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction

[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Domingos & Lowd, 2009]

Etc.

Page 27: Statistical Relational Learning  for Knowledge Extraction  from the Web

272727

Progress to Date

Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction

[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Domingos & Lowd, 2009]

Etc.

Leading unifying framework

Page 28: Statistical Relational Learning  for Knowledge Extraction  from the Web

282828

Overview

Machine reading Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions

28

Page 29: Statistical Relational Learning  for Knowledge Extraction  from the Web

29

Markov Networks Undirected graphical models

Log-linear model:

Weight of Feature i Feature i

otherwise0

CancerSmokingif1)CancerSmoking,(1f

1 1.5w

Cancer

CoughAsthma

Smoking

iii xfw

ZxP )(exp

1)(

29

Page 30: Statistical Relational Learning  for Knowledge Extraction  from the Web

30

First-Order Logic

Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x,y)

Grounding: Replace all variables by constantsE.g.: Friends (Anna, Bob)

World (model, interpretation):Assignment of truth values to all ground predicates

30

Page 31: Statistical Relational Learning  for Knowledge Extraction  from the Web

31

Markov Logic

Intuition: Soften logical constraints Syntax: Weighted first-order formulas Semantics: Feature templates for Markov

networks A Markov Logic Network (MLN) is a set of

pairs (Fi, wi) where Fi is a formula in first-order logic

wi is a real number1

( ) exp ( )i ii

P x w n xZ

Number of true groundings

of Fi

31

Page 32: Statistical Relational Learning  for Knowledge Extraction  from the Web

32

Example: Friends & Smokers

habits. smoking similar have Friends

cancer. causes Smoking

32

Page 33: Statistical Relational Learning  for Knowledge Extraction  from the Web

33

Example: Friends & Smokers

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

33

Page 34: Statistical Relational Learning  for Knowledge Extraction  from the Web

34

Example: Friends & Smokers

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

34

Page 35: Statistical Relational Learning  for Knowledge Extraction  from the Web

35

Example: Friends & Smokers

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Two constants: Anna (A) and Bob (B)Probabilistic graphical models andfirst-order logic are special cases

35

Page 36: Statistical Relational Learning  for Knowledge Extraction  from the Web

36

MLN Algorithms:The First Three Generations

Problem First generation

Second generation

Third generation

MAP inference

Weighted satisfiability

Lazy inference

Cutting planes

Marginal inference

Gibbs sampling

MC-SAT Lifted inference

Weight learning

Pseudo-likelihood

Voted perceptron

Scaled conj. gradient

Structure learning

Inductive logic progr.

ILP + PL (etc.)

Clustering + pathfinding

36

Page 37: Statistical Relational Learning  for Knowledge Extraction  from the Web

37

Efficient Inference Logical or statistical inference already hard But … can do approximate inference

Suffice to perform well in most cases Combine ideas from both camps E.g., MC-SAT MCMC SAT solver

Can also leverage sparsity in relational domains

More: Poon & Domingos, “Sound and Efficient Inference with Probabilistic and Deterministic Dependencies”, in Proc. AAAI-2006.

37

More: Poon, Domingos & Sumner, “A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC”, in Proc. AAAI-2008.

Page 38: Statistical Relational Learning  for Knowledge Extraction  from the Web

38

Weight Learning

Probability model P(X) X: Observable in training data Maximize likelihood of observed data Regularization to prevent overfitting

Page 39: Statistical Relational Learning  for Knowledge Extraction  from the Web

393939

Weight Learning

No. of times clause i is true in data

Expected no. times clause i is true according to MLN

39

log ( ) ( ) ( )i x ii

P x n x E n xw

Gradient descent

Use MC-SAT for inference Can also leverage second-order information

[Lowd & Domingos, 2007]

Requires inference

Page 40: Statistical Relational Learning  for Knowledge Extraction  from the Web

404040

Unsupervised Learning: How?

I.I.D. learning: Sophisticated model requires more labeled data

Statistical relational learning: Sophisticated model may require less labeled data Ambiguities vary among objects Joint inference Propagate information from

unambiguous objects to ambiguous ones One formula is worth a thousand labels

Small amount of domain knowledge large-scale joint inference

40

Page 41: Statistical Relational Learning  for Knowledge Extraction  from the Web

41

Unsupervised Weight Learning

Probability model P(X,Z) X: Observed in training data Z: Hidden variables E.g., clustering with mixture models

Z: Cluster assignment X: Observed features

Maximize likelihood of observed data by summing out hidden variables Z

( , ) ( ) ( | )P X Z P Z P X Z

Page 42: Statistical Relational Learning  for Knowledge Extraction  from the Web

42

4242

| ,log ( ) ( , ) ( , )z x i x z ii

P x E n x z E n x zw

Unsupervised Weight Learning

Sum over z, conditioned on observed x

Summed over both x and z

More: Poon, Cherry, & Toutanova, “Unsupervised Morphological Segmentation with Log-Linear Models”, in Proc. NAACL-2009.

Best Paper Award42

Gradient descent

Use MC-SAT to compute both expectations May also combine with contrastive estimation

Page 43: Statistical Relational Learning  for Knowledge Extraction  from the Web

434343

Markov Logic

Unified inference and learning algorithms Can handle millions of variables, billions of features,

ten of thousands of parameters Easy-to-use software: Alchemy Many successful applications

E.g.: Information extraction, coreference resolution, semantic parsing, ontology induction

43

Page 44: Statistical Relational Learning  for Knowledge Extraction  from the Web

4444

Pipeline Joint Inference

Combine segmentation and entity resolution for information extraction

Extract complex and nested bio-events from PubMed abstracts

More: Poon & Domingos, “Joint Inference for Information Extraction”, in Proc. AAAI-2007.

More: Poon & Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, in Proc. NAACL-2010.

44

Page 45: Statistical Relational Learning  for Knowledge Extraction  from the Web

4545

Unsupervised Learning: Example

Coreference resolution: Accuracy comparable to previous supervised state of the art

More: Poon & Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008.

45

Page 46: Statistical Relational Learning  for Knowledge Extraction  from the Web

464646

Overview

Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions

46

Page 47: Statistical Relational Learning  for Knowledge Extraction  from the Web

4747

Unsupervised Semantic Parsing

USP [Poon & Domingos, EMNLP-09] First unsupervised approach for semantic parsing End-to-end machine reading system Read text, answer questions

OntoUSP USP Ontology Induction [Poon & Domingos, ACL-10]

Encoded in a few Markov logic formulas

Best Paper Award

47

Page 48: Statistical Relational Learning  for Knowledge Extraction  from the Web

484848

Semantic Parsing

Microsoft buys Powerset BUY(MICROSOFT,POWERSET)Goal

Microsoft buys PowersetMicrosoft acquires semantic search engine PowersetPowerset is acquired by Microsoft CorporationThe Redmond software giant buys PowersetMicrosoft’s purchase of Powerset, …

Challenge

48

Page 49: Statistical Relational Learning  for Knowledge Extraction  from the Web

49

Limitations of Existing Approaches

Manual grammar or supervised learning Applicable to restricted domains only For general text

Not clear what predicates and objects to use Hard to produce consistent meaning annotation

Also, often learn both syntax and semantics Fail to leverage advanced syntactic parsers Make semantic parsing harder

Page 50: Statistical Relational Learning  for Knowledge Extraction  from the Web

5050

USP: Key Idea # 1

Target predicates and objects can be learned Viewed as clusters of syntactic or lexical variations

of the same meaning

BUY(-,-)

buys, acquires, ’s purchase of, … Cluster of various expressions for acquisition

MICROSOFT

Microsoft, the Redmond software giant, … Cluster of various mentions of Microsoft

Page 51: Statistical Relational Learning  for Knowledge Extraction  from the Web

5151

USP: Key Idea # 2

Relational clustering Cluster relations with same objects

USP Recursively cluster arbitrary expressions with similar subexpressions

Microsoft buys Powerset

Microsoft acquires semantic search engine Powerset

Powerset is acquired by Microsoft Corporation

The Redmond software giant buys Powerset

Microsoft’s purchase of Powerset, …

Page 52: Statistical Relational Learning  for Knowledge Extraction  from the Web

5252

USP: Key Idea # 2

Relational clustering Cluster relations with same objects

USP Recursively cluster arbitrary expressions with similar subexpressions

Microsoft buys Powerset

Microsoft acquires semantic search engine Powerset

Powerset is acquired by Microsoft Corporation

The Redmond software giant buys Powerset

Microsoft’s purchase of Powerset, …

Cluster same forms at the atom level

Page 53: Statistical Relational Learning  for Knowledge Extraction  from the Web

5353

USP: Key Idea # 2

Relational clustering Cluster relations with same objects

USP Recursively cluster arbitrary expressions with similar subexpressions

Microsoft buys Powerset

Microsoft acquires semantic search engine Powerset

Powerset is acquired by Microsoft Corporation

The Redmond software giant buys Powerset

Microsoft’s purchase of Powerset, …

Cluster forms in composition with same forms

Page 54: Statistical Relational Learning  for Knowledge Extraction  from the Web

5454

USP: Key Idea # 2

Relational clustering Cluster relations with same objects

USP Recursively cluster arbitrary expressions with similar subexpressions

Microsoft buys Powerset

Microsoft acquires semantic search engine Powerset

Powerset is acquired by Microsoft Corporation

The Redmond software giant buys Powerset

Microsoft’s purchase of Powerset, …

Cluster forms in composition with same forms

Page 55: Statistical Relational Learning  for Knowledge Extraction  from the Web

5555

USP: Key Idea # 2

Relational clustering Cluster relations with same objects

USP Recursively cluster arbitrary expressions with similar subexpressions

Microsoft buys Powerset

Microsoft acquires semantic search engine Powerset

Powerset is acquired by Microsoft Corporation

The Redmond software giant buys Powerset

Microsoft’s purchase of Powerset, …

Cluster forms in composition with same forms

Page 56: Statistical Relational Learning  for Knowledge Extraction  from the Web

5656

USP: Key Idea # 3

Start directly from syntactic analyses Focus on translating them to semantics Leverage rapid progress in syntactic parsing Much easier than learning both

Page 57: Statistical Relational Learning  for Knowledge Extraction  from the Web

57

Joint Inference in USP

Forms canonical meaning representation by recursively clustering synonymous expressions

Text Logical form in this representation Induces ISA hierarchy among clusters and

applies hierarchical smoothing (shrinkage)

57

Page 58: Statistical Relational Learning  for Knowledge Extraction  from the Web

58

USP: System Overview

Input: Dependency trees for sentences Converts dependency trees into quasi-logical

forms (QLFs) Starts with QLF clusters at atom level Recursively builds up clusters of larger forms Output:

Probability distribution over QLF clusters and their composition

MAP semantic parses of sentences58

Page 59: Statistical Relational Learning  for Knowledge Extraction  from the Web

59

Generating Quasi-Logical Forms

buys

Microsoft Powerset

nsubj dobj

Convert each node into an unary atom

59

Page 60: Statistical Relational Learning  for Knowledge Extraction  from the Web

60

Generating Quasi-Logical Forms

nsubj dobj

n1, n2, n3 are Skolem constants

buys(n1)

Microsoft(n2) Powerset(n3)

60

Page 61: Statistical Relational Learning  for Knowledge Extraction  from the Web

61

Generating Quasi-Logical Forms

nsubj dobj

Convert each edge into a binary atom

buys(n1)

Microsoft(n2) Powerset(n3)

61

Page 62: Statistical Relational Learning  for Knowledge Extraction  from the Web

62

Generating Quasi-Logical Forms

Convert each edge into a binary atom

buys(n1)

Microsoft(n2) Powerset(n3)

nsubj(n1,n2) dobj(n1,n3)

62

Page 63: Statistical Relational Learning  for Knowledge Extraction  from the Web

63

A Semantic Parse

buys(n1)

Microsoft(n2) Powerset(n3)

nsubj(n1,n2) dobj(n1,n3)

Partition QLF into subformulas

63

Page 64: Statistical Relational Learning  for Knowledge Extraction  from the Web

64

A Semantic Parse

buys(n1)

Microsoft(n2) Powerset(n3)

nsubj(n1,n2) dobj(n1,n3)

Subformula Lambda form: Replace Skolem constant not in unary atom

with a unique lambda variable 64

Page 65: Statistical Relational Learning  for Knowledge Extraction  from the Web

65

A Semantic Parse

buys(n1)

Microsoft(n2) Powerset(n3)

λx2.nsubj(n1,x2

)

Subformula Lambda form: Replace Skolem constant not in unary atom

with a unique lambda variable

λx3.dobj(n1,x3

)

65

Page 66: Statistical Relational Learning  for Knowledge Extraction  from the Web

66

A Semantic Parse

buys(n1)

Microsoft(n2) Powerset(n3)

λx2.nsubj(n1,x2

)

Core form: No lambda variableArgument form: One lambda variable

λx3.dobj(n1,x3

)

Core form

Argument form Argument form

66

Page 67: Statistical Relational Learning  for Knowledge Extraction  from the Web

67

A Semantic Parse

buys(n1)

Microsoft(n2

)

Powerset(n3)

λx2.nsubj(n1,x2)

Assign subformula to object cluster

λx3.dobj(n1,x3) BUY

MICROSOFT

POWERSET

67

Page 68: Statistical Relational Learning  for Knowledge Extraction  from the Web

68

Object Cluster: BUY

buys(n1

)

Distribution over core forms

0.1

acquires(n1) 0.2

……

One formula in MLN

Learn weights for each pair ofcluster and core form

68

Page 69: Statistical Relational Learning  for Knowledge Extraction  from the Web

69

Object Cluster: BUY

buys(n1

)

May contain variable number of property clusters

0.1

acquires(n1) 0.2

……

BUYER

BOUGHT

PRICE

……

69

Page 70: Statistical Relational Learning  for Knowledge Extraction  from the Web

70

Property Cluster: BUYER

λx2.nsubj(n1,x2)

Distributions over argument forms, clusters, and number

0.5

0.4

……

MICROSOFT 0.2

GOOGLE 0.1

……

Zero 0.1

One 0.8

……

λx2.agent(n1,x2)

70

Three MLN formulas

Page 71: Statistical Relational Learning  for Knowledge Extraction  from the Web

7171

Probabilistic Model

71

Exponential prior on number of parameters Cluster mixtures:

Object Cluster: BUY

buys 0.1

acquires 0.4

……

Property Cluster: BUYER

0.5

0.4

MICROSOFT 0.2

GOOGLE 0.1

Zero 0.1

One 0.8

nsubj

agent

71

Page 72: Statistical Relational Learning  for Knowledge Extraction  from the Web

7272

Probabilistic Model

72

Exponential prior on number of parameters Cluster mixtures with hierarchical smoothing:

Object Cluster: BUY

buys 0.1

acquires 0.4

……

Property Cluster: BUYER

0.5

0.4

MICROSOFT 0.2

GOOGLE 0.1

Zero 0.1

One 0.8

nsubj

agent

E.g., picking MICROSOFT as BUYER argument depends not only on BUY, but also on its ISA ancestors

72

Page 73: Statistical Relational Learning  for Knowledge Extraction  from the Web

73

Abstract Lambda Form

buys(n1) λx2.nsubj(n1,x2) λx3.dobj(n1,x3)

BUYS(n1) λx2.BUYER(n1,x2) λx3.BOUGHT(n1,x3)

Final logical form is obtained via lambda reduction

73

Page 74: Statistical Relational Learning  for Knowledge Extraction  from the Web

747474

Challenge: State Space Too Large

Potential cluster number exp(token-number) Also, meaning units and clusters often small

Use combinatorial search

74

Page 75: Statistical Relational Learning  for Knowledge Extraction  from the Web

757575

Inference: Find MAP Parse

Initialize

Search Operator

Lambda reduction

induces

protein CD11B

nsubj dobj

IL-4

nn

protein

IL-4

nn

protein

IL-4

nn

75

Page 76: Statistical Relational Learning  for Knowledge Extraction  from the Web

767676

Learning: Greedily Maximize Posterior

enhances 1.0induces 1.0

MERGE COMPOSE

amino acid 1.0induces 0.2enhances 0.8

……Initialize

Search Operators enhances 1.0induces 1.0 acid 1.0amino 1.0

acid 1.0amino 1.0

76

Page 77: Statistical Relational Learning  for Knowledge Extraction  from the Web

777777

Operator: Abstract

induces 0.30.1

enhances

ISA ISA

inhibits 0.2suppresses 0.1

induces 0.6

up-regulates 0.2

INDUCE

INHIBIT

inhibits 0.4

0.2

suppresses

INHIBIT

inhibits 0.4

0.2

suppressesinduces 0.6

up-regulates 0.2

INDUCE

MERGE with

REGULATE?

Captures substantial similarities 77

Page 78: Statistical Relational Learning  for Knowledge Extraction  from the Web

787878

Experiments

Apply to machine reading:

Extract knowledge from text and answer questions Evaluation: Number of answers and accuracy GENIA dataset: 1999 Pubmed abstracts Use simple factoid questions, e.g.:

What does anti-STAT1 inhibit? What regulates MIP-1 alpha?

78

Page 79: Statistical Relational Learning  for Knowledge Extraction  from the Web

7979

Total and Correct Answers

0

100

200

300

400

500

KW-SYN TextRunner RESOLVER DIRT USP

USP extracted five times as many correct answers as TextRunner

Highest precision of 91%

79

Page 80: Statistical Relational Learning  for Knowledge Extraction  from the Web

8080

Qualitative Analysis

Resolve many nontrivial variations Argument forms that mean the same, e.g.,

expression of X X expression

X stimulates Y Y is stimulated with X Active vs. passive voices Synonymous expressions Etc.

80

Page 81: Statistical Relational Learning  for Knowledge Extraction  from the Web

8181

Clusters And Compositions

Clusters in core forms investigate, examine, evaluate, analyze, study, assay diminish, reduce, decrease, attenuate synthesis, production, secretion, release dramatically, substantially, significantly ……

Compositionsamino acid, t cell, immune response, transcription factor,

initiation site, binding site …81

Page 82: Statistical Relational Learning  for Knowledge Extraction  from the Web

8282

Question-Answer Example

Q: What does IL-2 control?

A: The DEX-mediated IkappaBalpha induction

Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells.

82

Page 83: Statistical Relational Learning  for Knowledge Extraction  from the Web

838383

Overview

Machine reading Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions

83

Page 84: Statistical Relational Learning  for Knowledge Extraction  from the Web

8484

Web-Scale Joint Inference

Challenge: Efficiently identify the relevant Key: Induce and leverage an ontology

Ontology Capture essential properties & Abstract away unimportant variations

Upper-level nodes Skip irrelevant branches Wanted: Combine the following

Probabilistic ontology induction (e.g., USP) Coarse-to-fine learning and inference

[Felzenszwalb & McAllester, 2007; Petrov, Ph.D. Thesis]

84

Page 85: Statistical Relational Learning  for Knowledge Extraction  from the Web

8585

Knowledge Reasoning

Most facts/rules are not explicitly stated “Dark matter” in the natural language universe

kale contains calcium calcium prevent osteoporosis

kale prevents osteoporosis Keys:

Induce generic reasoning patterns Incorporate reasoning in extraction

Additional sources of indirect supervision

85

Page 86: Statistical Relational Learning  for Knowledge Extraction  from the Web

8686

Harness Social Computing Bootstrap online community

Knowledge Base

86

Page 87: Statistical Relational Learning  for Knowledge Extraction  from the Web

8787

Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop“Tell me everything about dicer applied

to synapse …”

87

Knowledge Base

Page 88: Statistical Relational Learning  for Knowledge Extraction  from the Web

8888

Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop

“Your extraction from my paper is correct except for blah …”

88

Knowledge Base

Page 89: Statistical Relational Learning  for Knowledge Extraction  from the Web

8989

Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop Form positive feedback loop

89

Knowledge Base

Page 90: Statistical Relational Learning  for Knowledge Extraction  from the Web

9090

Acknowledgments

Pedro Domingos, Colin Cherry, Kristina Toutanova, Lucy Vanderwende, Oren Etzioni, Dan Weld, Matt Richardson, Parag Singla, Stanley Kok, Daniel Lowd, Marc Sumner

ARO, AFRL, ONR, DARPA, NSF

90

Page 91: Statistical Relational Learning  for Knowledge Extraction  from the Web

9191

Summary

Statistical relational learning offers promising solutions for machine reading

Markov logic provides a language for this Syntax: Weighted first-order logical formulas Semantics: Feature templates of Markov nets

Open-source software: Alchemy

A success story: USP

Three key research directions

alchemy.cs.washington.edu

alchemy.cs.washington.edu/papers/poon09

91