Top Banner
What’s Wrong With current Semantic Web Reasoning (and how to fix it)
42
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

What’s Wrong With current Semantic Web

Reasoning(and how to fix it)

Page 2: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

2

This talk (and this workshop)

• Current state of Web Reasoning?

• What's wrong with it?

• What are we going to do about it?

• LarKC: one large new effort to do something about it

Page 3: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

3

What’s wrong with current SemWeb reasoning methods

?

Page 4: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

4

Characteristics of current Semantic Web reasoning

• centralised, algorithmic, boolean, deterministic

• examples of current attempts at scaleability:– identify subsets of OWL

• OWL DL• OWL DLP • OWL Horst

– identify alternative semantics for OWL• e.g. LP-style semantics

– scaleability by muscle-power

Page 5: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

5

Scalability by muscle power

Task/Data System Mill.Stats.

Time(sec)

Speed(stat/sec)

Inference

LUBM(500), [15] Sesame’s Native Store (v. .0alpha3)

70 10 800 6 481 RDFS +/-

LUBM(600) SwiftOWLIM v0.92b

83 3 941 20 979 OWL-Horst +/-

LUBM(8000) BigOWLIM v0.92b

1 060 251 413 4 216 OWL-Horst +/-, complete

Subset of UniProt ORACLE, 10gR2

100 277 200 361 RDFS +

FOAF,DC, PRISM Jena v2.3 200 no info. RDFS +/-

Reif. LUBM + Movie & Actor DB

AllegroGraph RDFS++ Reasoner 1.2

1 000 50 580 19 771 RDFS +/-;

Page 6: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

6

Moving in the right direction: New BigOWLIM (OntoText, Sirma)

• 4 switchable inference modes (owl-max,owl-horst-,rdfs-s, optimised rdf-s, none)

• custom rules for definable semantics

• < 100ms query performance on billlion triples(but 34hrs upload)

• http://www.ontotext.com/owlim/OWLIMPres.pdf

Page 7: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

7

Why we need“something different”

Gartner (May 2007, G00148725):"By 2012,

70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies”

• Semantic Technologies at Web Scale?– 20% of 30 billion pages @ 1000 triples per page = 6 trillion triples– 30 billion and 1000 are underestimates,

imagine in 6 years from now… – data-integration and semantic search at web-scale?

• Inference will have to become distributed, heuristic, approximate, probabilisticnot centralised, algorithmic, boolean, deterministic

Page 8: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

8

Why we need“something different”• Problem: pharmaceutical R&D in early clinical

development is stagnating

(Q1Q2Q3)

FDA white paper Innovation or Stagnation (March 2004):

“developers have no choice but to use the tools of the last century to assess this century's candidate solutions.”

“industry scientists often lack cross-cutting information about anentire product area, or information about techniques that may be used in areas other than theirs”

FDA white paper Innovation or Stagnation (March 2004):

“developers have no choice but to use the tools of the last century to assess this century's candidate solutions.”

“industry scientists often lack cross-cutting information about anentire product area, or information about techniques that may be used in areas other than theirs”

“Show me any potential liver toxicity associated with the compound’s drug class, target, structure and disease.”

Show me all liver toxicity associated with the target or the pathway.

Genetics

1Q“Show me all liver toxicity associated with compounds with similar structure”

Chemistry

2Q

“Show me all liver toxicity from the public literature and internal reports that are related to the drug class, disease and patient population”LITERATURE

3Q

Current NCBI: linking but no inference

Page 9: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

9

Why we need“something different”

• Our cities face many challenges • Urban Computing is the ICT way to address them

Is public transportation where the people are?Is public transportation where the people are?

Which landmarks attract more people?Which landmarks attract more people?

Where are people concentrating?Where are people concentrating?

Where is traffic moving?Where is traffic moving?

Page 10: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

10

What’s wrong with current Semantic Web Reasoning

• Properties of current inference techniques:

Based on logic as guiding paradigm:

• Exact

• Abrupt

• Expensive

Page 11: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

11

Current inference is exact

• “yes” or “no”

• not: “allmost”, “not by a long shot”, “yes, except for a few”, etc(think of subClassof)

• This was OK, as long as ontologies were clean:– hand-crafted– well-designed– carefully populated– well maintained– etc

Page 12: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

12

Current inference is exact

But current ontologies are sloppy

(and will be increasingly so)

• made by non-experts

• made by machines:– scraping from

• file-hierarchies, • mail-folders• todo-lists & phone-books on PDA’s

– machine learing from examples

Page 13: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

13

Sloppy ontologies need sloppy inference

Page 14: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

14

Sloppy ontologies need sloppy inference

“almost subClassOf”

Page 15: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

15

Combined ontologiesneed sloppy inference

Mapping ontologies is almost always messypost-doc young-researcher

“almost equal”

Page 16: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

16

Properties of current inference techniques

Based on logic as guiding paradigm:• Exact approximate• Abrupt• Expensive

Page 17: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

17

Current inference is abrupt

• nothing……………….. yes!

we want gradual answers:

• anytime computation

– agent can decide how good is good enough(human or machine)

• deadline computation– pay for quality– load balancing

Page 18: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

18

Current inference is expensive

• approximate answers are cheap

• gradual answers are arbitrarily cheap(WYPIWYG)

Page 19: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

19

Properties of current inference techniques

Based on logic as guiding paradigm:• Exact approximate• Abrupt gradual• Expensive cheap

Page 20: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

20

What’s wrong with currentSemantic Web Reasoning

• obsession with worst-case asymptotic complexity

Page 21: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

21

Who cares about

decidability?

Decidability ≈ completeness guarantee to find an answer, or tell you it doesn’t exist,given enough run-time & memory

Sources of incompleteness: incompleteness of the input data insufficient run-time to wait for the answer

Completeness is unachievable in practice anyway, regardless of the completeness of the algorithm

Page 22: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

22

Who cares about undecidability?Undecidability

≠ always guaranteed not to find an answer

Undecidability = not always guaranteed to find an answer

Undecidability may be harmless in many cases;in all cases that matter

Page 23: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

23

Who cares about complexity?worst-case: may be exponentially

rareasymptotic

ignores constants

Page 24: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

24

What to do instead?No good framework for

“average case” complexity

2nd best:do more experimental performance profileswith realistic data

Page 25: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

25

What’s wrong with currentSemantic Web Reasoning

• obsession with worst-case asymptotic complexity– not even good framework for "average" complexity

• obsession with recall & precisionWhy we need“something different”

Page 26: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

26

Need for approximationTrade-off recall for precision or vice

versa security: prefer recall medicin: prefer precision

Trade-off both for speedLogician’s nightmare:

drop soundness & completeness!

Page 27: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

27

pre

cisi

on

(sou

ndn

ess

)

recall (completeness)

logic

IR

Semantic Web

A logician’s nightmare(Dieter Fensel)

Page 28: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

28

What’s wrong with currentSemantic Web Reasoning

• obsession with worst-case asymptotic complexity– no good framework for "average" complexity

• obsession with recall & precision– no good framework for “good enough”

• separation of reasoning and search

Page 29: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

29

Integrating Searchwith Reasoning

Search

Axioms:

a hasType b

b subClassOf c

Reasoning

Conclusion

Page 30: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

30

Summary of analysis

• Based on logic, which is strict, abrupt, expensive

• Obsession with complexity

• Obsession with traditional soundness/completeness & recall/precision

• No recognition that different use-cases need different performance trade-offs

Page 31: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

31

Page 32: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

32

Goals of LarKC

1. Scaling to infinity– by giving up soundness & completeness– by switching between reasoning and search

2. Reasoning pipeline– by plugin architecture

3. Large computing platform– by cluster computing– by wide-area distribution

Page 33: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

33

Scaling to infinity

Possible approaches• Markov Logic

(probability in the logic, judging truth of formula): – adds a learnable weight to each FOL formula, specifying a

probability distribution over Herbrand interpretations (possible worlds)

• weighted RDF Graphs (probability as a heuristic, judging relevance of formula):– weighted activation spreading (for selection),– followed by classical inference over selected subgraph

• model sampling (probability in the logic):sampling space of all truth assignments, driven by probability of model

• and others

Page 34: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

34

Goals of LarKC

1. Scaling to infinity– by giving up soundness & completeness– by switching between reasoning and search

2. Reasoning pipeline– by plugin architecture

3. Large computing platform– by cluster computing– by wide-area distribution

Page 35: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

35

What is the large Knowledge Collider?Plug-in architecture

1. Retrieve

2. Abstract

3. Select

4. Reason

5. Decide

Page 36: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

36

What isthe Large Knowledge Collider

• Integrating Reasoning and Search

• dynamic, web-scale, and open-world

• in a plugable architecture

• Combining consortium competence– IR, Cognition– ML, Ontologies– Statistics, ML,

Cognition,DB– Logic,DB,

Probabilistic Inference– Economics,

Decision Theory

Page 37: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

37

Goals of LarKC

1. Scaling to infinity– by giving up soundness & completeness– by switching between reasoning and search

2. Reasoning pipeline– by plugin architecture

3. Large computing platform– by cluster computing– by wide-area distribution

Page 38: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

38

Two parallel implementations

1. Medium-size tight cluster parallel computing

– ≈ O(102) nodes– fully available– fully reliable– (almost) no bandwidth restrictions

2. Large scale wide area distributed computing

– ≈ O(104) nodes– unpredictable, unreliable, very limited

bandwidth– Thinking@home

(SETI@home, folding@home)

Page 39: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

39

How & when will others get access to the results

• Public releases of LarKC platform

• Public APIs enabling others to develop plug-ins

• Create Early Access Group

• Encourage uptake through Challenge Tasks

• Encourage participation through Thinking@home

• World Health Org. use-case is public domain data

• Give access to best practice through contributions to W3C SWBPD, SWEO, HCLS

Page 40: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

40

Organisation Country

DERI Innsbruk Austria

AstraZenica Sweden

CEFRIEL Italy

Cycorp, Europe Slovenia

Universität Stuttgart, High Performance Computing Germany

Max Plank Psychology Germany

Ontotext Lab Bulgaria

Saltlux Korea

Siemens Germany

Sheffield United Kingdom

Vrije Universiteit Amsterdam Netherlands

Beijing University of Technology PRC

World Health Organisation: Cancer Research France

Who will build this?

Page 41: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

41

Timing

• Start in April ’08

• First prototype after 1 year

• Limited open access after 2 years

• Open access after 2.5 years

• Open API’s, competition

• First demonstrators after 2.5 years

• Run-time 3.5 years

Page 42: Whats Wrong With current Semantic Web Reasoning (and how to fix it)

42

Most important results?

“An implemented configurable platform for large scale semantic computing,

together with

a library of plug-insand APIs enabling development by others,

the practicality of which is shown in

three demonstrated deployments in medical research, drug development and urban computing using mobile data

Open to the community

come and play with us