Machine Learning Methods for Analysing and Linking RDF Data Jens Lehmann September 16, 2014 Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 1 / 35
Dec 05, 2014
Machine Learning Methodsfor Analysing and Linking RDF Data
Jens Lehmann
September 16, 2014
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 1 / 35
Structured Machine Learning
How to analysestructured data?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
Structured Machine Learning
How to analysestructured data?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
Detecting Prime Patterns: Series Finder
Construct "Modus operandi" of criminals - identified 9 new crimepatterns in Cambridge MA, USA
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 3 / 35
Wang, Tong, et al. "Detecting Patterns of Crime with Series Finder." AAAI 2013.
Discovery of Laws of Physics
Background data generated using experimentsMathematical functions on input variables form hypothesis space
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 4 / 35
Schmidt, Lipson. "Distilling free-form natural laws from experimental data." Science 2009.
Protein Interaction
Rules learned via Inductive Logic Programming (ProGolem)understandable by experts and competitive with statistical learnersPossibly better drug design and reduction of side effects
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 5 / 35
Santos et al. "Automated identification of protein-ligand interaction features using InductiveLogic Programming: a hexose binding case study." BMC Bioinformatics 2012.
Background Knowledge
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 6 / 35
RDF and the Linked Data Principles
RDF Triple:
Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸
Subject
http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate
http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object
The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.
Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
RDF and the Linked Data Principles
RDF Triple:
Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸
Subject
http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate
http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object
The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.
Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
RDF and the Linked Data Principles
RDF Triple:
Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸
Subject
http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate
http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object
The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.
Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
RDF and the Linked Data Principles
RDF Triple:
Example:http://cs.ox.ac.uk/John︸ ︷︷ ︸
Subject
http://cs.ox.ac.uk/studies︸ ︷︷ ︸Predicate
http://cs.ox.ac.uk/CS︸ ︷︷ ︸Object
The term Linked Data refers to a set of best practices for publishing andinterlinking structured data on the Web.
Linked Data principles (simplified version):1 Use RDF and URLs as identifiers2 Include links to other datasets
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
OWL Ontologies
Web Ontology Language (OWL) builds on RDF and DescriptionLogics
ObjectsSpecific resources (constants)Examples: MARIA, LEIPZIG
ClassesSets of objects (unary predicates)Examples: Student, Car, Country
PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf
Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
OWL Ontologies
Web Ontology Language (OWL) builds on RDF and DescriptionLogicsObjects
Specific resources (constants)Examples: MARIA, LEIPZIG
ClassesSets of objects (unary predicates)Examples: Student, Car, Country
PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf
Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
OWL Ontologies
Web Ontology Language (OWL) builds on RDF and DescriptionLogicsObjects
Specific resources (constants)Examples: MARIA, LEIPZIG
ClassesSets of objects (unary predicates)Examples: Student, Car, Country
PropertiesConnections between objects (binary predicates)Examples: hasChild, partOf
Can be combined to complex concepts (OWL Class Expressions), e.g.:Child u ∃hasParent.Professor
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
Learning OWL Class Expressions - Definition
Given:Background Knowledge (OWL ontologies and RDF datasets)Positive and negative examples (objects in datasets)
Goal:Find OWL class expression describing positive but not negativeexamples
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 9 / 35
Application Example: Therapy Response Prediction
≈ 0.5-1% of population affected by Rheumatoid ArthritisAnti-TNF not effective for several million persons for unknown reasons
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 10 / 35
Learning OWL Class Expressions - Approaches
Least common subsumersCohen et al. "Computing least common subsumers in descriptionlogics." AAAI 1992
Terminological decision treesFanizzi et al. "Induction of concepts in web ontologies throughterminological decision trees." ECML PKDD 2010
Rule-basedFanizzi et al. "DL-FOIL concept learning in description logics." ILP2008
Genetic ProgrammingLehmann, Jens. "Hybrid learning of ontology classes." MLDM 2007
Refinement operatorsLehmann et al. "Concept learning in description logics using refinementoperators." ML 2010Iannone et al. "An algorithm based on counterfactuals for conceptlearning in the semantic web." AI 2007
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 11 / 35
Refinement Operators - Definitions
Given a DL L, consider the quasi-ordered space 〈C(L),vT 〉 overconcepts of Lρ : C(L)→ 2C(L) is a downward L refinement operator if for anyC ∈ C(L):
D ∈ ρ(C) implies D vT C
Notation: Write C ρ D instead of D ∈ ρ(C)Example refinement chain:
> ρ Person ρ Man ρ Man u ∃hasChild.>
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 12 / 35
Learning using Refinement Operators
>0,45
Cartoo weak
Person0,73
Person u ∃attends.>0,78
Person u ∃attends.Talk
0,97. . .
. . .
. . .
Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamples
Operator specialisesContinue untilterminationcriterion met
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
Learning using Refinement Operators
>0,45
Cartoo weak
Person0,73
Person u ∃attends.>0,78
Person u ∃attends.Talk
0,97. . .
. . .
. . .
Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialises
Continue untilterminationcriterion met
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
Learning using Refinement Operators
>0,45
Cartoo weak
Person0,73
Person u ∃attends.>0,78
Person u ∃attends.Talk
0,97. . .
. . .
. . .
Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialises
Continue untilterminationcriterion met
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
Learning using Refinement Operators
>0,45
Cartoo weak
Person0,73
Person u ∃attends.>0,78
Person u ∃attends.Talk
0,97. . .
. . .
. . .
Start with mostgeneral concept(top down)Heuristic evaluatesusing pos/negexamplesOperator specialisesContinue untilterminationcriterion met
=Learning Algorithm
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
Properties of Refinement Operators
An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)
Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.
C
C1 . . . . . . Cn
C
E . . .
D
C
C ≡ E
C
. . .
D ≡ E
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
Properties of Refinement Operators
An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.
Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.
C
C1 . . . . . . Cn
C
E . . .
D
C
C ≡ E
C
. . .
D ≡ E
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
Properties of Refinement Operators
An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T D
Complete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.
C
C1 . . . . . . Cn
C
E . . .
D
C
C ≡ E
C
. . .
D ≡ E
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
Properties of Refinement Operators
An L downward refinement operator ρ is calledFinite iff ρ(C) is finite for any concept C ∈ C(L)Redundant iff there exist two different ρ refinement chains from aconcept C to a concept D.Proper iff for C ,D ∈ C(L), C ρ D implies C 6≡T DComplete iff for C ,D ∈ C(L) with D @T C there is a concept E withE ≡T D and a refinement chain C ρ · · · ρ EWeakly complete iff for any concept C with C @T > we can reach aconcept E with E ≡T C from > by ρ.
C
C1 . . . . . . Cn
C
E . . .
D
C
C ≡ E
C
. . .
D ≡ EJens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
Properties of Refinement Operators
Properties indicate how suitable a refinement operator is for solvingthe learning problem:
Incomplete operators may miss solutionsRedundant operators may lead to duplicate concepts in the search treeImproper operators may produce equivalent concepts (which cover thesame examples)For infinite operators it may not be possible to compute all refinementsof a given concept
Key question: Which properties can be combined?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 15 / 35
Theorem: Properties of L Refinement Operators
Theorem
Maximum sets of combinable properties of L refinement operators forL ∈ {ALC,ALCN ,SHOIN ,SROIQ} are:
1 {weakly complete, complete, finite}2 {weakly complete, complete, proper}3 {weakly complete, non-redundant, finite}4 {weakly complete, non-redundant, proper}5 {non-redundant, finite, proper}
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 16 / 35
Foundations of Refinement Operators for Description Logics; Lehmann, Hitzler, ILP confer-ence, 2008
Concept Learning in Description Logics Using Refinement Operators, Lehmann, Hitzler, Ma-chine Learning journal, 2010
Definition of ρ
ρ(C) =
{{⊥} ∪ ρ>(C) if C = >ρ>(C) otherwise
ρB (C) =
∅ if C = ⊥{C1 t · · · t Cn | Ci ∈ MB (1 ≤ i ≤ n)} if C = >{A′ | A′ ∈ sh↓(A)} if C = A (A ∈ NC )∪{A u D | D ∈ ρB (>)}
{¬A′ | A′ ∈ sh↑(A)} if C = ¬A (A ∈ NC )∪{¬A u D | D ∈ ρB (>)}
{∃r.E | A = ar(r), E ∈ ρA(D)} if C = ∃r.D∪ {∃r.D u E | E ∈ ρB (>)}∪ {∃s.D | s ∈ sh↓(r)}
{∀r.E | A = ar(r), E ∈ ρA(D)} if C = ∀r.D∪ {∀r.D u E | E ∈ ρB (>)}∪ {∀r.⊥ |
D = A ∈ NC and sh↓(A) = ∅}∪ {∀s.D | s ∈ sh↓(r)}
{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)
{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)
∪ {(C1 t · · · t Cn) u D |D ∈ ρB (>)}
Base Operator (Excerpt)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35
Definition of ρ
ρ(C) =
{{⊥} ∪ ρ>(C) if C = >ρ>(C) otherwise
ρB (C) =
∅ if C = ⊥{C1 t · · · t Cn | Ci ∈ MB (1 ≤ i ≤ n)} if C = >{A′ | A′ ∈ sh↓(A)} if C = A (A ∈ NC )∪{A u D | D ∈ ρB (>)}
{¬A′ | A′ ∈ sh↑(A)} if C = ¬A (A ∈ NC )∪{¬A u D | D ∈ ρB (>)}
{∃r.E | A = ar(r), E ∈ ρA(D)} if C = ∃r.D∪ {∃r.D u E | E ∈ ρB (>)}∪ {∃s.D | s ∈ sh↓(r)}
{∀r.E | A = ar(r), E ∈ ρA(D)} if C = ∀r.D∪ {∀r.D u E | E ∈ ρB (>)}∪ {∀r.⊥ |
D = A ∈ NC and sh↓(A) = ∅}∪ {∀s.D | s ∈ sh↓(r)}
{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)
{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t CnD ∈ ρB (Ci ), 1 ≤ i ≤ n} (n ≥ 2)
∪ {(C1 t · · · t Cn) u D |D ∈ ρB (>)}
Base Operator (Excerpt)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35
Definition of ρ
{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}
∪ {∃s.D | s ∈ sh↓(r)}
Examples:
∃takesPartIn.SocialEvent
∃takesPartIn.Meeting
Student u ∃takesPartIn.SocialEvent
∃leads.SocialEvent
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
Definition of ρ
{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}
∪ {∃s.D | s ∈ sh↓(r)}
Examples:
∃takesPartIn.SocialEvent
∃takesPartIn.Meeting
Student u ∃takesPartIn.SocialEvent
∃leads.SocialEvent
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
Definition of ρ
{∃r .E | A = ar(r),E ∈ ρA(D)} if C = ∃r .D∪ {∃r .D u E | E ∈ ρB(>)}
∪ {∃s.D | s ∈ sh↓(r)}
Examples:
∃takesPartIn.SocialEvent
∃takesPartIn.Meeting
Student u ∃takesPartIn.SocialEvent
∃leads.SocialEvent
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
Properties of ρ
ρ↓ is completeρ↓ is infinite, e.g. there are infinitely many refinement steps of theform:
> ρ↓ C1 t C2 t C3 t . . .
ρcl↓ is properρ↓ is redundant: ∀r1.A1 t ∀r2.A1 ρ↓ ∀r1.(A1 u A2) t ∀r2.A1
ρ↓
ρ↓
∀r1.A1 t ∀r2.(A1 u A2) ρ↓ ∀r1.(A1 u A2) t ∀r2.(A1 u A2)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 19 / 35
“DL-Learner: Learning Concepts in Description Logics”,Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009
Learning using Refinement Operators
>0,47 [0]0,45 [1]
Cartoo weak
Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]
Person u ∃attends.>0,79 [4]0,78 [5]
Person u ∃attends.Talk
0,97 [4]. . .
. . .
. . .
Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
Learning using Refinement Operators
>0,47 [0]0,45 [1]
Cartoo weak
Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]
Person u ∃attends.>0,79 [4]0,78 [5]
Person u ∃attends.Talk
0,97 [4]. . .
. . .
. . .
Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
Learning using Refinement Operators
>0,47 [0]0,45 [1]
Cartoo weak
Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]
Person u ∃attends.>0,79 [4]0,78 [5]
Person u ∃attends.Talk
0,97 [4]. . .
. . .
. . .
Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
Learning using Refinement Operators
>0,47 [0]0,45 [1]
Cartoo weak
Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]
Person u ∃attends.>0,79 [4]0,78 [5]
Person u ∃attends.Talk
0,97 [4]. . .
. . .
. . .
Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
Learning using Refinement Operators
>0,47 [0]0,45 [1]
Cartoo weak
Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]
Person u ∃attends.>0,79 [4]0,78 [5]
Person u ∃attends.Talk
0,97 [4]. . .
. . .
. . .
Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
Learning using Refinement Operators
>0,47 [0]0,45 [1]
Cartoo weak
Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]
Person u ∃attends.>0,79 [4]0,78 [5]
Person u ∃attends.Talk
0,97 [4]. . .
. . .
. . .
Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
Learning using Refinement Operators
>0,47 [0]0,45 [1]
Cartoo weak
Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]
Person u ∃attends.>0,79 [4]0,78 [5]
Person u ∃attends.Talk
0,97 [4]. . .
. . .
. . .
Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
Learning using Refinement Operators
>0,47 [0]0,45 [1]
Cartoo weak
Person0,79 [0]0,78 [1]0,77 [2]0,75 [3]0,74 [4]0,73 [5]
Person u ∃attends.>0,79 [4]0,78 [5]
Person u ∃attends.Talk
0,97 [4]. . .
. . .
. . .
Redundancyeliminationtechnique withpolynomialcomplexity wrt.search tree sizeLength of childrenlimited byexpansion valueInfinite ρ applicablehe used by heuristic(Bias towards shortconcepts - Occam’sRazor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
Scalability
Refinement operator should build coherent concepts
Inference:Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)
Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)
Fragment extraction for application on large knowledge bases
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Scalability
Refinement operator should build coherent conceptsInference:
Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)
Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)
Fragment extraction for application on large knowledge bases
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Scalability
Refinement operator should build coherent conceptsInference:
Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)
Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)
Fragment extraction for application on large knowledge bases
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Scalability
Refinement operator should build coherent conceptsInference:
Complete & sound vs. approximationOpen World Assumption (OWA) vs. Closed World Assumption (CWA)
Stochastic coverage computationPick random example → perform instance check → computeconfidence interval (e.g. via Wald Method) wrt. objective function(e.g. F-measure)Up to 99% less instance checks in test examplesLow influence on accuracy shown for 380 learning tasks using 7ontologies (0, 2%± 0, 4% F-measure difference)
Fragment extraction for application on large knowledge bases
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, LorenzBühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Carcinogenesis
Goal: predict whether substance causes cancerWhy:
Each year 1000 new substances developedSubstances can often be only be validated using time consuming andexpensive experiments with mice → prioritise those with high risk
Background knowledge:Database of the US National Toxicology Program (NTP)
“Obtaining accurate structural alerts for the causes of chemical cancers isa problem of great scientific and humanitarian value.” (A. Srinivasan, R.D.King, S.H. Muggleton, M.J.E. Sternberg 1997)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 22 / 35
Knowledge Base Enrichment
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 23 / 35
Pattern Based Knowledge Base Enrichment; Lorenz Bühmann, Jens Lehmann; InternationalSemantic Web Conference (ISWC) 2013Universal OWL Axiom Enrichment for Large Knowledge Bases; Lorenz Bühmann, JensLehmann; Knowledge Engineering and Knowledge Management (EKAW) 2012
Protégé Plugin
Support for ontology creation and maintenance
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 24 / 35
Ontology Debugging: ORE
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 25 / 35
ORE - A Tool for Repairing and Enriching Knowledge Bases; Lehmann, Bühmann; Interna-tional Semantic Web Conference (ISWC) 2010
Data Quality Measurement: RDFUnit
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 26 / 35
Test-driven Evaluation of Linked Data Quality; World Wide Web Conference (WWW),ACM, 2014; Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, JensLehmann, Roland Cornelissen, Amrapali J. Zaveri
Robot Scientists Adam & Eve
Abduction to form hypothesis and ≈ 1 000 experiments per day12 new scientific discoveries regarding functions of genes in yeast
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 27 / 35
King, Ross D et al. "The automation of science." Science 324 (2009): 85-89.
Link Discovery - Motivation
Links are backbone of traditional WWW and Data WebLinks are central for data integration, deduplication, cross-ontologyquestion answering, reasoning, federated queries . . .Central problem for many large IT companies
Automated tools (LIMES, SILK) can create a high number of linksbetween RDF resources by using heuristics
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35
Link Discovery - Motivation
Links are backbone of traditional WWW and Data WebLinks are central for data integration, deduplication, cross-ontologyquestion answering, reasoning, federated queries . . .Central problem for many large IT companies
Automated tools (LIMES, SILK) can create a high number of linksbetween RDF resources by using heuristics
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35
Link Discovery - Definition
Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
Link Discovery - Definition
Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?δ = levenshtein(S.rdfs:label,T .dc:title)
δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
Link Discovery - Definition
Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
Link Discovery - Definition
Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
δ = levenshtein(S.rdfs:label,T .dc:title)
δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
Link Discovery - Definition
Definition (Link Discovery)Given sets S and T of resources and relation R (often owl:sameAs)Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
δ = levenshtein(S.rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
Example: Link Specification
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 30 / 35
Link Specification Syntax and Semantics
LS [[LS]]f (m, θ,M) {(s, t, r)|(s, t, r) ∈ M ∧ (m(s, t) ≥ θ)}LS1 u LS2 {(s, t, r) | (s, t, r1) ∈ [[L1]] ∧ (s, t, r2) ∈ [[L2]] ∧ r = min(r1, r2)}
LS1 t LS2
(s, t, r) |
r = r1 if ∃(s, t, r1) ∈ [[L1]] ∧ ¬(∃r2 : (s, t, r2) ∈ [[L2]]),
r = r2 if ∃(s, t, r2) ∈ [[L2]] ∧ ¬(∃r1 : (s, t, r1) ∈ [[L1]]),
r = max(r1, r2) if (s, t, r1) ∈ [[L1]] ∧ (s, t, r2) ∈ [[L2]].
Syntax and semantics allow to define an ordering similar tosubsumption (more specific specs generate less links)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 31 / 35
Link Specification Refinement Operator
ρ↓(LS) =
{f (m1, 1,∆) u · · · u f (mn, 1,∆) if LS = ⊥| mi ∈ SM, 1 ≤ i ≤ n, n ≤ 2|SM|}f (m, dt(θ),M) ∪ LS t f (m′, 1,M) if LS = f (m, θ,M) (atomic)(m ∈ SM,m 6= m′)LS1 u · · · u LSi−1 u LS ′ u LSi+1 u · · · u LSn if LS = LS1 u · · · u LSn(n ≥ 2)
with LS ′ ∈ ρ↓(LSi)
LS1 t · · · t LSi−1 t LS ′ t LSi+1 t · · · t LSn if LS = LS1 t · · · t LSn(n ≥ 2)
with LS ′ ∈ ρ↓(LSi) ∪ LS t f (m, 1,M)
(m ∈ SM,m not used in LS)
Upward refinement operatorPostitive: Weakly complete, finiteNegative: Not complete, redundant, not proper
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 32 / 35
Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 1.0)
t
f (edit(:socId, :socId), 0.5)f (trigrams(:name, :label), 0.5)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
Projects: DL-Learner and LIMES
DL-LearnerOpen-Source-Project: http://dl-learner.orgExtensible Platform for concept learning algorithmsSupports all RDF/OWL serialisations and major reasonersSeveral thousand downloads
LIMES (http://aksw.org/Projects/LIMES.html)Highly scalable engine (fastest RDF link discovery tool)Several machine learning approaches integrated (including the onepresented)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 34 / 35
“DL-Learner: Learning Concepts in Description Logics”,Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009
Summary & Conclusions
Many interesting applications of structured machine learning (therapyresponse prediction, disease prediction, protein folding, data qualitymeasurement, ontology debugging)Still few machine learning tools for working with RDF/OWL althoughmore and more data availableRefinement operators allow to apply supervised machine learning oncomplex background knowledgeCan be applied to other languages like link specifications
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 35 / 35