SSSW 2015 Bertinoro–Italy July 10, 2015 Sense Making Axel-Cyrille Ngonga Ngomo & Philippe Cudré-Mauroux
Aug 05, 2015
SSSW 2015 Bertinoro–Italy July 10, 2015
Sense Making
Axel-Cyrille Ngonga Ngomo &
Philippe Cudré-Mauroux
On Making Sense
• ½ of Computer Science is about making sense of some input data – KDD (cf. Claudia & Laura tutorial) – NLP (cf. Roberto’s talk) – Multimedia Analysis – Social Media / Big Data Analytics – Visualization – etc.
On the Menu Today
• Making Sense of Semantic Data – Making sense of SPARQL & Semantic Web predicates – Trust on Semantic Web data – Emergent Semantics
• Leveraging Semantic Data for Sense Making – Making sense of textual entities – Making sense of relational data – Making sense of webtables
Introduction
At some point in the earlytwenty-first century, all of mankindwas united in celebration. Wemarveled at our own magnificence aswe gave birth to AI.– Morpheus, The Matrix
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 2 / 52
Linked Data Web
Sense Making
Helping end users to make sense of the Semantic Web.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 4 / 52
Linked Data Web
Sense Making
Helping end users to make sense of the Semantic Web.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 4 / 52
Gaps
Language Gap
Semantic Web speaks languages that normal users do not understand
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 5 / 52
Language Gap
Problem
What does it mean?
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 6 / 52
Language Gap
Problem
What does it mean?
1 PREFIX dbo: <http :// dbpedia.org/ontology/>2 PREFIX res: <http :// dbpedia.org/resource/>3 SELECT DISTINCT ?person WHERE {4 ?person dbo:team ?sportsTeam .5 ?sportsTeam dbo:league res:Premier_League .6 ?person dbo:birthDate ?date .7 ?person dbo:birthPlace ?place .8 { ?place dbo:locatedIn res:Africa .}9 UNION
10 { ?place dbo:locatedIn res:Asia .}11 }12 ORDER BY DESC(?date)13 OFFSET 0 LIMIT 1
Give me the youngest person who plays in a Premier League
team and was born in Africa or Asia.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 8 / 52
Language Gap
Problem
What does it mean?
1 PREFIX dbo: <http :// dbpedia.org/ontology/>2 PREFIX res: <http :// dbpedia.org/resource/>3 SELECT DISTINCT ?person WHERE {4 ?person dbo:team ?sportsTeam .5 ?sportsTeam dbo:league res:Premier_League .6 ?person dbo:birthDate ?date .7 ?person dbo:birthPlace ?place .8 { ?place dbo:locatedIn res:Africa .}9 UNION
10 { ?place dbo:locatedIn res:Asia .}11 }12 ORDER BY DESC(?date)13 OFFSET 0 LIMIT 1
Give me the youngest person who plays in a Premier League
team and was born in Africa or Asia.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 8 / 52
Language Gap
Solution
Verbalization frameworks for the Semantic Web
Document planner MicroplannerRealizer
http://github.com/AKSW/SemWeb2NL
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 9 / 52
Language Gap: Triple2NL/BGP2NL
Approach
1 ρ(s p o) ⇒ poss(ρ(p),ρ(s))∧subj(BE,ρ(p))∧ dobj(BE,ρ(o))
2 ρ(s p o) ⇒ subj(ρ(p),ρ(s))∧ dobj(ρ(p),ρ(o))
1 :Momo :author :Ende
⇒ Momo’s author is Michael Ende.
2 ?x :author :Ende
⇒ ?x ’s author is Michael Ende.
3 :Momo :writtenBy :Ende
⇒ Momo was written by Michael Ende.
4 ?x :writtenBy :Ende
⇒ ?x was written by Michael Ende.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
Language Gap: Triple2NL/BGP2NL
Approach
1 ρ(s p o) ⇒ poss(ρ(p),ρ(s))∧subj(BE,ρ(p))∧ dobj(BE,ρ(o))
2 ρ(s p o) ⇒ subj(ρ(p),ρ(s))∧ dobj(ρ(p),ρ(o))
1 :Momo :author :Ende
⇒ Momo’s author is Michael Ende.
2 ?x :author :Ende
⇒ ?x ’s author is Michael Ende.
3 :Momo :writtenBy :Ende
⇒ Momo was written by Michael Ende.
4 ?x :writtenBy :Ende
⇒ ?x was written by Michael Ende.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
Language Gap: Triple2NL/BGP2NL
Approach
1 ρ(s p o) ⇒ poss(ρ(p),ρ(s))∧subj(BE,ρ(p))∧ dobj(BE,ρ(o))
2 ρ(s p o) ⇒ subj(ρ(p),ρ(s))∧ dobj(ρ(p),ρ(o))
1 :Momo :author :Ende
⇒ Momo’s author is Michael Ende.
2 ?x :author :Ende
⇒ ?x ’s author is Michael Ende.
3 :Momo :writtenBy :Ende
⇒ Momo was written by Michael Ende.
4 ?x :writtenBy :Ende
⇒ ?x was written by Michael Ende.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
Language Gap: SPARQL2NL/RDF2NL
Approach
Combination rules
1 ρ((s, p, o1).(s, p, o2))
⇒ poss(ρ(p),ρ(s))∧ subj(BE,ρ(p))∧ dobj(BE, cc(ρ(o1), ρ(o1))
?x’s author is Paul Erdos and ?x’s author is Kevin Bacon.
⇒ ?x’s authors are Paul Erdos and Kevin Bacon.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 11 / 52
Language Gap: SPARQL2NL/RDF2NL
?place is Shakespeare’s birth place or ?place is Shakespeare’s deathplace.
⇒ ?place is Shakespeare’s birth or death place.
This query retrieves values ?height such that ?height is ClaudiaSchiffer’s height.
⇒ This query retrieves Claudia Schiffer’s height.
?person’s team is ?sportsTeam. ?person’s birth date is ?date.?sportsTeam’s league is Premier League.
⇒ ?person’s team is ?sportsTeam, ?person’s birth date is ?date, and?sportsTeam’s league is Premier League.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 12 / 52
Language Gap: Evaluation
125 participants, 49 SPARQL experts, 3 tasks
94% of verbalizations were understandable
5.31± 1.08 average adequacy score
0 50 100 150 200 250
Number of Survey Answers
1
2
3
4
5
6
Adequacy
0 20 40 60 80 100 120
Number of Survey Answers
1
2
3
4
5
6
Fluency
Figure : Adequacy and fluency results in survey
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
Language Gap: Evaluation
125 participants, 49 SPARQL experts, 3 tasks
Slightly larger error with NL for expertsNon-experts enabled understand the meaning of queries
0 0,2 0,4 0,6 0,8 1 1,2 1,4
error rate
SPARQL
NL
NL (SPARQL experts)
Figure : Error rate over the three tasks
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
Language Gap: Evaluation
125 participants, 49 SPARQL experts, 3 tasks
Non-experts faster with NL than experts with SPARQLExperts faster with NL than experts with SPARQL
0 5 10 15 20
time in minutes (purple = standard deviation)
SPARQL
SPARQL (filtered)
NL
NL (filtered)
NL (SPARQL experts)
NL (SPARQL experts, filtered)
Figure : Average time needed
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
Language Gap: Challenges
Complex queries
Sacrifice adequacy for fluency
Other languages
Hybrid approach
Personalization
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 16 / 52
Gaps
Semantic Gap
Decentralized content generation
Contextualization mismatch
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 17 / 52
Semantic Gap
Problem
How do I communicate with it?
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 18 / 52
Semantic Gap
Solution
Question Answering Systems
Example:
Where did Abraham Lincoln die?
SELECT ?x WHERE {res:Abraham Lincoln dbo:deathPlace ?x .
}PowerAqua:
Triple representation:〈state/place, die, Abraham Lincoln〉Ontology mappings:〈Place, deathPlace, Abraham Lincoln〉
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 19 / 52
Semantic Gap: Mismatch
Triples do not always provide a faithful representation of the semanticstructure of the question
Thus more expressive queries cannot be answered
Example 1:
Which cities have more than three universities?
SELECT ?y WHERE {?x rdf:type dbo:University .
?x dbo:city ?y .}HAVING (COUNT(?x) > 3)
Triple representation:〈cities, more than, universities three〉
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 20 / 52
Semantic Gap: Mismatch
Triples do not always provide a faithful representation of the semanticstructure of the question
Thus more expressive queries cannot be answered
Example 2:
Who produced the most films?
SELECT ?y WHERE {?x rdf:type dbo:Film .
?x dbo:producer ?y .}ORDER BY DESC(COUNT(?x)) LIMIT 1
Triple representation:〈person/organization, produced, most films〉
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 20 / 52
Semantic Gap: Approach
To understand a user question, we need to understand:
The words
Abraham Lincoln → res:Abraham Lincoln
died in → dbo:deathPlace
The semantic structure
the most N → ODER BY DESC(COUNT(?n)) LIMIT 1
more than three N → HAVING (COUNT(?n) > 3)
Template-Based Question Answering1 Template generation: Understanding the semantic structure)2 Template instantiation: Understanding the words)
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 21 / 52
Semantic Gap: Example
Query: Who produced the most films?
1 SPARQL template:SELECT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .}ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]?p PROPERTY [produced]
2 Instantiations:
?c = <http://dbpedia.org/ontology/Film>
?p = <http://dbpedia.org/ontology/producer>
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
Semantic Gap: Example
Query: Who produced the most films?
1 SPARQL template:SELECT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .}ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]?p PROPERTY [produced]
2 Instantiations:
?c = <http://dbpedia.org/ontology/Film>
?p = <http://dbpedia.org/ontology/producer>
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
Semantic Gap: Example
Query: Who produced the most films?
1 SPARQL template:SELECT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .}ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]?p PROPERTY [produced]
2 Instantiations:
?c = <http://dbpedia.org/ontology/Film>
?p = <http://dbpedia.org/ontology/producer>
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
Semantic Gap: Architecture
Natural Language Question
Semantic Representaion
SPARQL Query
Templates
Templates with URI slots
Ranked SPARQL Queries
Answer
LOD
Entity identification
Entity and Query Ranking
Query Selection
Resourcesand Classes
SPARQL Endpoint
Type Checkingand Prominence
BOA PatternLibrary
Properties
Tagged Question
Domain Independent Lexicon
Domain Dependent Lexicon
Parsing
Corpora?
!Loading
State
Process
Uses
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 23 / 52
Semantic Gap: Template Generation
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Template Generation
1 Natural language question is taggedwith part-of-speech information.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Template Generation
2 Based on POS tags, lexical entriesare built on the fly.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Template Generation
3 These lexical entries, together withdomain-independent lexical entries,are used for parsing the question.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Template Generation
4 The resulting semanticrepresentation is translated into aSPARQL template.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Who produced the most films?
domain-independent: who, the most
domain-dependent: produced/VBD, films/NNS
SPARQL template 1:SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .}ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]?p PROPERTY [produced]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
Semantic Gap: Who produced the most films?
domain-independent: who , the most
domain-dependent: produced/VBD, films/NNS
SPARQL template 1:SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .}ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]?p PROPERTY [produced]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
Semantic Gap: Who produced the most films?
domain-independent: who, the most
domain-dependent: produced/VBD , films/NNS
SPARQL template 1:SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
Semantic Gap: Who produced the most films?
domain-independent: who, the most
domain-dependent: produced/VBD, films/NNS
SPARQL template 2:SELECT ?x WHERE {
?x ?p ?y .
}ORDER BY DESC(COUNT(?y)) LIMIT 1
?p PROPERTY [films]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
Semantic Gap: Template instantiation
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
Semantic Gap: Template instantiation
1 For resources and classes:
Identify synonyms of the label using WordNet.Retrieve entities with a label similar to the slot labelbased on string similarities (trigram, Levenshtein,substring).
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
Semantic Gap: Template instantiation
2 For property labels, the label isadditionally compared to naturallanguage expressions stored in theBOA pattern library.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
Semantic Gap: Template instantiation
3 The highest ranking entities arereturned as candidates for filling thequery slots.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
BOA
The BOA pattern library is a repository of natural languagerepresentations of Semantic Web predicates.Idea:
For each predicate P in a data repository (e.g. DBpedia), collect theset of entities S and O connected through P.
Search a text corpus (e.g. Wikipedia) for all sentences containing thelabels of S and O.
For all retrieved sentences, the natural language predicate is apotential pattern for P. The potential patterns are then scored by aneural network (e.g. according to frequency) and filtered.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 27 / 52
BOA: Example
Predicate:http://dbpedia.org/ontology/subsidiary
RDF snippet:
<http://dbpedia.org/resource/Google>
<http://dbpedia.org/ontology/subsidiary>
<http://dbpedia.org/resource/YouTube> .
<http://dbpedia.org/resource/Google> rdfs:label ‘Google’@en .
<http://dbpedia.org/resource/YouTube> rdfs:label ‘Youtube’@en .
Sentences:
Google’s acquisition of Youtube comes as online video is really startingto hit its stride.Youtube, a division of Google, is exploring a new way to get morehigh-quality clips on its site: financing amateur video creators.
Patterns:
subsidiary: S’s acquisition of Osubsidiary: O, a division of S
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 28 / 52
BOA
The use of BOA patterns allows us to match natural language expressionsand ontology concepts even if they are not string similar and not coveredby WordNet.Examples:
married to → http://dbpedia.org/ontology/spouse
was born in → http://dbpedia.org/ontology/birthPlace
graduated from → http://dbpedia.org/ontology/almaMater
write → http://dbpedia.org/ontology/author
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 29 / 52
Example: Who produced the most films?
Candidates for filling query slots:
?c CLASS [films]
<http://dbpedia.org/ontology/Film>
<http://dbpedia.org/ontology/FilmFestival>
. . .
?p PROPERTY [produced]
<http://dbpedia.org/ontology/producer>
<http://dbpedia.org/property/producer>
<http://dbpedia.org/ontology/wineProduced>
. . .
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 30 / 52
Semantic Gap: Query ranking and selection
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
Semantic Gap: Query ranking and selection
1 Every entity receives a scoreconsidering string similarity andprominence
2 The score of a query is thencomputed as the average of thescores of the entities used to fill itsslots
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
Semantic Gap: Query ranking and selection
3 In addition, type checks areperformed
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
Semantic Gap: Query ranking and selection
4 Of the remaining queries, the onewith highest score that returns aresult is chosen to retrieve ananswer.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
Example: Who produced the most films?
SELECT ?x WHERE {?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/Film> .}ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.7592425075864263
SELECT ?x WHERE {?x <http://dbpedia.org/ontology/film> ?y .
}ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.6264001353183296
SELECT ?x WHERE {?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/FilmFestival>.}ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.6012584940627768
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 32 / 52
Evaluation Setup
Question set: 39 DBpedia training questions from QALD-1
5 could not be parsed due to unknown syntactic constructions oruncovered domain-independent expressions
19 were answered exactly as required by the benchmark (withprecision and recall 1.0)
Another 2 are answered almost correctly (with precision and recallgreater than 0.8)
Mean precision: 0.61Mean recall: 0.63F-measure: 0.62
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 33 / 52
Main Sources of Error
Incorrect templatesTemplate structure does not coincide with structure of the data:
When did Germany join the EU?res:Germany dbp:accessioneudate ?x .
Predicate detection fails
inhabitants 9 dbp:population, dbp:populationTotalowns 9 dbo:keyPerson
higher 9 dbp:elevationM
Wrong query is selected
Who wrote The pillars of the Earth?res:The Pillars of the Earth (TV Miniseries) dbo:writer ?x .
res:The Pillars of the Earth dbo:author ?x .
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 34 / 52
Language Gap: Challenges
Schema-agnostic QA
Query Ranking
Relation Extraction
Ontology Lexicalization
Extraction of surface forms
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 35 / 52
Justification Gap
Problem
Are you sure? Prove it to me.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 36 / 52
Justification Gap
Solution
Gathering natural-language evidence?
http://aksw.org/Projects/defacto
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 37 / 52
Justification Gap: Automatic Query Generation
Solution
Gathering natural-language evidence?
⇓
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 38 / 52
Justification Gap: Evidence Generation
(s, p, o) = “ρ(s)” “ρ(p)” “ρ(s)”
:Momo :author :Ende1 “Momo” “author” “Michael Ende”2 “Momo” “written by” “Michael Ende”3 “Momo” “book by” “Michael Ende”
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 39 / 52
Justification Gap: Proof Scoring
Combination of features including1 Score of BOA pattern2 Token distance3 Total occurrence of resource labels4 Similarity to title
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 40 / 52
Justification Gap: Trustworthiness
Combination of features including1 Topic majority on the Web2 Topic majority in results3 Topic terms4 Page rank
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 41 / 52
Justification Gap: Fact Confirmation
Combination of features including1 Combined trustworthiness and proof score2 Number of proofs3 Total hit count4 Domain/Range
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 42 / 52
Justification Gap: Evaluation
10 triples/property
Top-60 most used properties
473 from 600 manually verified to be true
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 43 / 52
Justification Gap: Evaluation
J48 is overall best classifier (78.8% - 87.6%)
Easiest data set: random
Mixed dataset hardest
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 44 / 52
Summary
Language Gap
Semantic Gap
Justification Gap
Access Gap
Data Gap
Noise Gap
. . .
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 49 / 52
The End
Thank you! Questions?
Axel Ngongahttp://aksw.org/AxelNgonga
AKSW Research GroupUniversity of Leipzig, Germany@akswgroup
@NgongaAxel
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 52 / 52
The Semantics of the Semantic Web
• A priori: top-down semantics – Logical assertions – Crisp reuse of conceptualization
• In practice: hybrid bottom-up/top-down approach – (Human/software) agents are sloppy/ignorant – Agents do not agree (for various reasons) => Centralized view on decentralized construct ?
Semantic Grounding
The meaning of symbols can be explained by its semantic correspondences to other symbols alone [“Understanding understanding” Rapaport 93]
• Type 1 semantics: understanding in terms of something else • Problem: how to ground semantics?
• Type 2 semantics: understanding something in terms of itself
• “syntactic semantics”: grounding through recursive understanding
Emergent Semantics
Emergent Semantics: • Semantics as a posteriori agreement on conceptualizations
=> Don’t believe / enforce the schema ! • Semantics of symbols as recursive correspondences to other
symbols • Analyzing transitive closures of mappings
• Self-organizing, bottom-up approach • Global semantics (stable states) emerging from multiple
local interactions • Syntactic semantics
• Studying semantics from a syntactic perspective
3 Concrete Examples
1. Emergence of Semantic Interoperability 2. Entity disambiguation using same-as networks 3. A posteriori schema for LOD properties
• How many links do you need to make a semantic network interoperable?
• Semantic interoperability as an emergent property!
⇒ Connectivity indicator: ci = ∑j,k (jk-j(bc+cc)-k) pjk • Necessary condition for semantic interoperability in the
large: ci ≥ 0
Semantic Connectivity
Philippe Cudré-Mauroux, Karl Aberer: A Necessary Condition for Semantic Interoperability in the Large. CoopIS/DOA/ODBASE 2004: 859-872.
Graph-Based Disambiguation
• The great thing about unique identifiers is that there are so many to choose from – URI jungle – Disambiguation based on transitive closures on equality links
Philippe Cudré-Mauroux, Parisa Haghani, Michael Jost, Karl Aberer, Hermann de Meer: idMesh: graph-based disambiguation of linked data. WWW 2009: 591-600.
A Posteriori Schema
• Instance data use schema constructs in creative ways!
⇒ Retro-engineering of schema constructs based on the
deployment of instance data ⇒ Context-dependent, retro-compatible
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux: Fixing the Domain and Range of Properties in Linked Data by Context Disambiguation. LDOW 2015.
• Tons of research opportunities in this field • Understanding the emergent properties of LOD
networks (and how to exploit them) • Analyzing the deployment / use of semantic data (a
priori VS a posteriori views) • Capturing user disagreement (e.g., multi-views
ontologies, fuzzy ontologies, results diversification)
Research Directions
Volume ■ amount of data
Velocity
■ speed of data in and out
Variety ■ range of data types and sources
[Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization"
Opportunity: The 3-Vs of Big Data
Information Management
• The story so far: – Strict separation between unstructured and structured
data management infrastructures
DBMS
JDBC
SQL
Inverted Index
Keywords
HTTP
Information Integration
• Information integration is still one of the biggest CS problem out there (according to many e.g., Gartner)
• Information integration typically requires some sort of mediation 1. Unstructured Data: keywords, synsets 2. Structured Data: global schema, transitive closure of
schemas (mostly syntactic)
⇒ nightmarish if 1 and 2 taken separately, horror marathon if considered together
Entities as Mediation • Rising paradigm
– Store information at the entity granularity – Integrate information by inter-linking entities
• Advantages? – Coarser granularity compared to keywords
• More natural, e.g., brain functions similarly (or is it the other way around?)
– Denormalized information compared to RDBMSs • Schema-later, heterogeneity, sparsity • Pre-computed joins, “Semantic” linking
• Drawbacks?
Exposing Textual Data
• The XI Pipeline
• Runs on massive amounts of data (Spark)
Mention Extraction NER Entity
Linking Entity Typing
Named Entity Recognition (NER)
Text extraction
(Apache Tika)
List of extractedn-grams
n-gramIndexing
foreach
Candidate Selection
List of selected n-grams
SupervisedClassi!er
Ranked list of
n-grams
Lemmatization
n+1 gramsmerging
Feature extractionFeature
extractionFeatures
POS Tagging
frequency reweighting
Roman Prokofyev, Gianluca Demartini, Philippe Cudré-Mauroux: Effective named entity recognition for idiosyncratic web collections. WWW 2014: 397-408
Entity Linking
• Linking entities to text is an old problem… – … and is extremely hard, esp. for machines
• Dozens of approaches have been suggested
• What if – We want to combine approaches / frameworks? – We want to leverage both human computations &
algorithms?
ZenCrowd
• Integrate textual data w/ the Web of Data • Uses sets of algorithmic matchers to match
entities to online concepts • Uses dynamic templating to create micro-
matching-tasks and publish them on MTurk • Combines both algorithmic and human matchers
using probabilistic networks
Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudré-Mauroux: ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. WWW 2012: 469-478
ZenCrowd Architecture
Micro Matching
Tasks
HTMLPages
HTML+ RDFaPages
LOD Open Data Cloud
CrowdsourcingPlatform
ZenCrowdEntity
Extractors
LOD Index Get Entity
Input Output
Probabilistic Network
Decision Engine
Micr
o-Ta
sk M
anag
er
Workers Decisions
AlgorithmicMatchers
Probabilistic Inference
• Probabilistic network to integrate a priori & a posteriori information – Agreement of good turkers & algorithms
• Learning process – Constraints
• Unicity • Equality (SameAs)
– Giant probabilistic graph • Instantiated selectively
w1 w2
l1 l2
pw1( ) pw2( )
lf1( ) lf2( )
pl1( ) pl2( )
l3
lf3( )
pl3( )
c11 c22c12c21 c13 c23
u2-3( )sa1-2( )
Does it Work?
• Improves avg. prec. by 0.14 on average! – Minimal crowd involvement – Embarrassingly parallel problem
Top$US$Worker$
0$
0.5$
1$
0$ 250$ 500$
Worker&P
recision
&
Number&of&Tasks&
US$Workers$
IN$Workers$
0.6$0.62$0.64$0.66$0.68$0.7$
0.72$0.74$0.76$0.78$0.8$
1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$
Precision)
Top)K)workers)
Entity Typing
• Entities can have many types (facets) • Which fine-grained types are most relevant given the context?
Thing
American Billionaires
People from King County
People from Sea:le
Windows People
Agent
Person Living People
American People of Sco@sh Descent
Harvard University People
American Computer
Programmers
American Philanthropists
TRank • Fine-grained Typing
• Tree of 447’260 types • Rooted on <owl:Thing> • Depth of 19
• Ranks relevant types by analyzing the context • Textual context • Graph context • Decision trees • Linear regression
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer: TRank: Ranking Entity Types Using the Web of Data. ISWC 2013: 640-656.
Exposing Relational Data • Mapping language file
describes the relation between ontology and RDB
• Server provides HTML and linked data views and a SPARQL 1.1 endpoint
• Rewriting engine uses map-pings to rewrite Jena & Sesame API calls to SQL queries and generates RDF dumps in various formats
http://d2rq.org/ , http://aksw.org/Projects/Sparqlify.html , etc.
Exposing Webtables • Wealth of data in (HTML) tables • Yet another type of content to expose
Sreeram Balakrishnan, Alon Y. Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu: Applying WebTables in Practice. CIDR 2015 Tao, Cui, and David W. Embley. "Automatic hidden-web table interpretation, conceptualization, and semantic annotation." Data & Knowledge Engineering 68.7 (2009): 683-703.
Application 1: Enterprise Search
• How can end-users reach entities? ⇒ Structured search ⇒ Keyword search
• On their names or attributes – Obviously not ideal
• BM25 on TREC 2011 AOR: MAP=0.15, P@10=0.20 • Query extension, query completion or pseudo-relevance
feedback yield comparable (or worse) results
Hybrid Entity Search
The Descendants
TheDescendants
type
titleGeorgeClooney
George Clooney
nameMay 6, 1961
dateOfBirth
type
ShaileneW
Shailene Woodley
name
Nov. 15, 1991
dateOfBirth
type
playsIn
playsIn
• Main idea: combine unstructured and structured search – Inverted index to locate first candidates – Graph queries to refine the results
• Graph traversals (queries on object properties) • Graph neighborhoods (queries on
data type properties)
Inverted Index
Keywords
HTTP
DBMS
SPARQL
Architecture
LOD Cloud
index()User
Query Annotation and Expansion
Inverted Index
RDF Store
Ranking FunctionsRanking
FunctionsRanking Functions
query()
Entity SearchKeyword Query
intermediatetop-k resultsGraph-Enriched
Results
Graph Traversals(queries on object
properties)
Neighborhoods(queries on datatype
properties)
Structured Inverted Index
WordNet
3rd partysearch engines
Final Ranking Function
Pseudo-Relevance Feedback
Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and structured search for ad-hoc object retrieval. SIGIR 2012: 125-134
Application 3: Co-Reference Resolution
• Better co-reference resolution through the knowledge base
Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz, Djellel Eddine Difallah, and Philippe Cudre-Mauroux: SANAPHOR: Ontology-Based Coreference Resolution. ISWC 2015.
Barack Obama called Angela Merkel last week; the president asked the chancellor whether…
• NER in vertical domains • Crowdsourcing parts of the processing • Predicate extraction • Summarization • Exposing further types of content • Updates / transactions • Parallelization • Higher-level applications
Research Opportunities