SSSW 2015 Sense Making

SSSW 2015 Bertinoro–Italy July 10, 2015

Sense Making

Axel-Cyrille Ngonga Ngomo &

Philippe Cudré-Mauroux

On Making Sense

•  ½ of Computer Science is about making sense of some input data – KDD (cf. Claudia & Laura tutorial) – NLP (cf. Roberto’s talk) – Multimedia Analysis – Social Media / Big Data Analytics – Visualization –  etc.

On the Menu Today

•  Making Sense of Semantic Data – Making sense of SPARQL & Semantic Web predicates – Trust on Semantic Web data – Emergent Semantics

•  Leveraging Semantic Data for Sense Making – Making sense of textual entities – Making sense of relational data – Making sense of webtables

Making Sense of

Semantic Data

Introduction

At some point in the earlytwenty-first century, all of mankindwas united in celebration. Wemarveled at our own magnificence aswe gave birth to AI.– Morpheus, The Matrix

Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 2 / 52

Introduction


Linked Data Web

Sense Making

Helping end users to make sense of the Semantic Web.


Linked Data Web

Sense Making

Helping end users to make sense of the Semantic Web.


Gaps

Language Gap

Semantic Web speaks languages that normal users do not understand


Language Gap

Problem

What does it mean?


Language Gap


Language Gap

Problem

What does it mean?

1 PREFIX dbo: <http :// dbpedia.org/ontology/>2 PREFIX res: <http :// dbpedia.org/resource/>3 SELECT DISTINCT ?person WHERE {4 ?person dbo:team ?sportsTeam .5 ?sportsTeam dbo:league res:Premier_League .6 ?person dbo:birthDate ?date .7 ?person dbo:birthPlace ?place .8 { ?place dbo:locatedIn res:Africa .}9 UNION

10 { ?place dbo:locatedIn res:Asia .}11 }12 ORDER BY DESC(?date)13 OFFSET 0 LIMIT 1

Give me the youngest person who plays in a Premier League

team and was born in Africa or Asia.


Language Gap

Problem

What does it mean?

1 PREFIX dbo: <http :// dbpedia.org/ontology/>2 PREFIX res: <http :// dbpedia.org/resource/>3 SELECT DISTINCT ?person WHERE {4 ?person dbo:team ?sportsTeam .5 ?sportsTeam dbo:league res:Premier_League .6 ?person dbo:birthDate ?date .7 ?person dbo:birthPlace ?place .8 { ?place dbo:locatedIn res:Africa .}9 UNION

10 { ?place dbo:locatedIn res:Asia .}11 }12 ORDER BY DESC(?date)13 OFFSET 0 LIMIT 1

Give me the youngest person who plays in a Premier League

team and was born in Africa or Asia.


Language Gap

Solution

Verbalization frameworks for the Semantic Web

Document planner MicroplannerRealizer

http://github.com/AKSW/SemWeb2NL


http://github.com/AKSW/SemWeb2NL

Language Gap: Triple2NL/BGP2NL

Approach

1 ρ(s p o) ⇒ poss(ρ(p),ρ(s))∧subj(BE,ρ(p))∧ dobj(BE,ρ(o))

2 ρ(s p o) ⇒ subj(ρ(p),ρ(s))∧ dobj(ρ(p),ρ(o))

1 :Momo :author :Ende

⇒ Momo’s author is Michael Ende.

2 ?x :author :Ende

⇒ ?x ’s author is Michael Ende.

3 :Momo :writtenBy :Ende

⇒ Momo was written by Michael Ende.

4 ?x :writtenBy :Ende

⇒ ?x was written by Michael Ende.



Approach





2 ?x :author :Ende








Approach





2 ?x :author :Ende







Language Gap: SPARQL2NL/RDF2NL

Approach

Combination rules

1 ρ((s, p, o1).(s, p, o2))

⇒ poss(ρ(p),ρ(s))∧ subj(BE,ρ(p))∧ dobj(BE, cc(ρ(o1), ρ(o1))

?x’s author is Paul Erdos and ?x’s author is Kevin Bacon.

⇒ ?x’s authors are Paul Erdos and Kevin Bacon.


Language Gap: SPARQL2NL/RDF2NL

?place is Shakespeare’s birth place or ?place is Shakespeare’s deathplace.

⇒ ?place is Shakespeare’s birth or death place.

This query retrieves values ?height such that ?height is ClaudiaSchiffer’s height.

⇒ This query retrieves Claudia Schiffer’s height.

?person’s team is ?sportsTeam. ?person’s birth date is ?date.?sportsTeam’s league is Premier League.

⇒ ?person’s team is ?sportsTeam, ?person’s birth date is ?date, and?sportsTeam’s league is Premier League.


Language Gap: Evaluation

125 participants, 49 SPARQL experts, 3 tasks

94% of verbalizations were understandable

5.31± 1.08 average adequacy score

0 50 100 150 200 250

Number of Survey Answers

1

2

3

4

5

6

Adequacy

0 20 40 60 80 100 120

Number of Survey Answers

1

2

3

4

5

6

Fluency

Figure : Adequacy and fluency results in survey




Slightly larger error with NL for expertsNon-experts enabled understand the meaning of queries

0 0,2 0,4 0,6 0,8 1 1,2 1,4

error rate

SPARQL

NL

NL (SPARQL experts)

Figure : Error rate over the three tasks




Non-experts faster with NL than experts with SPARQLExperts faster with NL than experts with SPARQL

0 5 10 15 20

time in minutes (purple = standard deviation)

SPARQL

SPARQL (filtered)

NL

NL (filtered)

NL (SPARQL experts)

NL (SPARQL experts, filtered)

Figure : Average time needed


Language Gap: Application


Language Gap: Application


Language Gap: Challenges

Complex queries

Sacrifice adequacy for fluency

Other languages

Hybrid approach

Personalization


Gaps

Semantic Gap

Decentralized content generation

Contextualization mismatch


Semantic Gap

Problem

How do I communicate with it?


Semantic Gap

Solution

Question Answering Systems

Example:

Where did Abraham Lincoln die?

SELECT ?x WHERE {res:Abraham Lincoln dbo:deathPlace ?x .

}PowerAqua:

Triple representation:〈state/place, die, Abraham Lincoln〉Ontology mappings:〈Place, deathPlace, Abraham Lincoln〉


Semantic Gap: Mismatch

Triples do not always provide a faithful representation of the semanticstructure of the question

Thus more expressive queries cannot be answered

Example 1:

Which cities have more than three universities?

SELECT ?y WHERE {?x rdf:type dbo:University .

?x dbo:city ?y .}HAVING (COUNT(?x) > 3)

Triple representation:〈cities, more than, universities three〉


Semantic Gap: Mismatch

Triples do not always provide a faithful representation of the semanticstructure of the question

Thus more expressive queries cannot be answered

Example 2:

Who produced the most films?

SELECT ?y WHERE {?x rdf:type dbo:Film .

?x dbo:producer ?y .}ORDER BY DESC(COUNT(?x)) LIMIT 1

Triple representation:〈person/organization, produced, most films〉


Semantic Gap: Approach

To understand a user question, we need to understand:

The words

Abraham Lincoln → res:Abraham Lincoln

died in → dbo:deathPlace

The semantic structure

the most N → ODER BY DESC(COUNT(?n)) LIMIT 1

more than three N → HAVING (COUNT(?n) > 3)

Template-Based Question Answering1 Template generation: Understanding the semantic structure)2 Template instantiation: Understanding the words)


Semantic Gap: Example

Query: Who produced the most films?

1 SPARQL template:SELECT ?x WHERE {

?y rdf:type ?c .

?y ?p ?x .}ORDER BY DESC(COUNT(?y)) LIMIT 1

?c CLASS [films]?p PROPERTY [produced]

2 Instantiations:

?c = <http://dbpedia.org/ontology/Film>

?p = <http://dbpedia.org/ontology/producer>





?y rdf:type ?c .



2 Instantiations:







?y rdf:type ?c .



2 Instantiations:




Semantic Gap: Architecture

Natural Language Question

Semantic Representaion

SPARQL Query

Templates

Templates with URI slots

Ranked SPARQL Queries

Answer

LOD

Entity identification

Entity and Query Ranking

Query Selection

Resourcesand Classes

SPARQL Endpoint

Type Checkingand Prominence

BOA PatternLibrary

Properties

Tagged Question

Domain Independent Lexicon

Domain Dependent Lexicon

Parsing

Corpora?

!Loading

State

Process

Uses


Semantic Gap: Template Generation



1 Natural language question is taggedwith part-of-speech information.



2 Based on POS tags, lexical entriesare built on the fly.



3 These lexical entries, together withdomain-independent lexical entries,are used for parsing the question.



4 The resulting semanticrepresentation is translated into aSPARQL template.


Semantic Gap: Who produced the most films?

domain-independent: who, the most

domain-dependent: produced/VBD, films/NNS

SPARQL template 1:SELECT ?x WHERE {

?x ?p ?y .

?y rdf:type ?c .}ORDER BY DESC(COUNT(?y)) LIMIT 1




domain-independent: who , the most



?x ?p ?y .

?y rdf:type ?c .}ORDER BY DESC(COUNT(?y)) LIMIT 1





domain-dependent: produced/VBD , films/NNS


?x ?p ?y .

?y rdf:type ?c .

}ORDER BY DESC(COUNT(?y)) LIMIT 1

?c CLASS [films]

?p PROPERTY [produced]






?x ?p ?y .


?p PROPERTY [films]


Semantic Gap: Template instantiation



1 For resources and classes:

Identify synonyms of the label using WordNet.Retrieve entities with a label similar to the slot labelbased on string similarities (trigram, Levenshtein,substring).



2 For property labels, the label isadditionally compared to naturallanguage expressions stored in theBOA pattern library.



3 The highest ranking entities arereturned as candidates for filling thequery slots.


BOA

The BOA pattern library is a repository of natural languagerepresentations of Semantic Web predicates.Idea:

For each predicate P in a data repository (e.g. DBpedia), collect theset of entities S and O connected through P.

Search a text corpus (e.g. Wikipedia) for all sentences containing thelabels of S and O.

For all retrieved sentences, the natural language predicate is apotential pattern for P. The potential patterns are then scored by aneural network (e.g. according to frequency) and filtered.


BOA: Example

Predicate:http://dbpedia.org/ontology/subsidiary

RDF snippet:

<http://dbpedia.org/resource/Google>

<http://dbpedia.org/ontology/subsidiary>

<http://dbpedia.org/resource/YouTube> .

<http://dbpedia.org/resource/Google> rdfs:label ‘Google’@en .

<http://dbpedia.org/resource/YouTube> rdfs:label ‘Youtube’@en .

Sentences:

Google’s acquisition of Youtube comes as online video is really startingto hit its stride.Youtube, a division of Google, is exploring a new way to get morehigh-quality clips on its site: financing amateur video creators.

Patterns:

subsidiary: S’s acquisition of Osubsidiary: O, a division of S


BOA

The use of BOA patterns allows us to match natural language expressionsand ontology concepts even if they are not string similar and not coveredby WordNet.Examples:

married to → http://dbpedia.org/ontology/spouse

was born in → http://dbpedia.org/ontology/birthPlace

graduated from → http://dbpedia.org/ontology/almaMater

write → http://dbpedia.org/ontology/author


Example: Who produced the most films?

Candidates for filling query slots:

?c CLASS [films]

<http://dbpedia.org/ontology/Film>

<http://dbpedia.org/ontology/FilmFestival>

. . .

?p PROPERTY [produced]

<http://dbpedia.org/ontology/producer>

<http://dbpedia.org/property/producer>

<http://dbpedia.org/ontology/wineProduced>

. . .


Semantic Gap: Query ranking and selection



1 Every entity receives a scoreconsidering string similarity andprominence

2 The score of a query is thencomputed as the average of thescores of the entities used to fill itsslots



3 In addition, type checks areperformed



4 Of the remaining queries, the onewith highest score that returns aresult is chosen to retrieve ananswer.


Example: Who produced the most films?

SELECT ?x WHERE {?x <http://dbpedia.org/ontology/producer> ?y .

?y rdf:type <http://dbpedia.org/ontology/Film> .}ORDER BY DESC(COUNT(?y)) LIMIT 1

Score: 0.7592425075864263

SELECT ?x WHERE {?x <http://dbpedia.org/ontology/film> ?y .


Score: 0.6264001353183296

SELECT ?x WHERE {?x <http://dbpedia.org/ontology/producer> ?y .

?y rdf:type <http://dbpedia.org/ontology/FilmFestival>.}ORDER BY DESC(COUNT(?y)) LIMIT 1

Score: 0.6012584940627768


Evaluation Setup

Question set: 39 DBpedia training questions from QALD-1

5 could not be parsed due to unknown syntactic constructions oruncovered domain-independent expressions

19 were answered exactly as required by the benchmark (withprecision and recall 1.0)

Another 2 are answered almost correctly (with precision and recallgreater than 0.8)

Mean precision: 0.61Mean recall: 0.63F-measure: 0.62


Main Sources of Error

Incorrect templatesTemplate structure does not coincide with structure of the data:

When did Germany join the EU?res:Germany dbp:accessioneudate ?x .

Predicate detection fails

inhabitants 9 dbp:population, dbp:populationTotalowns 9 dbo:keyPerson

higher 9 dbp:elevationM

Wrong query is selected

Who wrote The pillars of the Earth?res:The Pillars of the Earth (TV Miniseries) dbo:writer ?x .

res:The Pillars of the Earth dbo:author ?x .


Language Gap: Challenges

Schema-agnostic QA

Query Ranking

Relation Extraction

Ontology Lexicalization

Extraction of surface forms


Justification Gap

Problem

Are you sure? Prove it to me.


Justification Gap

Solution

Gathering natural-language evidence?

http://aksw.org/Projects/defacto


http://aksw.org/Projects/defacto

Justification Gap: Automatic Query Generation

Solution

Gathering natural-language evidence?

⇓


Justification Gap: Evidence Generation

(s, p, o) = “ρ(s)” “ρ(p)” “ρ(s)”

:Momo :author :Ende1 “Momo” “author” “Michael Ende”2 “Momo” “written by” “Michael Ende”3 “Momo” “book by” “Michael Ende”


Justification Gap: Proof Scoring

Combination of features including1 Score of BOA pattern2 Token distance3 Total occurrence of resource labels4 Similarity to title


Justification Gap: Trustworthiness

Combination of features including1 Topic majority on the Web2 Topic majority in results3 Topic terms4 Page rank


Justification Gap: Fact Confirmation

Combination of features including1 Combined trustworthiness and proof score2 Number of proofs3 Total hit count4 Domain/Range


Justification Gap: Evaluation

10 triples/property

Top-60 most used properties

473 from 600 manually verified to be true


Justification Gap: Evaluation

J48 is overall best classifier (78.8% - 87.6%)

Easiest data set: random

Mixed dataset hardest


Challenges


Challenges


Challenges


Challenges


Summary

Language Gap

Semantic Gap

Justification Gap

Access Gap

Data Gap

Noise Gap

. . .


Goal


Goal


The End

Thank you! Questions?

Axel Ngongahttp://aksw.org/AxelNgonga

[email protected]

AKSW Research GroupUniversity of Leipzig, Germany@akswgroup

@NgongaAxel


http://aksw.org/AxelNgonga

[email protected]

@NgongaAxel

Emergent Semantics

The Semantics of the Semantic Web

•  A priori: top-down semantics – Logical assertions – Crisp reuse of conceptualization

•  In practice: hybrid bottom-up/top-down approach –  (Human/software) agents are sloppy/ignorant – Agents do not agree (for various reasons) => Centralized view on decentralized construct ?

Semantic Grounding

The meaning of symbols can be explained by its semantic correspondences to other symbols alone [“Understanding understanding” Rapaport 93]

•  Type 1 semantics: understanding in terms of something else •  Problem: how to ground semantics?

•  Type 2 semantics: understanding something in terms of itself

•  “syntactic semantics”: grounding through recursive understanding

Emergent Semantics

Emergent Semantics: •  Semantics as a posteriori agreement on conceptualizations

=> Don’t believe / enforce the schema ! •  Semantics of symbols as recursive correspondences to other

symbols •  Analyzing transitive closures of mappings

•  Self-organizing, bottom-up approach •  Global semantics (stable states) emerging from multiple

local interactions •  Syntactic semantics

•  Studying semantics from a syntactic perspective

3 Concrete Examples

1.  Emergence of Semantic Interoperability 2.  Entity disambiguation using same-as networks 3.  A posteriori schema for LOD properties

•  How many links do you need to make a semantic network interoperable?

•  Semantic interoperability as an emergent property!

⇒ Connectivity indicator: ci = ∑j,k (jk-j(bc+cc)-k) pjk •  Necessary condition for semantic interoperability in the

large: ci ≥ 0

Semantic Connectivity

Philippe Cudré-Mauroux, Karl Aberer: A Necessary Condition for Semantic Interoperability in the Large. CoopIS/DOA/ODBASE 2004: 859-872.

Graph-Based Disambiguation

•  The great thing about unique identifiers is that there are so many to choose from –  URI jungle –  Disambiguation based on transitive closures on equality links

Philippe Cudré-Mauroux, Parisa Haghani, Michael Jost, Karl Aberer, Hermann de Meer: idMesh: graph-based disambiguation of linked data. WWW 2009: 591-600.

A Posteriori Schema

•  Instance data use schema constructs in creative ways!

⇒  Retro-engineering of schema constructs based on the

deployment of instance data ⇒  Context-dependent, retro-compatible

Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux: Fixing the Domain and Range of Properties in Linked Data by Context Disambiguation. LDOW 2015.

•  Tons of research opportunities in this field •  Understanding the emergent properties of LOD

networks (and how to exploit them) •  Analyzing the deployment / use of semantic data (a

priori VS a posteriori views) •  Capturing user disagreement (e.g., multi-views

ontologies, fuzzy ontologies, results diversification)

Research Directions

Leveraging Semantic Data

for Sense Making

Volume ■  amount of data

Velocity

■  speed of data in and out

Variety ■  range of data types and sources

[Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization"

Opportunity: The 3-Vs of Big Data

Entities

Information Management

•  The story so far: – Strict separation between unstructured and structured

data management infrastructures

DBMS

JDBC

SQL

Inverted Index

Keywords

HTTP

Information Integration

•  Information integration is still one of the biggest CS problem out there (according to many e.g., Gartner)

•  Information integration typically requires some sort of mediation 1.  Unstructured Data: keywords, synsets 2.  Structured Data: global schema, transitive closure of

schemas (mostly syntactic)

⇒ nightmarish if 1 and 2 taken separately, horror marathon if considered together

Entities as Mediation •  Rising paradigm

– Store information at the entity granularity –  Integrate information by inter-linking entities

•  Advantages? – Coarser granularity compared to keywords

•  More natural, e.g., brain functions similarly (or is it the other way around?)

– Denormalized information compared to RDBMSs •  Schema-later, heterogeneity, sparsity •  Pre-computed joins, “Semantic” linking

•  Drawbacks?

Entity-Centric Data Management

Higher-level apps

Exposing Textual Data

•  The XI Pipeline

•  Runs on massive amounts of data (Spark)

Mention Extraction NER Entity

Linking Entity Typing

Named Entity Recognition (NER)

Text extraction

(Apache Tika)

List of extractedn-grams

n-gramIndexing

foreach

Candidate Selection

List of selected n-grams

SupervisedClassi!er

Ranked list of

n-grams

Lemmatization

n+1 gramsmerging

Feature extractionFeature

extractionFeatures

POS Tagging

frequency reweighting

Roman Prokofyev, Gianluca Demartini, Philippe Cudré-Mauroux: Effective named entity recognition for idiosyncratic web collections. WWW 2014: 397-408

Entity Linking

•  Linking entities to text is an old problem… – … and is extremely hard, esp. for machines

•  Dozens of approaches have been suggested

•  What if – We want to combine approaches / frameworks? – We want to leverage both human computations &

algorithms?

ZenCrowd

•  Integrate textual data w/ the Web of Data •  Uses sets of algorithmic matchers to match

entities to online concepts •  Uses dynamic templating to create micro-

matching-tasks and publish them on MTurk •  Combines both algorithmic and human matchers

using probabilistic networks

Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudré-Mauroux: ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. WWW 2012: 469-478

ZenCrowd Architecture

Micro Matching

Tasks

HTMLPages

HTML+ RDFaPages

LOD Open Data Cloud

CrowdsourcingPlatform

ZenCrowdEntity

Extractors

LOD Index Get Entity

Input Output

Probabilistic Network

Decision Engine

Micr

o-Ta

sk M

anag

er

Workers Decisions

AlgorithmicMatchers

Probabilistic Inference

•  Probabilistic network to integrate a priori & a posteriori information – Agreement of good turkers & algorithms

•  Learning process – Constraints

•  Unicity •  Equality (SameAs)

– Giant probabilistic graph •  Instantiated selectively

w1 w2

l1 l2

pw1( ) pw2( )

lf1( ) lf2( )

pl1( ) pl2( )

l3

lf3( )

pl3( )

c11 c22c12c21 c13 c23

u2-3( )sa1-2( )

Does it Work?

•  Improves avg. prec. by 0.14 on average! – Minimal crowd involvement – Embarrassingly parallel problem

Top$US$Worker$

0$

0.5$

1$

0$ 250$ 500$

Worker&P

recision

&

Number&of&Tasks&

US$Workers$

IN$Workers$

0.6$0.62$0.64$0.66$0.68$0.7$

0.72$0.74$0.76$0.78$0.8$

1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$

Precision)

Top)K)workers)

Entity Typing

•  Entities can have many types (facets) •  Which fine-grained types are most relevant given the context?

Thing

American Billionaires

People from King County

People from Sea:le

Windows People

Agent

Person Living People

American People of Sco@sh Descent

Harvard University People

American Computer

Programmers

American Philanthropists

TRank •  Fine-grained Typing

•  Tree of 447’260 types •  Rooted on <owl:Thing> •  Depth of 19

•  Ranks relevant types by analyzing the context •  Textual context •  Graph context •  Decision trees •  Linear regression

Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer: TRank: Ranking Entity Types Using the Web of Data. ISWC 2013: 640-656.

Exposing Relational Data •  Mapping language file

describes the relation between ontology and RDB

•  Server provides HTML and linked data views and a SPARQL 1.1 endpoint

•  Rewriting engine uses map-pings to rewrite Jena & Sesame API calls to SQL queries and generates RDF dumps in various formats

http://d2rq.org/ , http://aksw.org/Projects/Sparqlify.html , etc.

Exposing Webtables •  Wealth of data in (HTML) tables •  Yet another type of content to expose

Sreeram Balakrishnan, Alon Y. Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu: Applying WebTables in Practice. CIDR 2015 Tao, Cui, and David W. Embley. "Automatic hidden-web table interpretation, conceptualization, and semantic annotation." Data & Knowledge Engineering 68.7 (2009): 683-703.

Application 1: Enterprise Search

•  How can end-users reach entities? ⇒  Structured search ⇒ Keyword search

•  On their names or attributes – Obviously not ideal

•  BM25 on TREC 2011 AOR: MAP=0.15, P@10=0.20 •  Query extension, query completion or pseudo-relevance

feedback yield comparable (or worse) results

Hybrid Entity Search

The Descendants

TheDescendants

type

titleGeorgeClooney

George Clooney

nameMay 6, 1961

dateOfBirth

type

ShaileneW

Shailene Woodley

name

Nov. 15, 1991

dateOfBirth

type

playsIn

playsIn

•  Main idea: combine unstructured and structured search –  Inverted index to locate first candidates – Graph queries to refine the results

•  Graph traversals (queries on object properties) •  Graph neighborhoods (queries on

data type properties)

Inverted Index

Keywords

HTTP

DBMS

SPARQL

Architecture

LOD Cloud

index()User

Query Annotation and Expansion

Inverted Index

RDF Store

Ranking FunctionsRanking

FunctionsRanking Functions

query()

Entity SearchKeyword Query

intermediatetop-k resultsGraph-Enriched

Results

Graph Traversals(queries on object

properties)

Neighborhoods(queries on datatype

properties)

Structured Inverted Index

WordNet

3rd partysearch engines

Final Ranking Function

Pseudo-Relevance Feedback

Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and structured search for ad-hoc object retrieval. SIGIR 2012: 125-134

Application 2: Literature Browsing/Recommendation

Application 3: Co-Reference Resolution

•  Better co-reference resolution through the knowledge base

Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz, Djellel Eddine Difallah, and Philippe Cudre-Mauroux: SANAPHOR: Ontology-Based Coreference Resolution. ISWC 2015.

Barack Obama called Angela Merkel last week; the president asked the chancellor whether…

•  NER in vertical domains •  Crowdsourcing parts of the processing •  Predicate extraction •  Summarization •  Exposing further types of content •  Updates / transactions •  Parallelization •  Higher-level applications

Research Opportunities

1.  Analyzing emergent properties of LOD 2.  Crowdsourcing predicate extraction 3.  SPARQL verbalization 4.  Hybrid question answering 5.  Source selection 6.  Ranking SPARQL queries

Research Tasks

SSSW 2015 Sense Making

Data & Analytics

sense of semantic data

sense of webtables

sense of relational

person dbo

linked data web sense

place dbo

prefix dbo

sportsteam dbo