From Linked Data to Semantic Applications

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

The Semantic Web vision & Linked Data

Multi-disciplinary perspective

Linked Data, IR, NLP

Case study: Treo

Talking to the Linked Data Web

Semantic application patterns

Take-away message

2001:

Software which is able to

understand meaning

(intelligent, flexible)

Leveraging the Web for

information scale

What was the plan to

achieve it?

Build a Semantic Web

Stack

Which covers both

representation and

reasoning

Adoption:

No significant data

growth

Ontologies are not

straightforward to

build:

People are not

familiriazed with the

tools and principles

Difficult to keep

consistency at Web scale

Scalability

Problems:

Consistecy

Scalability

Logic World

Web World

The Web as a Huge Database

Fundamental step for data

creation

2006:

Where is the intelligence and

flexibility?

We will be back to this point

in a minute

Data Model Features:

Graph-based data model

Extensible schema

Entity-centric data integration

Specific Features:

Designed over open Web standards

Based on the Web infrastructure (HTTP, URIs)

Positives:

Solid adoption in the Open Data context

(eGovernment, eScience, etc,...)

Existing data is relevant (you can build real

applications)

Negatives:

Data consumption is a problem

Data generation beyond databases

mapping/triplification is also a problem

Still far from the Semantic Web vision

How to address the previous challenges?

Linked Data:

Web-scale structured data representation

Information Retrieval:

Search, approximation, ranking strategies

Scalability

Natural Language Processing (NLP):

Analysing natural language

Semantic approximation (distributional semantics)

IBM Watson approach

From which university did the wife of

Barack Obama graduate?

With Linked Data we are still in the DB world

With Linked Data we are still in the DB world

(but slightly worse)

From which university did the wife of Barack Obama graduate?

): Direction, path

Demonstration

Transform natural language queries into triple patterns

Steps:

Entity Recognition

Dependency parsing

Query Pattern detection

Query Planning

“From which university did the wife of Barack Obama graduate?”

prep(graduate-10, From-1)

det(university-3, which-2)

pobj(From-1, university-3)

aux(graduate-10, did-4)

det(wife-6, the-5)

nsubj(graduate-10, wife-6)

prep(wife-6, of-7)

nn(Obama-9, Barack-8)

pobj(of-7, Obama-9)

root(ROOT-0, graduate-10)

From/IN

which/WDT

university/NN

did/VBD

the/DT

wife/NN

of/IN

Barack/NNP

Obama/NNP

graduate/VB

?/.

Using NLP

Using NLP

Query:

Entity Search:

Build an entity index (instances)

Extract terms from URIs and index the terms using your

favourite IR framework

Search instances by keywords

Using IR

Using IR

Query

Linked Data

Web

Use distributional semantics to semantically match

query terms to predicates and classes

Distributional principle: Words that co-occur together

tend to have related meaning

Allows the creation of a comprehensive semantic model from

unstructured text

Based on statistical patterns over large amounts of text

No human annotations

Distributional semantics can be used to compute a

semantic relatedness measure between two words

Using NLP

and IR

Computation of a measure of “semantic proximity”

between two terms

Allows a semantic approximate matching between

and

It supports a reasoning-like behavior based on the

knowledge embedded in the corpus

Using NLP

and IR

Query

Linked Data

Web

Using NLP

and IR

Which properties are

semantically related to ‘wife’?

Using NLP

and IR

Query

Linked Data

Web

Using NLP

and IR

Query

Linked Data

Web

Query

Linked Data

Web

Using NLP

and IR

Semantic approximation in databases (as in any IR

system): semantic best-effort

Need some level of user disambiguation,

refinement and feedback

As we move in the direction of semantic systems

we should expect the need for principled dialog

mechanisms (like in human communication)

Pull the the user interaction back into the system

Using NLP

and IR

Derived from the experience developing Treo

Not restricted to queries over Linked Data

The following list is not intended to be complete

Pattern #1: Maximize the amount of knowledge in

your semantic application

Meaning interpretation depends on knowledge

Using LOD: DBpedia, Freebase, YAGO can give you

a very comprehensive set of instances and their

types

Wikipedia can provide you a comprehensive

distributional semantic model

Pattern #2: Allow your database to grow

Dynamic schema

Entity-centric data integration

Pattern #3: Once the database grows in complexity

use semantic search instead of structured queries

Instances can be used as pivot entities to reduce

the search space

They are easier to search

Higher specificity and lower vocabulary variation

Pattern #4: Use distributional semantics and

semantic relatedness for a robust semantic

matching

Distributional semantics allows your application to

digest (and make use of) large amounts of

unstructured information

Multilingual solution

Can be complemented with WordNet

Pattern #5: POS-Tags, Syntactic Parsing + Rules will

go a long way to interpret natural language queries

and sentences

Use them to explore the regularities in natural

language

Define a scope for natural language processing in

your application (restrict by domain, syntactic

complexity)

These tools are easy to use and quite robust (at

least for English)

Pattern #6: Provide a user dialog mechanism in the

application

Improve the semantic model with user feedback

Part of the Semantic Web vision can be addressed

today with a multi-disciplinary perspective

Linked Data, IR and NLP

You can build your own IBM Watson-like application

Both data and tools are available and ready to use:

the barrier is the mindset

Large opportunity for new solutions

NLP

WordNet

VerbNet

Stanford parser

C&C parser/Boxer

NLTK

DBpedia Spotlight

Gate

UIMA

IR

Lucene/Solr

Terrier

Datasets

DBpedia

Freebase

YAGO

Tools that will be

available soon:

Treo

Treo-ESA

Graphia

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain,

. IEEE Internet

Computing, Special Issue on Internet-Scale Data, 2012.

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain,

International Journal of Semantic Computing (IJSC),

2012.

André Freitas, Sean O'Riain, Edward Curry,

. 27th ACM Applied Computing Symposium, Semantic Web and Its

Applications Track, 2012.

André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da

Silva, In

Proceedings of the 16th International Conference on Applications of Natural Language to

Information Systems (NLDB) 2011.

André Freitas, Danilo S. Carvalho, João Carlos Pereira da Silva, Sean O'Riain, Edward Curry, A

Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia. In

Proceedings of the 1st Workshop on the Web of Linked Entities (WoLE 2012) at the 11th

International Semantic Web Conference (ISWC), 2012

andrefreitas.org

andre (dot) freitas – at – deri (dot) org

From Linked Data to Semantic Applications

Technology

From Linked Data to Semantic Applications