Top Banner
Semantic Search Ready to Use? Dr Victoria Uren
46

NetIKX Semantic Search Presentation

Oct 17, 2014

Download

Education

The slides discuss the research agenda for search of the semantic web and current available search tools. The slides were prepared for an audience of information
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NetIKX Semantic Search Presentation

Semantic Search

Ready to Use?

Dr Victoria Uren

Page 2: NetIKX Semantic Search Presentation

Motivation

“A little semantics goes a long way”

Jim Hendler

“The classic keyword search box exerts a powerful gravitational pull.

Academics and industry researchers need to achieve the intellectual

‘escape velocity’ necessary to revolutionize search. They must invest

much more in bold strategies that can achieve natural-language

searching and answering, rather than providing the electronic

equivalent of the index at the back of a reference book. “

Oren Etzioni, Search needs a shake up, Nature, 4 Aug. 2011, v.476,

pp25-26

Page 3: NetIKX Semantic Search Presentation

Plan

Introduction - What is semantic search?

Research Background

How it works

Interface types

Research Issues

What is usable?

For web search

For corporate data management

Page 4: NetIKX Semantic Search Presentation

Introduction

Page 5: NetIKX Semantic Search Presentation

Search as we know it

Full text search

TF-IDF & other statistical approaches

PageRank – exploiting hyperlink graph

Controlled term search

OPAC

MESH etc.

Other metadata

Date of publication, author etc.

Output typically ranked pages, records, documents

Page 6: NetIKX Semantic Search Presentation

Semantic Search

Classic IR perspective

Improve statistical/link based search of documents / webpages

by better understanding user’s information need

Resolve ambiguity

Clustering

Query expansion

Past searches, WordNet etc. to suggest related terms

Page 7: NetIKX Semantic Search Presentation
Page 8: NetIKX Semantic Search Presentation
Page 9: NetIKX Semantic Search Presentation

Semantic Search

Web 3.0 perspective

Improve search over machine understandable data which

may, or may not, include annotated documents

Search for entities (people, products …)

Search for facts (capital of Georgia?)

Fuse knowledge from different sources

Exploit structure of formal knowledge

Broader / narrower plus much more

Page 10: NetIKX Semantic Search Presentation

Web 3.0 Search is

Metadata search

So more like

Searching a relational database

E.g. an OPAC

Search of the deep web

BUT linked data is “heterogeneous”

Multiple domains mixed together

Microformats & RDFa are from multiple sources

Quality & consistency variable

Page 11: NetIKX Semantic Search Presentation

Benefits of Semantic Search

Machine understandability

i.e. controlled by “ontologies” so you can reason over it

Supports entity search

Ambiguity

Seat/SEAT

Broader/narrower

Exploiting hierarchical class relations

Complex queries over triples

E.g. Joint between mild steel and stainless steel

Heterogeneity

Mappings between ontologies (silo bridging)

Page 12: NetIKX Semantic Search Presentation

Research Systems

Page 13: NetIKX Semantic Search Presentation

Formal queries over RDF

SQL-like languages

SPARQL , SeRQL

Xpath like languages

Xquery, Rpath

Others

Metalog (controlled English)

F-logic

RDF-QBE (query by example)

James Bailey et al., Web and Semantic Web Query

Languages: A Survey. Reasoning Web 2005: 35-133

Page 14: NetIKX Semantic Search Presentation

Sample SPARQL

SELECT ?x

WHERE { ?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Smith" }

PREFIX vcard: http://www.w3.org/2001/vcard-rdf/3.0#

SELECT ?y ?givenName

WHERE { ?y vcard:Family "Smith" .

?y vcard:Given ?givenName . }

Examples from http://jena.sourceforge.net/ARQ/Tutorial/

Subject Object

Predicate

Page 15: NetIKX Semantic Search Presentation

Interfaces for Query Generation

Keyword

Forms

Graph based

Question answering

Tabular browsers

Page 16: NetIKX Semantic Search Presentation

Keyword based

Aims to be as close as possible to Google-like keyword search

Pluses

Minimal learning curve for users

Can handle heterogeneity

Minus

Query complexity is limited to Entity search & Simple

triples

Page 17: NetIKX Semantic Search Presentation

SemSearch

Y. Lei, V. Uren, and E. Motta, A Ranking-Driven

Approach to Semantic Search, Poster in ASWC 2008

Page 18: NetIKX Semantic Search Presentation

SemSearch

4 matches

(2 classes & 2 individuals)

6 matches

(relations)

Total queries generated = 4*6 = 24

for “News: Victoria“

Page 19: NetIKX Semantic Search Presentation

Forms

Familiar interface metaphor

Database search

Product search

Plus

Allows construction of more complex searches

Minus

Can’t handle heterogeneous open web - forms need to be pre-defined

Page 20: NetIKX Semantic Search Presentation
Page 21: NetIKX Semantic Search Presentation
Page 22: NetIKX Semantic Search Presentation

Graph-based Search

Aim is to expose the structure of the ontology to the user to

scaffold query formulation

Pluses

Good for single ontology environments

Helps the user comprehend the domain

Minuses

Can become unwieldy with big and complex domains

Page 23: NetIKX Semantic Search Presentation
Page 24: NetIKX Semantic Search Presentation

Question Answering

Natural language input

“What is the capital of Georgia?”

Translation process transforms the natural language into a formal query

Pluses

Relatively complex queries possible (intersection of 2 triples)

Can deal with heterogeneity

User doesn’t need to understand the ontology

Minuses

Heavy computation

Page 25: NetIKX Semantic Search Presentation

AquaLog: question answering

Natural Language

Query

Linguistic Triple

Logical Triples

Answer

GATE

components

Relation

Similarity

Service

Semantic

match

Lopez, V., Uren, V., Motta, E. and Pasin, M. (2007) AquaLog: An

ontology-driven question answering system for organizational

semantic intranets, Journal of Web Semantics, 5, 2, pp. 72-105.

What are the

projects

of Vanessa?

which is,

projects,

vanessa

project, has-

project-member/

has-project-leader,

vanessa

AKT,

Dot.KoM

Page 26: NetIKX Semantic Search Presentation

Tabular Browsing

Start with keyword search expand by browsing through links

Pluses

Supports data exploration

Output as sets of facts

Minuses

Not suitable for heterogeneous datasets

Can be slow

Page 27: NetIKX Semantic Search Presentation

Parallax (http://www.freebase.com/labs/parallax/)

Page 28: NetIKX Semantic Search Presentation

Research Challenges

Usability / expressivity trade off

Heterogeneity

Ontologies, quality, provenance

Mapping, filtering

Security & Privacy

Personal data, social web

Scalability

Page 29: NetIKX Semantic Search Presentation

Near Commercial Systems

Page 30: NetIKX Semantic Search Presentation

Usable Web3.0 Tools

For Web search

For Corporate data management

NOTE – a personal selection – I’m not endorsing any of these!

Page 31: NetIKX Semantic Search Presentation

Sig.ma (Semantic Information Mashup) http://sig.ma

Runs off Sindice crawl of pages with embedded RDFa and

other microformats

Uses a keyword search for entities

No attempt at fusion or disambiguation

Page 32: NetIKX Semantic Search Presentation

Web Search -Sig.ma

Page 33: NetIKX Semantic Search Presentation
Page 34: NetIKX Semantic Search Presentation
Page 35: NetIKX Semantic Search Presentation

Google RichSnippets

Entity data based on microformats, RDFa, microdata

Reviews

People

Products (GoodRelations)

Businesses & Organizations

Recipes

Events

Video

Supports entity search, with keyword search & facetted browsing

Harvested from sites which supply the data in the required formats

Page 36: NetIKX Semantic Search Presentation
Page 37: NetIKX Semantic Search Presentation

Wolfram|Alpha http://www.wolframalpha.com/

Focus is on computational knowledge

Natural language question input

Uses its own proprietary knowledge base

Page 38: NetIKX Semantic Search Presentation
Page 39: NetIKX Semantic Search Presentation

DBpedia http://dbpedia.neofonie.de/browse/

Searches factual information extracted from Wikipedia as RDF

Facetted browse approach in the home page

BUT used in many many other research & Open Linked Data

sites (e.g. Sig.ma)

Page 40: NetIKX Semantic Search Presentation
Page 41: NetIKX Semantic Search Presentation

Usable Web3.0 Tools

For Web Search

For Corporate Data Management

Opportunity for bridging data silos

Keyword search has never been as good for CMS and

Intranet as for internet

Need experts to configure free text search well

Distribution of terms can be skewed – impossible to

configure

Web3.0 is a network native technology

Page 42: NetIKX Semantic Search Presentation

Drupal 7

One of the most popular CMS

E.g. Recovery.gov was originally on Drupal

Semantic Drupal research pioneered by DERI Galway

Open Source

Developers often prefer it to Sharepoint

RDFa export as standard from CMS structure (no annotation needed)

Publish structured data that Google, Sindice etc. can harvest

API methods built in

Search NOT built in

Page 43: NetIKX Semantic Search Presentation
Page 44: NetIKX Semantic Search Presentation

Virtuoso (http://virtuoso.openlinksw.com/)

Hybrid server

XML

SQL

RDF

Free Text

Supporting

Merging of data silos in different formats

Production of Web applications & services

Large Scale

Open Source version

Page 45: NetIKX Semantic Search Presentation

Ready to use?

Beyond the TRL3-5 “valley of Death”

TRL7? for facetted browse, server technology

Not yet a stable market - technologies like SearchMonkey may come & go

Page 46: NetIKX Semantic Search Presentation

Acknowledgements

People: Fabio Ciravegna , Aba-Sah Dadzie, Khadija

Elbedweihy, Miriam Fernandez, Yuangui Lei, Vanessa Lopez,

Enrico Motta

Projects: X-Media, OpenKnowledge, AKT, SmartProducts