PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore

Post on 15-Mar-2018

216 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

Transcript

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 1

Search Engines for Semantic Web

KnowledgeTim Finin

University of Maryland, Baltimore County

Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott

Cost and Vishal Doshi

http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF

grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 2

This talk• Motivation• Semantic web 101• Swoogle Semantic Web

search engine• Use cases and applications• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 3

Once there were only a

few large computers

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 4

Then there were many,

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 5

All connected 24x7,Internet

Cellular telephonyIRDA802.11

BluetoothUltra Wide Band

RFIDand more to come

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 6

Interoperating;tcp/ip ftp smtp rpc corba ssh

http html xml

gif jpg mpg mp3pdf …

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 7

Access to the world’s knowledge

del.icio.us

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 8

Google has made us smarter

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 9

But what about our agents?

tell

register

Agents still have a very minimal understanding of text and images.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 10

This talk• Motivation• Semantic web 101• Swoogle Semantic Web search

engine• Use cases and applications• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 11

XML helps

“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”

-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 12

“The Semantic Web will globalize KR, just as the WWW globalize hypertext”

-- Tim Berners-Lee

Semantic Web adds semantics

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 13

Semantic Web 101<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:uni=http//ebiquity.umbc.edu/ontologies/uni/>

<uni:Student> <foaf:name>Li Ding</foaf:name> <foaf:mbox rdf:resource=“mailto:dingli1@umbc.edu”/> </uni:Student></rdf:RDF>

• RDF/XML• rdf:RDF tag• namespaces ontologies

• Semantic graph, URIs as nodes & links

• triples

Li Dingfoaf:name

uni:Studentrdf:type

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 14

Where’s the semantics?• URIs as “rigid designators”• Conventions for URIs denoting things in the “real

world”• Namespaces and URIs provide an unambiguous shared

vocabulary• RDF, RDFS and OWL have semantics defined using

model theory and also axioms• Ontologies allow agents to draw inferences

– uni:Student is a subclass of foaf:Person– Every uni:Student has at least one uni:school, which must be

an instance of uni:School– A foaf:Person with a uni:school is necessarily a uni:Student

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 15

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 16

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 17

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 18

RDF/a RDF/a is a W3C proposal for embedding RDF in XHTML documents

<html xmlns:foaf="http://xmlns.com/foaf/0.1/"> <head><title>Jo Lambda's Home Page</title></head> <body> Hello. This is <span property="foaf:name">Jo Lambda</span>'s home page. <h2>Work</h2> If you want to contact me at work, you can either <a rel="foaf:mbox" href="mailto:jo.lambda@example.org">email me</a>, or call <span property="foaf:phone">+1 777 888 9999</span>. </body></html>

<> foaf:name "Jo Lambda"^^rdf:XMLLiteral ; foaf:mbox <mailto:jo.lambda@example.org> ; foaf:phone "+1 777 888 9999"^^rdf:XMLLiteral .

An HTMLDocument with RDF embedded

The triples in ntriple format.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 19

But what about our agents?

A Google for knowledge on the Semantic Web is needed by software agents and programs

SwoogleSwoogle

Swoogle

Swoogle

SwoogleSwoogle

SwoogleSwoogle

Swoogle SwoogleSwoogle

SwoogleSwoogle

SwoogleSwoogle

tell

register

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 20

This talk• Motivation• Semantic web 101• Swoogle Semantic Web search

engine• Use cases and applications• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 21

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 22

•http://swoogle.umbc.edu/•Running since summer 2004•1.4M RDF documents, 250M RDF triples, 10K

ontologies•Semantic Web archive: many dynamic RDF

documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 23

Analysis

Index

Discovery

IR IndexerSearch Services

Semantic Webmetadata

Web Service

Web Server

Candidate URLs

Bounded Web CrawlerGoogle Crawler

SwoogleBot

SWD Indexer

Ranking

document cache

SWD classifier

human machine

html rdf/xml

the WebSemantic Web

Information flow Swoogle‘s web interface

Legends

Swoogle Architecture

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 24

A Hybrid Harvesting Framework

Manual submission

RDF crawlingBounded HTML crawlingMeta crawlingSeeds M Seeds H Seeds R

SwoogleSampleDataset

Inductive learner

the Web

Google API call crawl crawl

true

would

google

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 26

Performance – crawlers’ contribution • High SWD ratio: 42% URLs are confirmed as SWD

• Consistent growth rate: 3000 SWDs per day• RDF crawler: best harvesting method• HTML crawler: best accuracy• Meta crawler: best in detecting websites

0 500000 1000000 1500000

html craw ler

meta craw ler

rdf craw ler

sw oogle2

sw d nsw d failed unpinged

# of documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 27

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 28

Applications and use cases• Supporting Semantic Web developers

– Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors,statistics, etc.

• Searching specialized collections– Spire: aggregating observations and data from biologists– InderenceWeb: searching over and enhancing proofs– SemNews: Text Meaning of news stories

• Supporting SW tools– Triple shop: finding data for SPARQL queries

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 32

Web-scale semantic web data access

agent data access service the Web

ask (“person”)Search vocabulary

ask (“?x rdf:type foaf:Person”)

inform (“foaf:Person”)

Fetch docs

Populate RDF database

Query localRDF database

inform (doc URLs)

Search URIrefs in SW vocabulary

Search URLsin SWD index

Compose query

Index RDF data

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 33

UMBC Triple Shop• Online SPARQL RDF query processing based on HP’s

Joseki with two features• Selectable reasoning level of inference • Automatically finds SWDs for give queries using Swoogle

backend database– Provide dataset creation wizard and server-side dataset

storage– Tag and share saved datasets

SPARQL: a query language for getting information from RDF graphs (dataset)

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 34

UMBC Triple ShopQuerying the Semantic Web is as easy

as shopping(1) Go to http://sparql.cs.umbc.edu/(2) You provide a SPARQL query and constraints on what sources to

use(3) Swoogle finds and suggests documents with relevant data,

producing a dataset(4) You specify the amount of reasoning to do, possibly resulting in an

enhanced dataset(5) We run the query and give you the results(6) You can also download the dataset or save it on the server and give

it tags

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 35

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 36

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 37

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 38

This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 39

Will it Scale? How?Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling

System/date Terms Documents Individuals Triples Bytes

Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109

Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010

2006 1x106 5x107 5x107 5x109 5x1011

2008 5x106 5x109 5x109 5x1011 5x1013

We think Swoogle’s centralized approach can be made to work for the next few years if not longer.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 40

How much reasoning?• SwoogleN (N<=3) does limited reasoning

– It’s expensive– It’s not clear how much should be done

• More reasoning would benefit many use cases– e.g., type hierarchy

• Recognizing specialized metadata– E.g., that ontology A some maps terms from B to C

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 41

Conclusion• The web will contain the world’s knowledge in

forms accessible to people and computers– We need better ways to discover, index, search and

reason over SW knowledge• SW search engines address different tasks than

html search engines– So they require different techniques and APIs

• Swoogle like systems can help create consensus ontologies and foster best practices– Swoogle is for Semantic Web 1.0– Semantic Web 2.0 will make different demands

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 42

http://ebiquity.umbc.edu/Annotated

in OWL

For more information

top related