Top Banner
@ Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004 Swoogle search and metadata for the semantic web Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.
31

Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Jan 20, 2016

Download

Documents

Akanladi Elijah

S w o o g l e. search and metadata for the semantic web. Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004. Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649. Outline. Motivation Concepts Demo - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

@

Presented by eBiquity group, UMBC

CIKM’04, Nov 12, 2004

SwoogleSwooglesearch and metadata for the semantic web

Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.

Page 2: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

2

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Outline Motivation Concepts Demo Architecture

document discovery metadata creation ontology rank

Status Summary

http://swoogle.umbc.edu/

Page 3: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

3

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Motivation

(Google + Web) has made us all smarter something similar is needed by people and software

agents for information on the semantic web

Page 4: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

4

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Motivation – Common Questions Find an ontology

What are the ontologies about “time” ? Shall I use an existing ontology or create one?

Find instance data Show me the instances of a class “http://foo.com/Person”? Gather relevant information for my application.

Characterize the Semantic Web How many RDF documents are online? What are the most popular ontologies ? What graph properties does the semantic web have? Does namespace URI link to the corresponding ontology?

Page 5: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

5

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

The Role of Swoogle in Semantic Web

Semantic WebServices

Data Service

Software Agents, Applications

SW data service

database(Web) document

RDF document

usesuses

Directory/Digest Service

Service Finder

digestsdigests

searches

Data Finder Swoogle

Page 6: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

6

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Related work Ontology based annotation & search

Annotate web documents SHOE (UMCP, 1997) Ontobroker (AIFB, karlsruhe, 1998), WebKB (Martin & Eklund, 1999), QuizRDF (BT,2002)

Annotate proper reference & relations CREAM (AIFB,2003)

Ontology repositories Ontology level

DAML Ontology Library Schema Web SemWebCentral

Term level W3C’s Ontaria (2004)

Ontology management systems Stanford’s Ontolingua IBM’s Snobase

Based on both ontology and instance document

Automated discovery

Search and rank ontologies and terms

Digest but not store

Create metadata based on RDF and OWL semantics

Provide services to both human and software agents

Swoogle aims to be a Google-like online ontology repository

Page 7: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

7

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Concepts Document

A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL).

An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic.

An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic.

Term A term is a non-anonymous RDF resource which is the URI reference of

either a class or a property.

Individual An individual refers to a non-anonymous RDF resource which is the URI

reference of a class member.

In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple.

*JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htm

rdf:typerdfs:Class

foaf:Person

rdf:typefoaf:Person

http://.../foaf.rdf#finin

Page 8: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

8

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Concepts Example

wordNet:Agent

rdf:typerdfs:Class

rdfs:subClassOf

foaf:Person

http://xmlns.com/foaf/1.0/

foaf:mbox

rdfs:domain

rdf:typerdf:Property

Property

Class

SWO

http://foo.com/foaf.rdf#finin

foaf:mbox

rdf:type

[email protected]

foaf:Person

http://foo.com/foaf.rdf#finin

SWI

Individual

SWD

Term

NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows

rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: => http://www.w3.org/2000/01/rdf-schema foaf: => http://xmlns.com/foaf/1.0/wordNet: => http://xmlns.com/wordnet/1.6/

Page 9: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

9

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Demo

Find “Time” Ontology(Swoogle Search)1

2

3

4

Digest “Time” Ontology• Document view• Term view

Find Term “Person”(Ontology Dictionary)

Digest Term “Person”• Class properties• (Instance) properties

5 Swoogle Statistics

Page 10: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Find “Time” Ontology

We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.

Demo1

Page 11: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

11

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Usage of Terms in SWD

foaf:mbox

rdf:type

[email protected]

foaf:Person

http://www.cs.umbc.edu/~finin/foaf.rdf

wordNet:Agent

rdf:typerdfs:Class

rdfs:subClassOf

foaf:Person

http://xmlns.com/foaf/1.0/

foaf:mbox

rdfs:domain

rdf:typerdf:Property

populated Class

defined Class

populated Property

defined Property

http://foo.com/foaf.rdf#finin

foaf:mbox

rdf:type

[email protected]

foaf:Person

http://foo.com/foaf.rdf

defined Individual

Page 12: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Digest “Time” Ontology (term view)

Demo2(a)

………….

TimeZone

before

intAfter

Page 13: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

13

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Document Metadata Web document metadata

When/how discovered/fetched Suffix of URL Last modified time Document size

SWD metadata Language features

OWL species RDF encoding

Statistical features Defined/used terms Declared/used namespaces Ontology Ratio

Ontology Rank

Ontology annotation Label Version Comment

Related Relational Metadata Links to other SWDs

Imported SWDs Referenced SWDs Extended SWDs Prior version

Links to terms Classes/Properties

defined/used

Page 14: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Digest “Time” Ontology (document view)

Demo2(b)

Page 15: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Find Term “Person”Demo3

Not capitalized! URIref is case sensitive!

Page 16: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

16

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Term Metadata: An integrated definition

Class Definition• rdfs:subClassOf -- foaf:Agent• rdfs:label – “Person”

Properties (from SWI)• foaf:name• dc:title

Properties (from SWO)• foaf:mbox• foaf:name

foaf:name

foaf:mbox

rdfs:domain

rdfs:domain

Onto 1

owl:Classrdf:type

“Person”rdfs:label

foaf:Agent

rdfs:subClassOf

Onto 2

foaf:name

rdf:type

“Tim Finin”

SWD3

foaf:Person

Page 17: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Digest Term “Person”Demo4

167 different properties

562 different properties

Page 18: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Demo5 Swoogle

Statistics

Page 19: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

19

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Swoogle Architecture

metadata creation

data analysis

interface

SWD discovery

SWD MetadataWeb Service

Web Server

SWD Cache

The Web

The WebCandidate

URLs Web Crawler

SWD Reader

IR analyzer SWD analyzer

Agent Service

Page 20: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

20

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

1. SWD Discovery Swoogle uses three crawlers to discover likely SWD

URLs A Google Crawler uses Google to find URLs using

keywords: http://www.w3.org/2000/01/rdf-schema,... File type suffices: .rdf, .owl

A Focused Crawler crawls through HTML files recursively within the given website.

A SWD Crawler crawls through SWDs and discover URLs according to term semantics.

To determine the likely SWD URLs: Non-swd extension filter: .jpg, .mp3, and etc. Protocol filter: file://, urn:, and etc. Namespace of RDF resources in SWD

Page 21: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

21

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

2. Metadata Creation Document metadata

General metadata SWD metadata Ontology metadata

Term Metadata (definition) Class property (Instance) property: i.e. class-property bond

Relational metadata

Term Document

Term rdfs:subClassOf, rdfs:domain… rdfs:seeAlso, …

Document Uses, Defines,… owl:imports,…

Page 22: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

22

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

2.1 Ontology Ratio Why?

The fuzzy distinction between ontology and instance document Given a SWD foo, and let

C(foo): the set of classes defined in foo P(foo): the set of properties defined in foo I(foo): the set of instances defined in foo

Ontology Ratio as a heuristic to do the classification 0: pure SWI 1: pure SWO > 0.8: foo is said to be an ontology.

)()()(

)()()(ontology fooIfooPfooC

fooPfooCfooRatio

Page 23: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

23

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

2.2 Relational Metadata Inter-document relation

rdfs:seeAlso IMport (IM) e.g. owl:import Similar/Equal SWD

Inter-term relation EXtension (EX) e.g. rdfs:subClassOf use-TerM (TM) e.g. rdf:range use-INdividual (IN) e.g. owl:sameAs Prior Version (PV, IPV, CPV)

Generalized inter-document relations Generalized from individual level relation Capture more relations while with less complexity

Usage Link SWDs Ontology rank

Page 24: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

24

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

SWOs

SWIs

HTMLdocuments

Images

Audiofiles

Videofiles

3. Data analysis: Ranking SWD Why?

Ranking captures page importance and popularity

Ranking has been proven useful in HTML search.

SWD is different from HTML and has more semantics

So, a new SWD ranking mechanism is needed !

Related ideas? Google’s PageRank Kleinberg’s HITS

Page 25: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

25

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

3.1 Random surfer model (PageRank) How PageRank is computed?

page A’s rank is

Where {Ti } are the pages that link to A C(X): # of page X’s out links d is a damping factor (e.g., 0.85)

Compute by iterating until converge

Uniform probability of following any link is convention in the Web but not in the SW Links have semantics that influence the

probability of following them Rational users read an ontology and all

ontologies it referenced.

Jump to a random page

Follow arandom link

bored?

no

yes

read page

n

i i

i

TC

TPddAP

1

1

Page 26: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

26

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

3.2 Rational Random Surfer Model Weighted random behavior

Rational behavior Rank of a SWI

Rank of a a SWO

Jump to a random page

Follow arandom link

bored?

noyes

read page

Read referenced

SWOs

SWO?

yesno

m

j

AXilinksi

n

i

AjXifXiflow

lweightAXiflow

Xiflow

AXiflowXirawPRddArawPR

1

),(

1

),()(

)(),(

)(

),()()1()(

)()( ArawPRAPR

)(

)()(ATCXi

XirawPRAPR

where TC(A) is transitive closure of SWOs referencing A.

1

2

1

2

Page 27: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

27

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

3.3 Ontology Rank Example

foaf:mbox

rdf:type

[email protected]

foaf:Person

http://www.cs.umbc.edu/~finin/foaf.rdfwordNet:Person

rdf:type rdfs:Class

rdfs:subClassOf

foaf:Person

http://xmlns.com/foaf/1.0/

TM

TM

TM

http://www.w3.org/2000/01/rdf-schema

rdfs:subClassOf

rdf:Property

rdf:type

http://xmlns.com/wordnet/1.6/

rdfs:Classrdf:type

wordNet:Individualrdfs:subClassOf

wordNet:Person

EX

Page 28: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

28

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

3.3 Ontology Rank Example (cont’d)

http://www.cs.umbc.edu/~finin/foaf.rdf

http://xmlns.com/wordnet/1.6/

http://xmlns.com/foaf/1.0/

EX

TM

TM

TM

http://www.w3.org/2000/01/rdf-schema

rawPR =0.2rawPR =0.2

rawPR =100rawPR =100

rawPR =3rawPR =3

rawPR =300rawPR =300

PR =0.2PR =0.2

PR =100PR =100

PR =103PR =103

PR =403PR =403

Page 29: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

29

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Current Status Swoogle Watch reported (Nov 7, 2004)

40 M triples 270 K SWDs: 4k ontologies 144 K terms: 91K classes & 51K properties

Ongoing work Ontology Dictionary Swoogle Statistics Web Service interface (see Swoogle website) IR with the Semantic Web (Content search)

Character N-Grams Bag of URIrefs Swangling

Page 30: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

30

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

Summary

Swoogle (Mar, 2004)Swoogle (Mar, 2004)

Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)

Swoogle3Swoogle3

Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface

Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search

Better crawl & refresh strategies More metadata (ontology mapping) More IR features Better web service interfaces Capture and store all triples More reasoning

2005

2004

Page 31: Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004

Swoogle, cikm'04 -- http://swoogle.umbc.edu/

31

@SwoogleSwoogle

Concepts SummaryStatusArchitectureDemoMotivation

The End

Website: http://swoogle.umbc.edu Slides at: http://ebiquity.umbc.edu/v2.1/resource/html/id/66/ Demo: http://ebiquity.umbc.edu/v2.1/resource/html/id/65/