UMBC UMBC an Honors University in an Honors University in Maryland Maryland 1 Search Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
38
Embed
PowerPoint Presentationebiquity.umbc.edu/_file_directory_/resources/178.ppt · PPT file · Web viewSearch Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 1
Search Engines for Semantic Web
KnowledgeTim Finin
University of Maryland, Baltimore County
Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott
Cost and Vishal Doshi
http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF
grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 2
This talk• Motivation• Semantic web 101• Swoogle Semantic Web
search engine• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 3
Once there were only a
few large computers
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 4
Then there were many,
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 5
All connected 24x7,Internet
Cellular telephonyIRDA802.11
BluetoothUltra Wide Band
RFIDand more to come
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 6
Interoperating;tcp/ip ftp smtp rpc corba ssh
http html xml
gif jpg mpg mp3pdf …
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 7
Access to the world’s knowledge
del.icio.us
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 8
Google has made us smarter
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 9
But what about our agents?
tell
register
Agents still have a very minimal understanding of text and images.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 10
This talk• Motivation• Semantic web 101• Swoogle Semantic Web search
engine• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 11
XML helps
“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”
-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 12
“The Semantic Web will globalize KR, just as the WWW globalize hypertext”
-- Tim Berners-Lee
Semantic Web adds semantics
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 13
Semantic Web 101<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:uni=http//ebiquity.umbc.edu/ontologies/uni/>
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 14
Where’s the semantics?• URIs as “rigid designators”• Conventions for URIs denoting things in the “real
world”• Namespaces and URIs provide an unambiguous shared
vocabulary• RDF, RDFS and OWL have semantics defined using
model theory and also axioms• Ontologies allow agents to draw inferences
– uni:Student is a subclass of foaf:Person– Every uni:Student has at least one uni:school, which must be
an instance of uni:School– A foaf:Person with a uni:school is necessarily a uni:Student
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 15
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 16
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 17
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 18
RDF/a RDF/a is a W3C proposal for embedding RDF in XHTML documents
<html xmlns:foaf="http://xmlns.com/foaf/0.1/"> <head><title>Jo Lambda's Home Page</title></head> <body> Hello. This is <span property="foaf:name">Jo Lambda</span>'s home page. <h2>Work</h2> If you want to contact me at work, you can either <a rel="foaf:mbox" href="mailto:[email protected]">email me</a>, or call <span property="foaf:phone">+1 777 888 9999</span>. </body></html>
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 26
Performance – crawlers’ contribution • High SWD ratio: 42% URLs are confirmed as SWD
• Consistent growth rate: 3000 SWDs per day• RDF crawler: best harvesting method• HTML crawler: best accuracy• Meta crawler: best in detecting websites
0 500000 1000000 1500000
html craw ler
meta craw ler
rdf craw ler
sw oogle2
sw d nsw d failed unpinged
# of documents
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 27
This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 28
Applications and use cases• Supporting Semantic Web developers
– Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors,statistics, etc.
• Searching specialized collections– Spire: aggregating observations and data from biologists– InderenceWeb: searching over and enhancing proofs– SemNews: Text Meaning of news stories
• Supporting SW tools– Triple shop: finding data for SPARQL queries
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 32
Web-scale semantic web data access
agent data access service the Web
ask (“person”)Search vocabulary
ask (“?x rdf:type foaf:Person”)
inform (“foaf:Person”)
Fetch docs
Populate RDF database
Query localRDF database
inform (doc URLs)
Search URIrefs in SW vocabulary
Search URLsin SWD index
Compose query
Index RDF data
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 33
UMBC Triple Shop• Online SPARQL RDF query processing based on HP’s
Joseki with two features• Selectable reasoning level of inference • Automatically finds SWDs for give queries using Swoogle
backend database– Provide dataset creation wizard and server-side dataset
storage– Tag and share saved datasets
SPARQL: a query language for getting information from RDF graphs (dataset)
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 34
UMBC Triple ShopQuerying the Semantic Web is as easy
as shopping(1) Go to http://sparql.cs.umbc.edu/(2) You provide a SPARQL query and constraints on what sources to
use(3) Swoogle finds and suggests documents with relevant data,
producing a dataset(4) You specify the amount of reasoning to do, possibly resulting in an
enhanced dataset(5) We run the query and give you the results(6) You can also download the dataset or save it on the server and give
it tags
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 35
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 36
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 37
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 38
This talk• Motivation• Swoogle overview• Bots navigate the Semantic Web• Ranking Semantic Web content• Use cases and applications• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 39
Will it Scale? How?Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling
We think Swoogle’s centralized approach can be made to work for the next few years if not longer.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 40
How much reasoning?• SwoogleN (N<=3) does limited reasoning
– It’s expensive– It’s not clear how much should be done
• More reasoning would benefit many use cases– e.g., type hierarchy
• Recognizing specialized metadata– E.g., that ontology A some maps terms from B to C
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 41
Conclusion• The web will contain the world’s knowledge in
forms accessible to people and computers– We need better ways to discover, index, search and
reason over SW knowledge• SW search engines address different tasks than
html search engines– So they require different techniques and APIs
• Swoogle like systems can help create consensus ontologies and foster best practices– Swoogle is for Semantic Web 1.0– Semantic Web 2.0 will make different demands
UMBCUMBCan Honors University in an Honors University in