Top Banner
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1
22

Problems in Semantic Search

Feb 08, 2016

Download

Documents

Mindy

Problems in Semantic Search. Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu. 1. Agenda. Introduction Swoogle Cool things others do Swoogle facts/figures Our ideas References. 2. Why is Semantic Search significant?. 3. Swoogle. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Problems in Semantic Search

Problems in Semantic Search

Krishnamurthy Viswanathan and Varish Mulwad

{krishna3, varish1} AT umbc DOT edu

1

Page 2: Problems in Semantic Search

Agenda• Introduction

• Swoogle

• Cool things others do

• Swoogle facts/figures

• Our ideas

• References 2

Page 3: Problems in Semantic Search

Why is Semantic Search significant?

3

Page 4: Problems in Semantic Search

Swoogle• Swoogle is a search engine for

Semantic Web (SW) documents

• It offers the following services:– Search SW ontologies and documents– Search SW terms, i.e. URIs that have been defined

as classes and properties– Provide metadata of SW documents and support

browsing the Semantic Web

4

Page 5: Problems in Semantic Search

Swoogle• Swoogle supports two relevant query

types:– Ontology: Searches a small collection that consists

only of Semantic Web Ontologies– Document: Searches all SW documents. This

search space is much larger

• Swoogle indexes only the document’s URL, the terms being defined in the document, explicit descriptions about the document, and the namespaces used by the document 5

Page 6: Problems in Semantic Search

Swoogle capabilities• Web search:

– Basic metadata: e.g. url, desc, ns etc.– Document metadata: hasEncoding,

hasLength etc.– RDF metadata: hasGrammar, hasCntTriple

etc.

• Advanced search using Lucene features

• REST based services: Compose an HTTP GET query and retrieve the results in the form of RDF/XML 6

Page 7: Problems in Semantic Search

Examples of REST queries• A query is represented as a URL:

– REST_QUERY ::= SERVICE_URI ? PARAMS

• Example: search SW documents which are classified as ontologies (ontoRatio > 0)– queryType: e.g. search_swd_ontology– searchString: user constructed (see

manual)– Key

http://logos.cs.umbc.edu:8080/swoogle31/q?queryType=search_swd_ontology&searchString=person&key=demo

7

Page 8: Problems in Semantic Search

Cool things other semantic search engines do …

8

Page 9: Problems in Semantic Search

Sindice• Sindice is a Semantic Web search

engine created at Digitial Enterprise Research Institute (DERI)

• Interesting things to note about Sindice – – Architecture– Indexing

9

Page 10: Problems in Semantic Search

Sindice• Sindice uses the paradigms of cloud

computing for their architecture

• Sindice uses Hadoop / Nutch to distribute crawling across multiple machines

• Collected data is stored in a HBase – a distributed column store

10

Page 11: Problems in Semantic Search

Sindice• Sindice indexes based on –

– Inverse Functional Properties (IFP) – URI’s – Literals (Keywords)

IFP – An OWL cardinality restriction

• Benefits – Faster Retrieval

11

Page 12: Problems in Semantic Search

Watson – A gateway to the Semantic Web• From the Knowledge Management

Institute at the Open University in UK

• Interesting things to note about Watson – – Consider implicit semantic relationships– Quality of Semantic documents– “Rich access” to semantic data

12

Page 13: Problems in Semantic Search

Watson• Implicit relationships between

semantic web documents – Equivalence (Duplicate detection)

• Quality of Semantic Documents

• “Richer” access to Semantic Data– Web Interface for Humans– SparQL end point– Java/SOAP and REST APIs

13

Page 14: Problems in Semantic Search

Others• Semantic Web Search Engine (SWSE)

– Pipelined architecture for crawling and indexing– Improved index and storage structure

• Falcons– Class subsumption reasoning– Includes a Triple Store

14

Page 15: Problems in Semantic Search

Power Aqua• Multi-ontology based QA system powered

by PowerMap and Watson• Takes inputs in the form of NL queries• Factual queries that can be expressed as

one or more linguistic triples• Common wh-questions

15

Page 16: Problems in Semantic Search

Power Aqua• Key challenges in order to be able to

answer NL-questions:– Locating the ontologies relevant to a

particular query– Identifying semantically sound

relationships– Combining information from multiple

queries

16

Page 17: Problems in Semantic Search

Swoogle facts/figures• The search engine components

currently run on 4 machines• These machines host the crawler, the

Lucene index, the MySQL database etc. and access the NFS

• Approximately 20,000 pages are accessed by Swoogle everyday (which get queued)

• About 1,731,371 pure SW documents have been discovered

17

Page 18: Problems in Semantic Search

Swoogle facts/figures• Swoogle crawler has a large queue of

documents to be crawled and indexed• Swoogle accesses metadata and index

files over the NFS that makes information retrieval slower

18

Page 19: Problems in Semantic Search

Our Ideas: Research and Engineering• Acquire new hardware

• Parallelize Swoogle

• Focus on a particular domain

• Project Swoogle as a search engines for agents

19

Page 20: Problems in Semantic Search

Our Ideas: Research and Engineering• Improve Swoogle’s indexing scheme

• Analyze Swoogle’s ranking scheme

• Use of Swoogle Metadata

• Improve the usability of the website

• Google like Services

20

Page 21: Problems in Semantic Search

References• Li Ding et al., "Swoogle: A Search and Metadata Engine for the Semantic

Web", Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, November 2004.

• P. Mika, G. Tummarello “Web Semantics in the Clouds”, IEEE Intelligent Systems, Volume 23 , Issue 5 (September 2008)

• E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, G.Tummarello “Sindice.com: A document-oriented lookup index for open

linked data.” In International Journal of Metadata, Semantics and Ontologies, 3(1), 2008.

• Mathieu d’Aquin et al., “Watson: A Gateway for the Semantic Web” ,Poster session of the European Semantic Web Conference, ESWC 2007

• Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu , “Searching Semantic Web Objects Based on Class Hierarchies” In WWW 2008 Workshop on Linked Data on the Web, 2008

21

Page 22: Problems in Semantic Search

Questions ?

22