Top Banner
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph- Shaped (RDF) Data Thanh Tran 1 , Haofen Wang 2 , Sebastian Rudolph 1 , Philipp Cimiano 3 1 Institute AIFB, University Karlsruhe, Germany 2 APEX Lab, Shanghai Jiao Tong University, China 3 Web Information Systems, TU Delft, Netherlands
17

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Aug 29, 2014

Download

Technology

Thanh Tran

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

ICDE paper presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Top-k Exploration of Query Candidates for Efficient Keyword

Search on Graph-Shaped (RDF) Data

Thanh Tran1, Haofen Wang2, Sebastian Rudolph1, Philipp Cimiano3

1Institute AIFB, University Karlsruhe, Germany2APEX Lab, Shanghai Jiao Tong University, China

3Web Information Systems, TU Delft, Netherlands

Page 2: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Motivation

• Semantic search– Access to KB facts and semantically described documents– Support for expressive / precise information need

• How to capture the user’s information need?– Expressive queries with difficult syntax (SQL, SPARQL) vs.

limited but intuitive queries (Keywords)– Expressive power is crucial! – Support the user in specifying information needs in an

intuitive way is also crucial! • Goal: Interpreting Complex Information Needs by

Translating Keywords to Expressive Formal Queries

Page 3: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Related Work

• Translation of NL questions– Can the user specify a precise question when the information

need is vague? • Relaxed-structure query models– Require some knowledge about the query syntax and the

structure of the underlying data• Labeled query models – Require some knowledge about schema elements

• In keyword search, the user does not need to know about the query syntax and data schema– Crucial for environment like the Web where most data

sources to be queried are unknown to the user

Page 4: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Scenario – Interpreting Information Needs

), dD,Q,F,R(q ji

User Information NeedRDF Data Graph

X-Media“„2006 Philipp Cimiano

Query Specification

SELECT ?x , ? y , ? z WHERE {? x type Publication . ? x year 2006 . ? x author ?y . ? y name ’P . Cimiano ’ . ? y worksAt ? z . ? z name ’AIFB’}

Query Translation

Query Processing

Page 5: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Keyword Search – An Overview• Mapping of keywords to ”labels” of data elements

– Result in a set of keyword elements– Through imprecise matching, user even does not need to know the

labels of data elements (c.f. precise matching in [G. Bhalotia et al.])• Data Graph exploration

– Search for substructures (query graph) connecting keyword elements– Query graph vs. answer trees [H. He et al.]– Exploration of query graphs operates on summary of data graph only

• Top-k computation– Search guided by a scoring function to output only the top-k results– Guaranteed top-k vs. approximate top-k V. [V. Kacholia et al.]

• Mapping query graph to conjunctive query • Processing the conjunctive query using standard query engine

Page 6: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Keyword Search – The Workflow

• Offline: Summarization, Scoring, Term Expansion • Online: Query Computation, Query Processing

Page 7: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Graph Summarization

Example RDF GraphSummary Graph

• Goal: preserve sufficient information to compute elements and structure of the query, while reducing the exploration space

• Summary graph captures relations between entity classes, thus preserve structural information of the original data graph

Page 8: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Keyword Mapping & Graph Augmentation

Summary Graph

Keyword Query

„2006

Philipp Cimiano

AIFB“

Augmented Summary Graph

• Summary graph captures information for exploration of query structure• Online augmentation with elements & scores obtained from keyword mapping• Augmented graph contains further information for exploration of query elements

Page 9: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Top-k Graph Exploration • Cost-directed exploration of the graph, starting from keyword elements Nk

• Explore all possible distinct paths starting from nk 2 Nk • At each step, take cursor (“path”) from queues with lowest cost for exploration • When a connecting element nc is found,

• Paths from nk to nc are merged to construct the query graph• Top-k is invoked to add query graph to candidate list

• Top-k terminates when highest cost of the candidate list (the cost of the k-ranked query graph) is found to be lower than the lowest possible cost that can achieved with paths in the queues yet to be explored

Augmented Summary Graph Explored Paths

Page 10: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Mapping Query Graph to Conjunctive Query

Query Graph Conjunctive Query

• Conjunctive query obtained by exhaustive application of mapping rules• Every value vertex vvertex a term

• Every class vertex cvertex a distinct variable

• Every A-edge e(cvertex, vvertex) a query predicate e[var(cvertex), term(vvertex)]

• Every R-edge e(cvertex1, cvertex2) a query predicate e[var(cvertex1), var(cvertex2)]• Treat all query variables as distinguished • Specific mechanisms can be provided for the user to choose distinguished variables• Query chosen by the user finally translated to query formalism supported by the

query engine (SPARQL) for retrieving query answers

Page 11: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Rich Client Demo – xXploreKnow!

http://ontoware.org/projects/xxplore/

Page 12: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Web Demo – Q2Semantic

http://q2semantic.apexlab.org/UI.html

Page 13: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Evaluation – Effectiveness

• 12 users provide 30 keyword queries on DBLP, along with the NL description of the information need

• Reciprocal Rank = 1/r, where r is the rank of the correct query• A query is correct if it matches the information need• Information need can be interpreted in most cases, in particular

when path length, matching score as well as popularity of graph elements are incorporated into scoring function (C3)

MRRs of different Scoring Functions on DBLPQ1 Q3 Q5 Q7 Q9

Q11Q13

Q15Q17

Q19Q21

Q23Q25

Q27Q29

0

0.2

0.4

0.6

0.8

1

C1C2C3

Page 14: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Evaluation – Usability of Query Interpretation

- Standard approaches return top-k results - Our approach based on interpretation of keywords as

queries, i.e. compute top-k queries instead of top-k answer trees [V. Kacholia et al.] [H. He et al.]

- Queries are then transformed to simple natural language and presented to user

- 90% of users prefer to obtain question first, since it facilitates understanding of results

- All user prefers to do refinement on the structured query, rather than on the keywords, since the structured query can be manipulated in a more precise and predictable way

Page 15: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Evaluation – Efficiency• Comparison with bidirectional search [V. Kacholia et al.] and search based on

graph indexing (1000 BFS, 1000 METIS, 300 BFS, 300 METIS in [H. He et al.])• We measure time for query computation + time for processing several

queries until finding 10 answers • Outperforms bidirectional search by at least one order of magnitude• Performs fairly well when compared to indexing based approaches

Query Performance on DBLP Data

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q101

10

100

1000

10000

100000

Our SolutionBidirect1000 BFS1000 METIS300BFS300METIS

Page 16: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Conclusions and Future Work

• Conclusions– A new approach for keyword search on graph-structured

data, RDF in particular– Novel algorithms for the top-k exploration of subgraphs to

compute queries as an additional intermediate step– Query computing is performed on an aggregated graph

while query processing can leverage optimization capability of the database

• Future Work– Indexing connectivity and scores for further speed up– Consider special query operations (e.g. filters) as keywords

Page 17: Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Thank you for your attention!

Q&A