Top Banner
Reasoning and Identifying Relevant Reasoning and Identifying Relevant Matches for XML Keyword Search Matches for XML Keyword Search Ziyang Liu, Yi Chen Yi Chen Arizona State University
25

Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Reasoning and Identifying Relevant Reasoning and Identifying Relevant Matches for XML Keyword SearchMatches for XML Keyword Search

Ziyang Liu, Yi ChenYi ChenArizona State University

Page 2: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality position

Brown USA

name

forward

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

MotivationMotivation

Identifying relevant matches is a critical step of processing XML search.

Query: “Gasol, position”

relevant matches

irrelevant matches

Page 3: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

How to Evaluate Various How to Evaluate Various Strategies?Strategies?

Existing approaches for identifying relevant matches:XKSearch (SLCA) [Xu and Papakonstantinou 2005]

XRank [Guo et al. 2003]

XSEarch [Cohen et al. 2003] Star-semantics All-semantics

Schema-free XQuery (MLCA) [Li et al. 2004]

CVLCA [Li et al. 2007]

Page 4: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

How to Evaluate Various How to Evaluate Various Strategies?Strategies?

The traditional approach Obtain ground truth of query results by user studies on a large number of

documents and queries. Measure the precision and recall of a strategy wrt ground truth Costly

An axiomatic approach Formalize broad intuitions as a collection of simple axioms and evaluate

strategies based on the axioms. It has been successful in many areas, e.g. mathematical economics,

clustering, location theory, collaborative filtering, etc Cost-effective

Problem: Is it possible to evaluate and reason about XML keyword search strategies in a formal axiomatic framework?

Page 5: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

RoadmapRoadmap

Motivation and Problem Definition

Challenges and Contributions

Four properties that an XML search engine should satisfy Query Monotonicity/Consistency Data Monotonicity/Consistency

MaxMatch: the first system that satisfies all four properties

Experimental Evaluation

Conclusions

Page 6: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality position

Brown USA

name

forward

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

ChallengeChallengeIt is easy for an individual to assess the relevance of matches

But it is extremely difficult to formalize the relevance assessment, independently of any query, data, algorithm, and user

Query: “Gasol, position”

relevant matches

irrelevant matches

Page 7: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Example: Similar QueriesExample: Similar QueriesInterestingly, we discovered that some abnormal behaviors can be clearly observed when examining results of two similar queries or one query on two similar documents produced by the same search engine.

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality position

Brown USA

name

forward

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

Q1: “Gasol, position”Q2: “Grizzlies, Gasol, position”

These two “position” nodes should still be irrelevant.

Page 8: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Example: Similar DataExample: Similar Data

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality

Brown USA

name

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

Q: “Grizzlies, Gasol, Brown, position”

position

forward

An empty result after data insertion is abnormal.

How to capture the logical connection between query results?

Page 9: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Contributions of This WorkContributions of This WorkThe first work that formally reasoned about keyword search in an axiomatic framework

We identified four desirable properties that an XML search engine should satisfy.Data/Query Monotonicity capture the desirable changes to

the number of query resultsData/Query Consistency capture the desirable changes to the

content of a query result

We reasoned about existing XML keyword search strategies.

We proposed MaxMatch - the only XML keyword search strategy that possess all properties.

Experiments verified our intuition and demonstrated the effectiveness and efficiency of MaxMatch.

Page 10: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

RoadmapRoadmap

Motivation and Problem Definition

Challenges and Contributions

Four properties that an XML search engine should satisfy Query Monotonicity/Consistency Data Monotonicity/Consistency

MaxMatch: the first system that satisfies all four properties

Experimental Evaluation

Conclusions

Page 11: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Properties wrt Similar Properties wrt Similar QueriesQueries

Query Monotonicity When we add a keyword to the query, the query becomes more

restrictive, therefore the number of query results should not increase.

Query Consistency When we add a new keyword to the query, each delta subtree

that newly becomes (part of) a query result should contain the new keyword.

Page 12: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality position

Brown USA

name

forward

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

Example: Query Monotonicity/ConsistencyExample: Query Monotonicity/Consistency

Q1: “forward, name”Q2: “forward, USA, name”

New Keyword

Monotonicity: the number of query results reduces from 2 to 1.

Consistency: in each result, the delta sub-tree (if exists) contains “USA”.

Page 13: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Example Revisited: Violation of Query Example Revisited: Violation of Query ConsistencyConsistency

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality position

Brown USA

name

forward

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

Q1: “Gasol, position”

An XML keyword search engine that considers these nodes as relevant for the new query violates query consistency .

Q2: “Grizzlies, Gasol, position”

Page 14: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Properties wrt Similar DataProperties wrt Similar Data

Data Monotonicity When we add a node to the data, the data content becomes

richer, and the number of query results should not decrease.

Data Consistency After we add a node to the data, each delta subtree that

becomes (part of) a query result should contain the newly inserted node.

Page 15: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Example: Data Example: Data Monotonicity/ConsistencyMonotonicity/Consistency

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality

Brown USA

name

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

Q: “forward, name”

position

forward

New Match

Monotonicity: the number of query results increases from 1 to 2.

Consistency: in each result, the delta sub-tree (if exists) contains the new data node.

Page 16: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Example Revisited: Violation of Data Example Revisited: Violation of Data MonotonicityMonotonicity

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality

Brown USA

name

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

Q: “Grizzlies, Gasol, Brown, position”

position

forward

An XML keyword search engine that outputs an empty result on the updated data violates data monotonicity.

Page 17: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

The Proposed Axiomatic The Proposed Axiomatic FrameworkFramework

Four desirable properties Query Monotonicity Query Consistency Data Monotonicity Data Consistency

These properties are: Non-trivial

No prior XML keyword system satisfies all of them.

Non-redundant An algorithm may violate any one of them while satisfying others.

Satisfiable We propose a novel technique – MaxMatch - that satisfies all four

properties.

Page 18: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

RoadmapRoadmap

Motivation and Problem Definition

Challenges and Contributions

Four properties that an XML search engine should satisfy Query Monotonicity/Consistency Data Monotonicity/Consistency

MaxMatch: the first system that satisfies all four properties

Experimental Evaluation

Conclusions

Page 19: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

MaxMatchMaxMatch

MaxMatch’s name comes from “Maximal Match”

MaxMatch preserves each subtree whose set of descendant keyword matches is “Maximal” among its siblings. Intuitively, the subtrees that are removed are strictly less

relevant to the query since fewer keywords are contained.

Page 20: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

MaxMatchMaxMatch

team

name

Grizzlies

players

player

name position

Spain forward

player

nationality position

Miller USA guard

name

player

nationality position

Brown USA

name

forward

league

name

NBA

founded

1946

division

southwest

arena

FedExForum

founded

1995

Gasol

team team… …

nationality

Q: Grizzlies, Gasol, Brown, position

Not as informative as its siblings: discarded

MaxMatch satisfies all four properties.

Proof details and algorithms can be found in the paper.

Page 21: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

RoadmapRoadmap

Motivation and Problem Definition

Challenges and Contributions

Four properties that an XML search engine should satisfy Query Monotonicity/Consistency Data Monotonicity/Consistency

MaxMatch: the first system that satisfies all four properties

Experimental Evaluation

Conclusions

Page 22: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Search QualitySearch QualityData set: Baseball, Mondial

Query set: 36 queries in total

Ground truth: obtained by user study.

User perception of search results on query pairs and document pairs confirms our intuition of the proposed properties

F-measure of MaxMatch vs. Existing Approaches

Page 23: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

Processing TimeProcessing Time

Mondial Data (515KB) Baseball Data (1014KB)

Page 24: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

VLDB 2008, Auckland, New Zealand

ConclusionsConclusions

This is the first work on reasoning about and evaluating XML keyword search strategies using a formal axiomatic framework.

Four intuitive and elegant properties are proposed: query monotonicity/consistency, data monotonicity/consistency.

We designed and developed MaxMatch - the only XML keyword search strategy that satisfies all properties.

Experiments verified the intuition of the properties and the effectiveness and efficiency of MaxMatch.

MaxMatch is incorporated as part of XSeek [Liu & Chen Sigmod 07]

Page 25: Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Thank You!Thank You!

Questions?Questions?

Welcome to try MaxMatch Welcome to try MaxMatch at: xseek.asu.eduat: xseek.asu.edu