Top Banner
Cooperative XML (CoXML) Query Answering
53

Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

Cooperative XML (CoXML) Query Answering

Page 2: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

2

Motivation XML has become the standard format for information

representation and data exchange An explosive increase in the amount of XML data

available on the web, e.g., Bills at the Library of Congress IEEE Computer Society’s publication SwissProt – protein sequence databases XMark – online auction data ….

Effective XML search methods are needed!

Page 3: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

3

Challenges

XML schema is usually very complex E.g., the schema for the IEEE Computer Society publication

dataset contains about 170 distinct tags and more than 1000 distinct paths

It is often unrealistic for users to fully understand a schema before asking queries

Exact query answering is inadequate and approximate query answering is more appropriate!

Page 4: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

4

Approach: CoXML

Query

Approximate Answers

Cooperative XML Query Answering

XML Database Engine

XML Documents

Derive approximate answers by relaxing query conditions, i.e., query relaxation

Page 5: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

5

Roadmap Introduction Background CoXML Related Work Conclusion

Page 6: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

6

XML Data Model XML data is often modeled as an ordered labeled tree

Tree nodes: elements Tree edges: element-nesting relationships

1 article

title2 7 body

Search engine spam detection

section8

..a spam detection technique by content

analysis…

author3

name4 title5

XYZ IEEE Fellow

year6

2003

Content

Element

Page 7: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

7

XML Query Model XML queries are often modeled as trees

Structure conditions: a set of query nodes connected by Parent-to-child (‘/’): directly connected Ancestor-to-descendant (‘// ’): connected (either directly or indirectly)

Content conditions: Either value predicates or keyword constraints on query nodes

Example

article

title section

search engine

spam detection

year

2003

Page 8: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

8

XML Query Answer An answer for a query is a set of nodes in a data tree that

satisfies both structure and content conditions Example

1 article

title2 7 body

Search engine spam detection

section8

..a spam detection technique by content

analysis…

author3

name4 title5

XYZ IEEE Fellow

year6

2003

Data Tree

article

title section

search engine

spam detection

year

2003

Query Tree

Page 9: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

9

XML Query Relaxation Types Value relaxation: enlarging a value condition’s search scope

Node relabel: changing the label a node to a similar or a more general label by domain knowledge

article

title year

search engine

2003

section

spam detection

article

title year

search engine

2000-2005

section

spam detection

article

title year

search engine

2003

section

spam detection

document

title year

search engine

2003

section

spam detection

[1] Tree Pattern Relaxation (S. Amer-Yahia, et al., 2000)

Page 10: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

10

XML Query Relaxation Types Edge generalization: relaxing a ‘/’ edge to a ‘//’ edge

Node deletion: dropping a node from a query tree

article

title year

search engine

2003

section

spam detection

article

title year

search engine

2003

section

spam detection

article

title year

search engine

2003

section

spam detection

article

yearsearch engine

2003

section

spam detection

Page 11: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

11

XML Relaxation Properties Definition

Relaxation operation: an application of a relaxation type to a specific query node or edge

Lemma Given a query tree with n applicable relaxation

operations, there are potentially up to 2n relaxed trees

Possible combinations: ...1

n n

n

⎛ ⎞ ⎛ ⎞+ +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

Page 12: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

12

Roadmap Introduction Background CoXML Related Work Conclusion

Page 13: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

13

Challenges Query relaxation is often user-specific

Different users may have different approximate matching specifications for a given query tree

How to provide user-specific approximate query answering?

A query with n relaxation operations has potentially up to 2n relaxed queries How to systematically relax a query?

Query relaxation generates a set of approximate answers How to effectively rank the returned approximate answers?

Page 14: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

14

CoXML System Overview

Relaxation Engine

Ranking Module

Relaxation Index Builder

RLXQueryranked results

XML Documents

CoXML

XML Database Engine

XTAH

results

query exact answers

relaxed query

query

similarity metrics

relaxation language

relaxation indexes

Page 15: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

15

Roadmap Introduction Background CoXML

Relaxation Language Relaxation Indexes Ranking Evaluation Testbed

Related Work Conclusion

Page 16: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

16

Relaxation Language Motivation

Enabling users to specify approximate conditions in queries and to control the approximate matching process

RLXQuery - relaxation-enabled XQuery Extends the standard XML query language (XQuery) with

relaxation constructs & controls, such as ~ : approximate conditions ! : non-relaxable conditions REJECT : unacceptable relaxations AT-LEAST : minimum # of answers to be returned RELAX-ORDER : relaxation orders among multiple conditions USE: allowable relaxation types

Page 17: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

17

RLXQuery ExampleFOR $a in doc (“bib.xml”)//article

WHERE $a/year = ~2003 V-COND-LABEL t1 and

~($a[about(./!title, “search engine”)]/body/section)[about(.,

“spam detection”)] S-COND-LABEL t2

RETURN $a

RELAX-ORDER (t1, t2)

USE (edge generalization, node deletion)

AT-LEAST 20

article

titleyear

search engine

2003

body

section

spam detection

!

t2

t1

Page 18: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

18

Roadmap Introduction Background CoXML

Relaxation Language Relaxation Indexes Ranking Evaluation Testbed

Related Work Conclusion

Page 19: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

19

Relaxation Index Naïve approach

Generate all possible relaxed queries & iteratively select the best relaxed query to derive approximate answers

Exhaustive, but not scalable

Observation Many queries share the same (or similar) tree structures

Our approach: relaxation index Consider the structure of a query tree T as a template Build indexes on the relaxed trees of T Use the index to guide the relaxations of any query with the

same (or similar) tree structure as that of T

Page 20: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

20

Relaxation Index - XTAH XTAH

A hierarchical multi-level labeled cluster of relaxed trees

Building an XTAH Given a query structure template T, generate all possible

relaxed trees Each relaxed trees uses an unique set of relaxation

operations Cluster relaxed trees into groups based on relaxation

operations and distances similar to “suffix-tree” clustering

Page 21: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

21

XTAH Example

article

title body

section

$1

$2 $3

$4

Template structure T

{gen(e$1,$2)} … {gen(e$3, $4)} {del($2)}

node_relabeledge_generalization node_deletion

relax

{gen(e$3, $4), gen(e$1,$3)}

...

article

body

section

T6{gen(e$1, $2), gen(e$3, $4)}

{del($2), del($3)}

……

article

title body

section

T2

T4 article

title body

section

article

title body

section

T3

article

title body

section

T1

article

section

T7

A sample XTAH for the template structure T

gen(e$u, $v) – relaxing the edge between $u and $v

del($u) – deleting the node $u

Page 22: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

22

XTAH Properties Each group consists of a set of relaxed trees obtained by

using similar relaxation operations Efficient location of relaxed trees based on relaxation

operations

The higher level a group, the less relaxed the trees in the group Relaxing queries at different granularities by traversing up

and down the XTAH

Page 23: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

23

XTAH-Guided Query Relaxation Problem

Given a query with relaxation specifications (constructs and controls), how to search an XTAH for relaxed queries that satisfy the specification?

Approach First, prune XTAH groups containing trees that use

unacceptable relaxations as specified in the query This step can be efficiently achieved by utilizing internal node labels

Then, iteratively search the XTAH for the best relaxed query

Page 24: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

24

Query Relaxation Process Example

node_relabel

...

node_deletion

relax

…{gen(e$1,$2)} … {gen(e$3, $4)}

edge_generalization

{gen(e$3, $4), gen(e$1,$3)}

{gen(e$1, $2), gen(e$3, $4)}

article

title body

section

T2

T4 article

title body

section

article

title body

section

T3

article

title body

section

T1

{del($2)}

article

body

section

T6 {del($2), del($3)}

article

section

T7

article

title body

section

$1

$2 $3

$4

The template structure, T

A sample XTAH for the template structure T

article

titleyear

search engine

2003

body

section

spam detection

!

t2t1

Relaxation ControlUSE (edge generalization,

node deletion)AT-LEAST 20

Sample RLXQuery

Page 25: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

25

XTAH-Guided Query Relaxation Problem

Given a query and an XTAH, how to efficiently locate the best relaxation candidate at the leaf level?

Approach: M-tree Assign representatives to internal groups Representatives summarize distance properties of the trees within groups Use representatives to guide the search path to the best relaxation candidate

R0

R1 R2 R3

R5 R8R11

relaxed tree j

[2] M-tree: An efficient access method for similarity search in metric space (P. Ciaccia et. al., VLDB 97)

Page 26: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

26

Roadmap Introduction Background CoXML

Relaxation Language Relaxation Indexes Ranking Evaluation Testbed

Related Work Conclusion

Page 27: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

27

Ranking Ranking criteria

Based on both content and structure similarities between a query and an answer, i.e., a set of data nodes

Approach Content similarity – extended vector space model Structure similarity – tree editing distance with a model for

assigning operation cost Overall relevancy – a ranking model combing both content

and structure similarities

Page 28: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

28

Content Similarity

Term Frequency Inverse Document Frequency

Weighted Term Frequency Inverse Element Frequency

Vector Space Model

Extended Vector Space ModelXML content ranking

Traditional IR ranking

content similarity between a query and an answer (i.e., a set of data nodes)

content similarity between a query and a document

Page 29: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

29

Weighted Term Frequency Terms under different paths of a node weight differently Example

The weighted term frequency for a term t in a node v is:

pi: a path under the node v to a term t;

m: # of different paths under the node v that contain the term t

w1

tf ( , ) w( ) tf( , )m

i ii

v t p p t=

= ∗∑

section

spam detection

8 paragraph

…an approach to detect spam by …

12 reference

Spam detection taxonomy

section5

Spam Detection By Content Analysis

6 title

QueryData

Page 30: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

30

Inverse Element Frequency The more number of XML elements containing a term,

the less disambiguating power the term has E.g., the term “spam” is less disambiguating than the

term “detection” The inverse element frequency for a query term t is

1

2

($ , ) logN

ief u tN

=

$u: a query node whose content condition contains the term t

N1: # of data nodes that match the structure condition related to $u

N2: # of data nodes that match the structure condition related to $u and contain t

Page 31: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

31

Extended Vector Space Model The content similarity between an answer A and a

query Q is

|$ . |

w1 1

cont_sim( , ) tf ( , ) ief($ , )iu contn

i ij i iji j

A Q v t u t= =

= ∗∑ ∑

n: # of nodes in Q

{$u1, …, $un}: the set of query nodes in Q

{v1, …, vn}: the set of data nodes in A, where vi matches $ui (1 ≤ i ≤ n)

|$ui.cont|: the number of terms in the content conditions on the node $ui

tij: a term in the content condition on the query $ui

Page 32: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

32

Structure Distance Function Both XML data and queries are modeled as trees Similarities between trees are often computed by

editing distances, i.e., the cost of the cheapest sequence of editing operations

that transform one tree into the other tree The structure distance between an answer A and a query

Q can be measured as the total cost of relaxation operations used to derive A

1struct_dist( , ) cost( )

k

ii

A Q r=

=∑{r1, …, rk}: the set of relaxation operations used to derive A

cost(ri): the cost for ri (0 ≤ cost(ri) ≤ 1 )

Page 33: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

33

Relaxation Operation Cost Naïve approach

Assign uniform cost to all relaxation operations Simple but ineffective

Our approach Assign an operation cost based on the similarity between

the two nodes being approximated by the operation The closer the two nodes, the less the operation costs

cos ( ) 1 ($ , $ )it r similarity u v= −

ri: a relaxation operation

$u, $v: the two nodes that are being approximated by ri

Page 34: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

34

Nodes Approximated By Relaxation Operations

Relaxation Operation

Nodes being approximated by the operation: ($u, $v)

Example

Node relabel (a node with the old label, a node with the new label)

(article, document)

Node deletion (a child node, the parent node) (section, body)

Edge generalization

(a child node, a descendant node) (article/title, article//title)

article

title body

section

Query tree

document

title body

section

Node Relabel

article

title body

Node deletion

article

title body

section

Edge generalization

T1 T2 T3 T4

Page 35: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

35

overall relevancy

content similarity structure distance

Page 36: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

36

Overall Relevancy Function The overall relevancy of an answer A to a query Q,

sim(A, Q), is a function of cont_sim(A, Q) and struct_dist(A, Q)

Properties sim(A, Q) = cont_sim(A, Q) if struct_dist(A, Q) = 0 sim(A, Q) as cont_sim(, Q) sim(A, Q) as struct_dist(, Q)

Implementation

struct_dist( , )sim( , ) cont_sim( , )A QA Q A Q=α ∗

α is a small constant between 0 and 1

Page 37: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

37

Roadmap Introduction Background CoXML

Relaxation Indexes Relaxation Language Ranking Evaluation Testbed

Related Work Conclusion

Page 38: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

38

Evaluation Studies INEX (Initiative for the evaluation of XML)

Similar to TREC for text retrieval

Document collections Scientific articles from IEEE Computer Society 1995 – 2002 About 500MByte Each article consists of 1500 XML nodes on average

Queries Strict content and structure (SCAS) Vague content and structure (VCAS)

Golden standard Relevance assessment provided by INEX

Page 39: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

39

Evaluation of Content Similarity Datasets: INEX 03 test collection Query sets: 30 SCAS queries Comparisons: 38 submissions in INEX 03

Recall

Prec

isio

n

0.5 10

0.2

0.4

0.6

0.8

1

Avg. Precision 0.3309

Page 40: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

40

Evaluation of the Cost Model Dataset: INEX 05 test collection Query set: 22 simple VCAS queries Evaluation metric: normalized extended cumulative gain (nxCG)

the official evaluation metric used in INEX 05 Given a number i (i1), nxCG@i, similar to precision@i,

measures the relative gain users accumulated up to the rank i E.g., nxCG@10, nxCG@25, nxCG@50, …

Cost Models: UCost: uniform cost for each relaxation operation (Baseline) SCost: our proposed cost model

Page 41: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

41

Retrieval performance improvements with semantic cost model

αCost Model

0.1 0.3 0.5 0.7 0.9

Uniform 0.2584 0.2616 0.2828 0.2894 0.2916

Semantic 0.3319 (+28.44%)

0.3190 (+21.94%)

0.3196 (+13.04%)

0.3068 (+6%)

0.2957 (+4.08%)

struct_dist( , )sim( , ) cont_sim( , )A QA Q A Q= ∗α

Assigning relaxation operation with different cost based on the similarities of the nodes being operated improves retrieval performance!

nxCG@25 and nxCG@50 yield similar results

Query set: all content-and-structure queries in INEX 05

nxCG@10 (α, cost model)

Page 42: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

42

Evaluation of the Cost Model Result

αCost Model

0.1 0.3 0.5 0.7 0.9

UCost 0.2584 0.2616 0.2828 0.2894 0.2916

SCost 0.3319 (+28.44%)

0.3190 (+21.94%)

0.3196 (+13.04%)

0.3068 (+6%)

0.2957 (+4.08%)

struct_dist( , )sim( , ) cont_sim( , )A QA Q A Q= ∗α

Each cell: nxCG@10 for a given pair (α, cost model)

(% of improvement over the baseline)

Utilizing node similarities to distinguish costs of different operations improves retrieval performance!

Similar results are observed using nxCG@25 and nxCG@50

Page 43: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

43

Expressiveness of the Relaxation Language

INEX 05 Topic 267

Expressing Topic 267 using RLXQuery

<inex_topic topic_id="267" query_type="CAS" > <castitle> //article//fm//atl[about(., "digital libraries")] </castitle> <description> Articles containing "digital libraries" in their title. </description> <narrative> I'm interested in articles discussing Digital Libraries as their main subject. Therefore I require that the title of any relevant article mentions "digital library" explicitly. Documents that mention digital libraries only under the bibliography are not relevant, as well as documents that do not have the phrase "digital library" in their title. </narrative></inex_topic>

FOR $a in doc(“inex.xml”)//article

LET $b = $a//fm//!atl REJECT(fm, bb)

WHERE $b[about(., “digital libraries”)]

RETURN $b

Page 44: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

44

Expressing Topic 267 with RLXQuery

Results

FOR $a in doc(“inex.xml”)//article

LET $b = $a//fm//!atl REJECT(fm, bb)

WHERE $b[about(., “digital libraries”)]

RETURN $b

Evaluation Metric

Method

nxCG@10 nxCG@25

No relaxation control 0.1013 0.2365

With relaxation control 1.0 0.8986

Effectiveness of the Relaxation Control

Relaxation control enables the system to provide answers with greater relevancy!

Perfect accuracy

Page 45: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

45

Evaluation of the Ranking Function Dataset: INEX 05 test collection

Query set: 4 official VCAS queries with available relevance assessments

Comparison: top-1 submission in INEX 05

Results Metric

Topic

nxCG@10 nxCG@25

Top-1 CoXML Top-1 CoXML

256 0.4293 0.4248 0.4733 0.5555

264 0.0 0.0069 0.0 0.0033

275 0.7715 0.638 0.589 0.5922

284 0.0 0.1259 0.0 0.1233

Average 0.3002 (+0.4%) 0.2989 0.2656 0.3186 (+20%)The systematic relaxation approach enables our system to derive more approximate answers!

Our ranking function, based on both content and structure relevancy, outperforms other ranking functions using content similarities only!

Page 46: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

46

Roadmap Introduction Background CoXML

Relaxation Indexes – XTAH Relaxation Language – RLXQuery Ranking Evaluation Testbed

Related Work Conclusion

Page 47: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

47

CoXML Testbed

Team Members: Prof. Chu, S. Liu, T. Lee, E. Sung, C. Cardenas, A. Putnam, J. Chen, R. Shahinian

RLXQuery Preprocessor

RLXQuery Parser

Relaxation Manager

DatabaseManager

RankingModule

Relaxation Index Builder

XTAH

XML Database Engine

XML Document

s

XML Document

s

RelaxationController

RLXQuery

Approximate Answers

Page 48: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

48

Relaxation Examples using the Testbed

Page 49: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

49

Relaxation Examples using the Testbed

Page 50: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

50

Roadmap Introduction Background CoXML Related Work Conclusion

Page 51: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

51

Related Work: Query Relaxation Relaxation based on schema conversions ([LC01,

LMC01], [LMC03]) No structure relaxation

Native XML relaxation Propose structure relaxation types [e.g., KS01, ACS02]

We use the relaxation types introduced in [ACS02] Investigate efficient algorithms for deriving top-K answers

based on relaxation types supported [e.g, Sch02, ACS02, ALP04, AKM05]

No relaxation control

Page 52: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

52

Related Work: XML Ranking Content ranking

Most extend ranking models for text retrieval to the XML scenario, e.g., HyRex, XXL, JuruXML, XSearch

We utilize structure to distinguish terms of different weights occurring in different parts of a document

Structure ranking Based on tree editing distance algorithms w/o considering

operation cost [NJ02] Based on the occurrence frequency of the query trees, paths,

or predicates in data [MAK05, AKM05] Our structure ranking is similar to editing distance, but we

consider operation cost

Page 53: Cooperative XML (CoXML) Query Answering. 2 Motivation X ML has become the standard format for information representation and data exchange An explosive.

53

Conclusion Cooperative XML (CoXML) query answering

RLXQuery enables users to effectively express approximate query conditions and to control the approximate matching process

XTAH provides systematic query relaxation guidance

Both content and structure similarity metrics for evaluating the relevancy of approximate answers

Evaluation studies with the INEX test collections demonstrate the effectiveness of our methodology