Top Banner
Cooperative Query Answering fo r Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong
28

Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Dec 11, 2015

Download

Documents

Bailey Knop
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Cooperative Query Answering for Semistructured Data

Speakers:

Chuan Lin & Xi Zhang

By Michael Barg and Raymond K. Wong

Page 2: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Outline

Motivations Overview Basic Concepts Cooperative Query Processing Experiment

Page 3: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Motivations

XML data– same semantic content– very different structures

Page 4: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Example: same semantics, diff structures

“insurance claims” related to

“smoking” for “woman”

User Query:

Court Transcript:

insurance claim

plaintiff

woman

smoking

Insurance Record:insurance claim

insurer

woman smoking

Page 5: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Motivations

No exact query result

“phone number” of “Bob”

Who is the new “sales manager”

User Query: personnel

sales manager

Joe

phone number

assistantsales manager

Bob

phone number

salesmansalesman

Data:

Page 6: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Overview

Goal:– Return approximate answers for XML queries– “approximate”: semantic + structural similar

Solution:– Return a set of results– ranked by an overall score

score: indicates how well the subgraph containing the result satisfies the query criteria.

Page 7: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Basic Concepts: Query Tree

Query: /restaurant[.//Soho]/phone_number

Result Term

For each edge:

“head”: the end which is closer to nearest result term“end”: the other end

In case of tie, “head” is the end closer to root

Query Tree:

restaurant

soho phone_numberrh

th

t

Page 8: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Basic Concepts: Converging Order

Order of edges considered in query processing Converge on a result term

Page 9: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Basic Concepts: Similarity

Semantically similar topologies

restaurant

address

soho

restaurant

soho

soho

restaurant

soho

eating_places

restaurant

shopping_center

sohorestaurant

(a) (c) (e)(b) (d)

Page 10: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Basic Concepts: Similarity (cont.)

Deviation Proximity (DP)– Measure how far one structure deviates from a desired stru

cture– Given:

ra: data node with value a

rb: data node with value b Q(a,b): query tree edge

– DP: the actual position of rb to the nearest position, r’b, which satisfies the topological relationship specified by Q(a,b)

Topological relationship: parent-child, ancestor-descendent

Page 11: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Deviation Proximity

restaurant

address

soho

restaurant

soho

soho

eating_places

restaurant

shopping_center

sohorestaurant

0 2 31 3

Q (restaurant, soho) requires parent-child relationship

(soho’) (soho’)

soho

restaurant

(soho’)

(soho’)

(soho’)

DP(restauarent, soho):

Page 12: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Deviation Proximity

0 2 30 3

Q (restaurant, soho) requires anc-desc relationship

restaurant

address

soho

restaurant

soho

soho

eating_places

restaurant

shopping_center

sohorestaurant

(soho’)

(soho’)

soho

restaurant

(soho’)

(soho’)

(soho’)

DP(restauarent, soho):

Page 13: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Cooperative Query Processing

Input: a Query Tree QT, an XML Document Tree DT

Output: ordered list of <rresult_term, score> Cooperative Query Processing

– Structural proximity calculation– Progressive Score

Page 14: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Cooperative Query Processing (cont.)

Progressively matching edges in QT with DT

– Consider edges in converging order– For each edge QT(a,b), where a is head and b is t

ail, get a list of <ra, score> ra is a node in DT with value a

score is the progressive score of ra w.r.t the nearest rb

use graph encoding to calculate structural proximity of ra and rb

Page 15: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Structural Proximity Calculation

Encodings and Compressed Arrays– Compact– Preserve relationship to a larger graph– Facilitate distance calculations

Proximity Searching

Page 16: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Encodings and Compressed Arrays

Basic Concepts:– Common Node– Terminal Node– Annotated Node

Path representation– Representing Single Path– Representing Multiple Paths– Representing Multiple Elements

Compressed Arrays– Each encoding is a path/muti-path for a node/a set of n

odes

Page 17: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Food Gui de Shoppi ng Mal lCi ty Gui de

Restaurant

Address Menu Name

Street SuburbMai ns Mai ns

Somethi ng_fi shy

BBQ Sea FoodSohoBrown St

Soho

Pl ace to eat

Seafood

Restraustant

Category Fastfood

neptunesAppl e_bee

Name Shop

Branch

Soho

Bonza mal l product Restraurant

TypeSeafood

1 2 3

1

1 2 3

1 2 1 21

1 1 1 1

1

1

1 2 3

1

1 2

12

3

11

1 1 2

1 1

1

Encodings and Compressed Arrays

Page 18: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Representing Single Path

1.1.1 y1

Food Gui de Shoppi ng Mal lCi ty Gui de

Restaurant

Address Menu Name

Street SuburbMai ns Mai ns

Somethi ng_fi shy

BBQ Sea FoodSohoBrown St

Soho

Pl ace to eat

Seafood

Restraustant

Category Fastfood

neptunesAppl e_bee

Name Shop

Branch

Soho

Bonza mal l product Restraurant

TypeSeafood

1 2 3

1

1 2 3

1 2 1 21

1 1 1 1

1

1

1 2 3

1

1 2

12

3

11

1 1 2

1 1

1

1.2.1.1.1.1 y2

Page 19: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Representing Multiple Paths

Food Gui de Shoppi ng Mal lCi ty Gui de

Restaurant

Address Menu Name

Street SuburbMai ns Mai ns

Somethi ng_fi shy

BBQ Sea FoodSohoBrown St

Soho

Pl ace to eat

Seafood

Restraustant

Category Fastfood

neptunesAppl e_bee

Name Shop

Branch

Soho

Bonza mal l product Restraurant

TypeSeafood

1 2 3

1

1 2 3

1 2 1 21

1 1 1 1

1

1

1 2 3

1

1 2

12

3

11

1 1 2

1 1

1

A

B

C

1.3 B.B.2.1.1 C.3 C.C.2 y3

Page 20: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Representing Multiple Elements

Food Gui de Shoppi ng Mal lCi ty Gui de

Restaurant

Address Menu Name

Street SuburbMai ns Mai ns

Somethi ng_fi shy

BBQ Sea FoodSohoBrown St

Soho

Pl ace to eat

Seafood

Restraustant

Category Fastfood

neptunesAppl e_bee

Name Shop

Branch

Soho

Bonza mal l product Restraurant

TypeSeafood

1 2 3

1

1 2 3

1 2 1 21

1 1 1 1

1

1

1 2 3

1

1 2

12

3

11

1 1 2

1 1

1

A

B

C

1 A.A.1.1y1.2.1.1.1.1 y2.3 B.B.2.1.1 C.3 C.C.2 y3

Page 21: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Compressed Arrays

0 1 0 0 1 1 1 1 0 1

0 0 1 0 1 1 0 0 1 1

1

1

1

1

1

1

1

1

1

1

0

1

0

1

1

0

0

1

1

1

0

0

0

0

0

1

1

0

B 3 y2 1 111111 A2 Ay1

Ori gi nate f romnode “ A”

Ori gi nate f rom node “ A”Ori gi nate f rom

node “ A”

Boundary Pattern

I denti fi er Pattern

At node y1At node ANode A

Path encodi ng segment

0 1 1 1 0 1

1 1 1 1 0 0

0

0

0

1

1

0

0

1

1

1

0

1

0

0

0

1

1

0

1

0

0

0

1

1

0

0

0

1

0

1

1

0

3 1 1 B2 CC C y3 2

Ori gi nate f rom node “ B”Ori gi nate f romnode “ B”

Ori gi nate f romnode “ C”

Boundary Pattern

I denti fi er Pattern

At node y3At node CNode C

Path encodi ng segment

Page 22: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Drawback of Encoding

1A.A.1B.B.1D.2E.?.2C.C.1F.2G

A

B C

D E F G

1

1 2

1 2 1 2

Page 23: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Proximity Searching

Multi-Element Comparison– Input:

A compressed array, caN, containing the multi-element encoding of the Near Set.

A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF.

– output: dist, the shortest path from EF to the closest element in

Near Set

Page 24: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Proximity Searching

Food Gui de Shoppi ng Mal lCi ty Gui de

Restaurant

Address Menu Name

Street SuburbMai ns Mai ns

Somethi ng_fi shy

BBQ Sea FoodSohoBrown St

Soho

Pl ace to eat

Seafood

Restraustant

Category Fastfood

neptunesAppl e_bee

Name Shop

Branch

Soho

Bonza mal l product Restraurant

TypeSeafood

1 2 3

1

1 2 3

1 2 1 21

1 1 1 1

1

1

1 2 3

1

1 2

12

3

11

1 1 2

1 1

1

x2

x3

x1

A

B

Cy1

y3

y2

MinDist=5 MinDist = 4 MinDist = 2

Page 25: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Progressive Score

Accumulative Deviation Proximity (DP)– Calculated from structural proximity

Boolean operator at Query Tree branches

a

b c

a

b c

prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))

Page 26: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Experiment

Query:

//restaurant/soho

XML:

Query Result:

<soho, 2><soho, 3><soho, 4>

Food Gui de Shoppi ng Mal lCi ty Gui de

Restaurant

Address Menu Name

Street Suburb

Mai ns Mai nsSomethi ng_fi shy

BBQ Sea FoodSohoBrown St

Soho

Pl ace to eat

Seafood

Restraustant

Category Fastfood

neptunes Appl e_bee

Name Shop

Branch

Soho

Bonza mal l productRestraurant

TypeSeafood

Page 27: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Thank you!

Page 28: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Questions & Answers