Cooperative Query Answering fo r Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong
Dec 11, 2015
Cooperative Query Answering for Semistructured Data
Speakers:
Chuan Lin & Xi Zhang
By Michael Barg and Raymond K. Wong
Outline
Motivations Overview Basic Concepts Cooperative Query Processing Experiment
Motivations
XML data– same semantic content– very different structures
Example: same semantics, diff structures
“insurance claims” related to
“smoking” for “woman”
User Query:
Court Transcript:
insurance claim
plaintiff
woman
smoking
Insurance Record:insurance claim
insurer
woman smoking
Motivations
No exact query result
“phone number” of “Bob”
Who is the new “sales manager”
User Query: personnel
sales manager
Joe
phone number
assistantsales manager
Bob
phone number
salesmansalesman
Data:
Overview
Goal:– Return approximate answers for XML queries– “approximate”: semantic + structural similar
Solution:– Return a set of results– ranked by an overall score
score: indicates how well the subgraph containing the result satisfies the query criteria.
Basic Concepts: Query Tree
Query: /restaurant[.//Soho]/phone_number
Result Term
For each edge:
“head”: the end which is closer to nearest result term“end”: the other end
In case of tie, “head” is the end closer to root
Query Tree:
restaurant
soho phone_numberrh
th
t
Basic Concepts: Converging Order
Order of edges considered in query processing Converge on a result term
Basic Concepts: Similarity
Semantically similar topologies
restaurant
address
soho
restaurant
soho
soho
restaurant
soho
eating_places
restaurant
shopping_center
sohorestaurant
(a) (c) (e)(b) (d)
Basic Concepts: Similarity (cont.)
Deviation Proximity (DP)– Measure how far one structure deviates from a desired stru
cture– Given:
ra: data node with value a
rb: data node with value b Q(a,b): query tree edge
– DP: the actual position of rb to the nearest position, r’b, which satisfies the topological relationship specified by Q(a,b)
Topological relationship: parent-child, ancestor-descendent
Deviation Proximity
restaurant
address
soho
restaurant
soho
soho
eating_places
restaurant
shopping_center
sohorestaurant
0 2 31 3
Q (restaurant, soho) requires parent-child relationship
(soho’) (soho’)
soho
restaurant
(soho’)
(soho’)
(soho’)
DP(restauarent, soho):
Deviation Proximity
0 2 30 3
Q (restaurant, soho) requires anc-desc relationship
restaurant
address
soho
restaurant
soho
soho
eating_places
restaurant
shopping_center
sohorestaurant
(soho’)
(soho’)
soho
restaurant
(soho’)
(soho’)
(soho’)
DP(restauarent, soho):
Cooperative Query Processing
Input: a Query Tree QT, an XML Document Tree DT
Output: ordered list of <rresult_term, score> Cooperative Query Processing
– Structural proximity calculation– Progressive Score
Cooperative Query Processing (cont.)
Progressively matching edges in QT with DT
– Consider edges in converging order– For each edge QT(a,b), where a is head and b is t
ail, get a list of <ra, score> ra is a node in DT with value a
score is the progressive score of ra w.r.t the nearest rb
use graph encoding to calculate structural proximity of ra and rb
Structural Proximity Calculation
Encodings and Compressed Arrays– Compact– Preserve relationship to a larger graph– Facilitate distance calculations
Proximity Searching
Encodings and Compressed Arrays
Basic Concepts:– Common Node– Terminal Node– Annotated Node
Path representation– Representing Single Path– Representing Multiple Paths– Representing Multiple Elements
Compressed Arrays– Each encoding is a path/muti-path for a node/a set of n
odes
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
Encodings and Compressed Arrays
Representing Single Path
1.1.1 y1
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
1.2.1.1.1.1 y2
Representing Multiple Paths
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
A
B
C
1.3 B.B.2.1.1 C.3 C.C.2 y3
Representing Multiple Elements
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
A
B
C
1 A.A.1.1y1.2.1.1.1.1 y2.3 B.B.2.1.1 C.3 C.C.2 y3
Compressed Arrays
0 1 0 0 1 1 1 1 0 1
0 0 1 0 1 1 0 0 1 1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
1
0
0
1
1
1
0
0
0
0
0
1
1
0
B 3 y2 1 111111 A2 Ay1
Ori gi nate f romnode “ A”
Ori gi nate f rom node “ A”Ori gi nate f rom
node “ A”
Boundary Pattern
I denti fi er Pattern
At node y1At node ANode A
Path encodi ng segment
0 1 1 1 0 1
1 1 1 1 0 0
0
0
0
1
1
0
0
1
1
1
0
1
0
0
0
1
1
0
1
0
0
0
1
1
0
0
0
1
0
1
1
0
3 1 1 B2 CC C y3 2
Ori gi nate f rom node “ B”Ori gi nate f romnode “ B”
Ori gi nate f romnode “ C”
Boundary Pattern
I denti fi er Pattern
At node y3At node CNode C
Path encodi ng segment
Drawback of Encoding
1A.A.1B.B.1D.2E.?.2C.C.1F.2G
A
B C
D E F G
1
1 2
1 2 1 2
Proximity Searching
Multi-Element Comparison– Input:
A compressed array, caN, containing the multi-element encoding of the Near Set.
A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF.
– output: dist, the shortest path from EF to the closest element in
Near Set
Proximity Searching
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
x2
x3
x1
A
B
Cy1
y3
y2
MinDist=5 MinDist = 4 MinDist = 2
Progressive Score
Accumulative Deviation Proximity (DP)– Calculated from structural proximity
Boolean operator at Query Tree branches
a
b c
a
b c
prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))
Experiment
Query:
//restaurant/soho
XML:
Query Result:
<soho, 2><soho, 3><soho, 4>
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street Suburb
Mai ns Mai nsSomethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunes Appl e_bee
Name Shop
Branch
Soho
Bonza mal l productRestraurant
TypeSeafood
Thank you!
Questions & Answers