Top Banner
ICDCS 2008 @ Beijing China Routing of XML and XPath Queries in Da ta Dissemination Ne tworks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto
28

Routing of XML and XPath Queries in Data Dissemination Networks

Jan 21, 2016

Download

Documents

livana

Routing of XML and XPath Queries in Data Dissemination Networks. Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto. Agenda. Motivation Advertisement-based routing Covering Evaluation Conclusions. XML. XML. Motivation. Queries. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Routing of XML and XPath Queries in Data Disse

mination Networks

Guoli Li, Shuang Hou

Hans-Arno Jacobsen

Middleware Systems Research Group

University of Toronto

Page 2: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Agenda

Motivation Advertisement-based routing Covering Evaluation Conclusions

Page 3: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Motivation

Data sources: publish XML data Data users: register XPath queries The data dissemination network: deliver matching results to a large and dyn

amically changing group of users

Content-based Data Dissemination

… …XML

XML

… …

Queries

Queries

Results

Results

Page 4: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Publish/Subscribe

Publisher

Subscriber

Subscription (XPath)

Publication (XML)

Advertisement (DTD)

Subscriber

Matching of XMLs and XPaths [ICDE’06] Matching of Advertisements and XPaths Exploring relations among XPaths

Page 5: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Covering-based Routing

3 4

5

6

1

2

Page 6: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Language Model Advertisement: generated from DTDs

Non-recursive advertisement e.g., A = /t1/t2/t3…/tn-1/tn

Recursive advertisement Simple A = A1(A2)+A3 Series A = A1(A2)+A3(A4)+A5 Embedded A = A1(A2(A3 )+ A4)+A5

<?xml encoding="UTF-8"?>

<!ELEMENT personnel (person)+>

<!ELEMENT person (name,email*,url*,link?)>

<!ATTLIST person id ID #REQUIRED>

<!ELEMENT name ((family,given)|(given,family))>

<!ELEMENT family (#PCDATA)>

<!ELEMENT given (#PCDATA)>

<!ELEMENT email (#PCDATA)>

<!ELEMENT url EMPTY>

<!ATTLIST url href CDATA 'http://'>

<!ELEMENT link EMPTY>

<!ATTLIST link manager IDREF #IMPLIED>

… …

/personnel/person

/personnel/person/name

/personnel/person/name/family

/personnel/person/name/given

/personnel/person/email

/personnel/person/url

/personnel/person/link

DTD Advertisements

Page 7: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Language Model

Subscription: XPaths Absolute

e.g., /c/d/*/e Relative

e.g., c/d/*/e Descendant operators

e.g., c//e/*/c

c

d e

*

e

*

c

b

a

Page 8: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Advertisement-based Routing

P(A) P(S) P(S) P(A)

P(A) P(S)

P(A) P(S)

Subscription (S) Broker

A1: /a/b/*/e

A2: /b/e

A3: /a/b/d

A4: /a/b/e

… …

Page 9: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Overlapping Algorithms

-1 0 0 0 1 2

S = /a /b /c /* /b /e

Adv Sub Overlap

* * Y

* t Y

t * Y

t t Y

t1 t2 N

Next Table

A = /a /b /c /* /b /c /* /b /e

/a /b /c /* /b /c /* /b /e

/a /b /c /* /b /e

/a /b /c /* /b /c /* /b /e

/a /b /c /* /b /e/a /b /c /* /b /e

/a /b /c /* /b /c /* /b /e

e.g, S = /a /b //c /* /b //e

Basic case: Other cases:

Page 10: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Subscription Tree Subscriptions are

maintained in a hierarchical tree

A child has more than one parent

Siblings may intersect If a publication does not

match a node, it does not match any of the descendants

ROOT

/a

/b/e/c/f

/*/b d/a/b

/a/b/a/c /a/*/d

/a/b/d/a/c/d

/b/d/b/e

/b/d/a

pointer

Page 11: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Tree Maintenance

Insert Delete

Page 12: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Covering Algorithms

Similar to Adv-Sub overlapping algorithms Absolute simple XPEs Relative simple XPEs

XPEs with // operator e.g.,

S1 S2 Cover

* * Y

* t Y

t * N

t t Y

t1 t2 N

S2 = /a /a /* //c /e /c /d

S1 = /* /a //e /c

/a /a /*//c /e /c /d

/* /a /e /c

/a /a //c /e /c /d

/*

Page 13: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Merging Rules

Rules XPEs with one difference (e.g., element, op)

e.g., S1= /a/*/c/d S2 = /a/*/c/e S = /a/*/c/* XPEs with different sub-XPEs

e.g., … … … …

XPE1 XPE2

… … … …

S1

S2… … … … S //

Merge degree

P(S1)

P(S2)

P(S)

Page 14: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Evaluation

Setup Implemented in C++ Overlay with 127 content-based routers Cluster (each node:1.86GHz, 4G) vs. PlanetLab Workloads are generated from two DTDs: NITF and PSD

Metrics Number of subscriptions per router Network traffic XPE processing time Notification delay

Page 15: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Routing Table Size

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

0 20000 40000 60000 80000 100000

Number of Xpath Queries

Ro

utin

g T

ab

le s

ize

(# o

f XP

ath

Qu

eri

es)

No Covering( Set A and B)

50% Covering (Set A)

90% Covering (Set B)

Page 16: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Routing Table Size

0

10000

20000

30000

40000

50000

0 20000 40000 60000 80000 100000

Number of Subscriptions

Rou

ting

Tab

le S

ize

Covering (Set B)

Perfect Merging(Set B)

Imperfect Merging(Set B)

Page 17: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Network Traffic

Method Network Traffic Delay(ms)

No-Adv-No-Cov 654,871 97.82

No-Adv-With-Cov 572,890 20.74

With-Adv-No-Cov 398,810 98.09

With-Adv-With-Cov 326,796 20.89

With-Adv-With-CovPM 254,900 16.78

With-Adv-With-CovIPM 257,567 12.24

Page 18: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Process Time

Page 19: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Notification Delay (PSD)

Page 20: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Notification Delay (NITF)

Page 21: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Related Work Locating data sources in large distributed systems [Galanis et al. 2003]

DHT based approach Data summary

Query aggregation for scalable data dissemination [Chan et al. 2002]

Equivalence between the original query set and the aggregated set ONYX [Diao et al. 2004]

Deliver part of the XML documents Share common prefixes among queries using NFA

XTreeNet [Fenner et al. 2005]

Unify the pub/sub model and the query/response model Avoid repeatedly matching at each hop

Page 22: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Conclusions Investigate advertisement-based routing for XML data di

ssemination networks Propose a novel data structure to maintain covering & m

erging relationships among XPEs. Perform experimental evaluation on a 127 broker overlay

to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85%

Future work Extend to tree patterns Share common prefixes among XPEs in overlapping and coverin

g algorithms

Page 23: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Q & A

Contact [email protected] [email protected]

Middleware systems research group, University of Toronto www.msrg.eecg.toronto.edu

Page 24: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Process Time

Number of Subscriptions

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Tim

e (m

s)

0

20

40

60

80

100

120

140

Page 25: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Notification Delay (NITF)

Page 26: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Notification Delay (PSD)

Number of Hops

2 3 4 5 6

0

4

8

12

16

Not

ifica

tion

Del

ay (

ms)

Page 27: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

False Positives

0

2

4

6

8

0 0.05 0.1 0.15 0.2

Imperfect Degree

Fa

lse

Po

sitiv

e (

%)

Page 28: Routing of XML and XPath Queries in Data Dissemination Networks

ICDCS 2008 @ Beijing China

Conclusions Investigate advertisement-based routing for XML data di

ssemination networks Present algorithms to determine the covering relations a

mong arbitrary XPEs Propose a novel data structure to maintain covering & m

erging relationships among XPEs. Explore rules to merge similar XPEs in order to further re

duce the routing table size Perform experimental evaluation on a 127 broker overlay

to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85%