Distributed Structural and Value XML Filtering Iris Miliaraki and Manolis Koubarakis Department of Informatics and Telecommunications National and Kapodistrian University of Athens *Το άρθρο θα παρουσιαστεί στο “4th ACM International Conference on Distributed Event-Based Systems (DEBS 2010)”, Cambridge, UK. 9 Ο Ελληνικό Συμπόσιο Διαχείρισης Δεδομένων, Αγία Νάπα, Κύπρος
Distributed Structural and Value XML Filtering. Iris Miliaraki and Manolis Koubarakis Department of Informatics and Telecommunications National and Kapodistrian University of Athens. 9 Ο Ελληνικό Συμπόσιο Διαχείρισης Δεδομένων, Αγία Νάπα, Κύπρος. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Distributed Structural and Value XML Filtering
Iris Miliaraki and Manolis Koubarakis
Department of Informatics and TelecommunicationsNational and Kapodistrian University of Athens
*Το άρθρο θα παρουσιαστεί στο “4th ACM International Conference on Distributed Event-Based Systems (DEBS 2010)”, Cambridge, UK.
9Ο Ελληνικό Συμπόσιο Διαχείρισης Δεδομένων, Αγία Νάπα, Κύπρος
Outline
XML Filtering scenario Background
DHTs Structural matching
Value matching Experiments Sum up and future work
XML Filtering system XML Filtering system
XML Filtering scenario
XPath/XQuery?
XPath/XQuery?
Subscriber
Subscriber Publisher
Publisher
YFilter
XTrieFiST
Index-Filter
CentralizedDistributed
ONYX
Gong et al. [ICDE05]
XPush
Parallel/Hierarchical XTrie
Snoeren [SOSP 2001]
Miliaraki [WWW 2008]
XML Filtering scenario
XPath/XQuery?
XPath/XQuery
?
Subscriber
Subscriber Publisher
Publisher
Background: DHTs Structured overlay networks
Solve the item location problem in a distributed and dynamic network of nodes (in O(log N) hops): Let x be some data item. Find x!
Distributed version of hash table data structure id=Hash(K)
Main operations: Put: given a key (for a data item), map
the key onto a node. Get: Find the location of a data item with
Huge increase of NFA states!Huge increase of NFA states!
Destroy sharing of path expressions!Destroy sharing of path expressions!
Bottom-up evaluation
Common rule in relational query optimization apply selections as early as possible
Works well for relational query processing
pFist [Kwon et al. 2005]
A lot of effort evaluating predicates while the structure may not be matched
A lot of effort evaluating predicates while the structure may not be matched
Step-by-step evaluation XPath queries consist of distinct stepsEach step contains one or more value-based predicatesPerform value matching with structural matching in a
stepwise manner
YFilter – Inline [Diao et al. 2003] process predicates when NFA state is reached
Effort spent for evaluating predicates while the structure may not be fully matched
Effort spent for evaluating predicates while the structure may not be fully matched
Top-down evaluationCheck predicates after structural matching
YFilter – Selection-Postponed [Diao et al. 2003] performs predicate evaluation after the execution of the NFA
VA-RoXSum [Vagena et al. 2007] Focus on message aggregation
depending on predicate selectivity number of false positives may be very largedepending on predicate selectivity number of false positives may be very large
Moving on to details Parse XML document and generate a set of candidate
At each step of the execution, part of the NFA is revealed
Applies on equality predicates
IDEA: Use a compact summary of predicate information to stop NFA execution (prune) if we can deduce that no match can be found
IDEA: Use a compact summary of predicate information to stop NFA execution (prune) if we can deduce that no match can be found
00 11
22
bibphdthesis
44
*33
55
88conference
66authorarticle
77
cite
TD with pruning – Details
• Each peer responsible for storing many NFA fragments
• Each peer keeps one Bloom filter which summarizes predicates of queries indexed in the relevant NFA fragments Value filter (VF)
• Assuming a peer p and a state st, for each query q whose NFA accepting path contains st, we insert one predicate of q in the VF of p
TD with pruning - Main idea cont.
• Predicates are inserted as a whole in VFs using their string representation:– element[@attr=value] element + attr + value– element[text()=value] element + text + value
• VFs are updated during query indexing
• Since we traverse the NFA accepting path of a query to index all relevant VFs will be updated
Described methods to combine both structural and value XML filtering in a distributed environment
Experimental evaluation of our methodsFuture work
Potential improvements for SBS methodMore sophisticated methods for selectivity estimationRange predicatesTextual predicates
Questions?
Planetlab (2 predicates per query)
Performance improvement
Structural vs. value matching (small query set)
Structural vs. value matching (large query set)
<?xml version="1.0" encoding="UTF-8"?><statuses> <status><created_at>Tue Apr 07 22:52:51 +0000 2009</created_at><id>1472669360</id><text>At least I can get your humor through tweets. RT @abdur: I don't mean this in a bad way, but genetically speaking your a cul-de-sac.</text><source><a href="http://www.tweetdeck.com/">TweetDeck</a></source><truncated>false</truncated><in_reply_to_status_id></in_reply_to_status_id><in_reply_to_user_id></in_reply_to_user_id><favorited>false</favorited><in_reply_to_screen_name></in_reply_to_screen_name><user><id>1401881</id> <name>Doug Williams</name> <screen_name>dougw</screen_name> <location>San Francisco, CA</location> <description>Twitter API Support. Internet, greed, users, dougw and opportunities are my passions.</description> <profile_image_url>http://s3.amazonaws.com/twitter_production/profile_images/59648642/avatar_normal.png</profile_image_url> <url>http://www.igudo.com</url> <protected>false</protected> <followers_count>1027</followers_count> <profile_background_color>9ae4e8</profile_background_color> <profile_text_color>000000</profile_text_color> <profile_link_color>0000ff</profile_link_color> <profile_sidebar_fill_color>e0ff92</profile_sidebar_fill_color> <profile_sidebar_border_color>87bc44</profile_sidebar_border_color> <friends_count>293</friends_count> <created_at>Sun Mar 18 06:42:26 +0000 2007</created_at> <favourites_count>0</favourites_count> <utc_offset>-18000</utc_offset> <time_zone>Eastern Time (US & Canada)</time_zone> <profile_background_image_url>http://s3.amazonaws.com/twitter_production/profile_background_images/2752608/twitter_bg_grass.jpg</profile_background_image_url> <profile_background_tile>false</profile_background_tile> <statuses_count>3390</statuses_count> <notifications>false</notifications> <following>false</following> <verified>true</verified></user><geo/> </status> ... truncated ...</statuses>