1 One Torus to Rule One Torus to Rule Them All: Multi- Them All: Multi- dimensional Queries dimensional Queries in P2P Systems in P2P Systems Prasanna Ganesan Prasanna Ganesan Beverly Yang Beverly Yang Hector Garcia-Molina Hector Garcia-Molina Stanford University Stanford University
21
Embed
One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems
One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems. Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University. Motivation. P2P Systems Dynamic set of nodes Dynamic data distributed over nodes No centralization - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
One Torus to Rule Them One Torus to Rule Them All: Multi-dimensional All: Multi-dimensional Queries in P2P SystemsQueries in P2P Systems
Prasanna GanesanPrasanna Ganesan
Beverly YangBeverly Yang
Hector Garcia-MolinaHector Garcia-Molina
Stanford UniversityStanford University
22
MotivationMotivation
P2P SystemsP2P Systems– Dynamic set of nodesDynamic set of nodes– Dynamic data distributed over nodesDynamic data distributed over nodes– No centralizationNo centralization– Traditionally Traditionally : Simple point queries over data: Simple point queries over data
New P2P applications desire multi-New P2P applications desire multi-dimensional queriesdimensional queries– Photo Sharing: Find all labels for photos in a Photo Sharing: Find all labels for photos in a
geographical areageographical area– Multi-player games: Find all objects in an areaMulti-player games: Find all objects in an area
33
ProblemProblem
Devise P2P system to store relation R Devise P2P system to store relation R with:with:
Route query/insert/delete to relevant Route query/insert/delete to relevant nodesnodes– No centralization!No centralization!– Replicated directory too expensive!Replicated directory too expensive!– Trade-off between cost of query and Trade-off between cost of query and
cost of maintaining routing structurecost of maintaining routing structure
66
RoadmapRoadmap
Two Different ApproachesTwo Different Approaches– SCRAP: Space-filling curves with Range SCRAP: Space-filling curves with Range
2. Range partition 1-d data2. Range partition 1-d data– Preserves locality!Preserves locality!
99
Load Balancing with SCRAPLoad Balancing with SCRAP
Adjust partitions when unbalancedAdjust partitions when unbalanced– Adjust boundary with neighborAdjust boundary with neighbor– Migrate to new areaMigrate to new area– Guarantees: All loads within factor 4.24. Constant tuple Guarantees: All loads within factor 4.24. Constant tuple
movements per insert/delete [GBGM04]movements per insert/delete [GBGM04]
1010
Query RoutingQuery Routing
Map multi-dim query to set of 1-d rangesMap multi-dim query to set of 1-d ranges Send each 1-d range query to relevant Send each 1-d range query to relevant
nodenode Use a linked list to interconnect nodesUse a linked list to interconnect nodes
– Add “skip” pointers for fast routingAdd “skip” pointers for fast routing
– O(log n) messages for routing/node O(log n) messages for routing/node joins/leavesjoins/leaves
1111
RoadmapRoadmap
Two Different ApproachesTwo Different Approaches– SCRAP: Space-filling curves with Range SCRAP: Space-filling curves with Range
Node leaveNode leave– Sibling takes overSibling takes over– If no sibling, find If no sibling, find
someone in sibling someone in sibling sub-treesub-tree
1414
Murk PropertiesMurk Properties
Locality: Locality: Rectangulation Rectangulation better than SCRAPbetter than SCRAP
Load BalanceLoad Balance– Ok if data Ok if data
distribution is staticdistribution is static– ??? If data ??? If data
distribution is distribution is dynamicdynamic
1515
Routing QueriesRouting Queries
Build a grid of nodesBuild a grid of nodes– Adjacent nodes link to each otherAdjacent nodes link to each other– Analogous to linked list in higher dimensionsAnalogous to linked list in higher dimensions
ProblemsProblems– Node managing large space has many Node managing large space has many
neighbors!neighbors!– Routing on grid is too slow. Need skip Routing on grid is too slow. Need skip
pointerspointers– Not easy to add skip pointers (see paper)Not easy to add skip pointers (see paper)
1616
EvaluationEvaluation
DatasetsDatasets– Uniform: 32-bit ints drawn at randomUniform: 32-bit ints drawn at random– Skewed: Photo Co-ords from real collectionSkewed: Photo Co-ords from real collection
Nodes join one at a time to build Nodes join one at a time to build networknetwork
EvaluateEvaluate– Locality: #nodes that process a queryLocality: #nodes that process a query– Routing: #messages transmitted per queryRouting: #messages transmitted per query
1717
Dimensionality vs. LocalityDimensionality vs. Locality
Dimensionality
#nodes = 8192. #Ideal Locality =1
1818
Selectivity vs. LocalitySelectivity vs. Locality
1919
Network Size vs. routing Network Size vs. routing CostCost
MURKMURK– Much better locality than SCRAPMuch better locality than SCRAP– Routing still okRouting still ok– Load balance is more complex and heuristic Load balance is more complex and heuristic
2121
More InformationMore Information
Load Balancing, Range Queries and Load Balancing, Range Queries and P2PP2P– ““Online Balancing of Range-Partitioned Data Online Balancing of Range-Partitioned Data
with Applications to P2P Systemswith Applications to P2P Systems”, ”, VLDB 2004VLDB 2004
– ““Distributed Balanced Tables: Not Making a Distributed Balanced Tables: Not Making a Hash of it AllHash of it All”, ”, Stanford Tech ReportStanford Tech Report
– Google: “Prasanna Ganesan”Google: “Prasanna Ganesan” More work on P2PMore work on P2P