SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Nick Harvey, Mike Jones, Stefan Saroiu Stefan Saroiu , Marvin Theimer, Alec Wolman , Marvin Theimer, Alec Wolman Microsoft Research Microsoft Research University of Washington University of Washington
36
Embed
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SkipNet: A Scalable Overlay Network with Practical Locality Properties
SkipNet: A Scalable Overlay Network with Practical Locality Properties
Nick Harvey, Mike Jones, Nick Harvey, Mike Jones, Stefan SaroiuStefan Saroiu, Marvin Theimer, Alec Wolman, Marvin Theimer, Alec Wolman
Microsoft ResearchMicrosoft ResearchUniversity of WashingtonUniversity of Washington
Overlay NetworksOverlay Networks Overlays have achieved several goals:Overlays have achieved several goals:
Scalable and decentralized infrastructureScalable and decentralized infrastructure Uniform and random load and data distributionUniform and random load and data distribution
But, at the price of data controllabilityBut, at the price of data controllability Data may be stored far from its usersData may be stored far from its users Data may be stored outside its domainData may be stored outside its domain Local accesses leave local organizationLocal accesses leave local organization
Basic Basic trade-offtrade-off: data controllability vs. data uniformity: data controllability vs. data uniformity SkipNet: SkipNet:
Traditional overlay functionalityTraditional overlay functionality Provides an abstraction to control this Provides an abstraction to control this trade-offtrade-off::
Key Locality Properties and Key Locality Properties and AbstractionAbstraction In practice, two properties are important:In practice, two properties are important:
Content LocalityContent Locality – ability to explicitly place data – ability to explicitly place data Placement on a single node or on a set of nodesPlacement on a single node or on a set of nodes
Path LocalityPath Locality – ability to – ability to guaranteeguarantee that local that local traffic remains localtraffic remains local
One abstraction is important – CLB:One abstraction is important – CLB: SkipNet abstraction to control the SkipNet abstraction to control the trade-offtrade-off Multiple DHT scopes within one single overlayMultiple DHT scopes within one single overlay
Practical RequirementsPractical Requirements
Data Controllability:Data Controllability: Organizations want control over their own dataOrganizations want control over their own data Even if local data is globally availableEven if local data is globally available
Manageability:Manageability: Data control allows for data administration, Data control allows for data administration,
provisioning and manageabilityprovisioning and manageability Data center/cluster = constrained set of nodesData center/cluster = constrained set of nodes CLB ensures load balance across data CLB ensures load balance across data
Key property: two address spacesKey property: two address spaces1.1. Name ID space: nodes are sorted by their names (e.g. Name ID space: nodes are sorted by their names (e.g.
DNS names)DNS names)
2.2. Numeric ID space: nodes are randomly distributedNumeric ID space: nodes are randomly distributed
Combining both spaces achievesCombining both spaces achieves Content + Path localityContent + Path locality Other uses could emerge: range queries [AS ’03]Other uses could emerge: range queries [AS ’03]
Scalable peer-to-peer overlay networkScalable peer-to-peer overlay network O(log N) routing performance in both spacesO(log N) routing performance in both spaces O(log N) routing state per nodeO(log N) routing state per node
SkipNet RingSkipNet Ring
Pointers at level Pointers at level hh skip over 2 skip over 2hh nodes nodes Nodes are ordered by namesNodes are ordered by names
A
DM
V
T
XZ
O
SkipNet RingSkipNet Ring
Pointers at level Pointers at level hh skip over 2 skip over 2hh nodes nodes Nodes are ordered by namesNodes are ordered by names
A
DM
V
T
XZ
O
SkipNet RingSkipNet Ring
Pointers at level Pointers at level hh skip over 2 skip over 2hh nodes nodes Nodes are ordered by namesNodes are ordered by names
A
E F
M
H
S
Z
G
SkipNet Global ViewSkipNet Global View
A Level: L = 0
L = 1
L = 3
L = 2
Root RingRoot Ring
Ring 0Ring 0 Ring 1Ring 1
Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Ring000Ring000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
D M OT
VXZ
O
ZA T
M
X
DV
A T
M
X
D
VZ
O
O
ZA T
M
X
D
V
SkipNet Global ViewSkipNet Global View
A Level: L = 0
L = 1
L = 3
L = 2
Root RingRoot Ring
Ring 0Ring 0 Ring 1Ring 1
Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Ring000Ring000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
D M OT
VXZ
O
ZA T
M
X
DV
A T
M
X
D
VZ
O
O
ZA T
M
X
D
V
Two Address SpacesTwo Address Spaces
SkipNet can route efficiently in both SkipNet can route efficiently in both address spaces:address spaces: Name ID space (e.g. DNS names)Name ID space (e.g. DNS names) Numeric ID spaceNumeric ID space
Routing by Name IDRouting by Name ID
Level: L = 0
L = 1
L = 2
Example: route from A to VExample: route from A to V Simple Rule: Forward the message to node that is closest to Simple Rule: Forward the message to node that is closest to
dest, without going too far.dest, without going too far.
Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Ring000Ring000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
A Root RingRoot RingD M O
TVXZ
Ring 0Ring 0A
M
T
X
Ring 1Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Node A’sRoutingTable
Node A’sRoutingTable
Routing by Name IDRouting by Name ID
Level: L = 0
L = 1
L = 2
Example: route from A to VExample: route from A to V Simple Rule: Forward the message to node that is closest to Simple Rule: Forward the message to node that is closest to
dest, without going too far.dest, without going too far.
Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Ring000Ring000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
A Root RingRoot RingD M O
TVXZ
Ring 0Ring 0A
M
T
X
Ring 1Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Routing by Name IDRouting by Name ID
Level: L = 0
L = 1
L = 2
Example: route from A to VExample: route from A to V Simple Rule: Forward the message to node that is closest to Simple Rule: Forward the message to node that is closest to
dest, without going too far.dest, without going too far.
Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Ring000Ring000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
A Root RingRoot RingD M O
TVXZ
Ring 0Ring 0A
M
T
X
Ring 1Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Node T’sRoutingTable
Node T’sRoutingTable
Routing by Name IDRouting by Name ID
Level: L = 0
L = 1
L = 2
Example: route from A to VExample: route from A to V Simple Rule: Forward the message to node that is closest to Simple Rule: Forward the message to node that is closest to
dest, without going too far.dest, without going too far.
Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Ring000Ring000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
A Root RingRoot RingD M O
TVXZ
Ring 0Ring 0A
M
T
X
Ring 1Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Node T’sRoutingTable
Node T’sRoutingTable
Routing by Name IDRouting by Name ID
Level: L = 0
L = 1
L = 2
Example: route from A to VExample: route from A to V Simple Rule: Forward the message to node that is closest to Simple Rule: Forward the message to node that is closest to
dest, without going too far.dest, without going too far.
Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Ring000Ring000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
A Root RingRoot RingD M O
TVXZ
Ring 0Ring 0A
M
T
X
Ring 1Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Node T’sRoutingTable
Node T’sRoutingTable
Routing by Name IDRouting by Name ID
Level: L = 0
L = 1
L = 2
Example: route from A to VExample: route from A to V Simple Rule: Forward the message to node that is closest to Simple Rule: Forward the message to node that is closest to
dest, without going too far.dest, without going too far.
Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Ring000Ring000
Ring001Ring001
Ring010Ring010
Ring011Ring011
Ring100Ring100
Ring101Ring101
Ring110Ring110
Ring111Ring111
A Root RingRoot RingD M O
TVXZ
Ring 0Ring 0A
M
T
X
Ring 1Ring 1D
Z V
O
OZA T
M
X
D
V
A TM
X
DV Z
O L = 3
Routing by Numeric IDRouting by Numeric ID
Provides the basic DHT primitiveProvides the basic DHT primitive To store file “Foo.c”To store file “Foo.c”
Hash(“Foo.c”) Hash(“Foo.c”) a random numeric ID a random numeric ID Find highest ring matching that numeric IDFind highest ring matching that numeric ID Store file on node in that ringStore file on node in that ring
Log N routing efficiencyLog N routing efficiency
DHT ExampleDHT Example
Store file “Foo.c” from node AStore file “Foo.c” from node A Hash(“Foo.c”) = 101…Hash(“Foo.c”) = 101…
Route from A to V in Route from A to V in numeric numeric spacespace
Level: L = 0
L = 1
L = 2Ring 00Ring 00 Ring 01Ring 01 Ring 10Ring 10 Ring 11Ring 11
Multiple DHTs with differing scopes using a single Multiple DHTs with differing scopes using a single SkipNet structureSkipNet structure
A result of the ability to route in both address A result of the ability to route in both address spacesspaces
Divide data object names into 2 partsDivide data object names into 2 partsusing the ‘!’ special character using the ‘!’ special character CLB DomainCLB Domain CLB SuffixCLB Suffix
To read file “com.microsoftTo read file “com.microsoft!!skipnet.html”skipnet.html” Route by name ID to “com.microsoft”Route by name ID to “com.microsoft” Route by numeric ID to Hash(“skipnet.html”)Route by numeric ID to Hash(“skipnet.html”)
within the “com.microsoft” constraintwithin the “com.microsoft” constraint
com.sun
edu.ucbgov.irs
com.microsoft
skipnet.html
SkipNet Path LocalitySkipNet Path Locality
Organizations correspond to contiguous SkipNet Organizations correspond to contiguous SkipNet segmentssegments Internal routing by NameID remains internalInternal routing by NameID remains internal
Nodes have left / right pointersNodes have left / right pointers
com.sun
edu.ucbgov.irs
com.microsoft
com.microsoft.research
Fault ToleranceFault Tolerance
Many failures occur along organizational Many failures occur along organizational boundaries:boundaries: Gateway/firewall failure, BGP misconfig, physical Gateway/firewall failure, BGP misconfig, physical
network cut, …network cut, …
SkipNet handles organizational disconnect SkipNet handles organizational disconnect gracefullygracefully Results in two well-connected, partitioned SkipNetsResults in two well-connected, partitioned SkipNets Efficient remerging algorithmsEfficient remerging algorithms
Node independent failuresNode independent failures Same resiliency as systems such as Chord and PastrySame resiliency as systems such as Chord and Pastry Similar approach to repair (Leaf Set)Similar approach to repair (Leaf Set)
Primary Security Benefit & Primary Security Benefit & WeaknessWeakness+ SkipNet + name access control SkipNet + name access control
mechanism:mechanism: Content locality ensures that content stays Content locality ensures that content stays
within organizationwithin organization Path locality prevents: Path locality prevents:
malicious forwarders malicious forwarders analysis of internal trafficanalysis of internal traffic external tampering external tampering
- Easier to target organizations:Easier to target organizations: Someone creates one million nodes with Someone creates one million nodes with
name prefixes name prefixes microsofa.com microsofa.com and and microsort.commicrosort.com
Most traffic to/from Microsoft will go through Most traffic to/from Microsoft will go through a microsofa / microsort intermediate nodea microsofa / microsort intermediate node
Pastry and Chord implementationPastry and Chord implementation
Uses Mercator and GT-ITM network Uses Mercator and GT-ITM network topologiestopologies
Experimentally evaluated:Experimentally evaluated: Name ID routing performanceName ID routing performance Tolerance to organizational disconnectTolerance to organizational disconnect
Pastry and Chord implementationPastry and Chord implementation
Uses Mercator and GT-ITM network Uses Mercator and GT-ITM network topologiestopologies
Experimentally evaluated:Experimentally evaluated: Name ID routing performanceName ID routing performance Tolerance to organizational disconnectTolerance to organizational disconnect Numeric ID routing performanceNumeric ID routing performance Effectiveness of network proximity Effectiveness of network proximity
optimizationsoptimizations Effectiveness of CLB routing optimizationsEffectiveness of CLB routing optimizations
Routing by Name ID PerformanceRouting by Name ID Performance
Benefits come at no extra costBenefits come at no extra cost
Disconnected Org Size = 15% of all Disconnected Org Size = 15% of all nodesnodes
ConclusionsConclusions
SkipNetSkipNet: : Traditional overlay functionalityTraditional overlay functionality Explicit control of data placementExplicit control of data placement
Constrained load balancingConstrained load balancing Content + Path Locality are basic Content + Path Locality are basic
ingredients to:ingredients to: Data controllability Data controllability ManageabilityManageability SecuritySecurity Data availabilityData availability PerformancePerformance