Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao [email protected]
Mar 30, 2015
ROC/Sahara Retreats, 1/2002 2
Why TapestryToday’s Internet Route failures not uncommon
BGP too slow to recover, redundant routes unexploited IPv4 constrains deployment of new protocols
IP multicast, security protocols (DDoS traceback), … Wide-area applications straining existing systems
Scalable management of large scale resources
Our goals Wide-area scalable network overlay
Highly fault-tolerant routing / location Introspective / self-tuning platform Support application-specific protocols Efficient (b/w, latency) data delivery
Pass on wide-area solutions to application layer
ROC/Sahara Retreats, 1/2002 3
What is Tapestry?
A prototype of a decentralized, fault-tolerant, adaptive overlay infrastructure(Zhao, Kubiatowicz, Joseph et al. 2000)
Network substrate of OceanStore Routing: Suffix-based hypercube
Similar to Plaxton, Rajamaran, Richa (SPAA97) Decentralized location:
Virtual hierarchy per object with cached location references
Dynamic algorithms using local information
Core API: publishObject(ObjectID) routeMsgToObject(ObjectID) routeMsgToNode(NodeID)
ROC/Sahara Retreats, 1/2002 4
Routing and Location
Namespace (nodes and objects) 160 bits length 280 names before name collision Each object has its own hierarchy rooted at Root
f (ObjectID) = RootID, via a dynamic mapping function
Suffix routing from A to B At hth hop, arrive at nearest node hop(h) such that:
hop(h) shares suffix with B of length h digits Example: 5324 routes to 0629 via
5324 2349 1429 7629 0629
Object location: Root responsible for storing object’s location Publish / search both route incrementally to root
ROC/Sahara Retreats, 1/2002 5
4
2
3
3
3
2
2
1
2
4
1
2
3
3
1
34
1
1
4 3
2
4
Tapestry MeshIncremental suffix-based routing
NodeID0x43FE
NodeID0x13FENodeID
0xABFE
NodeID0x1290
NodeID0x239E
NodeID0x73FE
NodeID0x423E
NodeID0x79FE
NodeID0x23FE
NodeID0x73FF
NodeID0x555E
NodeID0x035E
NodeID0x44FE
NodeID0x9990
NodeID0xF990
NodeID0x993E
NodeID0x04FE
NodeID0x43FE
ROC/Sahara Retreats, 1/2002 6
Object LocationRandomization and Locality
ROC/Sahara Retreats, 1/2002 7
Fault-tolerant Routing
Strategy: Detect failures via soft-state probe packets Route around problematic hop via backup pointers
Handling: 3 forward pointers per outgoing route
(2 backups) 2nd chance algorithm for intermittent failures Upgrade backup pointers and replace
Protocols: First Reachable Link Selection (FRLS) Proactive Duplicate Packet Routing
ROC/Sahara Retreats, 1/2002 8
Talk Outline
Tapestry overview
Architecture
Evaluation
Brocade
Conclude
ROC/Sahara Retreats, 1/2002 9
Architecture Background
OceanStore implementation Java with asynchronous I/O Event-based, stage driven architecture
(Sandstorm – M. Welsh)
Operating SystemJava Virtual Machine
Sandstorm (async I/O, event arch.)Tapestry
OceanStoreApplications
ROC/Sahara Retreats, 1/2002 10
Key StagesStaticTClient / Federation Uses config files to bootstrap initial Tapestry
DynamicTClient Integrates new nodes into static Tapestry
Router Primary handler of routing and location
Patchwork Introspective monitoring and fault-detection
Sandstorm (async I/O, event arch.)
OceanStore
Applications
RouterStatic TClientDynamic TClient
Patchwork
ROC/Sahara Retreats, 1/2002 11
Static TClient
Federation used as rendezvous point
Pair-wise pings to generate route tables
Federation used as global barrier to begin
FS
S
S
S1. Si says hello to F2. F informs group of Si
3. Nodes do pair-wise pings
4. Nodes signal readiness
5. Barrier reached at F, signals start
ROC/Sahara Retreats, 1/2002 12
Dynamic TClientNode Integration1. Hill-climb to find nearest Gateway2. Route to surrogate / copy routes3. Move relevant objects to new root4. Directed multicast notifies nearby nodes
G S
Routes Request
Routes Response
Moving Object Pointers
Directed Multicast?
F
ROC/Sahara Retreats, 1/2002 13
Routing / Location
Router class
Maintains: RoutingTable:
[ ][ ] of RouteEntries ObjectPointers:
Hash(Guid)PublishInfoHash(Guid)LastHop
Handles: Object publication / unpublication / mobile objects Route / location message handling
ROC/Sahara Retreats, 1/2002 14
Patchwork
Fault-handling / introspective stage Granulated periodic beacons measure loss and
network latency to entries in routing table Promote/demote routes in single RouteEntry
Routernetwork
XA
B
C
A B CB C A
ROC/Sahara Retreats, 1/2002 15
Deployment Status Object Location
Publish / unpublish / route to object Mobile objects (backtracking unpublish) Active deletes, confirmation of non-existence
General Routing Route to node, redundant routes Soft-state fault-detection, limited optimization Advanced policies for fault recovery
Dynamic Integration Integration w/ limited optimizations Best effort fault-resilient integration mechanisms
Background threads for optimization / refresh
ROC/Sahara Retreats, 1/2002 16
Talk Outline
Tapestry overview
Architecture
Evaluation
Brocade
Conclude
ROC/Sahara Retreats, 1/2002 17
Generalized Results
Cached object pointersEfficient lookup for nearby objectsReasonable storage overhead
Multiple object roots Improves availability under attack Improves performance and perf. stability
Reliable packet deliveryRedundant pointers approximate optimal
reachabilityFRLS, a simple fault-tolerant UDP protocol
ROC/Sahara Retreats, 1/2002 18
First Reachable Link Selection
Use periodic UDP packets to gauge link condition
Packets routed to shortest “good” link
Assumes IP cannot correct routing table in time for packet delivery
ABCDE
IP Tapestry
No path exists to dest.
ROC/Sahara Retreats, 1/2002 19
Some Numbers
Measurements PIII 800, L2.2.18, IBM JDK 1.3 Simulating 6 nodes
(4 staticTC, 1 federation, 1 dynamicTC) Publishing / locating ~10 objects PublishMsg, RouteMsg: ~ 0-2 ms Integration: ~2600ms (w/ pings)
Integration messages: Assuming latency data available 2 x n (routing and objects)
16M (directed multicast notification) (M 3)
ROC/Sahara Retreats, 1/2002 20
Talk Outline
Tapestry overview
Architecture
Evaluation
Brocade
Conclude
ROC/Sahara Retreats, 1/2002 21
Landmark Routing on P2P
Brocade Exploit non-uniformity Minimize wide-area routing hops / bandwidth
Secondary overlay on top of Tapestry Select super-nodes by admin. domain
Divide network into cover sets
Super-nodes form secondary Tapestry Advertise cover set as local objects
Routing (AB) uses brocade to route directly into B’s local network
ROC/Sahara Retreats, 1/2002 22
Brocade Mechanisms
Selective utilization Nodes cache local cover set Only utilize brocade if dest. not in cache
Forwarding messages to supernodes1. Super-node does IP-snooping
2. Direct: cover set caches supernode
Inter-domain routing: AB1. ASN(A) via IP
2. SN(A) finds SN(B) via Tapestry location
3. SN(B)B via Tapestry/Chord/Pastry/CAN
ROC/Sahara Retreats, 1/2002 23
Brocade Routing RDPBrocade Latency RDP 3:1
00.5
11.5
22.5
33.5
44.5
5
2 4 6 8 10 12 14 16 18 20 22 24 26
Interdomain-adjusted Latency on Optimal Route
Re
lati
ve
De
lay
Pe
na
lty
Original Tapestry IP Snooping Brocade Directed Brocade
Local cover set cache on; interdomain:intradomain = 3:1Packet simulator, Transit-stub 4096 T nodes, 16 SuperN
ROC/Sahara Retreats, 1/2002 24
Brocade Bandwidth UsageBrocade Aggregate Bandwidth Usage
0
10
20
30
40
50
60
2 4 6 8 10 12 14
Physical Hops in Optimal Route
Ap
pro
x. B
W p
er M
essa
ge
Original Tapestry IP Snooping Brocade Directed Brocade
Local cover set cache onB/W unit: (sizeof (Msg) * Hops)
ROC/Sahara Retreats, 1/2002 25
Ongoing / Future Work
Fill in full functionality Fault-handling policies, introspection, self-repair
More realistic experiments Artificial topologies on SOSS simulator Larger scale dynamic integration experiments
Code development External deployment / Code release
Sprint programmable routers Academic networks
Introspective measurement platform Implementing applications (Bayeux, Brocade … )
ROC/Sahara Retreats, 1/2002 26
For More Information
Tapestry and related projects (and these slides):http://www.cs.berkeley.edu/~ravenben/tapestry
OceanStore:http://oceanstore.cs.berkeley.edu
Related papers:http://oceanstore.cs.berkeley.edu/publications
http://www.cs.berkeley.edu/~ravenben/publications