Jan 12, 2016
Relation of: Chord, CAN, INS, CFS DCSRelation of: Chord, CAN, INS, CFS DCS
)(log nO ))(4/(1
dndPolynomial in number of attributes in name
specifiersScalability
DCS
INS (early binding) Chord CAN
CFS
File blocks Sensor
Outline of ChordOutline of Chord
Chord protocol What is Chord ? Why use Chord ? Basic Chord protocol Extended Chord protocol Simulation results
Chord system APIs provided by Chord system
Outline of CFSOutline of CFS
CFS system structure Chord layer
Server selection
DHash layer Replication Caching Load balance Quota Update and delete
Experimental results
Part One: Chord
What is Chord?What is Chord?
node n
node n’ that stores key’s value
Value of key?
Application in n Value of key!
Why use Chord ?Why use Chord ?
High efficiency resolves a lookup via O(logN) messages to other
nodes
Fault tolerance
Chord scales well each node maintains information only about O(logN)
other nodes
Base Chord protocolBase Chord protocol
What is identifier cycle? Where to store a key? How to look up a key quickly ? Node joins (departures)
The base Chord protocolThe base Chord protocol
What is identifier cycle ?
Node’ IP address Node’s m-bit identifierSHA-1
Key (a string) key’s m-bit identifierSHA-1
Secure Hash Algorithm The algorithm takes a message of less than 264 bits in length and
produces a 160-bit message digest
Key1
Key2
Node na position in id cycle
a node
Where is a key stored?Where is a key stored?
Successor (k): the first node encountered when moving in the clockwise direction starting from k in the identifier circle.
Value of key!
How to look up a key quickly ?How to look up a key quickly ?
Successor(key) ?
Value of key?n1
n4
n2n3
……
Key
jump
n1
Key
Value of key!
move
Value of key?
How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.) We need finger table
How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.) finger table for node 1finger table for node 1
How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.)
Node 3: Am I predecessor(1) ?
Predecessor(1) successor(1)
Node 3: Try entry 3, and find node 0
Node 3: Send lookup to node 0
Node 0: Am I predecessor(1) ?
Node 0: successor(1) is node 1
return to node 3 (RPC)
Value of key 1
?
Node joinsNode joins
Two challenges Each node’s finger table is correctly filled Each key k is stored at node successor(k)
Three operations Initialize the predecessor and fingers of the new node n Update the fingers and predecessors of existing nodes Copy to n all keys for which node n has became their
successor
Initialize the predecessor and fingers Initialize the predecessor and fingers of node nof node n
Idea: Ask an existing node for information needed
Join in
Update the fingers and predecessors of existing nodes
Observation: when node n joins the network, n will become the ith finger of a node p when the following two conditions meet: P proceeds n by at least 2i-1
The ith finger of node p succeeds n
Solution: Try to find predecessor(n- 2i-1) for all 1<=i<=m; and check whether n is their ith finger, and whether n is their predecessor’s ith finger.
Update the fingers and predecessors of existing nodes (cont.)
Predecessor(6-21-1) =3, update
6
Predecessor(3) =1, no update
Predecessor(6-22-1) =3, update
6
Predecessor(6-23-1) =1, update6
Predecessor(1) =0, update
Predecessor(3) =1, no update
6
Predecessor(0) =3, no update
Join in
Copy to n all keys for which node n has became their successor
Idea: Node n can become the successor only for keys stored by the node immediately following n
Join in
Extended Chord protocolExtended Chord protocol
Concurrent joins Failures and replication
Concurrent joinsConcurrent joins
When multiple nodes with similar identifiers join at the same time, They tell the same predecessor that they are its successor. How to update predecessor and successor ?
notify
stabilize
Only the newcomer with the lowest (highest) identifier will be the predecessor (successor)
Periodically check whether new nodes have inserted themselves between a node and its immediate neighbors
Concurrent joins (cont.)Concurrent joins (cont.)
a
d
Example: Node b joins
ct1
b
t1+∆t
Succ (a)
Pred (b)
Succ(b)
Pred (c)
Succ(c)
Pred (d)
t0
d
a
t1
c
a
d
c
t1 +∆t
b
a
d
a
d
c
t2
b
a
c
b
d
c
notify stabilize
Failures and replicationFailures and replication
Challenges When node n fails, successor (n) must be found ? Successor (n) must has a copy of the key/value pair n’s failure must not disrupt queries in progress as the
system is re-stabilizing
How to find n’s successor after n How to find n’s successor after n fails ?fails ?
To find: n’s predecessor (n2) found that n doesn’t respond in
stabilizing To recover:
n2 looks through n2’s finger table for the first live node n1
n2 asks n1 for successor(n2) and uses the result as n2’s new successor
How to ensure that n’s successor has How to ensure that n’s successor has a copy after n fails ?a copy after n fails ?
Each Chord node maintains a list of its r nearest successors. Each successor has the data copy. After node n fails, queries for its keys automatically end
up at its successor
How to keep in-progress queries How to keep in-progress queries undisrupted ?undisrupted ?
Detect failure: Before stabilization has completed, node failure can be
detected by timing out the requests
Continue query: Any node with identifier close to the failed node’s
identifier will have similar table entries Such node can be used to route requests at a slight extra
cost in route length
Simulation results Path length: the number of nodes traversed by a lookup operation
Average: O(logN)
Simulation results (cont.)
Miss rate due to state inconsistency Miss rate due to state inconsistency increases fast with failure frequency
miss rate due to node failures (key lost) <
APIs provided by Chord systemAPIs provided by Chord system
Part Two: CFS
CFS System structureCFS System structureInterprets blocks as files;Interprets blocks as files;Present a file system interface to applicationsPresent a file system interface to applicationsStores data blocks reliablyStores data blocks reliablyMaintains routing tables to find blocksMaintains routing tables to find blocks
Chord Layer --- Server selectionChord Layer --- Server selection
Factors to consider: Distance around the ID ring RPC latency
_
d
di --- the latency from node n to node ni
--- the average latency of all the RPCs that node n has ever issued
di --- the latency from node n to node ni
)( inH --- an estimate of the number of Chord hops that would remain after contacting ni
n
n2
key
n1
n3
DHash layer --- ReplicationDHash layer --- Replication
DHash places a block’s replicas at the k servers immediately after successor (block).
After successor (block) fails, the block is immediately available at the new successor (block)
Independence failure is provided :
close to each other in the ID ring ≠ physically close to each other
DHash layer --- CachingDHash layer --- Caching
How to cache? Cache replacement? --- LRU Cache vs. replication?
n1
n4
n2n3
……
Key
Cache ?Cache ?Cache ?
Data ? Replication is good for solving nodes failure Cache is good for loading balance
DHash layer --- Load balanceDHash layer --- Load balance
Break file systems into many distributed blocks Caching A real server can act as multiple virtual servers
--- ID is derived from hashing both the real server’s IP address and the index of the virtual server
DHash layer --- QuotaDHash layer --- Quota
Why quota? The total amount of storage an IP address can consume
will grow linearly with the total number of CFS servers Prevent malicious injection of large quantities of data
Example: If each CFS server limits any one IP address to using 0.1% of its storage, then an attacker would have to mount an attack from about 1000 machines for it to be successful
DHash layer --- Update and deleteDHash layer --- Update and delete Update
Content hash block: supplied key = SHA-1(block’s content) Root block: only publisher with private key can change it
Delete No delete, useful for recovering from malicious data insertion
(Good or not?)
Experimental resultsExperimental results lookup cost: O(logN)
Experimental results (cont.)Experimental results (cont.) caching
1000 servers
Experimental results (cont.)Experimental results (cont.) Effect of nodes failure --- lookup fail because of all replicas fail
6 replicas 1000 blocks 1000 servers
Some discussionSome discussion
Chord in sensor network ? Do you want to use CFS (since no delete) ? Build CFS over CAN and INS ? Lazy replica copying ?