Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: [email protected]@cs.virginia.edu University of Virginia.

Chord & CFS

Presenter: Gang Zhou Nov. 11th, 2002

Email: [email protected]

University of Virginia

Relation of: Chord, CAN, INS, CFS DCSRelation of: Chord, CAN, INS, CFS DCS

)(log nO ))(4/(1

dndPolynomial in number of attributes in name

specifiersScalability

DCS

INS (early binding) Chord CAN

CFS

File blocks Sensor

Outline of ChordOutline of Chord

Chord protocol What is Chord ? Why use Chord ? Basic Chord protocol Extended Chord protocol Simulation results

Chord system APIs provided by Chord system

Outline of CFSOutline of CFS

CFS system structure Chord layer

Server selection

DHash layer Replication Caching Load balance Quota Update and delete

Experimental results

Part One: Chord

What is Chord?What is Chord?

node n

node n’ that stores key’s value

Value of key?

Application in n Value of key!

Why use Chord ?Why use Chord ?

High efficiency resolves a lookup via O(logN) messages to other

nodes

Fault tolerance

Chord scales well each node maintains information only about O(logN)

other nodes

Base Chord protocolBase Chord protocol

What is identifier cycle? Where to store a key? How to look up a key quickly ? Node joins (departures)

The base Chord protocolThe base Chord protocol

What is identifier cycle ?

Node’ IP address Node’s m-bit identifierSHA-1

Key (a string) key’s m-bit identifierSHA-1

Secure Hash Algorithm The algorithm takes a message of less than 264 bits in length and

produces a 160-bit message digest

Key1

Key2

Node na position in id cycle

a node

Where is a key stored?Where is a key stored?

Successor (k): the first node encountered when moving in the clockwise direction starting from k in the identifier circle.

Value of key!

How to look up a key quickly ?How to look up a key quickly ?

Successor(key) ?

Value of key?n1

n4

n2n3

……

Key

jump

n1

Key

Value of key!

move

Value of key?

How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.) We need finger table

How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.) finger table for node 1finger table for node 1

How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.)

Node 3: Am I predecessor(1) ?

Predecessor(1) successor(1)

Node 3: Try entry 3, and find node 0

Node 3: Send lookup to node 0

Node 0: Am I predecessor(1) ?

Node 0: successor(1) is node 1

return to node 3 (RPC)

Value of key 1

?

Node joinsNode joins

Two challenges Each node’s finger table is correctly filled Each key k is stored at node successor(k)

Three operations Initialize the predecessor and fingers of the new node n Update the fingers and predecessors of existing nodes Copy to n all keys for which node n has became their

successor

Initialize the predecessor and fingers Initialize the predecessor and fingers of node nof node n

Idea: Ask an existing node for information needed

Join in

Update the fingers and predecessors of existing nodes

Observation: when node n joins the network, n will become the ith finger of a node p when the following two conditions meet: P proceeds n by at least 2i-1

The ith finger of node p succeeds n

Solution: Try to find predecessor(n- 2i-1) for all 1<=i<=m; and check whether n is their ith finger, and whether n is their predecessor’s ith finger.

Update the fingers and predecessors of existing nodes (cont.)

Predecessor(6-21-1) =3, update

6

Predecessor(3) =1, no update

Predecessor(6-22-1) =3, update

6

Predecessor(6-23-1) =1, update6

Predecessor(1) =0, update


6


Join in

Copy to n all keys for which node n has became their successor

Idea: Node n can become the successor only for keys stored by the node immediately following n

Join in

Extended Chord protocolExtended Chord protocol

Concurrent joins Failures and replication

Concurrent joinsConcurrent joins

When multiple nodes with similar identifiers join at the same time, They tell the same predecessor that they are its successor. How to update predecessor and successor ?

notify

stabilize

Only the newcomer with the lowest (highest) identifier will be the predecessor (successor)

Periodically check whether new nodes have inserted themselves between a node and its immediate neighbors

Concurrent joins (cont.)Concurrent joins (cont.)

a

d

Example: Node b joins

ct1

b

t1+∆t

Succ (a)

Pred (b)

Succ(b)

Pred (c)

Succ(c)

Pred (d)

t0

d

a

t1

c

a

d

c

t1 +∆t

b

a

d

a

d

c

t2

b

a

c

b

d

c

notify stabilize

Failures and replicationFailures and replication

Challenges When node n fails, successor (n) must be found ? Successor (n) must has a copy of the key/value pair n’s failure must not disrupt queries in progress as the

system is re-stabilizing

How to find n’s successor after n How to find n’s successor after n fails ?fails ?

To find: n’s predecessor (n2) found that n doesn’t respond in

stabilizing To recover:

n2 looks through n2’s finger table for the first live node n1

n2 asks n1 for successor(n2) and uses the result as n2’s new successor

How to ensure that n’s successor has How to ensure that n’s successor has a copy after n fails ?a copy after n fails ?

Each Chord node maintains a list of its r nearest successors. Each successor has the data copy. After node n fails, queries for its keys automatically end

up at its successor

How to keep in-progress queries How to keep in-progress queries undisrupted ?undisrupted ?

Detect failure: Before stabilization has completed, node failure can be

detected by timing out the requests

Continue query: Any node with identifier close to the failed node’s

identifier will have similar table entries Such node can be used to route requests at a slight extra

cost in route length

Simulation results Path length: the number of nodes traversed by a lookup operation

Average: O(logN)

Simulation results (cont.)

Miss rate due to state inconsistency Miss rate due to state inconsistency increases fast with failure frequency

miss rate due to node failures (key lost) <

APIs provided by Chord systemAPIs provided by Chord system

Part Two: CFS

CFS System structureCFS System structureInterprets blocks as files;Interprets blocks as files;Present a file system interface to applicationsPresent a file system interface to applicationsStores data blocks reliablyStores data blocks reliablyMaintains routing tables to find blocksMaintains routing tables to find blocks

Chord Layer --- Server selectionChord Layer --- Server selection

Factors to consider: Distance around the ID ring RPC latency

_

d

di --- the latency from node n to node ni

--- the average latency of all the RPCs that node n has ever issued

di --- the latency from node n to node ni

)( inH --- an estimate of the number of Chord hops that would remain after contacting ni

n

n2

key

n1

n3

DHash layer --- ReplicationDHash layer --- Replication

DHash places a block’s replicas at the k servers immediately after successor (block).

After successor (block) fails, the block is immediately available at the new successor (block)

Independence failure is provided :

close to each other in the ID ring ≠ physically close to each other

DHash layer --- CachingDHash layer --- Caching

How to cache? Cache replacement? --- LRU Cache vs. replication?

n1

n4

n2n3

……

Key

Cache ?Cache ?Cache ?

Data ? Replication is good for solving nodes failure Cache is good for loading balance

DHash layer --- Load balanceDHash layer --- Load balance

Break file systems into many distributed blocks Caching A real server can act as multiple virtual servers

--- ID is derived from hashing both the real server’s IP address and the index of the virtual server

DHash layer --- QuotaDHash layer --- Quota

Why quota? The total amount of storage an IP address can consume

will grow linearly with the total number of CFS servers Prevent malicious injection of large quantities of data

Example: If each CFS server limits any one IP address to using 0.1% of its storage, then an attacker would have to mount an attack from about 1000 machines for it to be successful

DHash layer --- Update and deleteDHash layer --- Update and delete Update

Content hash block: supplied key = SHA-1(block’s content) Root block: only publisher with private key can change it

Delete No delete, useful for recovering from malicious data insertion

(Good or not?)

Experimental resultsExperimental results lookup cost: O(logN)

Experimental results (cont.)Experimental results (cont.) caching

1000 servers

Experimental results (cont.)Experimental results (cont.) Effect of nodes failure --- lookup fail because of all replicas fail

6 replicas 1000 blocks 1000 servers

Some discussionSome discussion

Chord in sensor network ? Do you want to use CFS (since no delete) ? Build CFS over CAN and INS ? Lazy replica copying ?

Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: [email protected]@cs.virginia.edu University of Virginia.

Documents