Top Banner
Chord & CFS Presenter: Gang Zhou Nov. 11th, 2002 Email: [email protected] University of Virginia
41

Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: [email protected]@cs.virginia.edu University of Virginia.

Jan 12, 2016

Download

Documents

Edgar Tate
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Chord & CFS

Presenter: Gang Zhou Nov. 11th, 2002

Email: [email protected]

University of Virginia

Page 2: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Relation of: Chord, CAN, INS, CFS DCSRelation of: Chord, CAN, INS, CFS DCS

)(log nO ))(4/(1

dndPolynomial in number of attributes in name

specifiersScalability

DCS

INS (early binding) Chord CAN

CFS

File blocks Sensor

Page 3: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Outline of ChordOutline of Chord

Chord protocol What is Chord ? Why use Chord ? Basic Chord protocol Extended Chord protocol Simulation results

Chord system APIs provided by Chord system

Page 4: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Outline of CFSOutline of CFS

CFS system structure Chord layer

Server selection

DHash layer Replication Caching Load balance Quota Update and delete

Experimental results

Page 5: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Part One: Chord

Page 6: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

What is Chord?What is Chord?

node n

node n’ that stores key’s value

Value of key?

Application in n Value of key!

Page 7: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Why use Chord ?Why use Chord ?

High efficiency resolves a lookup via O(logN) messages to other

nodes

Fault tolerance

Chord scales well each node maintains information only about O(logN)

other nodes

Page 8: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Base Chord protocolBase Chord protocol

What is identifier cycle? Where to store a key? How to look up a key quickly ? Node joins (departures)

Page 9: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

The base Chord protocolThe base Chord protocol

What is identifier cycle ?

Node’ IP address Node’s m-bit identifierSHA-1

Key (a string) key’s m-bit identifierSHA-1

Secure Hash Algorithm The algorithm takes a message of less than 264 bits in length and

produces a 160-bit message digest

Key1

Key2

Node na position in id cycle

a node

Page 10: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Where is a key stored?Where is a key stored?

Successor (k): the first node encountered when moving in the clockwise direction starting from k in the identifier circle.

Page 11: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Value of key!

How to look up a key quickly ?How to look up a key quickly ?

Successor(key) ?

Value of key?n1

n4

n2n3

……

Key

jump

n1

Key

Value of key!

move

Value of key?

Page 12: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.) We need finger table

Page 13: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.) finger table for node 1finger table for node 1

Page 14: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

How to look up a key quickly ?(cont.)How to look up a key quickly ?(cont.)

Node 3: Am I predecessor(1) ?

Predecessor(1) successor(1)

Node 3: Try entry 3, and find node 0

Node 3: Send lookup to node 0

Node 0: Am I predecessor(1) ?

Node 0: successor(1) is node 1

return to node 3 (RPC)

Value of key 1

?

Page 15: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Node joinsNode joins

Two challenges Each node’s finger table is correctly filled Each key k is stored at node successor(k)

Three operations Initialize the predecessor and fingers of the new node n Update the fingers and predecessors of existing nodes Copy to n all keys for which node n has became their

successor

Page 16: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Initialize the predecessor and fingers Initialize the predecessor and fingers of node nof node n

Idea: Ask an existing node for information needed

Join in

Page 17: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Update the fingers and predecessors of existing nodes

Observation: when node n joins the network, n will become the ith finger of a node p when the following two conditions meet: P proceeds n by at least 2i-1

The ith finger of node p succeeds n

Solution: Try to find predecessor(n- 2i-1) for all 1<=i<=m; and check whether n is their ith finger, and whether n is their predecessor’s ith finger.

Page 18: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Update the fingers and predecessors of existing nodes (cont.)

Predecessor(6-21-1) =3, update

6

Predecessor(3) =1, no update

Predecessor(6-22-1) =3, update

6

Predecessor(6-23-1) =1, update6

Predecessor(1) =0, update

Predecessor(3) =1, no update

6

Predecessor(0) =3, no update

Join in

Page 19: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Copy to n all keys for which node n has became their successor

Idea: Node n can become the successor only for keys stored by the node immediately following n

Join in

Page 20: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Extended Chord protocolExtended Chord protocol

Concurrent joins Failures and replication

Page 21: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Concurrent joinsConcurrent joins

When multiple nodes with similar identifiers join at the same time, They tell the same predecessor that they are its successor. How to update predecessor and successor ?

notify

stabilize

Only the newcomer with the lowest (highest) identifier will be the predecessor (successor)

Periodically check whether new nodes have inserted themselves between a node and its immediate neighbors

Page 22: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Concurrent joins (cont.)Concurrent joins (cont.)

a

d

Example: Node b joins

ct1

b

t1+∆t

Succ (a)

Pred (b)

Succ(b)

Pred (c)

Succ(c)

Pred (d)

t0

d

a

t1

c

a

d

c

t1 +∆t

b

a

d

a

d

c

t2

b

a

c

b

d

c

notify stabilize

Page 23: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Failures and replicationFailures and replication

Challenges When node n fails, successor (n) must be found ? Successor (n) must has a copy of the key/value pair n’s failure must not disrupt queries in progress as the

system is re-stabilizing

Page 24: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

How to find n’s successor after n How to find n’s successor after n fails ?fails ?

To find: n’s predecessor (n2) found that n doesn’t respond in

stabilizing To recover:

n2 looks through n2’s finger table for the first live node n1

n2 asks n1 for successor(n2) and uses the result as n2’s new successor

Page 25: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

How to ensure that n’s successor has How to ensure that n’s successor has a copy after n fails ?a copy after n fails ?

Each Chord node maintains a list of its r nearest successors. Each successor has the data copy. After node n fails, queries for its keys automatically end

up at its successor

Page 26: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

How to keep in-progress queries How to keep in-progress queries undisrupted ?undisrupted ?

Detect failure: Before stabilization has completed, node failure can be

detected by timing out the requests

Continue query: Any node with identifier close to the failed node’s

identifier will have similar table entries Such node can be used to route requests at a slight extra

cost in route length

Page 27: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Simulation results Path length: the number of nodes traversed by a lookup operation

Average: O(logN)

Page 28: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Simulation results (cont.)

Miss rate due to state inconsistency Miss rate due to state inconsistency increases fast with failure frequency

miss rate due to node failures (key lost) <

Page 29: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

APIs provided by Chord systemAPIs provided by Chord system

Page 30: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Part Two: CFS

Page 31: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

CFS System structureCFS System structureInterprets blocks as files;Interprets blocks as files;Present a file system interface to applicationsPresent a file system interface to applicationsStores data blocks reliablyStores data blocks reliablyMaintains routing tables to find blocksMaintains routing tables to find blocks

Page 32: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Chord Layer --- Server selectionChord Layer --- Server selection

Factors to consider: Distance around the ID ring RPC latency

_

d

di --- the latency from node n to node ni

--- the average latency of all the RPCs that node n has ever issued

di --- the latency from node n to node ni

)( inH --- an estimate of the number of Chord hops that would remain after contacting ni

n

n2

key

n1

n3

Page 33: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

DHash layer --- ReplicationDHash layer --- Replication

DHash places a block’s replicas at the k servers immediately after successor (block).

After successor (block) fails, the block is immediately available at the new successor (block)

Independence failure is provided :

close to each other in the ID ring ≠ physically close to each other

Page 34: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

DHash layer --- CachingDHash layer --- Caching

How to cache? Cache replacement? --- LRU Cache vs. replication?

n1

n4

n2n3

……

Key

Cache ?Cache ?Cache ?

Data ? Replication is good for solving nodes failure Cache is good for loading balance

Page 35: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

DHash layer --- Load balanceDHash layer --- Load balance

Break file systems into many distributed blocks Caching A real server can act as multiple virtual servers

--- ID is derived from hashing both the real server’s IP address and the index of the virtual server

Page 36: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

DHash layer --- QuotaDHash layer --- Quota

Why quota? The total amount of storage an IP address can consume

will grow linearly with the total number of CFS servers Prevent malicious injection of large quantities of data

Example: If each CFS server limits any one IP address to using 0.1% of its storage, then an attacker would have to mount an attack from about 1000 machines for it to be successful

Page 37: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

DHash layer --- Update and deleteDHash layer --- Update and delete Update

Content hash block: supplied key = SHA-1(block’s content) Root block: only publisher with private key can change it

Delete No delete, useful for recovering from malicious data insertion

(Good or not?)

Page 38: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Experimental resultsExperimental results lookup cost: O(logN)

Page 39: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Experimental results (cont.)Experimental results (cont.) caching

1000 servers

Page 40: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Experimental results (cont.)Experimental results (cont.) Effect of nodes failure --- lookup fail because of all replicas fail

6 replicas 1000 blocks 1000 servers

Page 41: Chord & CFS Presenter: Gang ZhouNov. 11th, 2002 Email: gz5d@cs.virginia.edugz5d@cs.virginia.edu University of Virginia.

Some discussionSome discussion

Chord in sensor network ? Do you want to use CFS (since no delete) ? Build CFS over CAN and INS ? Lazy replica copying ?