Top Banner
Lecture 10 Naming services for flat namespaces
41

Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

Lecture 10

Naming services for flat namespaces

Page 2: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Logistics / reminders Project

Send Samer and me your group membership by the end of the week

Quizzes: Q1: next time Q2: 11/16

Page 3: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Implementation options: Flat namespace

Problem: Given an essentially unstructured name how can we design a scalable solution that associates names to addresses?

Possible designs: [last time] Simple solutions (broadcasting,

forwarding pointers) Hash table-like approaches

Consistent hashing, Distributed Hash Tables

Page 4: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Functionality to implement Map: names access points (addresses)

Similar to a hash-table Manage (huge) list of pairs (name, address)

or (key, value)

Put (key, value) Lookup (key) value

Key idea: partitioning. Allocate parts of the list to different nodes

Page 5: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Why the put()/get() interface?

API supports a wide range of applications imposes no structure/meaning on keys

Key/value pairs are persistent and global Can store keys in other values (indirection) And thus build complex data structures

Page 6: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Why Might The Design Be Hard?

Decentralized: no central authority Scalable: low network traffic overhead Efficient: find items quickly (latency) Dynamic: nodes fail, new nodes join General-purpose: flexible naming

Page 7: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

The Lookup Problem

Internet

N1

N2 N3

N6N5

N4

Publisher

Put (Key=“title”Value=file data…) Client

Get(key=“title”)

?

• At the heart of all these services

Page 8: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Motivation: Centralized Lookup (Napster)

Publisher@

Client

Lookup(“title”)

N6

N9 N7

DB

N8

N3

N2N1SetLoc(“title”, N4)

Simple, but O(N) state and a single point of failure

Key=“title”Value=file data…

N4

Page 9: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Motivation: Flooded Queries (Gnutella)

N4Publisher@

Client

N6

N9

N7N8

N3

N2N1

Robust, but worst case O(N) messages per lookup

Key=“title”Value=file data…

Lookup(“title”)

Page 10: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Motivation: FreeDB, Routed DHT Queries (Chord, &c.)

N4Publisher

Client

N6

N9

N7N8

N3

N2N1

Lookup(H(audio data))

Key=H(audio data)Value={artist,

album title, track title}

Page 11: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Hash table-like approaches Consistent hashing, Distributed Hash Tables

Page 12: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Partition Solution: Consistent hashing

Consistent hashing: the output range of a hash function is treated as a

fixed circular space or “ring”.

CircularID Space N32

N10

N100

N80

N60

Key ID Node ID

K52

K30

K5

K99

K11

K33

128 0

Page 13: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Partition Solution: Consistent hashing

Mapping keys to nodes Advantages: incremental scalability, load

balancing

N32

N10

N100

N80

N60

CircularID Space

K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID

Page 14: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Consistent hashing

How do store & lookup work?

N32

N10

N100

N80

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID

“Key 5 isAt N10”

What node stores K5?

Page 15: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Additional trick: Virtual Nodes

Problem: How to do load balancing when nodes are heterogeneous?

Solution idea: Each node owns an ID space proportional to its ‘power’

Virtual Nodes: Each physical node hosts multiple (similar) virtual nodes. Virtual nodes are treated the sameAdvantages: load balancing, incremental scalability, dealing with

failures Dealing with heterogeneity: The number of virtual nodes that a

node is responsible for can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

When a node joins (if it supports many VN) it accepts a roughly equivalent amount of load from each of the other existing nodes.

If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.

Page 16: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Consistent Hashing – Summary so far

Mechanism: Nodes get an identity by hashing their IP address, keys are also

hashed into same space A key with id (hashed into) k, is assigned to first node whose

hashed id is equal or follows k, in circular space: successor(k)

Advantage Incremental scalability, load balancing Theoretical results:

[N number of nodes, k number of keys in the system] [With high probability] Each node is responsible for at most

(1+)K/N keys [With high probability] Joining or leaving of a node relocates

O(K/N) keys (and only to or from the responsible node)

Page 17: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

BUT Consistent hashing – problem

How large is the state maintained at each node? O(N); N number of nodes.

N32

N10

N100

N80

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID

“Key 5 isAt N10”

Page 18: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Basic Lookup (nonsolution)

N32

N10

N5

N20

N110

N99

N80

N60

N40

“Where is key 50?”

“Key 50 isAt N60”

• Lookups find the ID’s successor• Correct if successors are correct

Page 19: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Successor Lists Ensure Robust Lookup

N32

N10

N5

N20

N110

N99

N80

N60

• Each node remembers r successors• Lookup can skip over dead nodes

N40

10, 20, 32

20, 32, 40

32, 40, 60

40, 60, 80

60, 80, 99

80, 99, 110

99, 110, 5

110, 5, 10

5, 10, 20

Page 20: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

“Finger Table” Accelerates Lookups

N80

½¼

1/8

1/161/321/641/128

Page 21: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Lookups take O(log N) hops

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

Page 22: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Summary of Performance Characteristics

Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership

changes

Page 23: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Joining the Ring Three step process

Initialize all fingers of new node Update fingers of existing nodes Transfer keys from successor to new node

Two invariants to maintain to insure correctness Each node’s successor list is maintained successor(k) is responsible for monitoring k

Page 24: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

N36

1. Lookup(37,38,40,…,100,164)

N60

N40

N5

N20N99

N80

Join: Initialize New Node’s Finger Table

Locate any node p in the ring Ask node p to lookup fingers of new

node

Page 25: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

N36

N60

N40

N5

N20N99

N80

Join: Update Fingers of Existing Nodes

New node calls update function on existing nodes Existing nodes recursively update fingers of other

nodes

Page 26: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Copy keys 21..36from N40 to N36 (the others saty)K30

K38

N36

N60

N40

N5

N20N99

N80

K30

K38

Join: Transfer Keys

Only keys in the range are transferred

Page 27: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

N120

N113

N102

N80

N85

N10

Lookup(90)

Handling Failures Problem: Failures could cause incorrect lookup Solution: Fallback: keep track of successor’s

successor (i.e., keep list of r successors)

Page 28: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications28

Choosing Successor List Length

r - length of successor list N – nodes in the system

Assume 50% of the nodes fail P(successor list all dead for a specific node) =

(1/2)r i.e., P(this node breaks the ring) depends on independent failure assumption

P(no broken nodes) = (1 – (1/2)r)N

r = 2log(N) makes prob. = 1 – 1/N

Page 29: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

DHT – Summary so far

Mechanism: Nodes get an identity by hashing their IP address, keys are also

hashed into same space A key with id (hashed into) k, is assigned to first node whose

hashed id is equal or follows k, in circular space: successor(k)

Properties Incremental scalability, good load balancing Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership changes

Page 30: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Some experimental results

Page 31: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Chord Lookup Cost Is O(log N)

Number of Nodes

Avera

ge M

ess

ag

es

per

Looku

p

Constant is 1/2

Page 32: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Failure Experimental Setup Start 1,000 CFS/Chord servers

Successor list has 20 entries Wait until they stabilize Insert 1,000 key/value pairs

Five replicas of each Stop X% of the servers Immediately perform 1,000 lookups

Page 33: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

DHash Replicates Blocks at r Successors

N40

N10

N5

N20

N110

N99

N80

N60

N50

Block17

N68

• Replicas are easy to find if successor fails• Hashed node IDs ensure independent failure

Page 34: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Massive Failures Have Little Impact

0

0.2

0.4

0.6

0.8

1

1.2

1.4

5 10 15 20 25 30 35 40 45 50

Faile

d L

ooku

ps

(Perc

en

t)

Failed Nodes (Percent)

(1/2)6 is 1.6%

Page 35: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Applications

Page 36: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

An Example Application: The CD Database

Compute Disc Fingerprint

Recognize Fingerprint?

Album & Track Titles

Page 37: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

An Example Application: The CD Database

Type In Album andTrack Titles

Album & Track Titles

No Such Fingerprint

Page 38: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

A DHT-Based FreeDB Cache FreeDB is a volunteer service

Has suffered outages as long as 48 hours Service costs born largely by volunteer

mirrors Idea: Build a cache of FreeDB with a

DHT Add to availability of main service Goal: explore how easy this is to do

Page 39: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Cache Illustration

DHTDHTNew Albums

Disc Fingerp

rint

Disc In

fo

Disc Fingerprint

Page 40: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Trackerless BitTorrent:

A client wants to download the file: Contacts the tracker identified in

the .torrent file (using HTTP) Tracker sends client a (random)

list of peers who have/are downloading the file

Client contacts peers on list to see which segments of the file they have

Client requests segments from peers Client reports to other peers it knows about that it

has the segment Other peers start to contact client to get the segment

(while client is getting other segments)

Page 41: Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

EECE 411: Design of Distributed Software Applications

Next

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate Cooperate => support needed

Naming – enables some resource sharing Synchronization