Top Banner
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster-based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington SOSP 1999 Presented by: Fabián E. Bustamante
21

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

Dec 23, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

CS 443 Advanced OS

Fabián E. Bustamante, Spring 2005

Porcupine: A Highly Available Cluster-based Mail Service

Y. Saito, B. Bershad, H. Levy

U. Washington

SOSP 1999

Presented by: Fabián E. Bustamante

Page 2: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

2

Porcupine – goals & requirements

Use commodity hardware to build a large, scalable mail service

Main goal – scalability in terms of– Manageability - large but easy to manage

• Self-configure w/ respect to load and data distribution• Self-heal with respect to failure & recovery

– Availability – survive failures gracefully• Failure may prevent some users to access email

– Performance – scale linear with cluster size• Target – 100s of machines ~ billions of mail msgs/day

Page 3: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

3

Key Techniques and Relationships

Functional Homogeneity“any node can perform any task”

AutomaticReconfiguration

Dynamic SchedulingReplication

ManageabilityPerformanc

eAvailability

Framework

Techniques

Goals

Page 4: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

4

Why Email?

Mail is important– Real demand – Saito now works for Google

Mail is hard– Write intensive– Low locality

Mail is easy– Well-defined API– Large parallelism– Weak consistency

Page 5: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

5

Conventional Mail Solution

Static partitioning

Performance problems:– No dynamic load balancing

Manageability problems:– Manual data partition

Availability problems:– Limited fault tolerance

SMTP/IMAP/POP

Jeanine’smbox

Luca’smbox

Joe’smbox

Suzy’smbox

NFS servers

Page 6: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

6

Porcupine Architecture

Node A ...Node B Node Z...

SMTPserver

POPserver

IMAPserver

Mail mapMailbox storage

User profile

Replication Manager

Membership Manager

RPC

Load Balancer

User map

Page 7: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

7

Porcupine Operations

DNS-RR selection

Protocol handling

User lookup

Load Balancing

Message store

Internet

A B...

A

1. “send mail to luca”

2. Who manages luca? A

3. “Verify luca”

5. Pick the best nodes to store new msg C

4. “OK, luca has msgs on C and D

6. “Store msg”

B

C

...C

Page 8: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

8

Basic Data Structures

“luca”

B C A C A B A C

Luca: {A,C}ann: {B}

B C A C A B A C

suzy: {A,C} joe: {B}

B C A C A B A C

Apply hash function

User map

Mail map/user info

Mailbox storage

A B C

Luca’s MSGs

Suzy’s MSGs

Bob’s MSGs

Joe’s MSGs

Ann’s MSGs

Suzy’s MSGs

Page 9: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

9

Porcupine Advantages

Advantages:– Optimal resource utilization– Automatic reconfiguration and task re-distribution

upon node failure/recovery– Fine-grain load balancing

Results:– Better Availability– Better Manageability– Better Performance

Page 10: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

10

Performance

Goals– Scale performance linearly with cluster size

Strategy: Avoid creating hot spots– Partition data uniformly among nodes– Fine-grain data partition

Page 11: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

11

Measurement Environment

30 node cluster of not-quite-all-identical PCs– 100Mb/s Ethernet + 1Gb/s hubs– Linux 2.2.7– 42,000 lines of C++ code

Synthetic load

Compare to sendmail+popd

Page 12: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

12

How does Performance Scale?

0

100

200

300

400

500

600

700

800

0 5 10 15 20 25 30Cluster size

Messages/second

Porcupine

sendmail+popd68m/day

25m/day

Page 13: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

13

Availability

Goals:– Maintain function after failures– React quickly to changes regardless of cluster size– Graceful performance degradation / improvement

Strategy: Two complementary mechanisms– Hard state: email messages, user profile– Optimistic fine-grain replication– Soft state: user map, mail map – Reconstruction after membership

change

Page 14: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

14

Soft-state Reconstruction

1. Membership protocolUsermap recomputation

2. Distributed disk scan

Timeline

A

B

B C A B A B A C

luca: {A,C}

joe: {C}

B C A B A B A C

ann: {B}

B C A B A B A C

suzy: {A,B}C

B A A B A B A B

luca: {A,C}

joe: {C}

B A A B A B A B

suzy:

ann:

ann: {B}

B C A B A B A C

suzy: {A,B}

A C A C A C A C

luca: {A,C}

joe: {C}

A C A C A C A C

suzy: {A,B}

ann: {B}

ann: {B}

B C A B A B A C

suzy: {A,B}

Page 15: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

15

Reaction to Configuration Changes

300

400

500

600

700

0 100 200 300 400 500 600 700 800Time(seconds)

Messages/second

No failure

One nodefailureThree nodefailuresSix nodefailures

Nodes fail

New membership determined

Nodes recover

New membership determined

Page 16: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

16

Hard-state Replication

Goals:– Keep serving hard state after failures– Handle unusual failure modes

Strategy: Exploit Internet semantics– Optimistic, eventually consistent replication– Per-message, per-user-profile replication– Efficient during normal operation– Small window of inconsistency

Page 17: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

17

Replication Efficiency

0

100

200

300

400

500

600

700

800

0 5 10 15 20 25 30Cluster size

Me

ss

ag

es

/se

co

nd

Porcupine no replication

Porcupine with replication=2 68m/day

24m/day

Page 18: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

18

Replication Efficiency

0

100

200

300

400

500

600

700

800

0 5 10 15 20 25 30Cluster size

Me

ss

ag

es

/se

co

nd

Porcupine no replication

Porcupine with replication=2

Porcupine with replication=2, NVRAM 68m/day

24m/day

33m/day

Pretending – remove disk flushing from disk

logging routines.

Page 19: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

19

Load balancing: Storing messages

Goals:– Handle skewed workload well– Support hardware heterogeneity– No voodoo parameter tuning

Strategy: Spread-based load balancing– Spread: soft limit on # of nodes per mailbox

• Large spread better load balance• Small spread better affinity

– Load balanced within spread– Use # of pending I/O requests as the load

measure

Page 20: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

20

Support of Heterogeneous Clusters

0%

10%

20%

30%

0% 3% 7% 10%Number of fast nodes (% of total)

Th

rou

gh

pu

t in

crea

se(%

)

Spread=4

Static+16.8m/day (+25%)

+0.5m/day (+0.8%)

Node heterogeneity – 0% all nodes ~ at same speed, 3,7 & 10% - percentage of nodes w/ very fast disks

Relative performance improvement.

Page 21: CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

21

Conclusions

Fast, available, and manageable clusters can be built for write-intensive serviceKey ideas can be extended beyond mail– Functional homogeneity– Automatic reconfiguration– Replication– Load balancing

Ongoing work– More efficient membership protocol– Extending Porcupine beyond mail: Usenet,

Calendar, etc – More generic replication mechanism