Top Banner
Scalable Self- Repairing Publish/Subscribe Robbert van Renesse Ken Birman Werner Vogels Cornell University
32

Scalable Self-Repairing Publish/Subscribe

Jan 19, 2016

Download

Documents

Meghan

Scalable Self-Repairing Publish/Subscribe. Robbert van Renesse Ken Birman Werner Vogels Cornell University. Background. ISIS, Horus, Ensemble systems Strong properties (for replicated data) Adaptive (changing network/app behavior) Problems… as fast as slowest receiver “Jim Gray effect” - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scalable Self-Repairing Publish/Subscribe

Scalable Self-Repairing Publish/Subscribe

Robbert van Renesse

Ken Birman

Werner Vogels

Cornell University

Page 2: Scalable Self-Repairing Publish/Subscribe

Background

• ISIS, Horus, Ensemble systems– Strong properties (for replicated data)– Adaptive (changing network/app behavior)

• Problems…– as fast as slowest receiver– “Jim Gray effect”– no IP Multicast

Page 3: Scalable Self-Repairing Publish/Subscribe

New Direction

• Probabilistically Strong Guarantees– Randomized protocols

• Compartmentalization

• No reliance on IP multicast, clock sync

• Auto-configuration, self-repair

JBI

Page 4: Scalable Self-Repairing Publish/Subscribe

Three Main Components

• Astrolabe– Aggregation Service

• SelectCast– Dissemination Service

• Bimodal Multicast– End-to-end reliability

Page 5: Scalable Self-Repairing Publish/Subscribe

Aggregation

• Ability to summarize information from distributed sources.

• aka data fusion in sensor networks.

• The basis for scalability!

• Standard service in databases.

• Why not in distributed systems?

Page 6: Scalable Self-Repairing Publish/Subscribe

Examples

• Barrier Synchronization

• Voting

• Resource Location

• Multicast Routing

F

Page 7: Scalable Self-Repairing Publish/Subscribe

Astrolabe

• Astrolabe takes continuous snapshots of the global state of a distributed system, and aggregates this information in user-specified ways.

Page 8: Scalable Self-Repairing Publish/Subscribe

Four Design Principles

• Scalability through Hierarchy

• Flexibility through Mobile SQL

• Robustness through p2p Gossip

• Security through Certificates

Page 9: Scalable Self-Repairing Publish/Subscribe

DNS-like Domain HierarchyAttribute

list

Domains identified by path names

Page 10: Scalable Self-Repairing Publish/Subscribe

MIB• Each domain has an attribute list called

“MIB” (management information base).

• MIBs of internal domains generated by aggregating child domains’ MIBs.

Page 11: Scalable Self-Repairing Publish/Subscribe

Domain Table

• No servers for any domain: a MIB is replicated on all hosts in its domain!

• Each host maintains not only the MIBs of its own domains, but also those of its sibling domains.

• Sibling MIBs organized in “domain tables”.

Page 12: Scalable Self-Repairing Publish/Subscribe

Domain Table Example

ID CONTACTS ISSUED NMEMBERS MIN(LOAD)

dom1 10.0.0.1

10.0.0.2

T1 5 0.31

dom2 10.0.1.1 T2 10 0.13

dom3 10.0.2.3 T3 8 1.5

dom4 10.1.2.5

10.3.2.1

T4 18 0.0

Page 13: Scalable Self-Repairing Publish/Subscribe

Aggregation

id Load Weblogic? SMTP? Word Version

swift 2.0 0 1 6.2

falcon 1.5 1 0 4.1

cardinal 4.5 1 0 6.0

id Load Weblogic? SMTP? Word Version

gazelle 1.7 0 0 4.5

zebra 3.2 0 1 6.2

gnu .5 1 0 6.2

id Min Load

WL contact SMTP contact

domain1 1.5 123.45.61.3 123.45.61.17

domain2 1.7 127.16.77.6 127.16.77.11

domain3 3.1 14.66.71.8 14.66.71.12

Domain1 Domain2

SQL query “summarizes”

data

Dynamically changing query output is visible domain-wide (like spreadsheet)

Page 14: Scalable Self-Repairing Publish/Subscribe

Example queries

– SELECT SUM(nmembers) AS nmembers– SELECT MAX(depth) + 1 AS depth– SELECT MIN(minl) AS minl

• (minimum load)

– …

• Functions gossiped with everything else.

Page 15: Scalable Self-Repairing Publish/Subscribe

Aggregation

Name Load Weblogic? SMTP? Word Version

swift 2.0 0 1 6.2

falcon 1.5 1 0 4.1

cardinal 4.5 1 0 6.0

Name Load Weblogic? SMTP? Word Version

gazelle 1.7 0 0 4.5

zebra 3.2 0 1 6.2

gnu .5 1 0 6.2

Name Avg Load

WL contact SMTP contact

SF 2.6 123.45.61.3 123.45.61.17

NJ 1.8 127.16.77.6 127.16.77.11

Paris 3.1 14.66.71.8 14.66.71.12

Domain1 Domain2

Page 16: Scalable Self-Repairing Publish/Subscribe

Aggregation

Name Load Weblogic? SMTP? Word Version

swift 2.0 0 1 6.2

falcon 1.5 1 0 4.1

cardinal 4.5 1 0 6.0

Name Load Weblogic? SMTP? Word Version

gazelle 1.7 0 0 4.5

zebra 3.2 0 1 6.2

gnu .5 1 0 6.2

Name Avg Load

WL contact SMTP contact

SF 2.6 123.45.61.3 123.45.61.17

NJ 1.8 127.16.77.6 127.16.77.11

Paris 3.1 14.66.71.8 14.66.71.12

Domain1 Domain2

O(log n) info per host

Page 17: Scalable Self-Repairing Publish/Subscribe

Other Examples

1. Which are the three lowest loaded hosts?

2. Which domains contain hosts with an out-of-date virus database?

3. Do >30% of hosts measure elevated radiation?

4. Which domains contain subscribers interested in some topic?

5. Where is the nearest logging server?

Page 18: Scalable Self-Repairing Publish/Subscribe

Epidemic or Gossip Protocols

• Used to keep domain tables up-to-date

• Randomized Communication between (nearby) hosts:– Fast (latency grows O(log n))– Hard to stop (robust even in the face of Denial-of-

Service attacks)– Probabilistically Real-Time guarantees on latency

(based on epidemiological analysis).

Page 19: Scalable Self-Repairing Publish/Subscribe

How it works…

ID CONTACTS ISSUED NMEMBERS MIN(LOAD)

dom1 10.1.0.1

10.2.0.1

T1 5 0.23

dom2 10.3.0.1 T3 1 0.3

dom3 10.4.0.1 T4 8 0.0

ID CONTACTS ISSUED NMEMBERS MIN(LOAD)

domA 10.0.0.1

10.0.0.2

T5 2 0.31

domB 10.0.1.1 T6 1 0.13

domC 10.0.2.3 T7 2 1.5

domD 10.1.2.5

10.3.2.1

T8 3 0.0

gossip

SQL

Page 20: Scalable Self-Repairing Publish/Subscribe

SelectCast• Disseminate messages through Astrolabe

hierarchy

• (Application-level) Routers selected through domain aggregation:

SELECT

FIRST(3, routers) AS routers,

MIN(minload) AS minload

ORDER BY minload

Exploit heterogeneity, don’t hide it!

Page 21: Scalable Self-Repairing Publish/Subscribe

Multicast Tree

Page 22: Scalable Self-Repairing Publish/Subscribe

Fault Masking

Page 23: Scalable Self-Repairing Publish/Subscribe

Filtering (Pub/Sub)

• SQL condition on each message

• For example:– MIN(version) < 3– MAX(radiation) > 300– OR(subject) // BLOOM FILTERS– TRUE

• Generalization of topic based publishing

Page 24: Scalable Self-Repairing Publish/Subscribe

Filtering Example

Page 25: Scalable Self-Repairing Publish/Subscribe

Scalability

• Latency, memory use, CPU load, load on network links, all grow O(log N), and independent of update rate.

• Highly robust to omission and crash failures.

• Confirmed by analysis, simulation, and experiment.

• O(1) lookup for most useful queries.

Page 26: Scalable Self-Repairing Publish/Subscribe

Emulab topology (U. Utah)

Page 27: Scalable Self-Repairing Publish/Subscribe

Experiments

Page 28: Scalable Self-Repairing Publish/Subscribe

Real vs. Simulation

The real thing Simulation

Page 29: Scalable Self-Repairing Publish/Subscribe

Membership

• Domain failure detected when its attributes are no longer being updated.

• Domains discovered (and partitions repaired) through– gossip

– occasional broadcast and multicast

– configuration

• Special precautions for domains separated by firewalls and NAT boxes

Page 30: Scalable Self-Repairing Publish/Subscribe

Security

• Integrated PKI– integrity, no confidentiality– prevents “Sybil” Attacks

• Remove outliers– Summarize in a robust way

• Compartmentalize– Exploit domain hierarchy

Page 31: Scalable Self-Repairing Publish/Subscribe

Bimodal Multicast

• Probabilistic end-to-end reliability

• Uses IP Multicast or SelectCast for initial dissemination

• Runs a background gossip protocol to do repairs of message loss

• Performance improves with scale– share buffering load

Page 32: Scalable Self-Repairing Publish/Subscribe

Work in Progress

• Evaluate Scalability and Performance– emulation, simulation, deployment

• Improve support for low power apps– self configuration

• Improve expressiveness– pattern matching