Top Banner
Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University
28

Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Jan 03, 2016

Download

Documents

Garey Paul
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Ensemble and BeyondPresentation to David Tennenhouse, DARPA ITO

Ken Birman

Dept. of Computer Science

Cornell University

Page 2: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Quick Timeline

• Cornell has developed 3 generations of reliable group communication technology– Isis Toolkit: 1987-1990– Horus System: 1990-1994– Ensemble System: 1994-1999

• Today starting a major shift in emphasis– Spinglass Project: 1999-

Page 3: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Questions to consider

• Have these projects been successful?

• What is the future of Ensemble if we move to a new and different focus?

• Nature of the new opportunity we now perceive

Page 4: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Timeline

Isis Horus Ensemble

• Introduced reliability into group computing• Virtual synchrony execution model• Fairly elaborate, monolithic, but adequate speed• Many transition successes

� New York, Swiss Stock Exchanges� French Air Traffic Control console system� Southwestern Bell Telephone network mgt.� Hiper-D (next generation AEGIS)

Page 5: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Virtual Synchrony Model

crash

G0={p,q} G1={p,q,r,s} G2={q,r,s} G3={q,r,s,t}

p

q

r

s

tr, s request to join

r,s added; state xfer

t added, state xfer

t requests to join

p fails

Page 6: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Why a “model”?

• Models can be reduced to theory – we can prove the properties of the model, and can decide if a protocol achieves it

• Enables rigorous application-level reasoning

• Otherwise, the application must guess at possible misbehaviors and somehow overcome them

Page 7: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

French ATC system (simplified)

Controllers

Air Traffic Database(flight plans, etc)

X.500 Directory

Radar

Onboard

Page 8: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

A center contains...

• Perhaps 50 “teams” of 3-5 controllers each

• Each team supported by workstation cluster

• Cluster-style database server has flight plan information

• Radar server distributes real-time updates

• Connections to other control centers (40 or so in all of Europe, for example)

Page 9: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Process groups arise here:

• Cluster of servers running critical database server programs

• Cluster of controller workstations support ATC by teams of controllers

• Radar must send updates to the relevant group of control consoles

• Flight plan updates must be distributed to the “downstream” control centers

Page 10: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Use of our model?

• French government knows requirements for safety in ATC application

• With our model, we can reduce their need to a formal set of statements

• This lets us establish that our solution will really be safe in their setting

• Contrast with usual ad-hoc methodologies...

Page 11: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Timeline

Isis Horus Ensemble

• Simpler, faster group communication system• Uses a modular layered architecture. Layers are “compiled,” headers compressed for speed• Supports dynamic adaptation and real-time apps• Partitionable version of virtual synchrony• Transitioned primarily through Stratus Computer

� Phoenix system� Basis of Stratus f.tol. Proposal to OMG

Page 12: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Layered Microprotocols in HorusInterface to Horus is extremely flexible

Horus manages group abstraction

group semantics (membership, actions,events) defined by stack of modules

encryptencrypt

vsyncvsyncfilterfilter

signsign

ftolftolEnsemble stacksplug-and-playmodules to givedesign flexibilityto developer

Page 13: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Processes Communicate Through Identical Multicast Protocol Stacks

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

Page 14: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Superimposed Groups in Application With Multiple Subsystems

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

Magenta group for video communication

Orange forcontrol andcoordination

Page 15: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Timeline

Isis Horus Ensemble

• Horus-like stacking architecture, equally fast• Includes an innovative group-key mechanism for secure group multicast and key management• Uses high level language and can be formally proved correct, an unexpected and major success• Many early transition successes

� SC-21, Quorum via collaboration with BBN� Nortel, STC: potential commercial users� Discussions with MS (COM+), Sun (RMI.next): could be basis of standards.

Page 16: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Proving Ensemble Correct

• Unlike Isis and Horus, Ensemble is coded in a language with strong semantics (ML)

• So we took a spec. of virtual synchrony from MIT’s IOA group (Nancy Lynch)

• And are actually able to prove that our code implements the spec. and that the spec captures the virtual synchrony property!

Page 17: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

What Next?

• Continue some work with Ensemble– Keep it alive, support and extend it– Play an active role in transition– Assist standards efforts

• But shift in focus to a completely new effort– Emphasize adaptive behavior, extreme scalability,

robustness against local disruption– Fits “Intrinisically Survivable Systems” initiative

Page 18: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Throughput Stability: Achilles Heel of Group Multicast

• When scaled to even modest environments, overheads of virtual synchrony become a problem– One serious challenge involves management of

group membership information– But multicast throughput also becomes unstable

with high data rates, large system size, too.

• Stability of protocols like SRM unknown

Page 19: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Stock Exchange Problem: Vsync. multicast is too “fragile”

Most members are healthy….

… but one is slow

Page 20: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Effect of Perturbation

050

100150200250

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

am ount perturbed

thro

ughp

ut

(msg

s/sec

) Virtual Synchrony Protocol

Pbcast Protocol

Figure 1: Multicast throughput in an 8-member group perturbed by transient failures

IdealActual

Page 21: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Bimodal Multicast in Spinglass

• A new family of protocols with stable throughput, extremely scalable, fixed and low overhead per process and per message

• Gives tunable probabilistic guarantees

• Includes a membership protocol and a multicast protocol

• Requires some very weak QoS assumptions

Page 22: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Start by using unreliable multicast to rapidly distribute the message. But some messages may not get through, and some processes may be faulty. So initial state involves partial distribution of multicast(s)

Page 23: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Periodically (e.g. every 100ms) each process sends a digest describing its state to some randomly selected group member. The digest identifies messages. It doesn’t include them.

Page 24: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Recipient checks the gossip digest against its own history and solicits a copy of any missing message from the process that sent the gossip

Page 25: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Processes respond to solicitations received during a round of gossip by retransmitting the requested message. The round lasts much longer than a typical RPC time.

Page 26: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Pbcast bimodal delivery distribution

1.E-30

1.E-25

1.E-20

1.E-15

1.E-10

1.E-05

1.E+00

0 5 10 15 20 25 30 35 40 45 50

number of processes to deliver pbcast

p{#p

roce

sses

=k}

Scalability of Pbcast reliability

1.E-35

1.E-30

1.E-25

1.E-20

1.E-15

1.E-10

1.E-05

10 15 20 25 30 35 40 45 50 55 60

#processes in system

P{f

ailu

re}

Predicate I Predicate II

Ef fects of fanout on reliability

1.E-161.E-141.E-121.E-101.E-081.E-061.E-041.E-021.E+00

1 2 3 4 5 6 7 8 9 1 0

fanout

P{f

ailu

re}

Predicate I Predicate II

Fanout required for a specif ied reliability

44.55

5.56

6.57

7.58

8.59

20 25 30 35 40 45 50

#processes in system

fano

ut

Predicate I for 1E-8 reliability

Predicate II for 1E-12 reliability

Figure 5: Graphs of analytical results

Page 27: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

High Bandwidth measurements with varying numbers of sleepers

0

50

100

150

200

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

Probability of sleep event

Thro

ughp

ut

mea

sure

d at

un

pertu

rbed

pr

oces

s

Traditional w/1 s leeper

Pbcas t w/1 s leeper

Traditional w/3 s leepers

Pbcas t w 3/s leepers

Traditional w/5 s leepers

Pbcas t w/5 s leepers

Page 28: Ensemble and Beyond Presentation to David Tennenhouse, DARPA ITO Ken Birman Dept. of Computer Science Cornell University.

Spinglass: Summary of objectives

• Radically different approach yields stable, scalable protocols with steady throughput

• Small footprint, tunable to match conditions

• Completely asynchronous, hence demands new style of application development

• But opens the door to a new lightweight reliability technology supporting large autonomous environments that adapt