Top Banner
Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1 , Danny Dolev 1 , Hanna Parnas 2 1 School of Engineering and Computer Science, 2 Department of Neurobiology and the Otto Loewi Center for Cellular and Molecular Neurobiology, ---------------------
34

Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Robustness of Computer Networks is a Matter of the Heart

orHow to Make Distributed Computer

Networks Extremely Robust

Ariel Daliot1, Danny Dolev1, Hanna Parnas2

1School of Engineering and Computer Science,2Department of Neurobiology and the Otto Loewi Center for

Cellular and Molecular Neurobiology, The Hebrew University of Jerusalem, Israel

---------------------

Page 2: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Lecture Outline

Definition of RobustnessRobustness of Biological Systems vs. Engineered SystemsCarrying Over a Mechanism for Robustness (pulse synchronization) from Biology to Distributed Networks“Riding Two Tigers” – Superimposing two orthogonal fault models to attain extreme robustness of almost any distributed algorithm

Page 3: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

General Definition of Robustness

“Robustness is what enables a system to maintain its functionalities in spite of external and internal perturbations”

Page 4: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Robustness as modeled in non-linear dynamics

The system state is a point in phase spaceThere is an attractor (point or limit cycle) in phase space which represents the desired functionality of the systemPerturbations forcefully move the point representing the system’s stateThe robustness is the property of attraction The degree of robustness is characterized by the basin of attractionUpon a perturbation robustness can manifest itself in one of two ways: – The system returns to its current attractor (e.g. heart-beat when resting)– The system moves to a new attractor that maintains the system’s

functionality (e.g. heart-beat when walking at a constant speed)

Otherwise, unstable regions of phase space can be reached (e.g. heart-beat rate that starts to do damage to the organism)

Page 5: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Some Intuitive Conjectures on Robustness

It is advantageous for a system to have at least some degree of robustnessDemand for more robustness higher complexity There is a tradeoff between robustness and performanceThere is a tradeoff between robustness and costThere is a tradeoff between robustness and resource demandsThe degree of robustness of a system is a function of the nature and probability of the faultsThe world is dynamic thus robustness facilitates evolvability (the capacity for non-lethal heritable variation) and evolution selects robust traits

Page 6: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Robustness in Biologybiological systems rarely “blue-screen”

Page 7: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Robustness in Biology

Biological systems have extraordinary evolvabilityBiological systems are complex and fine tuned over the long course of evolutionPower laws are ubiquitous in nature (Pareto distribution), i.e. “20% of the causes account for 80% of the cases”Thus biological systems have evolved to be robust to most of the perturbation occurrencesI.e. Most perturbations falls on trajectories in the basin of attraction of some attractor that maintains system functionalityOn the other hand they are very fragile as certain perturbations can cause catastrophic, cascading failures (e.g. dinosaurs vs. mammals)Thus very robust to random failures but very vulnerable to targeted attacks; are or behave like scale-free networks(Barabasi, Nature, 2000)

Page 8: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Examples of Robustness in BiologyProtein Interaction Networks (Giot et. al., Science, 2003)

Chemotaxis in bacteria (Alon et. al., Nature, 99)

Circadian clockRobustness against mutationsHomeostasis (e.g. mammalian temperature regulation)Adaptability of organisms to changing environmentsMammals vs. dinosaursCardiac pacemaker (Sivan, Dolev & Parnas, 2000)

Page 9: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Mechanisms that Facilitate Biological Robustness

(Kitano, Nature 2004)

System Control (feedback mechanisms)Fail-safe mechanisms (redundancy and diversity)ModularityDecoupling (containment of faults inside the modules)

Similar principles are used to attain robustness in engineered systems!

Thus it makes much sense to search for and understand biological mechanisms for robustness that can be carried over to computer systems

Page 10: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Robustness in Engineered Systems

Low evolvability, sometimes constrained by historical and non-technical considerationsStart to be complex but much less than biological systemsSystems typically designed according to the very extremes of the power laws, i.e. robust to only the uttermost frequent perturbations, if at all.Design effort usually invested in performance and cost, less in robustness as this is costly and “not needed in the average case”Engineered systems are typically not robust to many perturbationsFurthermore are very fragile as certain perturbations can cause catastrophic or cascading failures (e.g. NYC blackout); are typically vulnerable to targeted attacks thought they shouldn't be!

Page 11: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Importance of Robustness in Distributed Computer Systems

Distributed systems become an integral part of daily systems

Distributed systems become increasingly more complex

This leads to an increased need for robustness

Page 12: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Example of Robustness of an Engineered System – AFCS

(maintaining of direction, altitude, velocity)

Page 13: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Fault Models in Distributed Computer Systems

Link/Omission faultsCrash/Stop/Napping faultsByzantine failures (ongoing “malicious” faults)Transient faults (system temporarily forced into an arbitrary state or total chaos)

Tolerating on-going perturbations and converging from any point in state space at the same time is a “wishful” property for a robust distributed system

Page 14: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Byzantine Faults

Maliciousness; two-faced behavior; code bugs that express themselves over time; hardware corruption; unpredictable behavior; unpredictable local faults; code corruptionUsually requires n>3f to tolerate f faults (without authentication)Byzantine algorithms typically focus on confining the influence of ongoing faults assuming an initial consistent state of correct nodesCan be modeled by arbitrary perturbations in a fraction of the n dimensions of state space in a certain time windowWithin that time window no perturbations whatsoever are allowed in the rest of the dimensionsA transient violation of the above will typically throw the system state forever into an unstable region of state space

Page 15: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Self-StabilizationAddresses the situation when ALL nodes can concurrently be faulty for a limited period of timeSelf-stabilizing algorithms focus on realizing the task following a “catastrophic” state, once the system is back within the assumption boundariesTypically modeled by an arbitrary perturbation in state space followed by no perturbation whatsoever in any of the dimensions until the state returns to the attractor

assumption boundaries

Page 16: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Byzantine Faults and Self-Stabilization

Self-stabilization is “orthogonal” to Byzantine failures, i.e. these are uncorrelated fault models

Are “complementary” fault models, superimposed both make an algorithm overcome any type of fault from any state

Very few protocols (~3) posses both properties despite decades of research. Two of these protocols have super-exponential time complexity

Page 17: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Cardiac ganglion of the lobster (Sivan, Dolev & Parnas, 2000)

Four interneurons tightly synchronize their pulses in order to give the heart its optimal pulse, fault tolerantly

Able to adjust the synchronized firing pace, up to a certain bound (e.g. while escaping a predator)

motor

neurons

|..|.. |.|.||.

|..|.. |..|.. |..|..

Page 18: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

The target is to synchronize pulses from any state and any faults

.....|.............|..................|.....................|...................|....

……...|.............|..................|.....................|..............|..........

.......|.............|..................|.....................|..................|..... t

……………......|.............|..................|.....................|.......................

…......|.............|..................|.....................|................|........

…………….|.............|..................|.....................|.....|...................

.……......|.............|..................|.....................|...........|.............

.....||||||........||.....|||......||......||......|.......||.||.||.....|......||.......……

…….....|.............|||.||.||.||||...............|.......|||||||||||||||||||||...||||.||||…...

cycle

Synchronized state (σ)Arbitrary state

Faultynodes

Page 19: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

“Pulse Synchronization”, in distributed computer systems

The computers are required to:

Invoke regular pulses in tight synchronySynchronize from ANY state (self-stabilization)Have a bounded pulse frequencyTolerate upto a third permanent Byzantine faults

Examples of other synchronization problems: “Firing Squad”, “Clock Synchronization”, etc.

Page 20: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Making Any* Byzantine Algorithm Stabilize

I.e. Robust to any arbitrary transient perturbation in the system state with any arbitrary permanent perturbation in up-to a third of the dimensionsAlmost as robust as a distributed computer system can getIn a sense more robust than typical biological systems as it covers much of state space to defend against targeted attacks but lacks real evolvability and adaptabilityAdding evolvability and adaptability could make it as robust as algorithms can be

Page 21: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

*Restrictions on the Basic Algorithm

Can be initialized σ (pulse skew) time units apartHas sampling points where the application state is safe to readSampling points can be identified by reading PCDuring legal executions of the basic algorithm all the sampling points are within Δ time of each other A snapshot of application states that are read Δ real time of each other is “consistent”, i.e. meaningful

Page 22: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Outline of the Scheme

At “pulse” event– Send local state to all nodes and Byzantine Agree it;– All correct nodes now see the same global snapshot;– Check if global snapshot represents a legal state;– If yes but your state is corrupt then repair state;– If not then reset basic algorithm;

Page 23: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Scheme Pitfalls

The general scheme may seem very simple but…When basic algorithm is not synchronized, how close do the sampling points need to be in order to get a “consistent” snapshot?And if they are not close how do you detect that?And if the basic algorithm is synchronized what happens if the sampling points are around the pulse, s.t. some correct nodes send their states and some don’t subsequent to the pulseAssume the global snapshot seems “consistent”, can the predicate detection module always detect if the application is in an illegal state considering the uncertainties in the consistencies?

Page 24: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

ByzStabilizer: Stabilizes any* Byzantine Algorithm

ByzStabilizerAt “pulse” eventBegin

1. Abort any other ByzStabilizer;2. If (must-reset) then reset basic algorithm;3. When reaching an identified state, exchange the state values and the elapsed time

since the pulse;4. “Byzantine Agree” on the (state, elapsed-time) sent by each node;5. Sift through agreed values for a set of values with elapsed times within some Δ of

each other comprising a consistent global snapshot;6. If no such set then do must-reset:=true and propose pulse;7. Do predicate evaluation on the consistent global snapshot;8. If predicate is satisfied but you are not part of the set then repair your state;9. If predicate is satisfied then basic algorithm is in a legal state do nothing;10. Else do must-reset:=true and propose pulse;

End

Page 25: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Agreed set of values

Pulse uncertainty

First “Δ” uncertainty

Identify the f+1st value in the safe region

Define the end of the region with respect to its “elapsed time”

Different nodes invoke their pulse at different times

Agreement completion time uncertainty

Safe region

Agreed set within this region

Page 26: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Time Complexity and Convergence Time of ByzStabilizer

Ω[σ+Δ+Σ+(2f+1).RTT] ≈ Ω[Σ+(2t+3).RTT]– σ is the pulse skew – Δ is the sampling point skew– Σ is the time complexity of the basic algorithm– RTT is the Round Trip Time– t is the actual number of permanent Byzantine faults

This is roughly the complexity of the Byzantine Agreement and the basic algorithm combinedTime complexity equals the convergence time. I.e. even when everything is ok you pay the price of the convergence timeIf solving the basic problem can be reduced to consensus on one value then we can give a scheme that has time complexity of 2 RTTs !! I.e when everything is ok you pay almost nothing

Page 27: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

SummaryBiological systems are more robust than engineered systems Both use supposedly the same principles for robustnessWe defined “pulse synchronization” in distributed systems, carried over from the cardiac pacemakerUsing this mechanism we presented the first algorithm that stabilizes any Byzantine algorithm that conforms to certain very reasonable restrictionsThe cost of the algorithm is relatively low and thus we show that self-stabilization in distributed computer systems facing Byzantine faults does not carry a significant additional cost beyond the cost of tolerating Byzantine faultsIt also implies that robustness does not necessarily mean high cost

Page 28: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Biological synchronization…

Page 29: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

“The importance of being synchronized…”

Page 30: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Sometimes you miss…

Page 31: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

“But everything will be fine again…”

Page 32: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Questions?

Page 33: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.

Drosophila melanogaster -Protein Interaction Network

Page 34: Robustness of Computer Networks is a Matter of the Heart or How to Make Distributed Computer Networks Extremely Robust Ariel Daliot 1, Danny Dolev 1, Hanna.