Top Banner
The Complexity of Adding Failsafe Fault- tolerance Sandeep S. Kulkarni Ali Ebnenasir
25

The Complexity of Adding Failsafe Fault-tolerance

Feb 05, 2016

Download

Documents

velvet

The Complexity of Adding Failsafe Fault-tolerance. Sandeep S. Kulkarni Ali Ebnenasir. Motivations. Why automatic addition of fault-tolerance? Why begin with a fault-intolerant program? Reuse of the fault-intolerant program Separation of concerns (functionality vs. fault-tolerance) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Complexity of Adding Failsafe Fault-tolerance

The Complexity of Adding Failsafe Fault-tolerance

Sandeep S. KulkarniAli Ebnenasir

Page 2: The Complexity of Adding Failsafe Fault-tolerance

MotivationsWhy automatic addition of fault-tolerance?Why begin with a fault-intolerant program? Reuse of the fault-intolerant program Separation of concerns (functionality vs. fault-

tolerance) Potential to preserve properties such as

efficiencyOne obstacle Adding masking fault-tolerance to distributed

programs is NP-hard [ FTRTFT, 2000]

Page 3: The Complexity of Adding Failsafe Fault-tolerance

Motivation (Continued)Approach for dealing with complexity Heuristics [SRDS 2001]

Weaker form of tolerance Failsafe

Safety only in the presence of faults Nonmasking

Safety may be temporarily violated Restricting input

Programs Specifications

Page 4: The Complexity of Adding Failsafe Fault-tolerance

Motivation (Continued) Why failSafe Fault-Tolerance? Simplify the design of masking Partial automation of masking fault-

tolerance (using TSE’98)

Intolerant Program

Nonmasking fault-tolerant

Masking fault-tolerant

Failsafe fault-tolerant

Automate

Automate

Page 5: The Complexity of Adding Failsafe Fault-tolerance

Outline of the TalkProblem of adding fault-toleranceDifficulties caused by distributionComplexity of failsafe fault-toleranceClass of programs and specifications for which polynomial synthesis is possible

Page 6: The Complexity of Adding Failsafe Fault-tolerance

Basic Concepts:Programs and Faults

State space Sp

Program transitions deltap, faults deltafInvariant S, fault-span TSpecification spec: Safety is specified by transitions, (sj, sk) that should not be executed

S

T

p/f p

f

Page 7: The Complexity of Adding Failsafe Fault-tolerance

Problem StatementInputs: program p, Invariant S, Faults f, Specification specOutputs: program p’, Invariant S’Requirements: Only fault-tolerance is added; no new functional behavior is added

Invariant of fault-intolerant program Invariant of fault-tolerant program

No new transition here New transitions may be added here

Page 8: The Complexity of Adding Failsafe Fault-tolerance

Difficulties with Distribution

Read/Write restrictionsTwo Boolean variables a and bProcess cannot read bCan we include the following transition?

a=0,b=0 a=1,b=0

• Only if we include the transition

a=0,b=1 a=1,b=1

Groups of transitions (instead of individual transitions) must be chosen.

Page 9: The Complexity of Adding Failsafe Fault-tolerance

Reduction from 3-SATIncluded iff x0 is false

Included iff x0 is true

Included iffxj is false

Included iffxk is true

Included iffxl is false

cj = xj \/ xk \/ xl_

an = a0a0

Page 10: The Complexity of Adding Failsafe Fault-tolerance

Dealing with the Complexity of Adding

Failsafe Fault-toleranceFor what class of problems, failsafe fault-tolerance can be added in polynomial timeRestrictions on Fault-tolerant programs Specifications Faults

Our approach for restrictions: In the absence of faults, preserve all

computations of the fault-intolerant program

Page 11: The Complexity of Adding Failsafe Fault-tolerance

Restrictions on Programs and Specifications

Monotonicity requirements Capture the notion that safe

assumptions can be made about variables that cannot be read

Focus on specifications and transitions of fault-intolerant programs

Page 12: The Complexity of Adding Failsafe Fault-tolerance

Monotonicity of Specifications

Definition: A specification spec is positive monotonic with respect to variable x iff:

For every s0, s1, s’0, s’1: The value of all other variables in s0 and s’0 are the same The value of all other variables in s1 and s’1 are the same

s1s0

x = falsex = false

If

Does not violate safety

s’0 s’1

x = truex = true

Does not violate safety

Then

Page 13: The Complexity of Adding Failsafe Fault-tolerance

Monotonicity of ProgramsDefinition: Program p with invariant S is negative monotonic with respect to variable x iff:

For every s0, s1, s’0, s’1: The value of all other variables in s0 and s’0 are the same The value of all other variables in s1 and s’1 are the same

s1s0

Invariant S

x = truex = true

s’0 s’1

X = falsex = false

Page 14: The Complexity of Adding Failsafe Fault-tolerance

TheoremAdding failsafe fault-tolerance can be done in polynomial time if either:

Program is negative monotonic, and Spec is positive monotonic

Or Program is positive monotonic, and Spec is negative monotonic

If only one of these conditions is satisfied then adding failsafe fault-tolerance is still NP-hard For many problems, these requirements are easily

met

Page 15: The Complexity of Adding Failsafe Fault-tolerance

Example: Byzantine Agreement

Processes: General, g, and three non-generals j, k, and lVariables

d.g : {0, 1} d.j, d.k, d.l : {0, 1, ┴ } b.g, b.j, b.k, b.l : {true, false} f.g, f.j, f.k, f.l : {0, 1}

Fault-intolerant program transitions d.j = ┴ /\ f.j = 0 d.j := d.g d.j ≠ ┴ /\ f.j = 0 f.j := 1

Fault transitions ¬b.g /\ ¬b.j /\ ¬b.k /\ ¬b.l b.j := true b.j d.j,f.j :=0|1,0|1

Page 16: The Complexity of Adding Failsafe Fault-tolerance

Example: Byzantine Agreement

(Continued)Safety Specification:

Agreement: No two non-Byzantine non-generals can finalize with different decisions

Validity: If g is not Byzantine, no process can finalize with different decision with respect to g

Read/Write restrictions Readable variables for process j:

b.j, d.j, f.j d.g, d.k, d.l

Process j can write d.j, f.j

Page 17: The Complexity of Adding Failsafe Fault-tolerance

Example: Byzantine Agreement

(Continued) Observation 1:

Positive monotonicity of specification with respect to b.j Observation 2:

Negative monotonicity of program, consisting of the transitions of j, with respect to b.k

Observation 3: Negative monotonicity of specification with respect to f.j

Observation 4: Positive monotonicity of program, consisting of the

transitions of j, with respect to f.k

Page 18: The Complexity of Adding Failsafe Fault-tolerance

SummaryComplexity analysis for failsafe fault-tolerance Reduction from 3-SAT Restrictions on specifications and

programs for which polynomial synthesis is possible Several problems fall in this category

Byzantine agreement, consensus, commit, … Necessity of these restrictions

Page 19: The Complexity of Adding Failsafe Fault-tolerance

Future WorkSimplifying the design of masking fault-tolerance using the two-step approachRefining boundary between classes for which polynomial synthesis is possible and for which exponential complexity is inevitableUsing monotonicity requirements for simplifying masking fault-tolerance

Page 20: The Complexity of Adding Failsafe Fault-tolerance

Thank YouQuestions?

Page 21: The Complexity of Adding Failsafe Fault-tolerance

Future WorkConclusion

Specifying the boundary Fault-tolerance addition can be done in polynomial time Exponential complexity is inevitable Goal: what problems can benefit from automation?

Necessity and sufficiency of monotonicity requirements

Future Work How can we Change a non-monotonic program to a

monotonic one by modifying its invariant?

How can we Strengthen a non-monotonic specification to a monotonic one?

How a nonmasking program can be designed manually to satisfy monotonicity requirements?

Page 22: The Complexity of Adding Failsafe Fault-tolerance

Basic Concepts: Fault-tolerant Program

Fault-tolerance in the presence of faults:

Failsafe: Satisfies its safety specification

Nonmasking: Satisfies its liveness specification(safety may be violated temporarily)

Masking: Satisfies safety and liveness specification

Page 23: The Complexity of Adding Failsafe Fault-tolerance

The complexity of Adding Failsafe fault-tolerance Adding (failsafe/nonmasking/masking) fault-tolerance in high atomicity model is in PAdding masking fault-tolerance to distributed programs is in NPHow about failsafe?

Adding Failsafe to distributed programsis NP-hard!! (proof in the paper) Reduction of 3-SAT to the problem of failsafe

fault-tolerance addition

Page 24: The Complexity of Adding Failsafe Fault-tolerance

Our ApproachStepwise towards masking fault-tolerance: Automating the addition of failsafe

fault-tolerance How hard is adding failsafe fault-tolerance?Polynomial time boundaries for failsafe tolerance addition?

Page 25: The Complexity of Adding Failsafe Fault-tolerance

Sp’

Sp,