The Complexity of Adding Failsafe Fault-tolerance

The Complexity of Adding Failsafe Fault-tolerance

Sandeep S. KulkarniAli Ebnenasir

MotivationsWhy automatic addition of fault-tolerance?Why begin with a fault-intolerant program? Reuse of the fault-intolerant program Separation of concerns (functionality vs. fault-

tolerance) Potential to preserve properties such as

efficiencyOne obstacle Adding masking fault-tolerance to distributed

programs is NP-hard [ FTRTFT, 2000]

Motivation (Continued)Approach for dealing with complexity Heuristics [SRDS 2001]

Weaker form of tolerance Failsafe

Safety only in the presence of faults Nonmasking

Safety may be temporarily violated Restricting input

Programs Specifications

Motivation (Continued) Why failSafe Fault-Tolerance? Simplify the design of masking Partial automation of masking fault-

tolerance (using TSE’98)

Intolerant Program

Nonmasking fault-tolerant

Masking fault-tolerant

Failsafe fault-tolerant

Automate

Automate

Outline of the TalkProblem of adding fault-toleranceDifficulties caused by distributionComplexity of failsafe fault-toleranceClass of programs and specifications for which polynomial synthesis is possible

Basic Concepts:Programs and Faults

State space Sp

Program transitions deltap, faults deltafInvariant S, fault-span TSpecification spec: Safety is specified by transitions, (sj, sk) that should not be executed

S

T

p/f p

f

Problem StatementInputs: program p, Invariant S, Faults f, Specification specOutputs: program p’, Invariant S’Requirements: Only fault-tolerance is added; no new functional behavior is added

Invariant of fault-intolerant program Invariant of fault-tolerant program

No new transition here New transitions may be added here

Difficulties with Distribution

Read/Write restrictionsTwo Boolean variables a and bProcess cannot read bCan we include the following transition?

a=0,b=0 a=1,b=0

• Only if we include the transition

a=0,b=1 a=1,b=1

Groups of transitions (instead of individual transitions) must be chosen.

Reduction from 3-SATIncluded iff x0 is false

Included iff x0 is true

Included iffxj is false

Included iffxk is true

Included iffxl is false

cj = xj \/ xk \/ xl_

an = a0a0

Dealing with the Complexity of Adding

Failsafe Fault-toleranceFor what class of problems, failsafe fault-tolerance can be added in polynomial timeRestrictions on Fault-tolerant programs Specifications Faults

Our approach for restrictions: In the absence of faults, preserve all

computations of the fault-intolerant program

Restrictions on Programs and Specifications

Monotonicity requirements Capture the notion that safe

assumptions can be made about variables that cannot be read

Focus on specifications and transitions of fault-intolerant programs

Monotonicity of Specifications

Definition: A specification spec is positive monotonic with respect to variable x iff:

For every s0, s1, s’0, s’1: The value of all other variables in s0 and s’0 are the same The value of all other variables in s1 and s’1 are the same

s1s0

x = falsex = false

If

Does not violate safety

s’0 s’1

x = truex = true

Does not violate safety

Then

Monotonicity of ProgramsDefinition: Program p with invariant S is negative monotonic with respect to variable x iff:

For every s0, s1, s’0, s’1: The value of all other variables in s0 and s’0 are the same The value of all other variables in s1 and s’1 are the same

s1s0

Invariant S

x = truex = true

s’0 s’1

X = falsex = false

TheoremAdding failsafe fault-tolerance can be done in polynomial time if either:

Program is negative monotonic, and Spec is positive monotonic

Or Program is positive monotonic, and Spec is negative monotonic

If only one of these conditions is satisfied then adding failsafe fault-tolerance is still NP-hard For many problems, these requirements are easily

met

Example: Byzantine Agreement

Processes: General, g, and three non-generals j, k, and lVariables

d.g : {0, 1} d.j, d.k, d.l : {0, 1, ┴ } b.g, b.j, b.k, b.l : {true, false} f.g, f.j, f.k, f.l : {0, 1}

Fault-intolerant program transitions d.j = ┴ /\ f.j = 0 d.j := d.g d.j ≠ ┴ /\ f.j = 0 f.j := 1

Fault transitions ¬b.g /\ ¬b.j /\ ¬b.k /\ ¬b.l b.j := true b.j d.j,f.j :=0|1,0|1


(Continued)Safety Specification:

Agreement: No two non-Byzantine non-generals can finalize with different decisions

Validity: If g is not Byzantine, no process can finalize with different decision with respect to g

Read/Write restrictions Readable variables for process j:

b.j, d.j, f.j d.g, d.k, d.l

Process j can write d.j, f.j


(Continued) Observation 1:

Positive monotonicity of specification with respect to b.j Observation 2:

Negative monotonicity of program, consisting of the transitions of j, with respect to b.k

Observation 3: Negative monotonicity of specification with respect to f.j

Observation 4: Positive monotonicity of program, consisting of the

transitions of j, with respect to f.k

SummaryComplexity analysis for failsafe fault-tolerance Reduction from 3-SAT Restrictions on specifications and

programs for which polynomial synthesis is possible Several problems fall in this category

Byzantine agreement, consensus, commit, … Necessity of these restrictions

Future WorkSimplifying the design of masking fault-tolerance using the two-step approachRefining boundary between classes for which polynomial synthesis is possible and for which exponential complexity is inevitableUsing monotonicity requirements for simplifying masking fault-tolerance

Thank YouQuestions?

Future WorkConclusion

Specifying the boundary Fault-tolerance addition can be done in polynomial time Exponential complexity is inevitable Goal: what problems can benefit from automation?

Necessity and sufficiency of monotonicity requirements

Future Work How can we Change a non-monotonic program to a

monotonic one by modifying its invariant?

How can we Strengthen a non-monotonic specification to a monotonic one?

How a nonmasking program can be designed manually to satisfy monotonicity requirements?

Basic Concepts: Fault-tolerant Program

Fault-tolerance in the presence of faults:

Failsafe: Satisfies its safety specification

Nonmasking: Satisfies its liveness specification(safety may be violated temporarily)

Masking: Satisfies safety and liveness specification

The complexity of Adding Failsafe fault-tolerance Adding (failsafe/nonmasking/masking) fault-tolerance in high atomicity model is in PAdding masking fault-tolerance to distributed programs is in NPHow about failsafe?

Adding Failsafe to distributed programsis NP-hard!! (proof in the paper) Reduction of 3-SAT to the problem of failsafe

fault-tolerance addition

Our ApproachStepwise towards masking fault-tolerance: Automating the addition of failsafe

fault-tolerance How hard is adding failsafe fault-tolerance?Polynomial time boundaries for failsafe tolerance addition?

Sp’

Sp,

The Complexity of Adding Failsafe Fault-tolerance

Documents

transitions of fault

failsafe faulttolerancefor

faultspan tspecification

program p

groups of transitions

individual transitions

samethe value

faults deltafinvariant