Top Banner
Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant sections from Ghosh’s textbook © Indranil Gupta, Sayan Mitra
19

Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Jan 20, 2016

Download

Documents

Jewel Lane
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-1Lecture 26-1

Computer Science 425Distributed Systems

CS 425 / ECE 428

Fall 2014

Computer Science 425Distributed Systems

CS 425 / ECE 428

Fall 2014

Indranil Gupta (Indy)

Lecture 26

Self-StabilizationReading: Relevant sections from Ghosh’s textbook

© Indranil Gupta, Sayan Mitra

Page 2: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-2Lecture 26-2

MotivationMotivation

• As the number of computing elements increase in distributed systems failures become more common

• We desire that fault-tolerance should be automatic, without external intervention

• Two kinds of fault tolerance– masking: application layer does not see faults, e.g., redundancy and

replication

– non-masking: system deviates, deviation is detected and then corrected: e.g., roll back and recovery

• Self-stabilization is a general technique for non-masking distributed systems

• We deal only with transient failures which corrupt data, but not crash-stop failures

Page 3: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-3Lecture 26-3

Self-stabilizationSelf-stabilization

• Technique for spontaneous healing

• Guarantees eventual safety following

failures

Feasibility demonstrated by Dijkstra

(CACM `74)

E. Dijkstra

Page 4: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-4Lecture 26-4

Self-stabilizing systemsSelf-stabilizing systems

• Recover from any initial configuration to a legitimate

configuration in a bounded number of steps, as long as

the processes are not further corrupted

• Assumption:

Failures affect the state (and data) but not the program

code

Page 5: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-5Lecture 26-5

Self-stabilizing systemsSelf-stabilizing systems

• The ability to spontaneously recover from any initial state implies that no initialization is ever required.

• Such systems can be deployed ad hoc, and are guaranteed to function properly within bounded number of steps

• Guarantees-fault tolerance when the mean time between failures (MTBF) >> mean time to recovery (MTTR)

Page 6: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-6Lecture 26-6

Self-stabilizing systemsSelf-stabilizing systems

• Self-stabilizing systems exhibit

non-masking fault-tolerance

• They satisfy the following two

criteria

– Convergence

– Closure

Not L Lconvergence

closure

fault

Page 7: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-7Lecture 26-7

Example 1: Stabilizing mutual exclusion in unidirectional ring

Example 1: Stabilizing mutual exclusion in unidirectional ring

01 62 4 753

N-1

Consider a unidirectional ring of processes. Counter-clockwise ring.One special process (yellow above) is process with id=0Legal configuration = exactly one token in the ring (Safety)Desired “normal” behavior: single token circulates in the ring

Page 8: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-8Lecture 26-8

Dijkstra’s stabilizing mutual exclusion

Dijkstra’s stabilizing mutual exclusion

0

p0 if x[0] = x[N-1] then x[0] := x[0] + 1

pj j > 0 if x[j] ≠ x[j -1] then x[j] := x[j-1]

Wrap-around after K-1

N processes: 0, 1, …, N-1state of process j is x[j] {0, 1, 2, K-1}, where K > N

TOKEN is @ a process p = “if” condition is true @ process p

Legal configuration: only one process has tokenCan start the system from an arbitrary initial configuration

Page 9: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-9Lecture 26-9

Example executionExample execution

00

0

0

00 1

0

0

0

00 1

1

0

0

00

11

1

1

11 2

1

1

1

11 K-1

K-1

K-1

K-1

K-1

K-1

p0 if x[0] = x[N-1] then x[0] := x[0] + 1

pj j > 0 if x[j] ≠ x[j -1] then x[j] := x[j-1]

Page 10: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-10Lecture 26-10

Stabilizing executionStabilizing execution

01

0

1

40 0

00

0

1

40 0

0

44

1

40

00

44

00

40 0

0

44 000

0

00

000

0

0

p0 if x[0] = x[N-1] then x[0] := x[0] + 1

pj j > 0 if x[j] ≠ x[j -1] then x[j] := x[j-1]

Page 11: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-11Lecture 26-11

What HappensWhat Happens

• Legal configuration = a configuration with a single token

• Perturbations or failures take the system to configurations with multiple tokens

– e.g. mutual exclusion property may be violated

• Within finite number of steps, if no further failures occur, then the system returns to a legal configuration

Not L Lconvergence

closure

fault

Page 12: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-12Lecture 26-12

Why does it work ?Why does it work ?

1. At any configuration, at least one process can make a move (has token)

2. Set of legal configurations is closed under all moves

3. Total number of possible moves from (successive configurations) never increases

4. Any illegal configuration C converges to a legal configuration in a finite number of moves

11

0

0

00

Page 13: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-13Lecture 26-13

Why does it work ?Why does it work ?

1. At any configuration, at least one process can make a move (has token), i.e., if condition is false at all processes

– Proof by contradiction: suppose no one can make a move

– Then p1,…,pN-1 cannot make a move

– Then x[N-1] = x[N-2] = … x[0]

– But this means that p0 can make a move => contradiction

11

0

0

00

p0 if x[0] = x[N-1] then x[0] := x[0] + 1

pj j > 0 if x[j] ≠ x[j -1] then x[j] := x[j-1]

Page 14: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-14Lecture 26-14

Why does it work ?Why does it work ?

1. At any configuration, at least one process can make a move (has token)

2. Set of legal configurations is closed under all moves– If only p0 can make a move, then for all i,j: x[i] = x[j]. After p0’s move,

only p1 can make a move

– If only pi (i≠0) can make a move

» for all j < i, x[j] = x[i-1]

» for all k ≥ i, x[k] = x[i], and

» x[i-1] ≠ x[i]

» x[0] ≠ x[N-1]

in this case, after pi‘s move only pi+1 can move

11

0

0

00

p0 if x[0] = x[N-1] then x[0] := x[0] + 1

pj j > 0 if x[j] ≠ x[j -1] then x[j] := x[j-1]

Page 15: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-15Lecture 26-15

Why does it work ?Why does it work ?

1. At any configuration, at least one process can make a move (has token)

2. Set of legal configurations is closed under all moves

3. Total number of possible moves from (successive configurations) never increases

– any move by pi either enables a move for pi+1 or none at all

11

0

0

00

p0 if x[0] = x[N-1] then x[0] := x[0] + 1

pj j > 0 if x[j] ≠ x[j -1] then x[j] := x[j-1]

Page 16: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-16Lecture 26-16

Why does it work ?Why does it work ?

1. At any configuration, at least one process can make a move (has token)

2. Set of legal configurations is closed under all moves

3. Total number of possible moves from (successive configurations) never increases

4. Any illegal configuration C converges to a legal configuration in a finite number of moves

– There must be a value, say v, that does not appear in C (since K > N)

– Except for p0, none of the processes create new values (since they only copy values)

– Thus p0 takes infinitely many steps, and since it only self-increments, it eventually sets x[0] = v (within K steps)

– Soon after, all other processes copy value v and a legal configuration is reached in N-1 steps

11

0

0

00

p0 if x[0] = x[N-1] then x[0] := x[0] + 1

pj j > 0 if x[j] ≠ x[j -1] then x[j] := x[j-1]

Page 17: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-17Lecture 26-17

Putting it All TogetherPutting it All Together

• Legal configuration = a configuration with a single token

• Perturbations or failures take the system to configurations with multiple tokens

– e.g. mutual exclusion property may be violated

• Within finite number of steps, if no further failures occur, then the system returns to a legal configuration

Not L Lconvergence

closure

fault

Page 18: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-18Lecture 26-18

SummarySummary

• Many more self-stabilizing algorithms– Self-stabilizing distributed spanning tree

– Self-stabilizing distributed graph coloring

– Not covered in the course – look them up on the web!

Page 19: Lecture 26-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2014 Indranil Gupta (Indy) Lecture 26 Self-Stabilization Reading: Relevant.

Lecture 26-19Lecture 26-19

RemindersReminders

• MP2, HW4 due soon after break– I hope you’ve already started. If not, start now! Don’t start after

break; it’s too late then.

• Only 3 lectures left!

• Have a good Thanksgiving break!

• (No lectures or office hours next week)