Top Banner
UPV / EHU Distributed Algorithms for Failure Detection in Crash Environments R. Cortiñas, A. Lafuente, M. Larrea Distributed Systems Group University of the Basque Country UPV/EHU
14

Distributed Algorithms for Failure Detection in Crash Environments

Feb 25, 2016

Download

Documents

Distributed Algorithms for Failure Detection in Crash Environments. R. Cortiñas, A. Lafuente, M. Larrea Distributed Systems Group University of the Basque Country UPV/EHU. Guest Stars:  P ,  S and Omega.  P : s trong completeness, eventual strong accuracy - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed Algorithms for Failure Detection in Crash Environments

UPV / EHU

Distributed Algorithms forFailure Detection inCrash Environments

R. Cortiñas, A. Lafuente, M. Larrea

Distributed Systems GroupUniversity of the Basque Country UPV/EHU

Page 2: Distributed Algorithms for Failure Detection in Crash Environments

2

UPV / EHU

Master SIA – Sistemas Distribuidos

Guest Stars: P, S and Omega

P: strong completeness, eventual strong accuracy– Eventually every process that crashes is permanently

suspected by every correct process– There is a time after which correct processes are not

suspected by any correct process

S: strong completeness, eventual weak accuracy– There is a time after which some correct process is

never suspected by any correct process

• Omega: eventual leader election– There is a time after which all the correct processes

always trust the same correct process

Page 3: Distributed Algorithms for Failure Detection in Crash Environments

3

UPV / EHU

Master SIA – Sistemas Distribuidos

The First P Algorithm [CT96]

Page 4: Distributed Algorithms for Failure Detection in Crash Environments

4

UPV / EHU

Master SIA – Sistemas Distribuidos

p1

p3

p4

p6

p5

p2

Communication Optimality

A ring arrangement of processes

Page 5: Distributed Algorithms for Failure Detection in Crash Environments

5

UPV / EHU

Master SIA – Sistemas Distribuidos

p1

p3

p4

p6

p5

p2

Communication Optimality

Communication-efficient algorithms:

n links are used forever

Page 6: Distributed Algorithms for Failure Detection in Crash Environments

6

UPV / EHU

Master SIA – Sistemas Distribuidos

p1

p3

p4

p6

p5

p2

Communication Optimality

Communication-optimal algorithms:

C links are used forever

Page 7: Distributed Algorithms for Failure Detection in Crash Environments

7

UPV / EHU

Master SIA – Sistemas Distribuidos

Communication-optimal P

Page 8: Distributed Algorithms for Failure Detection in Crash Environments

8

UPV / EHU

Master SIA – Sistemas Distribuidos

• We also propose an optimal implementation of S, the weakest failure detector for solving Consensus:

– processes ordered: p1, ..., pn– heartbeat strategy– communication pattern: one-to-successors– based on a trusted process (instead of a list of suspected

processes)

Communication-optimal Omega

Page 9: Distributed Algorithms for Failure Detection in Crash Environments

9

UPV / EHU

Master SIA – Sistemas Distribuidos

i) Initially, p1 starts sending messages periodically to the rest of processes, and all processes trust p1

p2p1 p5p4p3

trusted1 = p1 trusted2 = p1 trusted3 = p1 trusted4 = p1 trusted5 = p1

Communication-optimal Omega

Page 10: Distributed Algorithms for Failure Detection in Crash Environments

10

UPV / EHU

Master SIA – Sistemas Distribuidos

ii) If a process does not receive a message within some timeout period from its trusted process pi, then it suspects pi and takes the next process pi+1 as its new trusted process

p2p1 p5p4

trusted1 = p1 trusted2 = p1 trusted3 = p1 timeout on p1

trusted4 = p2

trusted5 = p1

p3

Communication-optimal Omega

Page 11: Distributed Algorithms for Failure Detection in Crash Environments

11

UPV / EHU

Master SIA – Sistemas Distribuidos

iii) If a process trusts itself, then it starts sending messages periodically to its successors

p2p1 p5p4

trusted1 = p1 trusted3 = p1 trusted4 = p2 trusted5 = p1

p3

timeout on p1

trusted2 = p2

Communication-optimal Omega

Page 12: Distributed Algorithms for Failure Detection in Crash Environments

12

UPV / EHU

Master SIA – Sistemas Distribuidos

iv) If a process receives a message from a process pi preceding its trusted process, then it will trust pi again, increasing its timeout period with respect to pi

p2p1 p5

trusted1 = p1 message from p1

trusted2 = p1

timeout_period21++

trusted3 = p2 message from p1

trusted4 = p1

timeout_period41++

trusted5 = p1

p3 p4

Communication-optimal Omega

Page 13: Distributed Algorithms for Failure Detection in Crash Environments

13

UPV / EHU

Master SIA – Sistemas Distribuidos

• Lemma. With the previous algorithm, eventually all the correct processes will permanently trust the first correct process in p1, ..., pn

• This property trivially allows us to provide the properties of S:

– Eventual weak accuracy: by not suspecting the trusted process– Strong completeness: by suspecting all the processes except the

trusted process

Communication-optimal Omega

Page 14: Distributed Algorithms for Failure Detection in Crash Environments

14

UPV / EHU

Master SIA – Sistemas Distribuidos

Communication-optimal Omega