Top Banner
Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro
27

Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Dec 16, 2015

Download

Documents

Sean Tainter
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Deadlocks in Distributed Systems

Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike

DeMauro

Page 2: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Outline

• Distributed systems• Deadlock basics• Strategies• Algorithms• Simulation

Page 3: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

What is a distributed system?

• A collection of sites which communicate via message passing.– Centralized– Decentralized

Page 4: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

What is a deadlock?

• A deadlock occurs when all elements of a set, comprised of multiple processes, request a resource which is held by another process in the set.– In distributed systems, a deadlock may arise when

waiting for messages in addition to resources.

Page 5: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Conditions for a Deadlock

• Necessary– Mutual Exclusion– Hold and Wait– No Preemption

• Sufficient– Circular Wait

Page 6: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Mutual Exclusion

• When one process enters its critical state while accessing a resource no other process can enter their critical state to access the same resource.– Without this, multiple processes could access the

same resource, which would mean they would not have to wait, which would prevent the deadlock.• This would cause data corruption.

– When messages are the resources, mutual exclusion is guaranteed.

Page 7: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Hold and Wait

• Occurs when a process gathers and retains certain resources, but is still awaiting other resources before it can continue execution.- Without this, there could be no deadlocks because

if process A is waiting on a resource held by process B, process B is guaranteed to be able to proceed.- This limits what we can program.

Page 8: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

No Preemption

• No process can take resources held by another process.– If preemption is allowed and mutual exclusion is

maintained, once a resource has been preempted, the process which previously had the resource needs to be placed in a state where it no longer has it.• This may mean killing the process.

– If there is no mutual exclusion, there can be no concept of preemption.

Page 9: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Circular Wait

• A set of nodes with transitive relationship are in circular wait if the dependency chain of a node leads back to itself.– Without this, the dependency graph becomes a directed

acyclic graph, which will have a sink which can continue.• The standard way of preventing circular wait is to

create a partial ordering of the resources and only allow a process to request resources which are higher in the ordering than the current highest held.– This is inconvenient if the resources needed by a process

are not known beforehand.

Page 10: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Prevention

• Prevention is preventing a process from requesting a resource which would lead to a deadlock.

• A standard method of prevention is to force processes to acquire all their required resources before execution.

• Another method is to remove one of the four conditions for deadlocks.

Page 11: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Avoidance

• Avoidance is when a resource is granted to a process only when the resulting state is safe.– A safe state if there is an execution sequence which does not lead

to a deadlock.• Banker’s algorithm: This algorithm needs to know the

current amount of available resources the system has free, the amount of resources each process has, and an upper bound on the amount of resources each process will hold during its execution. It will only grant a request for resources to a process if it leaves the system with enough resources to fulfill the biggest possible request for resources for at least one process.

Page 12: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Detection

• No resource checking done in advance• State is stored in some fashion• Periodically checks state for deadlocks

Page 13: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Graphs

• a) Resource Allocation Graph (Transaction Wait For)

• b) Wait For Graph (WFG)

Page 14: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Recovery

• Kill Member of Detected Deadlock– Random Kill– Priority Kill– Kill Youngest

• Time-out approach

Page 15: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Distributed Issues

• False (Phantom) Deadlock• Detection

Page 16: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Algorithm Considerations

• Message Passing– Traffic– Length

• Resolution Efficiency

Page 17: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Resource Models

• Single-resource• AND• OR• AND-OR

Page 18: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Detection Algorithm Classes

• Path-Pushing• Edge-Chasing (probe based)• Diffusing Computations• Global State Detection

Page 19: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Selected Algorithms

• Obermarck’s – Path-Pushing– AND model– Obsoleted for inaccurate WFG

• Hermann and Chandy’s – AND-OR– Diffusing computation

• Bracha and Toueg’s– AND-OR– Global state detection

Page 20: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Algorithms

• Mitchell and Merritt’s – Single-Resource– Edge chasing– Benefits• Simple• Only one cycle detects deadlock• Not always phantom deadlocks

– Complexity – O(s(s-1)/2)

Page 21: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Chandy and Misra’s Algorithm

• Multiple Resource• Diffusing computation• AND Model• Complexity: O(N(N-1))

Page 22: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Algorithms

• Probe-based Algorithms– Chandy-Mirsa-Haas– Roesler

• Hierarchical Algorithms– Ho-Ramamoorthy

Page 23: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Algorithms (cont)

• Online deadlock detection (Isloor-Marsland)– Immediate deadlock detection

Page 24: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Difficulty of Proof

• TWF graphs can form in many ways, which makes it difficult to study all situations

• Deadlocks are sensitive to the timing of requests

• In distributed systems, message latency is unpredictable and there is no global memory

Page 25: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Requirements foralgorithm correctness

• At least one process can proceed• No process will be restarted an indefinite

number of times

Page 26: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

Simulation

• Partial reversion of process instead of killing it• Uses a WFG of the entire system to detect

deadlocks• This algorithm honors mutual exclusion and

implements preemption.• Causes more data corruption than simply

killing the youngest process.– A solution to this is programmers writing their

code anticipating the possibility of a reversion.

Page 27: Deadlocks in Distributed Systems Ryan Clemens, Thomas Levy, Daniel Salloum, Tagore Kolluru, Mike DeMauro.

References• Knapp, Edgar. Deadlock detection in distributed databases, ACM Computing Surveys, Volume 19 Issue 4, Dec. 1987• Natalija Krivokapić, Alfons Kemper, Ehud Gudes. Deadlock detection in distributed database systems: a new algorithm and a

comparative performance analysis. The VLDB Journal — The International Journal on Very Large Data Bases , Volume 8 Issue 2, October 1999– doi:10.1007/s007780050075– URL: http://portal.acm.org/citation.cfm?id=765509.765510&coll=DL&dl=ACM&CFID=17423245&CFTOKEN=33985983

• Singhal, M.; , "Deadlock detection in distributed systems," Computer , vol.22, no.11, pp.37-48, Nov 1989– doi: 10.1109/2.43525

URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=43525&isnumber=1667