Top Banner
Checkpointing Approach for Multiple Processor Failures IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING Ge-Ming Chiu, Member, IEEE Computer Society, and Jane-Ferng Chiu Presented By, Linda Maria Pulickal S7 CSE
29

A New Diskless Check Pointing Approach

Aug 30, 2014

Download

Documents

Linda Pulickal
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A New Diskless Check Pointing Approach

A New Diskless Checkpointing

Approach for Multiple Processor

Failures

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING

Ge-Ming Chiu, Member, IEEE Computer Society, and Jane-Ferng Chiu

Presented By,Linda Maria Pulickal

S7 CSE

Page 2: A New Diskless Check Pointing Approach

Check Point Snapshot of current application state.

Used to restart the execution in case of failure.

Very important in large scale distributed computing.

INTRODUCTION

Page 3: A New Diskless Check Pointing Approach

Checkpoints are stored in the primary storage memory of peer processors.

No need of secondary storage - saves time.

What is DiskLess CheckPointing?

Page 4: A New Diskless Check Pointing Approach

No latency = No performance degradation.

When stable storage is unavailable. Eg: mobile computing systems.

Effective in a large scale (10,000-100,000 processors).

Advantage of DiskLess approach

Page 5: A New Diskless Check Pointing Approach

Diskless checkpointing

neighbor-based

each processorsaves its checkpoints

in entirety in the memory of peer

processors.

Parity-baseduse a dedicated checkpoint pro-

cessor to store the parity of the

checkpoints taken by all the

application processors using XOR operations.

Reed-Solomon coding-basedencodes

checkpoints of multiple processors using Reed-Solomon

erasure coding techniques.

Page 6: A New Diskless Check Pointing Approach

Extra dedicated processors for storing checkpoint data.

Difficulty finding extra processors. Eg: mobile computing systems

this addition increases failure probability.

Memory overhead.

Problem with existing techniques

Page 7: A New Diskless Check Pointing Approach

System Model

Collection of n processors (or nodes), P0, P1, P2, ... ,Pn-1, interconnected by a (wired or wireless) network.

Page 8: A New Diskless Check Pointing Approach

1. Diskless checkpointing

scheme to tolerate up to k simultaneous failures.

2. Reduce memory overhead.

GOALS

Page 9: A New Diskless Check Pointing Approach

Basic Operation of the Proposed

Scheme

Page 10: A New Diskless Check Pointing Approach

Important terms:

1. Checkpoint Storage Nodes2. Checkpoint Coverage Nodes

S7

S8

S9

S10

Checkpoint COVERAGE - CCi

Checkpoint STORAGE - CSi

P1

P2

P3

P4

+ S5

P5

P5

Page 11: A New Diskless Check Pointing Approach

Each Pi send its checkpoint to at least k other processors (CSi).

-- at least one of CSi will remain alive for each

failed processor.

Pi also stores a copy of the state in a distinct section of its memory.

-- to help other failed processors decode their previous checkpoints.

Steps:

Page 12: A New Diskless Check Pointing Approach

Each Pi calculates the parity from CCi using XOR.

Stores only the parity result in memory.

Advantage: Memory space of size equal to the

maximum checkpoint.

Page 13: A New Diskless Check Pointing Approach

The conceptual framework of diskless checkpointing approach.

Page 14: A New Diskless Check Pointing Approach

Recovery

P5.

S6.

P5 want to recover

P1

P2

P5S6+

P6 node is used

P6 State:

S6 = P1 + P2 + P5

P5 = S6 – P1 – P2

Page 15: A New Diskless Check Pointing Approach

Safe Recovery CriterionFor any failed processor Pi, at least one

node in CSihas all of its checkpoint coverage nodes

intact.

DETERMINING THE CHECKPOINT STORAGE NODE SET

Page 16: A New Diskless Check Pointing Approach

the cardinality of CSi must be at least k.

the cardinality of CCi is k to ensure good load balance.

Fundamentals of CSi

Page 17: A New Diskless Check Pointing Approach

+ S2

S4

P3

P4

S0 = P3 + P4S1 = P2 + P3S2 = P0 + P1S3 = P2 + P4S4 = P0 + P1

S0P0

Not Good Design.. How? CS0 ∩ CS1 = { P2, P4 } , more than 1 element

Page 18: A New Diskless Check Pointing Approach

+ S1

S2

P3

P4

S0 = P3 + P4S1 = P0 + P4S2 = P0 + P1S3 = P1 + P2S4 = P2 + P3

S0P0

Not Good Design.. How? P1 Є CS0 ; CS0 ∩ CS1 = { P2}

Page 19: A New Diskless Check Pointing Approach

For all Pi and Pr,

(1) │CSi ∩ CSr │ ≤ 1 , i ≠ r

For each Pi,

(2)  CSi ∩ CSr = ᶲ , for any Pr Є CSi.

Theorms

Page 20: A New Diskless Check Pointing Approach

Design of CSi’s

Page 21: A New Diskless Check Pointing Approach

Cyclic design concept.

Derived from CS0 as,

Only focus on CS0 design

Page 22: A New Diskless Check Pointing Approach

PSR Sequence:d0, d1, d2, ... ,dr-1 is PSR if NO l, m, p, and q 0 ≤ l ≤ m < p ≤ q ≤ r – 1 satisfy,

Eg: 2, 1 , 5 , 3 not PSR 1, 3 , 5 , 2 is PSR

PSR ensures no 2 processors share more than 1 checkpoint storage node.

Design of CS0

Page 23: A New Diskless Check Pointing Approach

Construct a PSR sequence of 3 (i.e k -1) +ve integers.

Select sequence with minimum sum, D. Eg: d0 = 1 ; d1 = 3 ; d2 = 2 & D = 6.

First element of CS0 : PD+1 = P7

ADD d0 , d1 , d2 as respective increments to P7.

CS0 ={ P7, P8, P11, P13}

Steps for k =4:

Page 24: A New Diskless Check Pointing Approach
Page 25: A New Diskless Check Pointing Approach

total no. of processors in the system ≥ 3D+2.

Ensure theorm 2.

Requirements

Page 26: A New Diskless Check Pointing Approach

Performance Analysis

Page 27: A New Diskless Check Pointing Approach
Page 28: A New Diskless Check Pointing Approach

28

?

Page 29: A New Diskless Check Pointing Approach

Thank You ….