Top Banner
ده ع ش ی وز ت های م ت س ی ش م دز ی م ر ت ل ص ف12 # اب ت ک ازsinghal Advanced Operating Systems Sharif University of Technology
36

ترمیم در سیستمهای توزیع شده

Feb 22, 2016

Download

Documents

tyme

ترمیم در سیستمهای توزیع شده. فصل 12 از کتاب singhal Advanced Operating Systems Sharif University of Technology. ترمیم در سیستمهای توزیع شده. هدف : بازگرداندن سیستم به حالت معمولی و نرمال خود. تغییرات داده شده بوسیله پردازه خطا در undo شوند. منابع اختصاص داده شده پس گرفته شوند. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

12 singhalAdvanced Operating SystemsSharif University of Technology1 : . undo . . : (). .

! .

2 Failure Recovery ( ) . : Forward Error Recovery : Backward Error RecoveryPerformance penalty 3 (B.E.R) (Recovery Points) . . :

. Log

CPU Stable StorageSecondary Storage4 BER : (Operation Based) : undo . :

(UPDATE-IN-PLACE) Log :: Log

5 BER :do: LogUndo: doRedo: do log WAL WAL: undo log . undo log, redo log .

6 BER 2- ( checkpointing ) checkpoint : : rollback

rollback checkpoint .

Shadow paging 7 ( - )

8 (Domino) X X3 Y Y2 (!)

m ! ( ) X X2 Z Z2 X Y X1 Y1 Z Z1 :: :: XYZX1X2X3Y1Y2Z1Z2m9Lost msg X Y X1 Y1m .XYX1Y110LiveLock .

n1 . Y Y1 m1 X X1 . m1 n1 . Y . n1 m2 . Y n2 m2 n1 . Y .....

XYX1Y1m1n1XYm2n2Roll-backn111 ( ) . : . : . .

12 . . . k !

13 Toueg ,Koo FIFO : (): . : . 14 Toueg ,Koo - :Pi C C . Pi . Pi .

: Pi C . C . .15 Koo ... C !

X m C {X2, Y2, Z2} X2, Y2, Z1}} . C .

XYZX1X2Y1Y2Z1Z2m16 . . m.l . T . Y ,X m C .

last-label-rcvdX[Y] =

first-label-sentX[Y] =

17 X Y C last-label-rcvdX[Y] . Y C last-label-rcvdX[Y] first-label-sentY[X] >

X C Y . Y .

Chkpt-cohortX = {Y | last-label-rcvdX[Y] > } C .

18The Checkpoint AlgorithmInitial state at all processes p:For all processes q do first-label-sentp[q] := ;

OK-to-take-ckptp =

At initiator process Pi:For all processes p ckpt-cohort pi doSend Take-a-tentative-ckpt(Pi, last-label-rcvd pi[p]) message;If all processes replied yes thenFor all processes p ckpt-cohort pi doSend Make-tentative-ckpt-permanent;elseFor all processes p ckpt-cohort pi doSend Undo-tentative-ckpt.

19The algorithm ContinuedAt all processes p:Upon receiving Take-a-tentative-ckpt(q, last-label-rcvd q[p]) message from q doBeginIf OK-to-take-ckptp = yes AND last-label-rcvd q[p] first-label-sentp[q] > thenbegintake a tentative checkpoint;for all processes r ckpt-cohort p do Send Take-a-tentative-ckpt(P,last-label-rcvd p[r]) message;If all processes r ckpt-cohort p replied yes thenOK-to-take-ckptp := yes elseOK-to-take-ckptp = noEnd; Send (p, OK-to-take-ckptp) to q; end;

20The algorithm ContinuedAt all processes p:Upon receiving Make-tentative-ckpt-permanent message doBeginMake tentative checkpoint permanent;For all processes r ckpt-cohort p doSend Make-tentative-ckpt-permanent message;End;Upon receiving Undo-tentative-ckpt-permanent message doBeginUndo tentative checkpoint;For all processes r ckpt-cohort p doSend Undo-tentative-ckpt-permanent message;End;

21Rollback-Recovery: .

: (Pi) C C R "no" . . : Pi . .22Rollback-Recovery Continued :

X Z .

XYZX1X2Y1Y2Z2Z123Rollback-Recovery Continued:

Last-Label-SentX[Y] =

x y C Last-Label-SentX[Y] . Y C Last-Label-RcvdY[X] > Last-Label-SentX[Y] X X y undo .roll-cohortX = {Y|X can send msgs to Y}

Largest Value24The Recovery AlgorithmInitial state at process P: Resume-execution := true;For all processes q, doLast-label-rcvdp[q] := T; Willing-to-rollp = At initiator process Pi:For all processes p roll-cohortpi doSend Prepare-to-rollback (Pi, last-label-sentPi[p]) message;If all processes replied Yes then for all p roll-cohortpi doSend Roll-back message; else for all processes proll-cohortpi do Send Donot-roll-back message;

25The algorithm ContinuedAt all processes p: Upon receiving Prepare-to-rollback(q, last-label-sentq[p]) message from q do BeginIf willing-to-rollp AND last-label-rcvdp[q] > last-label-sentq[p] AND (resume-executionp) Then Begin Resume-executionp := false; For all processes r roll-cohortp doSend Prepare-to-rollback(p, last-label-sentp[r]) message; If all processes r roll-cohortp replied yes thenwilling-to-rollp := yeselsewilling-to-rollp := noend;Send (p, willing-to-rollp) message tp q;End;

26The algorithm ContinuedUpon receiving Roll-back message AND if resume-executionp = false doBeginRestart from ps permanent checkpoint;For all processes r roll-cohortp doSend Roll-back message;End;Upon receiving Donot-roll-back message doBeginResume execution;For all processes r roll-cohortp doSend Roll-back message;End;

27Async Checkpointing & Recovery : C . C . C .

() C . C . C . undo Rollback Log redo .

28 Juang & Venkatesan checkpointing Log :: : :{s, m, msg-sent}

: event-driven fire . .

( ) 29 Juang & Venkatesan : NotationsRCVDij(chpti) j i . chpti .SENT ij(chpti) i j chpti . . rollback ( ). ......

30 Juang & Venkatesan Y eY1 Y X X Y . X eX2 Y . Z .

ex0XYZex1ex2ex3ey0ey1ey2ey3ez0ez1ez2ez331:: fail . . . i : : Log ckpti = ckpti = ( )

32 for k:=1 to N do (* N is the # of processors *)beginfor each neighboring process j doSend ROLLBACK(I, SENTij(ckpti)) msg;wait for ROLLBACK msg from every neighbor( ).for every ROLLBACK(j,c) received from j,i does the following:if RCVDij(ckpti) > c then /* */beginfind the latest event e such that RCVDij(e)cj;ckpti := e;end;end;

33 : :

Y Y1 . ey2 log . X Z .

XYZey0ey1ey2ey3ez0ez1ez2ez3ex0ex1ex2ex3X1Y1Z1failure34 X ckptX eX3 RollBack(X,2) to Y, RollBack(X,0) to ZY ckptY eY2RollBack(Y,2) to X, RollBack(Y,1) to ZZ ckptZ eZ2RollBack(Z,0) to X, RollBack(Z,1) to Y

X RCVDXY(ckptX) = 3> 2 ckptX eX2 :

RCVD XY(eX2) = 2 2 Z RCVDZY(ckptZ) = 2> 1 ckptZ = eZ1 Y

Y .

RollBack35 : Y RollBack(Y,2) to X, RollBack(Y,1) to ZX RollBack(X,0) to Z, RollBack(X,1) to YZ RollBack(Z,1) to Y, RollBack(Z,0) to X

ckpt Z, Y, X ex2 eY2 eZ1 . ex2} eY2 eZ1 { . .

36