Chandy-Lamport Snapshotting

Post on 10-Dec-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Chandy-Lamport Snapshotting

COS 418: Distributed SystemsPrecept 8

Themis Melissaris and Daniel Suo

[Content adapted from I. Gupta]

Agenda

• Whatareglobalsnapshots?• TheChandy-Lamport algorithm• WhydoesChandy-Lamport work?

2

Globalsnapshots

3

Exampleofaglobalsnapshot

4

Butthatwaseasy

• Inoursystemofworldleaders,wewereabletocapturetheir‘state’(i.e.,likeness)easily– Synchronizedinspace– Synchronizedintime

• Howwouldwetakeaglobalsnapshotiftheleaderswereallathome?

• WhatifObamatoldTrudeauthatheshouldreallyputonashirt?

• Thismessageispartofoursystemstate!5

Globalsnapshotisglobalstate

• Eachdistributedapplicationhasanumberofprocesses(leaders)runningonanumberofphysicalservers

• Theseprocessescommunicatewitheachotherviachannels(textmessaging)

• Asnapshot capturesthelocalstatesofeachprocess(e.g.,programvariables)alongwiththestateofeachcommunicationchannel

6

Whydoweneedsnapshots?

• Checkpointing:restartiftheapplicationfails• Collectinggarbage:removeobjectsthatdon’thaveanyreferences

• Detectingdeadlocks:canexaminethecurrentapplicationstate

• Otherdebugging:alittleeasiertoworkwiththanprintf…

7

Wecouldjustsynchronizeclocks

• Eachprocessrecordsstateattimesomeagreedupont– Butclocksskew– Andwewouldn’trecordmessages

• Doweneedsynchronization?• WhatdidLamport realizeaboutorderingevents?

8

• Twoprocesses:P1andP2

Exampleofglobalsnapshotsv2

9

P1 P2

• ChannelC12 fromP1toP2• ChannelC21 fromP2toP1

Exampleofglobalsnapshotsv2

10

P1 P2

C12

C21

• ProcessstatesforP1andP2

Exampleofglobalsnapshotsv2

11

P1 P2

C12

C21

X:0Y:0Z:0

X:1Y:2Z:3

• Channelstates(i.e.,messages)forC12andC21• Thisisourinitialglobalstate• Alsoaglobalsnapshot

Exampleofglobalsnapshotsv2

12

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

• P1 tellsP2 tochangeitsstatevariable,X2,from1to4

• Thisisanotherglobalsnapshot

Exampleofglobalsnapshotsv2

13

P1 P2

C12:[X2 → 4]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

• P2 receivesthemessagefromP1• Anotherglobalsnapshot

Exampleofglobalsnapshotsv2

14

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

X2 → 4

• P2 changesitsstatevariable,X2,from1to4• Andanotherglobalsnapshot

Exampleofglobalsnapshotsv2

15

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

• Theglobalstatechangeswheneveraneventhappens– Processsendsmessage– Processreceivesmessage– Processtakesastep

• Movingfromstatetostateobeyscausality

Summary

16

Chandy-Lamport algorithm

17

• Problem:recordaglobalsnapshot(stateforeachprocessandchannel)

• Model– N processesinthesystemwithnofailures– TherearetwoFIFOunidirectionalchannelsbetweeneveryprocesspair(Pi →Pj andPj →Pi)

– Allmessagesarrive,intact,notduplicated• Futureworkrelaxestheseassumptions

Systemmodel

18

• Takingasnapshotshouldn’tinterferewithnormalapplicationbehavior– Don’tstopsendingmessages– Don’tstoptheapplication!

• Eachprocesscanrecorditsownstate• Collectstateinadistributedmanner• Anyprocesscaninitiateasnapshot

Systemrequirements

19

• Let’ssayprocessPi initiatesthesnapshot• Pi recordsitsownstateandpreparesaspecialmarkermessage(distinctfromapplicationmessages)

• Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)

• StartrecordingallincomingmessagesfromchannelsCji forj notequaltoi

Initiatingasnapshot

20

• ForallprocessesPj (includingtheinitiator),consideramessageonchannelCkj

• Ifweseemarkermessageforthefirsttime– Pj recordsownstateandmarksCkj asempty– Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)

– StartrecordingallincomingmessagesfromchannelsClj forl notequaltojork

• Elseaddallmessagesfrominboundchannelssincewebeganrecordingtotheirstates

Propagatingasnapshot

21

• Allprocesseshavereceivedamarker(andrecordedtheirownstate)

• AllprocesseshavereceivedamarkeronalltheN-1 incomingchannels(andrecordedtheirstates)

• Later,acentralservercangatherthepartialstatetobuildaglobalsnapshot

Terminatingasnapshot

22

• P1 initiatesasnapshot

Example

23

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

• First,P1 recordsitsstate

Example

24

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

• Then,P1 sendsamarkermessagetoP2 andbeginsrecordingallmessagesoninboundchannels

• Meanwhile,P2 sentamessagetoP1

Example

25

P1 P2

C12:[<marker>]

C21:[M1]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

• P2 receivesamarkermessageforthefirsttime,sorecordsitsstate

• P2 thensendsamarkermessagetoP1

Example

26

P1 P2

C12:[Empty]

C21:[<marker>]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

<marker>

M1

• P1 hasalreadysentamarkermessage,soitrecordsallmessagesitreceivedoninboundchannelstotheappropriatechannel’sstate

Example

27

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

M1

• Bothprocesseshaverecordedtheirstateandallthestateofallincomingchannels

• Oursnapshottedstateishighlightedinblue

Example

28

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

M1

ReasoningabouttheChandy-Lamport algorithm

29

• RelatedtotheLamport clockpartialordering• Aneventispresnapshot ifitoccursbeforethelocalsnapshotonaprocess

• Postsnapshot ifafterwards• IfeventA happenscausallybeforeeventB,andB ispresnapshot,thenA istoo

Causalconsistency

30

• IfA andB happenonthesameprocess,thenthisistriviallytrue

• ConsiderwhenA isthesendandB isthecorrespondingreceiveeventonprocessespandq,respectively– SinceB ispresnapshot,q can’thavereceivedamarkerandp can’thavesentamarker

– Amustalsohappenpresnapshot• SimilarlogicforA happeningpostsnapshot

Proof

31

• Inorderforanapplicationmessagem inthechannelfromprocessp toprocessq tobeinthesnapshot– Musthappenafterq hasreceiveditsfirstmarker– Beforep hassentitsmarkertoq

• Amessagem willonlybeinthesnapshotifthesendingprocesswaspresnapshot andthereceivingprocesswaspostsnapshot

Pokingtheproof:PartI

32

• Howdoweorderconcurrentevents?– Remember,allprocessescommunicate

• Whatifaprocessreceivesamarkerinbetweensendingamarkerandsomeevent?– Theseshouldhappenatomically

• Whatifsomethinghappensonaprocessindependentlyofmessagesafterthewall-clocktimeofwhenthesnapshotstarts?– Snapshotsarecausallyconsistent

Pokingtheproof:PartII

33

Mondaytopic:StreamingDataProcessing

34

top related