Top Banner
Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris and Daniel Suo [Content adapted from I. Gupta] Distributed Snapshots: Determining Global States of a Distributed System K. Mani Chandy and Leslie Lamport ACM Transactions on Computer Systems February 4, 1985
28

Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Jul 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Chandy-Lamport Snapshotting

COS 418: Distributed SystemsPrecept 8

Themis Melissaris and Daniel Suo

[Content adapted from I. Gupta]

Distributed Snapshots: Determining Global States of a Distributed SystemK. Mani Chandy and Leslie Lamport ACM Transactions on Computer SystemsFebruary 4, 1985

Page 2: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Globalsnapshots

3

Page 3: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Exampleofaglobalsnapshot

4

Page 4: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Butthatwaseasy

• Inoursystemofworldleaders,wewereabletocapturetheir‘state’(i.e.,likeness)easily– Synchronizedinspace– Synchronizedintime

• Howwouldwetakeaglobalsnapshotiftheleaderswereallathome?

• WhatifObamatoldTrudeauthatheshouldreallyputonashirt?

• Thismessageispartofoursystemstate!5

Page 5: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Globalsnapshotisglobalstate

• Eachdistributedapplicationhasanumberofprocesses(leaders)runningonanumberofphysicalservers

• Theseprocessescommunicatewitheachotherviachannels(textmessaging)

• Asnapshot capturesthelocalstatesofeachprocess(e.g.,programvariables)alongwiththestateofeachcommunicationchannel

6

Page 6: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Whydoweneedsnapshots?

• Checkpointing:restartiftheapplicationfails• Collectinggarbage:removeobjectsthatdon’thaveanyreferences

• Detectingdeadlocks:canexaminethecurrentapplicationstate

• Otherdebugging:alittleeasiertoworkwiththanprintf…

7

Page 7: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Wecouldjustsynchronizeclocks

• Eachprocessrecordsstateattimesomeagreedupont– Butclocksskew– Andwewouldn’trecordmessages

• Doweneedsynchronization?• WhatdidLamport realizeaboutorderingevents?

8

Page 8: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Twoprocesses:P1andP2

Exampleofglobalsnapshotsv2

9

P1 P2

Page 9: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• ChannelC12 fromP1toP2• ChannelC21 fromP2toP1

Exampleofglobalsnapshotsv2

10

P1 P2

C12

C21

Page 10: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• ProcessstatesforP1andP2

Exampleofglobalsnapshotsv2

11

P1 P2

C12

C21

X:0Y:0Z:0

X:1Y:2Z:3

Page 11: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Channelstates(i.e.,messages)forC12andC21• Thisisourinitialglobalstate• Alsoaglobalsnapshot

Exampleofglobalsnapshotsv2

12

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

Page 12: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• P1 tellsP2 tochangeitsstatevariable,X2,from1to4

• Thisisanotherglobalsnapshot

Exampleofglobalsnapshotsv2

13

P1 P2

C12:[X2 → 4]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

Page 13: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• P2 receivesthemessagefromP1• Anotherglobalsnapshot

Exampleofglobalsnapshotsv2

14

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

X2 → 4

Page 14: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• P2 changesitsstatevariable,X2,from1to4• Andanotherglobalsnapshot

Exampleofglobalsnapshotsv2

15

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

Page 15: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Theglobalstatechangeswheneveraneventhappens– Processsendsmessage– Processreceivesmessage– Processtakesastep

• Movingfromstatetostateobeyscausality

Summary

16

Page 16: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

Chandy-Lamport algorithm

17

Page 17: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Problem:recordaglobalsnapshot(stateforeachprocessandchannel)

• Model– N processesinthesystemwithnofailures– TherearetwoFIFOunidirectionalchannelsbetweeneveryprocesspair(Pi →Pj andPj →Pi)

– Allmessagesarrive,intact,notduplicated• Futureworkrelaxestheseassumptions

Systemmodel

18

Page 18: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Takingasnapshotshouldn’tinterferewithnormalapplicationbehavior– Don’tstopsendingmessages– Don’tstoptheapplication!

• Eachprocesscanrecorditsownstate• Collectstateinadistributedmanner• Anyprocesscaninitiateasnapshot

Systemrequirements

19

Page 19: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Let’ssayprocessPi initiatesthesnapshot• Pi recordsitsownstateandpreparesaspecialmarkermessage(distinctfromapplicationmessages)

• Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)

• StartrecordingallincomingmessagesfromchannelsCji forj notequaltoi

Initiatingasnapshot

20

Page 20: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• ForallprocessesPj (includingtheinitiator),consideramessageonchannelCkj

• Ifweseemarkermessageforthefirsttime– Pj recordsownstateandmarksCkj asempty– Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)

– StartrecordingallincomingmessagesfromchannelsClj forl notequaltojork

• Elseaddallmessagesfrominboundchannelssincewebeganrecordingtotheirstates

Propagatingasnapshot

21

Page 21: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Allprocesseshavereceivedamarker(andrecordedtheirownstate)

• AllprocesseshavereceivedamarkeronalltheN-1 incomingchannels(andrecordedtheirstates)

• Later,acentralservercangatherthepartialstatetobuildaglobalsnapshot

Terminatingasnapshot

22

Page 22: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• P1 initiatesasnapshot

Example

23

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

Page 23: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• First,P1 recordsitsstate

Example

24

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

Page 24: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Then,P1 sendsamarkermessagetoP2 andbeginsrecordingallmessagesoninboundchannels

• Meanwhile,P2 sentamessagetoP1

Example

25

P1 P2

C12:[<marker>]

C21:[M1]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

Page 25: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• P2 receivesamarkermessageforthefirsttime,sorecordsitsstate

• P2 thensendsamarkermessagetoP1

Example

26

P1 P2

C12:[Empty]

C21:[<marker>]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

<marker>

M1

Page 26: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• P1 hasalreadysentamarkermessage,soitrecordsallmessagesitreceivedoninboundchannelstotheappropriatechannel’sstate

Example

27

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

M1

Page 27: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• Bothprocesseshaverecordedtheirstateandallthestateofallincomingchannels

• Oursnapshottedstateishighlightedinblue

Example

28

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

M1

Page 28: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris

• RelatedtotheLamport clockpartialordering• Aneventispresnapshot ifitoccursbeforethelocalsnapshotonaprocess

• Postsnapshot ifafterwards• IfeventA happenscausallybeforeeventB,andB ispresnapshot,thenA istoo

Causalconsistency

30