COORDINATION, SYNCHRONIZATION AND LOCKING WITH ISIS2 Ken Birman 1 Cornell University.

COORDINATION, SYNCHRONIZATION AND LOCKING WITH ISIS2

Ken Birman

1

Cornell University

2

Isis2 has many options for coordination

Within a group of processes there are many ways in which you might want coordinated or synchronized behavior

Isis2 can support all of them, but because there are many patterns, the topic isn’t trivial!

3

Examples

Primary/Backup fault-tolerance Group g receives a request A and B are assigned to handle it If A succeeds, it sends a reply and we’re

done If A fails, B takes over

4

Examples

Coordinator/Cohort fault-tolerance Group g receives a request A and B are assigned to handle it, but in

such a way that every request has its own primary (“coordinator”) and every request has one or more backups (“cohort”)

If A succeeds, it sends a reply and we’re done

If A fails, B takes over Updates issued to the group state by A

prior to failing are visible to B when it takes over

5

Examples

Periodic action based on a timer A clock is running, and every X ms, the

group members jointly perform some action They could each take some “part” of a

shared task Or the action could be performed in

primary/backup style with a primary member initiating it but others standing ready to help if the primary fails

6

Examples

Locking The group manages some form of data.

The items have an associated key Members can obtain a lock on the key

Mutex locks: while a member holds the key, no other member can access the key

Read/Write locks: Read locks allow other readers but Write locks exclude both readers and writers

7

Barrier synchronization

Group has some form of task to do

An initiator sends a request to start the members working on that task, then wishes to wait until they are all finished

The members function as “workers”. As each finishes it signals that it has finished its part

8

Summary of Options

Goal Purpose Technique to Use

Primary/Backup “There can only be one” (primary, that is)

Group of size 2. Primary is the member with rank=0

Coordinator/Cohort “Many hands, light work”

Group of any size. Rotate the roles…

Periodic Actions Heart beat Rank 0 member sends a timing signal

Locking: Mutex “One at a time” Isis2 locking tool (basic)

Locking: Read/Write Locks

Databases Isis2 locking tool (fancy)

Barrier Synchronization

“Is everyone finished?”

An Isis2 Query

9 Primary - Backup

10

… Details

Primary backup Easiest: Form a group with 2 members Rule: The member with rank 0 is the primary, the

member with rank 1 is a backup To update data, use the Isis2 Send primitive, but call

g.Flush() before talking to an external user of the service If the group updates databases or file storage may need to

use SafeSend. This topic is covered in a different module. If a failure occurs, a new view will signal that the

backup is now the new primary It picks up in the same state that the old primary was

in

11

Issues with primary/backup

Notice that Isis2 lacks a way to send a multicast to the group plus one external member

So suppose an outside person asks the group to do something, like in our old picture:

Backup will knowwhat the primaryintended to do

But did the primaryactually send the response?

Did it reach the user?

Time

Update the monitoring and alarmscriteria for Mrs. Marsh as follows…

Confirmed

Response delay seen by end-user

would include Internet latencies Service

response delay

Service instance

12

This problem can’t be solved! In the Isis2 system, we can’t atomically

send a message to the external user in such a way that the backup will be certain it was sent.

So… the backup must re-send the last message(s) to the user!

… but how?

Time

Update the monitoring and alarmscriteria for Mrs. Marsh as follows…

Confirmed

Response delay seen by end-user

would include Internet latencies Service

response delay

Service instance

Confirmed

13

UDP, TCP-R…

With UDP the backup might be able to just send the identical reply.

With TCP, the user sees the connection break and if the confirmation wasn’t received, may need to re-issue the request. The backup (the new primary) would sense that this is a repeated request and resend the old reply

Cornell has a technology called TCP-R. With it a single TCP connection can be “taken over” by the backup. With TCP-R the backup can finish the sending the primary was in the midst of doing even if it crashed

14 Coordinator Cohort

15

… More details

Coordination-Cohort Really the identical idea, but now we relay the

request into the group, using OrderedSend. If the group updates databases or file storage may

need to use SafeSend. A topic covered in a different module.

Some simple rule should be used to map requests to the members: not just “rank 0 is the primary” but “rank K will be the primary for request R” For example, the request could contain some sort of

identification data, or we could compute a hashcode Any rule that all members can apply will do the trick

In other ways, just like primary-backup

16

Which multicast should we use? If external users connect in a load-balanced way,

each member can Handle read-only work locally Use OrderedSend to relay and work that updates the

group state.. If the group updates databases or file storage may need to use SafeSend.

With OrderedSend, always call Flush if important updates were done and we are able to respond to the external user

If external users all connect to the rank-0 member then we can just use Send, but we risk overloading that member if everyone connects to the same one

17Periodic Actions with Timer

18

Periodic Actions

Easiest solution: Have the member with rank 0 launch a thread This thread loops Wait for K ms (use Thread.Sleep() or a timed call

to WaitOne on a semaphore that is always 0) Then issue a g.Send to “ping everyone”

Latencies of g.Send in an otherwise idle group will be very low and the jitter even smaller Action should be taken by everyone within a

millisecond or two

19

Periodic Actions

Fancier solutions are also possible There is a literature on real-time actions in

fault-tolerant systems If you are facing a “mission critical” need

you might consider using such a solution For example, every member could run its own

timer, and every member could send a “ping” On receiving “ping at time T” from a majority of

members of the current view, take the action Such a solution will be far more robust, but

more costly

20 Read and Write Locks

21

Locking

Isis2 supports group-wide locks If you want local locking, don’t use this tool

Basic API: g.WriteLock(“name” [, timeout]). g.Lock() for short. g.ReadLock(“name” [, timeout]) g.Unlock(“name”)

You can also control the persistency of the lock state of the group and the way that failures are handled

22

Rules…

Lock requests are handled one by one in order

Locks on different “names” don’t conflict Locks on the same name:

Write locks exclude all other locks Read locks: allow further read locks, until a

write lock is waiting. But then read locks wait behind the write lock

Timeout: Causes a “cancel” request to be sent

23

Handling of failures

If the lock holder fails, the default action is to “break” the lock (release it). Read locks always act this way.

For write locks, you can specify that instead that a broken lock be passed to the rank-0 group member A lock-transfer upcall event notifies you when this

occurs You would need to code whatever handling you desire.

It will run in the rank-0 member as necessary.

24

Lock-State Persistence

The lock manager state is stored in a data structure that can live in memory or be retained on disk The default configuration keeps the structure in

memory If you override this and request persistent locking, we

use SafeSend instead of OrderedSend, and the locking state will be saved on disk.

In the default case, if the group terminates the lock state is discarded. In the persistent case lock state is retained even across group shutdowns

25

When is lock persistence important? If your database will be used across

periods when the whole group shuts down, we would say that the database itself is External (not in-memory) Persistent

In such cases you’ll use SafeSend to update the database. And for this case may want to make the lock service state persistent too.

DB1

DB2

26 Barrier Synchronization

27


This is easy achieved using g.Query/OrderedQuery

Query can initiate the computation, or could simply specify “which” computation you have in mind, if you have many running in parallel.

Group members use g.Reply() when they reach the barrier point. Sender waits for all to reply

28

Barrier Synchronization: Failures We recommend that in the Reply you

send The size of the group view when

computation started Which rank this particular member had in

the group … just record these values when the Query

arrives

Then do g.Reply(myRank, N, other data…)

Caller can thus verify that it received all N replies. If not, it knows that some member crashed

29

Summary of Options

Goal Purpose Technique to Use

Primary/Backup “There can only be one” (primary, that is)

Group of size 2. Primary is the member with rank=0

Coordinator/Cohort “Many hands, light work”

Group of any size. Rotate the roles…

Periodic Actions Heart beat Rank 0 member sends a timing signal

Locking: Mutex “One at a time” Isis2 locking tool (basic)

Locking: Read/Write Locks

Databases, 2PL. Isis2 has locks but doesn’t implement transactions.

Isis2 locking tool (fancy)Caution: make the lock state as persistent as the database!


“Is everyone finished?”

An Isis2 Query

COORDINATION, SYNCHRONIZATION AND LOCKING WITH ISIS2 Ken Birman 1 Cornell University.

Documents