(c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 1

Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)


Motivation

Distributed usage. Local autonomy. Maintainability.

Allows for growth. Reliability – a number of copies. Components:

Reliable communication. Local DBs – may be identical.

Problems: Query processing. Maintaining multiple copies. Concurrency control and recovery. Distributed data dictionary.


Topics

Part I: Distributed 2PL Part II: Distributed Deadlocks Part III: Timestamp based Algorithms Part IV: Optimistic Concurrency Control


Part I: Distributed 2PL

Each item may have a number of copies. Intuitively – behave as if there is a single copy. Mechanisms:

Writers lock all copies. Central copy. Central locking site. Majority locking. A generalization. Moving central copy – not covered.

cs

Note - we only care about synchronization here, the issue of actual reading or writing is covered by the EFFECTIVE lock a transaction obtains.


Writers lock all copies

Each copy may be locked individually. Read[x]: lock some copy of x. Write[x]: lock all copies of x. Resulting executions are SR. Problems:

writers tend to deadlock. Many messages.


Central copy

A central copy per item. Read[x]: read-lock the central copy. Write[x]: write-lock the central copy. Advantage: fewer messages.


Central locking site

A single site that maintains a global lock table.

Advantages: few messages. checking the WFG for deadlocks.

Disadvantages: a possible bottleneck.


Majority locking

The previous solutions are vulnerable to site failure (any in the first, a central in the other two).

Read[x]: lock a majority of x’s copies. Write[x]: lock a majority of x’s copies. Thus, for all x, no transactions that conflict on x can

both have a majority – effective lock. Disadvantage: many messages, can trade time for

number of messages using “forwarding”.


A generalization

Suppose there are n copies of x. Let k, l be s.t. k + l > n and l > n/2. Read[x]: obtain k out of n. Write[x]: obtain l out of n. There can be no concurrent reader/writer and

another writer of x effectively locking x. Choose l, k:

Many readers: small k. Many writers: small l.


Part II: Distributed Deadlocks Left as reading material.


Part III: Timestamp based Algorithms A system model. Assumptions. Operations in a distributed environment. Timestamp Ordering (TO). Conservative Timestamp Ordering (CTO). Transaction classes.


A system modelDATADMTM

Transaction

Transaction

DATADMTMTransaction

Transaction

DATADMTMTransaction

Transaction


Assumptions

No concurrency within a transaction. Write into private workspaces at the various

DMs. Each transaction is managed by a single TM. Each item x may have a number of physical

copies x1, … , xn.


Operations in a distributed environment Begin: set up a private workspace. Read[x]: If x is in the workspace, read it from there.

Otherwise read x from some copy xi by issuing dm_read.

Write[x,v]: The single copy of x in the private workspace is assigned v.

END: perform a 2-phase commit: For each updated x, for all copies of x:

Issue a pre-stable-write command to store x on stable storage.

Once all DMs confirm: issue dm-write commands to the DMs to install the new value in the database.


Timestamp Ordering (TO) - skip Idea: conflict equivalent to a serial history in

timestamp order. Item = <S_read, S_write, stable, value>

S_read – set of readers’ timestamps of the item. S_write – set of writers’ timestamps of the item. stable – a flag indicating a committed value.


Timestamp Ordering (TO) - skip On accessing an item with stable = no:

wait possible deadlock. abort may be wasteful.

DM_Read with ts: if ts < max {t | t S_write } abort. otherwise, read and add ts to S_read.

DM_Write with ts: if ts < max {t | t S_read} abort. if ts < max {t | t S_write } ignore (TWR). otherwise, set stable = no; write and add ts to S_write.

Commit: After all writes are performed, set stable = yes. Abort: Remove ts from S_read and S_write. Make all items the

transaction updated stable=yes.

cs

essentially a lock

cs

read from the past

cs

there's a later readNote - this is done as part of commit.This means we need to store pre-images to restore.

cs

In this algorithm commit is decided once all DM_Writes are executed. This implies we need to keep pre-images.


Another Timestamp Ordering Algorithm Terminology

DM_r: read item. DM_w: write item at transaction end. p[x]: synchronization due to private write. WTS(x), RTS(x): ts of latest dm-write, dm-read. Buffering: Delaying operations for future execution. min_r(x): ts of earliest ts buffered read op. min_w(x), min_p(x): same idea. DM_r[x] is ready if ts(DM_r[x]) < min_p(x) in the buffer, if

any. DM_w[x] is ready if ts(DM_w[x]) < min_r(x), if any, and

min_p(x) = ts(DM_w[x] ), in the buffer.

cs

can read the proper version for this op

cs

first pending reader is later


Another Timestamp Ordering Algorithm DM_r[x]:

if ts(r[x]) < WTS[x] abort. if ts(r[x]) > min_p[x] put in buffer. otherwise, perform and update RTS(x).

p[x]: if ts(p[x]) < RTS[x] abort. if ts(p[x]) < WTS[x] abort. otherwise, put in buffer.

DM_w[x]: (note: a p[x] previously executed, no abort) if ts(w[x]) > min_r[x] put in buffer. if ts(w[x]) > min_p[x] put in buffer. otherwise, perform, update WTS(x) and throw p[x].

Occasionally check if actions change min_r(x) and min_p(x) some buffer operation is now ready.

Observations: No deadlocks are possible (why?). Upon abort, discard the private workspace and all transaction’s operations.

Need to update RTS(x).

cs

reading from the past

cs

wait until proper version is installed

cs

synchronize during execution

cs

a later version was read, we are not sure which ...

cs

writing into the past, can we use TWR? NO!

cs

appears at end of transactionNote - unlike previous ts alg., here you cannot abort during your DM writes.

cs

p[x] no longer relevant

cs

In this algorithm before DM_Writes are performed and once values are on stable storage, commit can be decided as there will be no aborts.

cs

if a transaction waits for another's operation, that other transaction has a lower timestamp.


Conservative Timestamp Ordering (CTO) To prevent aborts do the following. Perform an operation only when certain that a

later operation will not be restarted due to small ts.

No aborts and no deadlocks, less concurrency.

But, how long to wait? Solution – use CTO. Operations: DM_r, DM_w.


Conservative Timestamp Ordering Architecture

TM1

transaction

transaction

TMn

transaction

transaction

DM1

DMk

queue – ts order

queue – ts order


Conservative Timestamp Ordering Algorithm TMs must submit dm-read operations in ts-

order, if an operation with ts t is issued, one with ts s < t will not be issued in the future.

Similarly for dm-writes. Achieve by:

each TM works serially. each transaction first reads all its data and

then writes all results at the end (still in ts order but allows execution parallelism). Termination of transactions in ts order.

Data items need no associated timestamps.

cs

Check this again!Ceri, page 231


CTO - Ready Operations

Maintain at each DM a queue for read and write ops. Buffer DM_r and DM_w operations. Output a DM_r operation if:

There is a DM_w operation from each TMi and all such operations have higher timestamps.

Output a DM_w operation if: There is a DM_r operation from each TMi and all such

operations have higher timestamps. There is a DM_w operation from each TMi and all such

operations have higher timestamps. DM_w operations are never rejected! Overall effect: a serial execution in timestamp order!


Conservative Timestamp Ordering - Problems Problem: What if a TM issues no operation to

this queue? Solution: null operations (have a ts but are no-

ops). Can send ‘infinite time stamps’ to indicate

expected long inactivity.


Transaction classes CTO synchronizes everything, an overkill! Transaction class = <readset, writeset> If transactions are known in advance, each transaction

can be assigned to one or more classes. If T reads X and writes Y, T belong to a class c=<rs, ws>

if X rs and Y ws. A TM manages a single class. A transaction must belong to the class of the TM

managing it. Run CTO, only ops of relevant TMs are considered:

To output DM_r[x], wait until there are DM_w operations from all TMs (classes) that have x in their write sets with higher ts.

To output DM_w[x], wait until there are DM_r operations with higher ts from all TMs (classes) that have x in their read sets and DM_w operations with higher ts from all TMs (classes) that have x in their write sets.


Part IV: Optimistic Concurrency Control Can be based on locking or timestamps. First, a centralized algorithm. We show a timestamp-based algorithm

(Kung-Robinson). Then, adaptation to a distributed

environment.


Rules for Validation: Centralized A transaction has read, validate and write phases. During

the read phase it also computes and writes to a private space.

Executions will be serializable in timestamp order. To ensure this, for all transactions Tk s.t. ts(Tk) < ts(T), one

of the following should hold: Tk completed its write phase prior to T starting its

read phase. Tk completed its write phase while T is in its read

phase and write-set(Tk) read-set(T) = . Tk completed its read phase before T completes its

read-phase, write-set(Tk) read-set(T) = and write-set(Tk) write-set(T) = .

A timestamp is assigned only once validation succeeds. Do it after the write phase.

Different validations can be executed in parallel. So, each transaction T uses START(T) and FINISH(T) to

determine the transactions against which it should be validated.

cs

With some adjustments not detailed here.

cs

Important - to make the START and FINISH work, they are incremented after each access!!!


Rules for T and Tk

R V W

R V W

R V W

R V W

R V W

R V W

rule a

rule b

rule c

Tk completed its write phase prior to T starting its read phase. ts(Tk) < start(T).

Tk completed its write phase while T is in its read phase and write-set(Tk) read-set(T) = . start(T) < ts(Tk) < finish(T).

Tk completed its read phase before T completes its read-phase, write-set(Tk) read-set(T) = and write-set(Tk) write-set(T) = .

finish(Tk) < finish(T).

cs

start(T) - when T started read-phasefinish(T) - when T finished read phase.


Distributed Setting

A transaction can execute at many sites. Perform a validation phase at each site in which T operated. This

is called local validation. The local site may have purely local as well as sub-transactions

of global transactions. If validation is successful at all sites, ensure global consistency:

Build HB(Tj) for each sub-transaction of T at site j. This is a set of id’s of global transactions that must precede T. Built during local validation.

Global validation is done by making sure that each transaction in the HB set is either committed or aborted.

Deadlocks are possible. After the global validation phase, a timestamp can now be

issued. It will be the same one for all local sub-transactions of a global transaction.

Use 2-phase commit. Notify local sub-transactions.

cs

What is the problem?Since each site does independent local validation, in one site the order may ne Ti Tj and in another Tj Ti. So, we need to ensure a global order. The HB (happened before) set is the tool for doing that. It ensures there are no ordering conflicts.Are there other solutions. Yes, for example use a global counter for local validations. Drawback - too much dependency, especially if few global sub-transactions are operational at a site.

(c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

Documents

c oded shmueli

central copy

single copy of x

number of copies

writers lock

central locking site

distributed deadlocks

copy xi