(c) Oded Shmueli 2004 1 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)
Dec 20, 2015
(c) Oded Shmueli 2004 1
Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)
(c) Oded Shmueli 2004 2
Motivation
Distributed usage. Local autonomy. Maintainability.
Allows for growth. Reliability – a number of copies. Components:
Reliable communication. Local DBs – may be identical.
Problems: Query processing. Maintaining multiple copies. Concurrency control and recovery. Distributed data dictionary.
(c) Oded Shmueli 2004 3
Topics
Part I: Distributed 2PL Part II: Distributed Deadlocks Part III: Timestamp based Algorithms Part IV: Optimistic Concurrency Control
(c) Oded Shmueli 2004 4
Part I: Distributed 2PL
Each item may have a number of copies. Intuitively – behave as if there is a single copy. Mechanisms:
Writers lock all copies. Central copy. Central locking site. Majority locking. A generalization. Moving central copy – not covered.
(c) Oded Shmueli 2004 5
Writers lock all copies
Each copy may be locked individually. Read[x]: lock some copy of x. Write[x]: lock all copies of x. Resulting executions are SR. Problems:
writers tend to deadlock. Many messages.
(c) Oded Shmueli 2004 6
Central copy
A central copy per item. Read[x]: read-lock the central copy. Write[x]: write-lock the central copy. Advantage: fewer messages.
(c) Oded Shmueli 2004 7
Central locking site
A single site that maintains a global lock table.
Advantages: few messages. checking the WFG for deadlocks.
Disadvantages: a possible bottleneck.
(c) Oded Shmueli 2004 8
Majority locking
The previous solutions are vulnerable to site failure (any in the first, a central in the other two).
Read[x]: lock a majority of x’s copies. Write[x]: lock a majority of x’s copies. Thus, for all x, no transactions that conflict on x can
both have a majority – effective lock. Disadvantage: many messages, can trade time for
number of messages using “forwarding”.
(c) Oded Shmueli 2004 9
A generalization
Suppose there are n copies of x. Let k, l be s.t. k + l > n and l > n/2. Read[x]: obtain k out of n. Write[x]: obtain l out of n. There can be no concurrent reader/writer and
another writer of x effectively locking x. Choose l, k:
Many readers: small k. Many writers: small l.
(c) Oded Shmueli 2004 10
Part II: Distributed Deadlocks Left as reading material.
(c) Oded Shmueli 2004 11
Part III: Timestamp based Algorithms A system model. Assumptions. Operations in a distributed environment. Timestamp Ordering (TO). Conservative Timestamp Ordering (CTO). Transaction classes.
(c) Oded Shmueli 2004 12
A system modelDATADMTM
Transaction
Transaction
DATADMTMTransaction
Transaction
DATADMTMTransaction
Transaction
(c) Oded Shmueli 2004 13
Assumptions
No concurrency within a transaction. Write into private workspaces at the various
DMs. Each transaction is managed by a single TM. Each item x may have a number of physical
copies x1, … , xn.
(c) Oded Shmueli 2004 14
Operations in a distributed environment Begin: set up a private workspace. Read[x]: If x is in the workspace, read it from there.
Otherwise read x from some copy xi by issuing dm_read.
Write[x,v]: The single copy of x in the private workspace is assigned v.
END: perform a 2-phase commit: For each updated x, for all copies of x:
Issue a pre-stable-write command to store x on stable storage.
Once all DMs confirm: issue dm-write commands to the DMs to install the new value in the database.
(c) Oded Shmueli 2004 15
Timestamp Ordering (TO) - skip Idea: conflict equivalent to a serial history in
timestamp order. Item = <S_read, S_write, stable, value>
S_read – set of readers’ timestamps of the item. S_write – set of writers’ timestamps of the item. stable – a flag indicating a committed value.
(c) Oded Shmueli 2004 16
Timestamp Ordering (TO) - skip On accessing an item with stable = no:
wait possible deadlock. abort may be wasteful.
DM_Read with ts: if ts < max {t | t S_write } abort. otherwise, read and add ts to S_read.
DM_Write with ts: if ts < max {t | t S_read} abort. if ts < max {t | t S_write } ignore (TWR). otherwise, set stable = no; write and add ts to S_write.
Commit: After all writes are performed, set stable = yes. Abort: Remove ts from S_read and S_write. Make all items the
transaction updated stable=yes.
(c) Oded Shmueli 2004 17
Another Timestamp Ordering Algorithm Terminology
DM_r: read item. DM_w: write item at transaction end. p[x]: synchronization due to private write. WTS(x), RTS(x): ts of latest dm-write, dm-read. Buffering: Delaying operations for future execution. min_r(x): ts of earliest ts buffered read op. min_w(x), min_p(x): same idea. DM_r[x] is ready if ts(DM_r[x]) < min_p(x) in the buffer, if
any. DM_w[x] is ready if ts(DM_w[x]) < min_r(x), if any, and
min_p(x) = ts(DM_w[x] ), in the buffer.
(c) Oded Shmueli 2004 18
Another Timestamp Ordering Algorithm DM_r[x]:
if ts(r[x]) < WTS[x] abort. if ts(r[x]) > min_p[x] put in buffer. otherwise, perform and update RTS(x).
p[x]: if ts(p[x]) < RTS[x] abort. if ts(p[x]) < WTS[x] abort. otherwise, put in buffer.
DM_w[x]: (note: a p[x] previously executed, no abort) if ts(w[x]) > min_r[x] put in buffer. if ts(w[x]) > min_p[x] put in buffer. otherwise, perform, update WTS(x) and throw p[x].
Occasionally check if actions change min_r(x) and min_p(x) some buffer operation is now ready.
Observations: No deadlocks are possible (why?). Upon abort, discard the private workspace and all transaction’s operations.
Need to update RTS(x).
(c) Oded Shmueli 2004 19
Conservative Timestamp Ordering (CTO) To prevent aborts do the following. Perform an operation only when certain that a
later operation will not be restarted due to small ts.
No aborts and no deadlocks, less concurrency.
But, how long to wait? Solution – use CTO. Operations: DM_r, DM_w.
(c) Oded Shmueli 2004 20
Conservative Timestamp Ordering Architecture
TM1
transaction
transaction
TMn
transaction
transaction
DM1
DMk
queue – ts order
queue – ts order
(c) Oded Shmueli 2004 21
Conservative Timestamp Ordering Algorithm TMs must submit dm-read operations in ts-
order, if an operation with ts t is issued, one with ts s < t will not be issued in the future.
Similarly for dm-writes. Achieve by:
each TM works serially. each transaction first reads all its data and
then writes all results at the end (still in ts order but allows execution parallelism). Termination of transactions in ts order.
Data items need no associated timestamps.
(c) Oded Shmueli 2004 22
CTO - Ready Operations
Maintain at each DM a queue for read and write ops. Buffer DM_r and DM_w operations. Output a DM_r operation if:
There is a DM_w operation from each TMi and all such operations have higher timestamps.
Output a DM_w operation if: There is a DM_r operation from each TMi and all such
operations have higher timestamps. There is a DM_w operation from each TMi and all such
operations have higher timestamps. DM_w operations are never rejected! Overall effect: a serial execution in timestamp order!
(c) Oded Shmueli 2004 23
Conservative Timestamp Ordering - Problems Problem: What if a TM issues no operation to
this queue? Solution: null operations (have a ts but are no-
ops). Can send ‘infinite time stamps’ to indicate
expected long inactivity.
(c) Oded Shmueli 2004 24
Transaction classes CTO synchronizes everything, an overkill! Transaction class = <readset, writeset> If transactions are known in advance, each transaction
can be assigned to one or more classes. If T reads X and writes Y, T belong to a class c=<rs, ws>
if X rs and Y ws. A TM manages a single class. A transaction must belong to the class of the TM
managing it. Run CTO, only ops of relevant TMs are considered:
To output DM_r[x], wait until there are DM_w operations from all TMs (classes) that have x in their write sets with higher ts.
To output DM_w[x], wait until there are DM_r operations with higher ts from all TMs (classes) that have x in their read sets and DM_w operations with higher ts from all TMs (classes) that have x in their write sets.
(c) Oded Shmueli 2004 25
Part IV: Optimistic Concurrency Control Can be based on locking or timestamps. First, a centralized algorithm. We show a timestamp-based algorithm
(Kung-Robinson). Then, adaptation to a distributed
environment.
(c) Oded Shmueli 2004 26
Rules for Validation: Centralized A transaction has read, validate and write phases. During
the read phase it also computes and writes to a private space.
Executions will be serializable in timestamp order. To ensure this, for all transactions Tk s.t. ts(Tk) < ts(T), one
of the following should hold: Tk completed its write phase prior to T starting its
read phase. Tk completed its write phase while T is in its read
phase and write-set(Tk) read-set(T) = . Tk completed its read phase before T completes its
read-phase, write-set(Tk) read-set(T) = and write-set(Tk) write-set(T) = .
A timestamp is assigned only once validation succeeds. Do it after the write phase.
Different validations can be executed in parallel. So, each transaction T uses START(T) and FINISH(T) to
determine the transactions against which it should be validated.
(c) Oded Shmueli 2004 27
Rules for T and Tk
R V W
R V W
R V W
R V W
R V W
R V W
rule a
rule b
rule c
Tk completed its write phase prior to T starting its read phase. ts(Tk) < start(T).
Tk completed its write phase while T is in its read phase and write-set(Tk) read-set(T) = . start(T) < ts(Tk) < finish(T).
Tk completed its read phase before T completes its read-phase, write-set(Tk) read-set(T) = and write-set(Tk) write-set(T) = .
finish(Tk) < finish(T).
(c) Oded Shmueli 2004 28
Distributed Setting
A transaction can execute at many sites. Perform a validation phase at each site in which T operated. This
is called local validation. The local site may have purely local as well as sub-transactions
of global transactions. If validation is successful at all sites, ensure global consistency:
Build HB(Tj) for each sub-transaction of T at site j. This is a set of id’s of global transactions that must precede T. Built during local validation.
Global validation is done by making sure that each transaction in the HB set is either committed or aborted.
Deadlocks are possible. After the global validation phase, a timestamp can now be
issued. It will be the same one for all local sub-transactions of a global transaction.
Use 2-phase commit. Notify local sub-transactions.