Top Banner
(c) Oded Shmueli 2004 1 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)
28

(c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 1

Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

Page 2: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 2

Motivation

Distributed usage. Local autonomy. Maintainability.

Allows for growth. Reliability – a number of copies. Components:

Reliable communication. Local DBs – may be identical.

Problems: Query processing. Maintaining multiple copies. Concurrency control and recovery. Distributed data dictionary.

Page 3: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 3

Topics

Part I: Distributed 2PL Part II: Distributed Deadlocks Part III: Timestamp based Algorithms Part IV: Optimistic Concurrency Control

Page 4: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 4

Part I: Distributed 2PL

Each item may have a number of copies. Intuitively – behave as if there is a single copy. Mechanisms:

Writers lock all copies. Central copy. Central locking site. Majority locking. A generalization. Moving central copy – not covered.

cs
Note - we only care about synchronization here, the issue of actual reading or writing is covered by the EFFECTIVE lock a transaction obtains.
Page 5: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 5

Writers lock all copies

Each copy may be locked individually. Read[x]: lock some copy of x. Write[x]: lock all copies of x. Resulting executions are SR. Problems:

writers tend to deadlock. Many messages.

Page 6: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 6

Central copy

A central copy per item. Read[x]: read-lock the central copy. Write[x]: write-lock the central copy. Advantage: fewer messages.

Page 7: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 7

Central locking site

A single site that maintains a global lock table.

Advantages: few messages. checking the WFG for deadlocks.

Disadvantages: a possible bottleneck.

Page 8: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 8

Majority locking

The previous solutions are vulnerable to site failure (any in the first, a central in the other two).

Read[x]: lock a majority of x’s copies. Write[x]: lock a majority of x’s copies. Thus, for all x, no transactions that conflict on x can

both have a majority – effective lock. Disadvantage: many messages, can trade time for

number of messages using “forwarding”.

Page 9: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 9

A generalization

Suppose there are n copies of x. Let k, l be s.t. k + l > n and l > n/2. Read[x]: obtain k out of n. Write[x]: obtain l out of n. There can be no concurrent reader/writer and

another writer of x effectively locking x. Choose l, k:

Many readers: small k. Many writers: small l.

Page 10: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 10

Part II: Distributed Deadlocks Left as reading material.

Page 11: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 11

Part III: Timestamp based Algorithms A system model. Assumptions. Operations in a distributed environment. Timestamp Ordering (TO). Conservative Timestamp Ordering (CTO). Transaction classes.

Page 12: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 12

A system modelDATADMTM

Transaction

Transaction

DATADMTMTransaction

Transaction

DATADMTMTransaction

Transaction

Page 13: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 13

Assumptions

No concurrency within a transaction. Write into private workspaces at the various

DMs. Each transaction is managed by a single TM. Each item x may have a number of physical

copies x1, … , xn.

Page 14: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 14

Operations in a distributed environment Begin: set up a private workspace. Read[x]: If x is in the workspace, read it from there.

Otherwise read x from some copy xi by issuing dm_read.

Write[x,v]: The single copy of x in the private workspace is assigned v.

END: perform a 2-phase commit: For each updated x, for all copies of x:

Issue a pre-stable-write command to store x on stable storage.

Once all DMs confirm: issue dm-write commands to the DMs to install the new value in the database.

Page 15: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 15

Timestamp Ordering (TO) - skip Idea: conflict equivalent to a serial history in

timestamp order. Item = <S_read, S_write, stable, value>

S_read – set of readers’ timestamps of the item. S_write – set of writers’ timestamps of the item. stable – a flag indicating a committed value.

Page 16: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 16

Timestamp Ordering (TO) - skip On accessing an item with stable = no:

wait possible deadlock. abort may be wasteful.

DM_Read with ts: if ts < max {t | t S_write } abort. otherwise, read and add ts to S_read.

DM_Write with ts: if ts < max {t | t S_read} abort. if ts < max {t | t S_write } ignore (TWR). otherwise, set stable = no; write and add ts to S_write.

Commit: After all writes are performed, set stable = yes. Abort: Remove ts from S_read and S_write. Make all items the

transaction updated stable=yes.

cs
essentially a lock
cs
read from the past
cs
there's a later readNote - this is done as part of commit.This means we need to store pre-images to restore.
cs
In this algorithm commit is decided once all DM_Writes are executed. This implies we need to keep pre-images.
Page 17: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 17

Another Timestamp Ordering Algorithm Terminology

DM_r: read item. DM_w: write item at transaction end. p[x]: synchronization due to private write. WTS(x), RTS(x): ts of latest dm-write, dm-read. Buffering: Delaying operations for future execution. min_r(x): ts of earliest ts buffered read op. min_w(x), min_p(x): same idea. DM_r[x] is ready if ts(DM_r[x]) < min_p(x) in the buffer, if

any. DM_w[x] is ready if ts(DM_w[x]) < min_r(x), if any, and

min_p(x) = ts(DM_w[x] ), in the buffer.

cs
can read the proper version for this op
cs
first pending reader is later
Page 18: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 18

Another Timestamp Ordering Algorithm DM_r[x]:

if ts(r[x]) < WTS[x] abort. if ts(r[x]) > min_p[x] put in buffer. otherwise, perform and update RTS(x).

p[x]: if ts(p[x]) < RTS[x] abort. if ts(p[x]) < WTS[x] abort. otherwise, put in buffer.

DM_w[x]: (note: a p[x] previously executed, no abort) if ts(w[x]) > min_r[x] put in buffer. if ts(w[x]) > min_p[x] put in buffer. otherwise, perform, update WTS(x) and throw p[x].

Occasionally check if actions change min_r(x) and min_p(x) some buffer operation is now ready.

Observations: No deadlocks are possible (why?). Upon abort, discard the private workspace and all transaction’s operations.

Need to update RTS(x).

cs
reading from the past
cs
wait until proper version is installed
cs
synchronize during execution
cs
a later version was read, we are not sure which ...
cs
writing into the past, can we use TWR? NO!
cs
appears at end of transactionNote - unlike previous ts alg., here you cannot abort during your DM writes.
cs
p[x] no longer relevant
cs
In this algorithm before DM_Writes are performed and once values are on stable storage, commit can be decided as there will be no aborts.
cs
if a transaction waits for another's operation, that other transaction has a lower timestamp.
Page 19: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 19

Conservative Timestamp Ordering (CTO) To prevent aborts do the following. Perform an operation only when certain that a

later operation will not be restarted due to small ts.

No aborts and no deadlocks, less concurrency.

But, how long to wait? Solution – use CTO. Operations: DM_r, DM_w.

Page 20: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 20

Conservative Timestamp Ordering Architecture

TM1

transaction

transaction

TMn

transaction

transaction

DM1

DMk

queue – ts order

queue – ts order

Page 21: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 21

Conservative Timestamp Ordering Algorithm TMs must submit dm-read operations in ts-

order, if an operation with ts t is issued, one with ts s < t will not be issued in the future.

Similarly for dm-writes. Achieve by:

each TM works serially. each transaction first reads all its data and

then writes all results at the end (still in ts order but allows execution parallelism). Termination of transactions in ts order.

Data items need no associated timestamps.

cs
Check this again!Ceri, page 231
Page 22: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 22

CTO - Ready Operations

Maintain at each DM a queue for read and write ops. Buffer DM_r and DM_w operations. Output a DM_r operation if:

There is a DM_w operation from each TMi and all such operations have higher timestamps.

Output a DM_w operation if: There is a DM_r operation from each TMi and all such

operations have higher timestamps. There is a DM_w operation from each TMi and all such

operations have higher timestamps. DM_w operations are never rejected! Overall effect: a serial execution in timestamp order!

Page 23: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 23

Conservative Timestamp Ordering - Problems Problem: What if a TM issues no operation to

this queue? Solution: null operations (have a ts but are no-

ops). Can send ‘infinite time stamps’ to indicate

expected long inactivity.

Page 24: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 24

Transaction classes CTO synchronizes everything, an overkill! Transaction class = <readset, writeset> If transactions are known in advance, each transaction

can be assigned to one or more classes. If T reads X and writes Y, T belong to a class c=<rs, ws>

if X rs and Y ws. A TM manages a single class. A transaction must belong to the class of the TM

managing it. Run CTO, only ops of relevant TMs are considered:

To output DM_r[x], wait until there are DM_w operations from all TMs (classes) that have x in their write sets with higher ts.

To output DM_w[x], wait until there are DM_r operations with higher ts from all TMs (classes) that have x in their read sets and DM_w operations with higher ts from all TMs (classes) that have x in their write sets.

Page 25: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 25

Part IV: Optimistic Concurrency Control Can be based on locking or timestamps. First, a centralized algorithm. We show a timestamp-based algorithm

(Kung-Robinson). Then, adaptation to a distributed

environment.

Page 26: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 26

Rules for Validation: Centralized A transaction has read, validate and write phases. During

the read phase it also computes and writes to a private space.

Executions will be serializable in timestamp order. To ensure this, for all transactions Tk s.t. ts(Tk) < ts(T), one

of the following should hold: Tk completed its write phase prior to T starting its

read phase. Tk completed its write phase while T is in its read

phase and write-set(Tk) read-set(T) = . Tk completed its read phase before T completes its

read-phase, write-set(Tk) read-set(T) = and write-set(Tk) write-set(T) = .

A timestamp is assigned only once validation succeeds. Do it after the write phase.

Different validations can be executed in parallel. So, each transaction T uses START(T) and FINISH(T) to

determine the transactions against which it should be validated.

cs
With some adjustments not detailed here.
cs
Important - to make the START and FINISH work, they are incremented after each access!!!
Page 27: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 27

Rules for T and Tk

R V W

R V W

R V W

R V W

R V W

R V W

rule a

rule b

rule c

Tk completed its write phase prior to T starting its read phase. ts(Tk) < start(T).

Tk completed its write phase while T is in its read phase and write-set(Tk) read-set(T) = . start(T) < ts(Tk) < finish(T).

Tk completed its read phase before T completes its read-phase, write-set(Tk) read-set(T) = and write-set(Tk) write-set(T) = .

finish(Tk) < finish(T).

cs
start(T) - when T started read-phasefinish(T) - when T finished read phase.
Page 28: (c) Oded Shmueli 20041 Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article)

(c) Oded Shmueli 2004 28

Distributed Setting

A transaction can execute at many sites. Perform a validation phase at each site in which T operated. This

is called local validation. The local site may have purely local as well as sub-transactions

of global transactions. If validation is successful at all sites, ensure global consistency:

Build HB(Tj) for each sub-transaction of T at site j. This is a set of id’s of global transactions that must precede T. Built during local validation.

Global validation is done by making sure that each transaction in the HB set is either committed or aborted.

Deadlocks are possible. After the global validation phase, a timestamp can now be

issued. It will be the same one for all local sub-transactions of a global transaction.

Use 2-phase commit. Notify local sub-transactions.

cs
What is the problem?Since each site does independent local validation, in one site the order may ne Ti Tj and in another Tj Ti. So, we need to ensure a global order. The HB (happened before) set is the tool for doing that. It ensures there are no ordering conflicts.Are there other solutions. Yes, for example use a global counter for local validations. Drawback - too much dependency, especially if few global sub-transactions are operational at a site.