TRANSACTION PROCESSING - RBVRR Womens College

TRANSACTION PROCESSING

UNIT-III

Transaction ConceptA transaction is a unit of program execution that accesses and possibly updates various data items.

A transaction must see a consistent database.

During transaction execution the database may be temporarily inconsistent.

When the transaction completes successfully (is committed), the database must be consistent.

After a transaction commits, the changes it has made to the database persist, even if there are system failures.

Multiple transactions can execute in parallel.

Two main issues to deal with:

Failures of various kinds, such as hardware failures and system crashes

Concurrent execution of multiple transactions

Transaction ConceptsA transaction is a computation (i.e., program in execution, not the source code or binaries) that accesses and possibly modifies a database:

Can be interleaved with other transactions

But guarantees certain properties

The purpose of the transaction concept is to avoid the problems that may arise from interleaving

Transaction ConceptsOperation: action on a data item

Transaction: set of operations performed in some partial order according to the specifying program

A transaction makes a set of operations appear as one logical operation

If programmers guarantee that their transactions are individually correct, then the DBMS guarantees that all interleaved executions are also correct

Transaction ConceptsA transaction is a logical unit of databaseprocessing that includes one or more databaseaccess operations – these include

session,

deletion,

modification or

retrieval operations.

The database operations that form a transactioncan either be embedded within an applicationprogram or they can be specified via a high-levelquery language such as SQL.

Transaction ConceptsOne way of specifying the transaction boundaries

are by specifying explicit begin transaction and

end transaction.

In the case of statements in an application

program, all databases access operations between

the two are considered as forming one transaction.

If the database operations in a transaction do not

update the database but only retrieve data, the

transaction is called a read-only transaction

Transaction ConceptsConcurrency control and recovery mechanism are mainly concerned with the database access commands in a transaction.

Transactions submitted by the various users may execute concurrently and may access and update the same database items.

If this concurrent execution is uncontrolled, it may add to problems, such as inconsistent database.

Transaction Lifecycle

A transaction goes through well-defined stages in its life (always terminating)

inactive

active (may read and write: this is where the entire business logic occurs)

precommit (no errors during execution; needed for mutual commitment protocols)

failed (errors)

committed (the DBMS decides this)

forgotten (the DBMS reclaims data structures)

TRANSACTION STATESWe need to be more precise about what is meant by―successful completion‖ of a transaction. To do so, weestablish a simple abstract transaction model. A transactionmust be in one of the following states:

Active, the initial state.

Partially committed, after the last statement has beenexecuted.

Failed, after the discovery that normal execution can nolonger proceed.

Aborted, after the transaction has been rolled back andthe database restored to its state prior to the start of thetransaction.

Committed, after ―successful‖ completion.

Transaction State (Cont.)

ACID PropertiesThese formalize the notion of a transaction behaving as one operation

(Failure) Atomicity—all or none—if failed then no changes to DB or messages

Consistency—don't violate DB integrity constraints: execution of the op is correct

Isolation (Atomicity)—partial results are hidden

Durability—effects (of transactions that "happened" or committed) are forever

ACID Properties

Atomicity. Either all operations of the transaction are properly reflected in the database or none are.

Consistency. Execution of a transaction in isolation preserves the consistency of the database.

Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions.

That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started execution after Ti finished.

Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

A transaction is a unit of program execution that accesses and possibly

updates various data items.To preserve the integrity of data the database

system must ensure:

Concurrent Transaction Processing

SchedulesSchedules are histories of computations showing all events of interest

Schedule of T1...Tn has ops of T1...Tn in the same order as within each Ti, but interleavedacross Ti to model concurrency

Includes active transactions

Typically a partial order among events

Two challengesWhat are the bad schedules?

How can the DBMS prevent them?

ConflictOrder-sensitivity of operations

Two operations of different transactions, but on the same data, conflict if

Their mutual order is significant, i.e., determines at least one of the following:

The final value of that item read by future transactions

The value of the item as read by present transactions

Consider the flow of information or the nonflow of information

Serial SchedulesTransactions are wholly before or after

others (i.e., one by one)

Clearly, we must allow for service requests to come in slowly, one-by-one

Thus, under independence of transactions (assuming each transaction is correct), serial schedules are obviously correct

Serializable Schedules

Interleaved schedules are desirable

Why?

Those equivalent to some serial schedule. Here equivalent can mean

Conflict equivalent —all pairs of conflicting ops are ordered the same way

View equivalent—all users get the same view

Achieving SerializabilityOptimistically: Let each transaction run, but check for serializability before committing

Pessimistically: Use a protocol, e.g., locking, to ensure that only serializable schedules are realized

Generally, the pessimistic approach is more common

Example of Fund TransferTransaction to transfer $50 from account A to account B:1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

Atomicity requirement — if the transaction fails after step 3 and before step 6, the system should ensure that its updates are not reflected in the database, else an inconsistency will result.

Consistency requirement – the sum of A and B is unchanged by the execution of the transaction.

Example of Fund Transfer(Cont.)Isolation requirement — if between steps 3 and 6, another transaction is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be).

Isolation can be ensured trivially by running transactions serially, that is one after the other.

However, executing multiple transactions concurrently has significant benefits, as we will see later.

Durability requirement — once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist despite failures.

Implementation of Atomicity and Durability

The recovery-management component of a database system implements the support for atomicity and durability.

The shadow-database scheme:assume that only one transaction is active at a time.

a pointer called db_pointer always points to the current consistent copy of the database.

all updates are made on a shadow copy of the database, and db_pointer is made to point to the updated shadow copy only after the transaction reaches partial commit and all updated pages have been flushed to disk.

in case transaction fails, old consistent copy pointed to by db_pointer can be used, and the shadow copy can be deleted.

Implementation of Atomicity and Durability (Cont.)

Assumes disks do not failUseful for text editors, but

extremely inefficient for large databases (why?)Does not handle concurrent transactions

The shadow-database scheme:

Concurrent Executions

Multiple transactions are allowed to run concurrently in the system. Advantages are:

increased processor and disk utilization, leading to better transaction throughput: one transaction can be using the CPU while another is reading from or writing to the disk

reduced average response time for transactions: short transactions need not wait behind long ones.

Concurrency control schemes – mechanisms to

achieve isolation; that is, to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database

SchedulesSchedule – a sequences of instructions that specify the chronological order in which instructions of concurrent transactions are executed

a schedule for a set of transactions must consist of all instructions of those transactions

must preserve the order in which the instructions appear in each individual transaction.

A transaction that successfully completes its execution will have a commit instructions as the last statement (will be omitted if it is obvious)

A transaction that fails to successfully complete its execution will have an abort instructions as the last statement (will be omitted if it is obvious)

Schedule 1Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.A serial schedule in which T1 is followed by T2:

Schedule 2• A serial schedule where T2 is followed by T1

Schedule 3Let T1 and T2 be the transactions defined previously.The following schedule is not a serial schedule, but it is equivalent to Schedule 1.

In Schedules 1, 2 and 3, the sum A + B is preserved.

Schedule 4

The following concurrent schedule does not preserve the value of (A + B).

Serializability

Basic Assumption – Each transaction preserves database consistency.

Thus serial execution of a set of transactions preserves database consistency.

A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of:

1.conflict serializability

2.view serializability

We ignore operations other than read and writeinstructions, and we assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write instructions.

Conflicting Instructions Instructions li and lj of transactions Ti and Tjrespectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q.

1. li = read(Q), lj = read(Q). li and lj don’t conflict.

2. li = read(Q), lj = write(Q). They conflict.3. li = write(Q), lj = read(Q). They conflict4. li = write(Q), lj = write(Q). They conflict

Intuitively, a conflict between li and lj forces a (logical) temporal order between them.

If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.

Conflict SerializabilityIf a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent.

We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule

Conflict Serializability (Cont.)Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions.

Therefore Schedule 3 is conflict serializable.

Schedule 3 Schedule 6

Conflict Serializability (Cont.)

Example of a schedule that is not conflict serializable:

We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.

View SerializabilityLet S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met:

1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in schedule S´, also read the initial value of Q.

2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read the value of Q that was produced by transaction Tj .

3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S´.

As can be seen, view equivalence is also based purely on reads and writesalone.

View Serializability (Cont.)

A schedule S is view serializable it is view equivalent to a serial schedule.

Every conflict serializable schedule is also view serializable.

Below is a schedule which is view-serializable but not conflict serializable.

What serial schedule is above equivalent to?

Every view serializable schedule that is not conflict serializable has blind writes.

Other Notions of SerializabilityThe schedule below produces same outcome as the serial schedule < T1, T5 >, yet is not conflict equivalent or view equivalent to it.

Determining such equivalence requires analysis of operations other than read and write.

Testing for SerializabilityConsider some schedule of a set of transactions T1, T2, ..., Tn

Precedence graph — a direct graph where the vertices are the transactions (names).

We draw an arc from Ti to Tj if the two transaction conflict, and Ti accessed the data item on which the conflict arose earlier.

We may label the arc by the item that was accessed.

Example 1x

y

Example Schedule (Schedule A) + Precedence Graph

T1 T2 T3 T4 T5

read(X)

read(Y)

read(Z)

read(V)

read(W)

read(W)

read(Y)

write(Y)

write(Z)

read(U)

read(Y)

write(Y)

read(Z)

write(Z)

read(U)

write(U)

T3T4

T1 T2

Test for Conflict SerializabilityA schedule is conflict serializable if and only if its precedence graph is acyclic.

Cycle-detection algorithms exist which take order n2 time, where n is the number of vertices in the graph.

(Better algorithms take order n + e where e is the number of edges.)

If precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph.

This is a linear order consistent with the partial order of the graph.

For example, a serializability order for Schedule A would beT5 T1 T3 T2 T4

Are there others?

Test for View SerializabilityThe precedence graph test for conflict serializability cannot be used directly to test for view serializability.

Extension to test for view serializability has cost exponential in the size of the precedence graph.

The problem of checking if a schedule is view serializable falls in the class of NP-complete problems.

Thus existence of an efficient algorithm is extremelyunlikely.

However practical algorithms that just check some sufficient conditions for view serializability can still be used.

Recoverable Schedules

Recoverable schedule — if a transaction Tj reads a data item previously written by a transaction Ti , then the commit operation of Ti appears before the commit operation of Tj.

The following schedule (Schedule 11) is not recoverable if T9

commits immediately after the read

If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence, database must ensure that schedules are recoverable.

Need to address the effect of transaction failures on concurrently

running transactions.

Cascading Rollbacks

Cascading rollback – a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back.

Can lead to the undoing of a significant amount of work

Cascadeless Schedules

Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj.

Every cascadeless schedule is also recoverable

It is desirable to restrict the schedules to those that are cascadeless

CONCURRENCY CONTROLWhy Concurrency Control?

Several problems can occur when concurrent transactions execute in anuncontrolled manner. Here, some of the problems are discussed.

Consider an airline reservation database in which a record is stored foreach airline flight. Each record includes the no. of reserved seats on thatflight as a named data item, among other information. A transaction T1that transfers N reservations form one flight whose number of reservedseats is stored in the database item named X to another flight whosenumber of reserved seats is stored in the database item named Y. T2represents another transaction that just reserves M seats on the 1st flight(X) referenced in transactionT1.

T1 T2

Read-item (X); Read-item (X);

X: = X – N; X: = X + M;

Write-item (X); Write-item (X);

Read-item (Y);

Y: = Y+N;

Write-item (Y)

We now discuss the types of problems we may encounter with these two transactions if they

run concurrently.

Fig.1

The Lost Update Problem

This problem occurs when two transactions that access thesame database items have their operations interleaved in a waythat makes the values of some database item incorrect.

Suppose that transactions T1 and T2 are submitted atapproximately the same time, and suppose that their operationsare interrelated as shown in Figure 2;

Then the final value of items X is incorrect, because T2 reads thevalue of X before T1 changes it in the database and hence theupdated value resulting from T1 is lost.

For example, if X = 80 at the start (originally there were 80reservations on the flight), N = 5 (T1 transfers 5 seatreservations from the flight corresponding to X to the flightcorresponding to Y), and M = 4 (T2 reserves 4 seats on X), thefinal result should be X = 79;

But in the interleaving of operations shown in Fig.3 it is X = 84because the update in T1 that removed the five seats from Xwas lost.

T1 T2

Read-item (X);

X: = X – N; read-item (X);

X: = X + M;

Write-item (X);

Time Read-item (Y);

Write-item (X);

Y: = Y+N;

Write-item (Y)

Figure 2

Item X has an

incorrect value

because its

update by is

“lost”

(overwritten)

Transaction T1

fails and must

change the value

of X back to its

old value.

Meanwhile T2 has

read the incorrect

value of X.

T1 T2

Read-item (X);

X: = X – N; Read-item (X);

Time Write-item (X); X: = X + M;

Write-item (X);

Read-item (Y);

Figure 3

The Temporary Update (or Dirty Read) Problem

This problem occurs when one transaction updates a database item

and then the transaction fills for some reason.

The updated item is accessed by another transaction before it is

changed back to its original value. Fig. 3 shows an example where T1

updates item X and then fails before completion, so the system must

change X back to its original value.

Before it can do so, transaction T2 reads the “temporary” value of X,

which will not be recorded permanently in the database because of the

failure of T1.

The value of item X that is read by T2 is called dirty data, because it

has been created by a transaction that has not been completed and

committed yet;

Hence, this problem is also known as the dirty read problem.

The Incorrect Summary Problem

If one transaction is calculating an aggregate summary

function on a number of records while other transactions are

updating some of these records, the aggregate function may

calculate some values before they are updated and others after

they are updated.

For e.g., suppose that transaction T3 is calculating the total

no. of reservations on all the flights; meanwhile, transaction

T1 is executing.

If interleaving of operations shown in Fig. 4 occurs, the

result of T3 will be off by an amount N because T3 reads the

value of X after N seats have been subtracted from it but

reads the value of Y before those N seats have been added to

it.

T3 reads X after N is

subtracted and reads Y

before N is added; A

wrong summary is the

result (off by N).

T1 T2

Sum: = 0;

Read-item (A);

Sum: = Sum + A;

Read-item (X);

X: = X – N;

Write-item (X);

Read-item (X);

Sum: = Sum+X;

Read-item (Y);

Sum: = Sum+Y;

Read-item (Y);

Y: = Y+N;

Write-item (Y)

Figure 4

Concurrency ControlA database must provide a mechanism that will ensure that all possible schedules are

either conflict or view serializable, and

are recoverable and preferably cascadeless

A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency

Are serial schedules recoverable/cascadeless?

Testing a schedule for serializability after it has executed is a little too late!

Goal – to develop concurrency control protocols that will assure serializability.

Concurrency Control vs. Serializability Tests

Concurrency-control protocols allow concurrent schedules, but ensure that the schedules are conflict/view serializable, and are recoverable and cascadeless.

Concurrency control protocols generally do not examine the precedence graph as it is being created

Instead a protocol imposes a discipline that avoids nonseralizable schedules.

We study such protocols in Chapter 16.

Different concurrency control protocols provide different tradeoffs between the amount of concurrency they allow and the amount of overhead that they incur.

Tests for serializability help us understand why a concurrency control protocol is correct.

Weak Levels of ConsistencySome applications are willing to live with weak levels of consistency, allowing schedules that are not serializable

E.g. a read-only transaction that wants to get an approximate total balance of all accounts

E.g. database statistics computed for query optimization can be approximate (why?)

Such transactions need not be serializable with respect to other transactions

Tradeoff accuracy for performance

Levels of Consistency in SQL-92

Serializable — default

Repeatable read — only committed records to be read, repeated reads of same record must return same value. However, a transaction may not be serializable – it may find some records inserted by a transaction but not find others.

Read committed — only committed records can be read, but successive reads of record may return different (but committed) values.

Read uncommitted — even uncommitted records may be read.

Lower degrees of consistency useful for gathering approximateinformation about the database

LOCKINGLocking Techniques for

Concurrency ControlSome of the main techniques used to controlconcurrent execution of transactions are basedon the concept of locking data items.

A lock is a variable associated with a date itemthat describes the status of the item with respectto possible operations that can be applied to it.

Generally, there is one lock for each data item inthe database.

Locks are used as a means of synchronizing theaccess by concurrent transactions to thedatabase items.

Lock Terminology Implicit locks are locks placed by the DBMS

Explicit locks are issued by the application program

Lock granularity refers to size of a locked resourceRows, page, table, and database level

Large granularity is easy to manage but frequently causes conflicts

Types of lockAn exclusive lock prohibits other users from reading the locked resource

A shared lock allows other users to read the locked resource, but they cannot update it

Types of Locks and System Lock Tables

Binary Locks

A binary lock can have two states or values:

locked and unlocked (or 1 and 0 for simplicity).

A distinct lock is associated with each databaseitem X. If the value of the lock on X is 1, item Xcannot be accessed by the database operationthat requests the item.

If the value of the lock on X is 0, the item can beaccessed when requested.

We refer to the current value of the lockassociated with item X as Lock (X).

Two operations, lock-item and unlock-items, are used with binary locking.

A transaction requests access to an item X by issuing a lock-item (X) operation.

If Lock (X) = 1, the transaction is forced to wait. If Lock (X) = 0, it is set to 1 (the transaction locks the item) and the transaction is allowed to access item X.

When the transaction is through using the item, it issues an unlock-item (X) operation, which sets locks (X) to 0 (unlocks the item) so that X may be accessed by other transactions.

Hence, a binary lock enforces Mutual exclusion on the data item.

A description of the lock-item (X) and unlock-item (X) operations is shown below.

Lock-item (X):

B: If Lock (X) = 0 (* item is unlocked *)

Then

Lock (X) 1 (* lock the item *)

Else begin

Wait (until lock (X) = 0

And the lock manger wakes up the transaction);

Go to B

End;

Unlock-item (X):

Lock (X) 0; (* unlock the item *)

If any transactions are waiting then wakeup one of the waitingtransactions.

If the simple binary locking scheme described here is used, every transaction

must obey the following rules:

1) 1. A transaction T must issue the operation lock-item (X) before any read-item (X) or write –item(X) operations are performed in it.

2) 2. A transaction T must issue the operation unlock-item (X) after all read-item (X) and write-item (X)operations are completed in T.

3) 3. A transaction T will not issue a lock-item (X) operation if it already holds the lock on item X.

4) 4. A transaction T will no issue an unlock-item (x)operation unless it already holds the lock on item X.

Shared/Exclusive (or Read/Write) Locks The binary locking scheme is too restrictive for database items, because of most one transaction can hold a lock on a given item.

We should allow several transactions to access the same item X if they all access X for reading purposes only.

However, if a transaction is to write an item X, it must have exclusive access to X.

For this purpose, a different type of lock called a multi-mode lock is used.

In this scheme – called shared/exclusive or read/write locks – there are three locking operations:

read-lock (X),

write-lock (X), and

unlock (X).

A lock associated with an item X, lock (X), now has 3 possible states: ―read-locked‖, ―write-locked‖ or ―unlocked‖.

A real-locked item is also called shared-locked, because other transactions are allowed to read the item, whereas a write-locked item is called exclusive-locked, because a single transaction exclusively holds the lock on the item.

When we use the share/exclusive locking scheme, the system must enforce the

following rules.1. A transaction T must issue the operation read-lock (X) or write-lock (X) before any read-item (X) operation is performed in T.

2. A transaction T must issue the operation write-lock (X) beforeany write-item (X) operation is performed in T.

3. A transaction t must issue the operation unlock (X) after all read-item (X) and write-item (X) operations are complete in T.

4. A transaction T will not issue a read-lock (X) operation if italready holds a read (shared) lock or a write (exclusive) lock on itemX.

5. A transaction T will not issue a write-lock operation if it alreadyholds a read (shared) lock or write (exclusive) lock on item X.

6. A transaction T will not issue an unlock (X) operationunless it already holds a read (shared) lock or a write(exclusive) lock on item X.

Read-lock (X):

B: If lock (x) = “unlocked”

Then

begin Lock (X) “read-locked”;

No. of reads (X) 1

End

Else if Lock (X) = read-locked”

Then no. of reads (X) no. of reads (x) + 1;

Else begin

Wait (until Lock (X) = “unlocked” and

The lock manager wakes up the transaction);

go to B;

End;

Write-Lock (X):

B: If Lock (X) = “unlocked”

Then Lock (X) “write – locked”

Else begin

Wait (until lock (X) = “unlocked” and the lock manger wakes up the transaction);

go to B

End;

Unlock (X):

If Lock (X) = “write-locked”

Then begin

Lock (X) “unlocked”;

Wakeup one of the waiting transactions, if any

End;

Else if Lock (X) = “read-locked”

Then begin

No_of_reads (X) no_of_reads (X) –1;

If no_of_reads (X) = 0

Then begin Lock (X) = “unlocked”;

Wakeup one of the waiting transaction,

If any;

End;

End;

Concurrent Processingwith Explicit Locks

Two-Phase Locking

A transaction is said to follow the two-phase lockingprotocol if all locking operations (read-lock, write-lock) precede the first unlock operation in thetransaction.

Such a transaction can be divided into two phases:

an expanding or growing (first phase), during which newlocks on items can be acquired but none can be released;

and a shrinking (second phase), during which existing lockscan be released but no new locks can be acquired.

Two-phased LockingTwo-phased locking

Transactions are allowed to obtain locks as necessary (growing phase)

Once the first lock is released (shrinking phase), no other lock can be obtained

A special case of two-phased lockingLocks are obtained throughout the transaction

No lock is released until the COMMIT or ROLLBACK command is issued

This strategy is more restrictive but easier to implement than two-phased locking

The two transactions T1 and T2 do not follow the two-phase locking

protocol.

T1 T2

Read-lock (Y); Read-lock (X);

Read-item (Y); Read-item (X);

Unlock (X); Unlock (Y);

Write-lock (X); Write-lock (Y);

Read-item (X); R read-item (Y);

X: = X + Y; Y: Y + 1;

Write-item (X); Write-item (Y);

Unlock (X) Unlock (Y);

This is because the write-lock (X) operation follows the unlock (Y)

operation in T1, and similarly the write-lock (Y) operation follows the

unlock (X) operation in T2.

If we enforce two-phase locking, the transactions can be rewritten as T1, and T2,

as shown below:

T1 T2

Read-lock (Y); Read-lock (X);

Read-item (Y); Read-item (X);


Unlock (Y); Unlock (X);


Read-item (X); Read-item (Y);

X: = X + Y; Y: Y + 1;

Write-item (X); Write-item (Y);

Unlock (X) unlock (Y);

It can be proved that, if every transaction in a schedule follows the two-phase

locking protocol, the schedule is guaranteed to be serializable, obviating the need

to test for serializability of schedules any more. The locking mechanism by

enforcing two-phase locking also enforces serializability.

•Two phase locking may limit the amount of concurrency that can

occur in a schedule.

•This is because a transaction T may not be able to release an item

X after it is through using it.

•If T must lock an additional item Y later on; or conversely, T must

have the additional item Y before it needs it so that it can release X.

•Hence, X must remain locked by T until all items that the

transaction needs to read or write have been locked; only then can X

be released by T.

•Meanwhile, another transaction seeking to access X may be forced

to wait, even though T is done with X; conversely, if Y is locked

earlier than it is needed, another transaction seeking to access Y is

forced to wait even though T is not using Y yet.

•This is the price for guaranteeing serializability of all schedules

without having to check the schedules themselves.

Basic, Conservative, Strict, and Rigorous Two-Phase Locking:

The technique described above is known as basic 2PL. The

variation known as conservative 2PL (or Static 2PL) requires a

transaction to lock all the items it accesses before the transaction

begins execution, by predefining its read-set and write-set.

If any of the pre-declared items needed cannot be locked, the

transaction does not lock any item; instead, it waits until all the

items are available for locking.

Conservative 2PL is a deadlock free protocol. However it is

difficult to use in practice because of the need to predeclare the

read-set and write-set, which is not possible in most situations.

In practice, the most popular variation of 2PL is strict 2PL,

which guarantees strict schedule.

In this variation, a transaction T does not release any of its

exclusive (write) locks until after it commits or aborts.

Deadlock Deadlock, or the deadly embrace, occurs when two transactions are each waiting on a resource that the other transaction holds

Preventing deadlockAllow users to issue all lock requests at one time

Require all application programs to lock resources in the same order

Breaking deadlockAlmost every DBMS has algorithms for detecting deadlock

When deadlock occurs, DBMS aborts one of the transactions and rollbacks partially completed work

Deadlock

Deadlock Detection

Deadlocks are caused by cyclic lock waits

(e.g., in conjunction with lock conversions).

t2

w2(y) w2(x)

t1

r1(x) w1(y)Example:

Deadlock detection:

(i) Maintain dynamic waits-for graph (WFG) with

active transactions as nodes and

an edge from ti to tj if tj waits for a lock held by ti.

(ii) Test WFG for cycles

• continuously (i.e., upon each lock wait) or

• periodically.

Deadlock Resolution

Choose a transaction on a WFG cycle as a deadlock victimand abort this transaction,and repeat until no more cycles.

Possible victim selection strategies:1. Last blocked2. Random3. Youngest4. Minimum locks5. Minimum work6. Most cycles7. Most edges

Illustration of Victim Selection Strategies

Most-cycles strategy would select t1 (or t3) to break all 5 cycles.

t1 t2 t3

t6 t5 t4

t7

t8

t10

t9

Example WFG:

Example WFG:

t4t3 t6t5

t1t2

Most-edges strategy would select t1 to remove 4 edges.

Deadlock Prevention

Restrict lock waits to ensure acyclic WFG at all times.

Reasonable deadlock prevention strategies:1. Wait-die:

upon ti blocked by tj: if ti started before tj then wait else abort ti

2. Wound-wait:upon ti blocked by tj:

if ti started before tj then abort tj else wait3. Immediate restart:

upon ti blocked by tj: abort ti

4. Running priority:upon ti blocked by tj:

if tj is itself blocked then abort tj else wait5. Timeout:

abort waiting transaction when a timer expiresAbort entails later restart.

Optimistic versus PessimisticLocking

Optimistic locking assumes that no transaction conflict will occur:

DBMS processes a transaction; checks whether conflict occurred:

If not, the transaction is finished

If so, the transaction is repeated until there is no conflict

Pessimistic locking assumes that conflict will occur:

Locks are issued before a transaction is processed, and then the locks are released

Optimistic locking is preferred for the Internet and for many intranet applications

Declaring Lock CharacteristicsMost application programs do not explicitly declare locks due to its complication

Instead, they mark transaction boundaries and declare locking behavior they want the DBMS to use

Transaction boundary markers: BEGIN, COMMIT, and ROLLBACK TRANSACTION

Advantage

If the locking behavior needs to be changed, only the lock declaration need be changed, not the application program

Transaction Definition in SQL

Data manipulation language must include a construct for specifying the set of actions that comprise a transaction.

In SQL, a transaction begins implicitly.

A transaction in SQL ends by:

Commit work commits current transaction and begins a new one.

Rollback work causes current transaction to abort.

Levels of consistency specified by SQL-92:

Serializable — default

Repeatable read

Read committed

Read uncommitted

Schedule 7

Precedence Graph for (a) Schedule 1 and (b) Schedule 2

Illustration of Topological Sorting

Precedence Graph

Implementation of IsolationSchedules must be conflict or view serializable, and recoverable, for the sake of database consistency, and preferably cascadeless.

A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency.

Concurrency-control schemes tradeoff between the amount of concurrency they allow and the amount of overhead that they incur.

Some schemes allow only conflict-serializable schedules to be generated, while others allow view-serializable schedules that are not conflict-serializable.

Figure 15.6

LOG-BASED RECOVERY

Two approach to ensure atomicity of a transaction: Log-based, shadow paging.

Log-based recovery: The system maintains a database log. Each log record may contain the following possible values:

1. transaction name: each transaction must have a unique identifier

2. data item name: unique name of data item that is written

3. old value: the value of data item prior to the write operation

4. new value: the value that data item will have after the write operation

The various log records are:

<Ti, start>: Transaction Ti has started

<Ti, X, V1, V2>: Transaction Ti has written data item X. The original value of X was V1, its new value is V2.

<Ti, commit>: Transaction Ti has committed

LOG-BASED RECOVERY (Cont…)

Before a transaction modifies the actual data page, it must generate a log record for its proposed change and write this record to the log file.

T1 Log records

<T1, start>

read(A) <T1, A, 1000, 950>

A=A-50

write(A)

read(B) <T1, B, 10, 60>

B=B+50

write(B)

<T1, commit>

LOG-BASED RECOVERY (Cont…)There are two alternative approaches to logging: deferred database modification, and immediate database modification.

Logging with deferred database modification

1. Record all database modifications performed by a transaction Ti in a log file, no physical updates are performed on the data items.

2. When Ti commits:

a) generate a commit log record for Ti to the log file

b) write the log file to the disk drive

c) execute the updates proposed in the log file using redo(Ti)

redo(Ti) sets the value of all data item updates by transaction Ti to their new value.

undo(Ti) sets the value of all data item updates by transaction Ti to their old value.

In the presence of failures, the system invokes redo(Ti) for each transaction that has a commit log record in the log file.

Both redo and undo operations are idempotent, repeating them several times has no side effect.

DEFERRED LOG-BASED RECOVERY (Cont…)

A=1000

B=10

B=10

(1)

Read(A)

(2)

A=A-50

A=1000

B=10

A=1000

B=10

(3)

Write(A)

(4)

Read(B)

(5)

B=B+50

(6)

Write(B)

A=1000

<T1, start>

<T1,A,1000,950>

A=950

<T1, start>

<T1, start>

<T1,A,1000,950>

A=950

B=50

<T1, start>

<T1,A,1000,950>

<T1, B, 10, 60> A=950

A=950

B=60

(7)

Commit

<T1, start>

<T1,A,1000,950>

<T1, B, 10, 60>

<T1, commit>

Redo

Deferred Database Modification (Cont.)

During recovery after a crash, a transaction needs to be redone if and only if both <Ti start> and<Ti commit> are there in the log.

Redoing a transaction Ti ( redoTi) sets the value of all data items updated by the transaction to the new values.

Crashes can occur while

the transaction is executing the original updates, or

while recovery action is being taken

example transactions T0 and T1 (T0 executes before T1):

Default values: A=1000 B=2000C=700

T0: read (A) T1 : read (C)

A: - A - 50 C:-C- 100

Write (A) write (C)

read (B)

B:- B + 50

write (B)

Deferred Database Modification (Cont.)

Below we show the log as it appears at three instances of time.

If log on stable storage at time of crash is as in case:(a) No redo actions need to be taken(b) redo(T0) must be performed since <T0 commit> is present (c) redo(T0) must be performed followed by redo(T1) since

<T0 commit> and <Ti commit> are present

LOG-BASED RECOVERY (Cont…)Logging with immediate database modification

1. All the write operations on the data items are performed immediately

2. A transaction generates log records as it updates records

3. The buffer pool maintains the dependency between the log records and the buffer pool frames containing the updated data item

(a) Before a buffer pool frame is written to the disk, its corresponding log record must be written to the disk.

(b) A transaction does not commit until its commit log record is written to the disk.

In the presence of failures, with immediate modification, the system must analyze the log record and perform Redo/Undo operations.

With deferred database modification, the system eliminates Undo operations during the recovery period. However, when processing a transaction, the system must still perform Redo operations.

Immediate Database Modification Example

Log Write Output

<T0 start>

<T0, A, 1000, 950><To, B, 2000, 2050>

A = 950B = 2050

<T0 commit><T1 start><T1, C, 700, 600>

C = 600BB, BC

<T1 commit>BA

Note: BX denotes block containing X.

Immediate DB Modification Recovery Example

Below we show the log as it appears at three instances of time.

Recovery actions in each case above are:(a) undo (T0): B is restored to 2000 and A to 1000.

(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are

set to 950 and 2050 respectively.(c) redo (T0) and redo (T1): A and B are set to 950 and 2050

respectively. Then C is set to 600

CHECKPOINTING

Motivation: In the presence of failures, the system consults with the log file to determine which transaction should be redone and which should be undone. There are two major difficulties:

1) the search process is time consuming

2) most transactions are okay as their updates have made it to the database (the system performs wasteful work by searching through and redoing these transactions).

Approach: perform a checkpoint that requires the following operations:

1. output all log records from main memory to the disk

2. output all modified (dirty) pages in the buffer pool to the disk

3. output a log record <checkpoint> onto the log file on disk

T1 can be ignored (updates already output to disk due to checkpoint)

T2 and T3 redone.

T4 undone

For each transaction Tkafter and during the checkpoint, if <Tk, commit> log record appears in the log file then execute Redo(Tk)

For each transaction Tk after the checkpoint, if no<Tk, commit> log record appears in the log file then execute Undo(Tk)

TcTf

T1

T2

T3

T4

checkpoint system failure

At recovery time:Example of Checkpoints

SHADOW PAGING

Shadow paging: takes advantage of the structure of the file system. It generates no log records and is different than the log-based recovery techniques. Assume the file system is organized as a set of pages and a page table that keeps track of the pages in the file system.

The system maintains two page tables during the life of a transaction. The current page table and the shadow page table. When transaction Ti

starts, both page tables are identical. The shadow page table never changes. The current page table may change when a transaction performs a write operation. All input/output operations use the current page table to locate the database pages on a disk.

… …

…tPA

PA

Current page table Shadow page table

An example of shadow paging.

SHADOW PAGING (Cont…)

When transaction Ti issues a write(A) command, the write operation is executed as follows (assuming data item A resides on page PA):

1. If page PA is not already in main-memory then issue input(PA)

2. If this is the first write operation on page PA by transaction Ti

then:

(a) allocate a new disk page (call it tPA)

(b) copy PA into tPA

(c) modify the current page table so that the entry corresponding to PA now points to tPA

3. perform the update on the page pointed to by tPA


When transaction Ti commits:

1. Ensure all buffer pages in memory that have been modified are flushed to the disk.

2. output the current page table to disk. Do not overwrite the shadow page table because it might be needed for recovery from a crash.

3. Output the disk address of the current page table to the fixed location in disk containing the address of the shadow page table. Thus, the current page table has become the shadow page table, and the transaction is committed.

Other details:

1. free the pages of the old shadow page table that are no longer necessary. Requires reading the old shadow page table and free its pages. What happens when there is a crash?


Shadow paging suffers from following limitations:

1. data fragmentation: the pages of a file are dispersed across the surface of a disk

2. garbage collection: pages that are no longer accessible should be gathered as they are garbage now

3. What happens to auxiliary index structures?

4. Can multiple transactions update a file simultaneously? If so, what happens if two transactions try to update two different records that reside on the same disk page.

End of the Unit-III

TRANSACTION PROCESSING - RBVRR Womens College

Documents