HANDLING FAILURES
Jan 02, 2016
HANDLING FAILURES
Warning
This is a first draft I welcome your corrections
One common objective
Maintaining database in a consistent state
Means here maintaining the integrity of the data After a money transfer between two accounts,
the amount debited from the fist account should be equal to the amount credited to the second account
Assuming no-fee transfer
Two different problems
Handling outcomes of system failures:Server crashes, power failures, …
Preventing inconsistencies resulting from concurrent queries/updates that interfere with each otherNext chapter
Failure modes
Erroneous data entry:Will impose constraints
Required range of values, … 10-digit phone numbers, ...
Will add triggers Programs that execute when some
condition occursEven more controls
Failure modes
Media failures:Disk failures
Complete Irrecoverable read errors
Recovery Use disk array redundancy (RAID) Maintain an archive of DB Replicate DB
Failure modes
Catastrophic failureEverything is lostRecovery
Archive (if stored at another place) Distributed replication ...
Failure modes
System failuresPower failuresSoftware errors
Could stop the system in the middle of a transaction
Need a recovery mechanism
Transactions (I)
Any process that query and modify the database
Typically consist of multiple steps
Several of these steps may modify the DB
Main problem is partial execution of a transaction:
Money was taken from account A but not credited to account B
Transactions (II)
Running the transaction again will rarely solve the problemWould take the money from account A a
second time
We need a mechanism allowing us to undo the effects of partially executed transactionsRoll back to safe previous state
General organization
Uses a log Transaction manager interacts with
Query processorLog managerBuffer manager
Recovery manager will interact with buffer manager
Involved entities
The "elements" of the database:Tables?Tuples?
Best choice are disk blocks/pages.
Correctness principle
If a transactionexecutes in the absence of any other
transactions or system errors,starts with the DB in a consistent state,
it will then leave the DB in a consistent state.
We do not question the wisdomof authorized transactions
The converse
Transactions are atomic:Either executed as a whole or not at all
Partial executions are likely to leave the DB in an inconsistent state
Transactions that execute simultaneously are likely to leave the DB in an inconsistent state
Unless we take some precautions
Primitive operations (I)
INPUT(X)Read block containing data base element X and store it
in a memory buffer
READ(X,t)Copy value of element X to local variable t May require an implicit INPUT(X)
Primitive operations (II)
WRITE (X,t)Copy value of local variable t to element XMay require an implicit INPUT(X)
OUTPUT(X)Flush to disk the block containing X
Example
Transaction T doubles the values of elements A and B:A= A*2;
B = B*2 Integrity constraint A = B Start with A = B = 8
Steps
READ(A,t)t = t*2;WRITE(A, t)OUTPUT(A);READ(B,t)t = t*2;WRITE(B, t)OUTPUT(B);
Undo logging
Undo logging
Idea is to undo transactions that did not completeWill keep on a log the previous values of all
data blocks that are modified by the transaction
Will also note on log whether the transaction completed successfully (COMMIT) failed (ABORT)
The log
Log records include<Start T><Commit T>
Notes that T completed successfullyAbort<T>
Transaction failed, we need to undo all possible changes it made to the DB
The undo log
Also includes<T, X, v>
Transaction T changed DB element X and its former value is v
An undo log Will contain several interleaved transaction
Start T1
T1A, 50
Start T2
T1B, 30
T2C, "i"
T1D, 30
Start T3
T3E, "x"
Commit T1
T2F, 0
T3G, "z"
Commit T2
…
Undo logging rules
If a transaction T modifies DB element X, the log record <T, X, v> must be written to disk before the new value of is written to disk
If T commits, its <COMMIT> record cannot be written to disk until after all database elements changed by T have been written to diskAnd not much later than that!
Example
<START T>READ (A,t);t = t*2;WRITE(A, t) preceded by <T, A, 8>READ (B,t);t = t*2;WRITE(B, t) preceded by <T, B, 8>FLUSH LOG;OUTPUT(A);OUTPUT(B) followed by <COMMIT>
original value
original value
Another example (I)
Transferring cash from account A to account BStart with
A = $1200 B = $100
Want to transfer $500
Another example
<START T>READ (A,t);t = t - $500 ;WRITE(A, t) preceded by <T, A, 1200>READ (B,t);t = t + 500;WRITE(B, t) preceded by <T, B, 100>FLUSH LOG;OUTPUT(A);OUTPUT(B) followed by <COMMIT>
original value
original value
Important
You cannot commit the transaction until all physical writes to disk have successfully completed
Recovery using undo logging
Look at translation records on the logDo they end with a <COMMIT>
If translation is committedDo nothing
elseRestore the initial state of the DB
Why?
Since the transaction <COMMIT> marks the completion of all any physical writes to the diskWe can safely ignore all committed
transactions because they have safely completed
We must undo all other transactions because they could have left the DB in an inconsistent state
Recovering from an undo log
Transactions T1 and T2 have completedNothing to do
Transaction T3 never completedOne action to undo
Reset entity E to previous value "x"
Start T1
T1A, 50
Start T2
T1B, 30
T2C, "i"
T1D, 30
Start T3
T3E, "x"
Commit T1
T2F, 0
T3G, "z"
Commit T2
Checkpointing (I)
Quiescent checkpointsWait until all current transactions have
committed then write <CHECKPOINT>Very simple but slows down the DB while the
checkpoint waits for all transactions to complete
A quiescent checkpoint
Can safely ignore the part of the logbefore the checkpoint
Must look for uncommitted transactions
Checkpointing (II)
Non-Quiescent CheckpointsTwo steps
Start checkpoint noting all transactions that did not yet complete<START CHECKPOINT(T1, T2, ...)>
Wait until all these transactions have committed then write <END CHECKPOINT(T1, T2, ...)>
Does not slow down the DB
A non-quiescent checkpoint
Can safely ignore this part of the log
Must look for uncommitted transactions
Must look for uncommitted transactions
STARTCHECKPOINT
ENDCHECKPOINT
Another non-quiescent checkpoint
Cannot ignore this part of the log but can restrict search totransactions(T1, T2, …, Tn)
Must look for uncommitted transactions
START CHECKPOINT (T1, T2, …, Tn)
Purging the log
Can remove all log entries pertaining to transactions that started beforeA quiescent checkpointThe start of a non quiescent checkpoint after
that checkpoint ended
Redo logging
Redo logging
Idea is to redo transactions that did complete and not let other transactions modify in any way the DBWill keep on a log the new values of all data
blocks that that the transaction plans to modify
Will also note on log whether the transaction completed successfully (COMMIT) failed (ABORT)
The redo log
Log records include<Start T><Commit T> <Abort<T><T, X, w>
Transaction T changed DB element X and its new value is w.
A redo log Will contain several interleaved transaction
Start T1
T1A, 80
Start T2
T1B, 20
T2C, "i"
T1D, 40
Start T3
T3E, "x"
Commit T1
T2F, 0
T3G, "z"
Commit T2
…
Redo logging rules
If a transaction T modifies DB element X, the log record <T, X, v> must be written to disk before the transaction commits
If T commits, its <COMMIT> record must be written to disk before any database element changed by T can be written to disk
Example
<START T>READ (A,t);t = t - $500 ;WRITE(A, t) preceded by <T, A, 700>READ (B,t);t = t + 500;WRITE(B, t) preceded by <T, B, 600>FLUSH LOG;<COMMIT>OUTPUT(A);OUTPUT(B);
new value
new value
must be written to logbefore any OUTPUT
Recovery using redo logging
Look at translation records on the logDo they end with a <COMMIT>
If translation is committedReplay the transaction from the log
elseDo nothing
Just the opposite of what undo logging does!
Why?
Since the transaction <COMMIT> now precedes any physical writes to the diskWe must replay all committed transactions
because we do not know if the physical writes were actually completed before the crash.
We can ignore non-committed transactions because they did not modify the data on disk
Recovering from a redo log
Transactions T1 and T2 have completedMust replay them
Transaction T3 never completedDid not modify the DBCan ignore it
Start T1
T1A, 60
Start T2
T1B, 50
T2C, "i"
T1D, 5
Start T3
T3E, "z"
Commit T1
T2F, 6
T3G, "u"
Commit T2
Important
You must flush all the buffer pages that were modified by the transactions that have already committed
And no other! If you flush any buffer page that was modified
by a transaction that did not yet commit, you will be in big trouble if the transaction aborts
A non-quiescent checkpoint
Can safely ignore this part of the log
Big flush
STARTCHECKPOINT
ENDCHECKPOINT
Can now ignore alltransactions that completed before the start of the checkpoint
Recovering after a check point
Roll back to most recent complete checkpoint Replay all committed transactions that
Are in the list of in progress transactions at the start of the checkpoint
Started after the start of the checkpoint
Can ignore all other transactions
A new problem
What if the same block of the DB is modifiedBy a transaction that has already committed,By another transaction that has not yet
committed?
Should we flush the block or not?No good answer
A comparison
Undo logging Keeps track of the old
values of all DB entities Transactions commit
after all new values have been written to disk
Recovery means undoing all transactions that did not commit
Redo logging Keeps track of the new
values of all DB entities Transactions commit
before any new value is written to disk
Recovery means redoing all transactions that committed
Something worthremembering
Undo/redo logging
Indo/redo logging
Idea is to redo transactions that did complete and undo all othersWill keep on a log the both the old and new
values of all data blocks that that the transaction modifies
Will also note on log whether the transaction completed successfully (COMMIT) failed (ABORT)
The undo/redo log
Log records include<Start T><Commit T> <Abort<T><T, X, v, w>
Transaction T changed DB element X replacing its old value v by thenew value w.
Redo logging rules
If a transaction T modifies DB element X, the log record <T, X, v, w> must be written to disk before the transaction commits
If T commits, its <COMMIT> record can be written to disk before or after any database element changed by T are written to disk
Example
<START T>READ (A,t);t = t - $500 ;WRITE(A, t) preceded by <T, A, 1200, 700>READ (B,t);t = t + 500;WRITE(B, t) preceded by <T, B, 100, 600>FLUSH LOG;<COMMIT>OUTPUT(A);OUTPUT(B);
old and new
old and new
in no particular order
Recovery using undo/redo logging
Look at translation records on the logDo they end with a <COMMIT>
If translation is committedReplay the transaction from the log
elseUndo the incomplete/aborted transaction
Recovering from a redo log
Transactions T1 and T2 have completedMust replay them using the new values
Transaction T3 never completedMust undo it using the old saved values
Start T1
T1A
6050
Start T2
T1B2030
T2C
"x" "y"
T1D56
Start T3
T3E
"z" "a"
Commit T1
T2F 65
T3G
"u" "v"
Commit T2
Checkpointing
Non-Quiescent Checkpoints Start checkpoint noting all transactions that
have not yet committed<START CHECKPOINT(T1, T2, ...)>
Flush the log Flush all the modified buffer pages Write
<END CHECKPOINT(T1, T2, ...)>
Why?
We do not distinguish here betweenBlocks that were updated by transactions that
are already committedBlocks that were updated by transactions that
have not yet committed (and could never reach that stage)
We now have enough data on the log to undo them if needed`
Recovering after a check point
Roll back to most recent complete checkpoint Look at all transactions that
Are in the list of in progress transactions at the start of the checkpoint
Started after the start of the checkpoint Replay them if they committed Undo them otherwise
A summary
Undo/redo logging Keeps track of both the old and the new values
of all DB entities Transactions commit either before or after new
values have been written to disk Recovery means
undoing all transactions that did not commitredoing those that committed
Something worthremembering
Handling mediafailures
Old school approach
Make frequent backups (Archiving) Backups can be
Complete/incremental Example
Do a full backup every weekend Incremental backups every weekday
Contain the files/DBs that were updated on that day
Criticism
As we store more and more data on larger and larger disks, the time needed to make these backups become prohibitive
Better solution is to use a redundant disk array architecture that reduces to a minimum the risk of data lossRAID level 6, triple parity, …