Top Banner
1 Gray & Reuter: Resource Manager Resource Managers Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq Queues Workflow Log ResMgr CICS & Inet Adv TM Cyberbrick Files &Buffers COM+ Corba Replication Party B-tree Access Path Groupware Benchmark Mon Tue Wed Thur Fri Jim Gray Jim Gray Microsoft, Gray @ Microsoft.com Microsoft, Gray @ Microsoft.com Andreas Reuter Andreas Reuter International University, Andreas.Reuter@i- International University, Andreas.Reuter@i- u.de u.de
53

Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

Mar 27, 2015

Download

Documents

Christian Kirk
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

1Gray & Reuter: Resource Manager

Resource ManagersResource Managers

9:00

11:00

1:30

3:30

7:00

Overview

Faults

Tolerance

T Models

Party

TP mons

Lock Theory

Lock Techniq

Queues

Workflow

Log

ResMgr

CICS & Inet

Adv TM

Cyberbrick

Files &Buffers

COM+

Corba

Replication

Party

B-tree

Access Paths

Groupware

Benchmark

Mon Tue Wed Thur Fri

Jim Gray Jim Gray Microsoft, Gray @ Microsoft.comMicrosoft, Gray @ Microsoft.com

Andreas ReuterAndreas ReuterInternational University, [email protected] University, [email protected]

Page 2: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

2Gray & Reuter: Resource Manager

Whirlwind Tour: The Actors

Resource managers – provide ACID objects (transactional objects)provide ACID objects (transactional objects)– Use log manager to record changesUse log manager to record changes– Use transaction manager to coordinate multi-RM changesUse transaction manager to coordinate multi-RM changes– Use communication manager to make transactional RPCsUse communication manager to make transactional RPCs

Transaction Manager

Log Manager

Log

Objects

Resource Managers

Objects

Resource Managers

Volatile Storage

Durable Storage

Volatile Storage

Durable Storage

Communication Manager

Transaction Manager

Log Manager

Communication Manager

Log

Page 3: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

3Gray & Reuter: Resource Manager

Whirlwind Tour: the Application VerbsTRID Begin_Work(context *); /* begin a transaction */Boolean Commit_Work(context *); /* commit the transaction */void Abort_Work(void); /* rollback to savepoint zero */

savepoint Save_Work(context *); /* establish a savepoint */savepoint Rollback_Work(savepoint); /*return to savept (savept 0 = abort)*/Boolean Prepare_Work(context *); /* put transaction in prepared state */context Read_Context(void); /* return current savepoint context */TRID Chain_Work(context *); /* end current and start next trans */

TRID My_Trid(void); /* return current transaction identifier*/TRID Leave_Transaction(void); /*set process trid null, return current id*/Boolean Resume_Transaction(TRID); /* set process trid to desired trid */

enum tran_status { ACTIVE , PREPARED , ABORTING , COMMITTING , ABORTED , COMMITTED}; tran_status Status_Transaction(TRID); /* transaction identifier status */

Page 4: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

4Gray & Reuter: Resource Manager

Whirlwind Tour Types Of Transaction Executions

Shaded stuff is “undone”Shaded stuff is “undone”

Save Persistent

BeginActionActionSave ActionSaveActionActionActionSaveActionActionCommit

Commit

A Simple Commit

A Simple Abort

BeginActionActionSave ActionSaveActionActionActionSaveActionRollback

ActionActionActionSave Action

A Partial RollbackBeginActionActionSave ActionSaveActionActionActionSaveActionRollback

A Persistent Transaction Surviving A System Restart

BeginActionAction

ActionSave Action

RestartActionSave ActionCommit

Page 5: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

5Gray & Reuter: Resource Manager

Whirlwind Tour: the TRID FlowCall graph: who calls whom.TRIDs flow on all such calls.Application is typically root.RM can be an application (use a transactional RM to store state)

Application

Application Servers

Resource Managers

Resource Managers

Transaction Application Servers

Page 6: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

6Gray & Reuter: Resource Manager

Whirlwind tour Normal (no failure) Transaction Execution

TM generates the TRID at Begin_Work().

Coordinates Commit,

RM joins work, generates log records, allows commit

Transaction Manager

Write Commit Log Record & Force Log

Commit Phase 1? Yes/No

Commit Phase 2 ack

Transaction Callbacks Functions

Work Requests Resource Manager

Normal Funcitons

Lock Requests

Log RecordsWork Requests

Lock Manager

transid

Log Manager

Application

Begin_Work()

Commit_Work()

Join_Work

Page 7: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

7Gray & Reuter: Resource Manager

WW tour: The Resource Manger view

Resource Manager

resource manager's own service interface

rmCall(...)

transaction management

other resource managers

rmCall(...)

TP monitoradministrative functions and callbacks to install, start, and schedule a resource manager

response

invocation

callbacks(depends on application)

Save

Prepare Commit UNDO REDO

Checkpoint

Transaction Manager

functions

callbacks

Identify SaveWork

RollbackWork Join

StatusTransaction Leave

Resume

Page 8: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

8Gray & Reuter: Resource Manager

WW tour: The Resource manager view

Boolean Savepoint(LSN *); /* invoked at tran Save_Work(). Returns RM vote */Boolean Prepare(LSN *); /* invoked at phase_1. Return vote on commit */void Commit(); /* called at commit ¯2 */void Abort(); /* called at failed commit ¯2 or abort */

void UNDO(LSN); /* Undo the log record with this LSN */void REDO(LSN); /* Redo the log record with this LSN */Boolean UNDO_Savepoint(LSN);/* Vote TRUE if can return to savepoint */void REDO_Savepoint(LSN);/* Redo a savepoint. */

void TM_Startup(LSN); /* TM restarting. Passes RM ckpt LSN */LSN Checkpoint(LSN * low_water); /* TM checkpointing, Return RM ckpt LSN,

set low water LSN */Boolean Join_Work(RMID, TRID); /* Become part of a transaction */

Page 9: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

9Gray & Reuter: Resource Manager

WW Tour: The Transaction Manager

Transaction rollback.

coordinates transaction rollback to a savepoint or abort rollbacks can be initiated by any participant.

Resource manager restart.

If an RM fails and restarts, TM presents checkpoint anchor & RM undo/redo log

System restart.

TM drives local RM recovery (like RM restart)

TM resolves any in-doubt distributed transactions

Media recovery.

TM helps RM reconstruct damaged objects by providing

archive copies of object + the log of object since archived.

Node restart. Transaction commit among independent TMs when a TM fails.

Page 10: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

10Gray & Reuter: Resource Manager

WW Tour: When a Transaction Aborts

At transaction rollbackTM drives undo of each RM joined to the transaction

Can be to savepoint 0 (abort) or partial rollback.

Transaction Manager

ReadTransaction's Log Records & Call Undo Write Abort Record in Log

Transaction Callbacks

Work Requests

Normal Funcitons

Lock Requests

Log RecordsWork Requests

Lock Manager

transid

Log Manager

Application

Begin_Work()

Rollback_Work()

Undo (log record)

Aborted(transid)

Join_Work

Resource Manager

Page 11: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

11Gray & Reuter: Resource Manager

WW tour: the Transaction Managerat Restart/Recovery

At restart, TM reading the log drives RM recovery.

Single log scan.

Single resolver of transactions.

Multiple logs possible, but more complex/more work.

Transaction Manager

Find Checkpoint Read log forward Redo each op At end, Undo Soft Savepoints & Transactions

Undo (log record)

Log RecordsLog Manager

Undo (log record)Undo(log record)

Resource Manager

Redo (log record)Redo (log record)Redo (log record)

Redo (log record)Redo (log record)

Redo(log record)

Log Records

Page 12: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

12Gray & Reuter: Resource Manager

End of Whirl-Wind TourEnd of Whirl-Wind Tour

Page 13: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

13Gray & Reuter: Resource Manager

Resource Manager Concepts:Undo Redo Protocol

DO

Old State New State

DO-UNDO- REDO Protocol

log record

New StateOld State

UNDO

log record

Old State

log record

New State

REDO

Page 14: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

14Gray & Reuter: Resource Manager

Resource Manager Concepts: Transaction UNDO Protocol

declare cursor for transaction_log select rmid, lsn /* a cursor on the transaction's log */from log /* it returns the resource manager name */where trid = :trid /* and record id (log sequence number) */descending lsn; /* and returns records in LIFO order */

void transaction_undo(TRID trid) /* Undo the specified transaction. */ { int sqlcode; /* event variables set by sql */

open cursor transaction_log; /* open an sql cursor on the trans log */while (TRUE) /* scan trans log backwards & undo each*/

{ /* fetch the next most recent log rec */fetch transaction_log into :rmid, :lsn; /* */if (sqlcode != 0) break; /* if no more, trans is undone, end loop*/

rmid.undo(lsn); /* tell RM to undo that record */ } /* tell RM to undo that record */ close cursor transaction_log; /* Undo scan is complete, close cursor */ }; /* return to caller */

• If UNDO to savepoint , the UNDO stops at desired savepoint

Page 15: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

15Gray & Reuter: Resource Manager

Resource Manager Concepts: Restart REDO Protocol

Note: REDO forwards, UNDO backwards

void log_redo(void) /* */{declare cursor for the_log /* declare cursor from log start forward */

select rmid, lsn /* gets RM id and log record id (lsn) */from log /* of all log records. */ascending lsn; /* in FIFO order */

open cursor the_log; /* open an sql cursor on the log table */while (TRUE) /* Scan log forward& redo each record. */

{ fetch the_log into :rmid, :lsn; /* fetch the next log record */if (sqlcode != 0) break; /* if no more, then all redone, end loop */

rmid.redo(lsn);} /* tell RM to redo that record */ close cursor the_log; /* Redo scan complete, close cursor */ }; /* return to caller */

Page 16: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

16Gray & Reuter: Resource Manager

Idempotence

F(F(X)) == F(X): Needed in case restart fails (and restarts)

Redo(Redo(old_state,log), log) = Redo(new_state,log) = new_state

Undo(Undo(new_state,log), log) = Undo(old_state,log) = old_state

Old State

New State

log record

log record

undo

redo

Page 17: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

17Gray & Reuter: Resource Manager

Testable State: Can Tell If It Happened.IF operation not idempotent AND state not testable

THEN recovery is impossible

ELSE for F in {UNDO, REDO}:

not testable: WHILE (! ACK) F(F(X))

testable: WHILE ( not desired state) {F(x)}

New State

Old State

testUnknown State

Page 18: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

18Gray & Reuter: Resource Manager

Real Operations: Can Not Be Undone

Defer operations until commit is assured.

Perform as part of Phase 2 of commit

If must undo for some reason,

generate compensation log record

to be processed by some higher authority.

UNDO

REDO

New State

log record

Old State

DO

Old State

log record

Commit

New State

log record

Old State

Old State Old State

log record Compensation log record

Old State

Page 19: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

19Gray & Reuter: Resource Manager

Example: Communications Session RM

Ops are idempotent (sequence numbers)

and testable (sequence numbers)

log cancellation message return to savepoint acknowledge

if not duplicate <normal DO processing> else just acknowledge.

Sender Receiver

DO

UNDO

REDO

COMMIT

log message & seqno send

send cancellation (generates log record)

resend message

send any deferred (real) messages

establish savepoint. log message & seqno acknowledge

Session And Message Recovery Actions

do it

Page 20: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

20Gray & Reuter: Resource Manager

Kinds of LoggingPhysical:

Keep old and new value of container (page, file,...)

Pro: Simple

Allows recovery of physical object (e.g. broken page)

Con: Generates LOTS of log data

Logical:

Keep call params such that you can compute F(x), F-1

(x)

Pro: Sounds simple

Compact log.

Con: Doesn't work (wrong failure model).

Operations do not fail cleanly.

Page 21: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

21Gray & Reuter: Resource Manager

Sample Physical LOG RECORD

Ordinary sequential insert is OK.Update of sorted (B-tree) page:

update LSN

update page space map

update pointer to record

insert record at correct spot (move 1/2 the others)

Essentially writes whole page (old and new).

16KB log records for 100-byte updates.

struct compressed_log_record_for_page_update /* */{ int opcode; /* opcode will say compressed page update*/filename fname; /* name of file that was updated */long pageno; /* page that was updated */long offset; /* offset within page that was updated */long length; /* length of field that was updated */char old_value[length]; /* old value of field */char new_value[length]; /* new value of field */}; /* */

Page 22: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

22Gray & Reuter: Resource Manager

Sample Physical LOG RECORD

Very compact.

Implies page update(s) for record (may be many pages long).

Implies index updates (many be many indices on base table)

struct logical_log_record_for_insert /* */{ int opcode; /* opcode will says insert */filename fname; /* name of file that was updated */long length; /* length of record that was updated */char record[length]; /* value record */}; /* */

Page 23: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

23Gray & Reuter: Resource Manager

The trouble with Logical Logging Logical logging needs to start UNDO/REDO with an action-consistent state.

No half completed operations.

for example: insert (table, record)ALL or NONE of the indices should be updated

when logical UNDO/REDO is invoked.

Problem:

Failure model is Page & Message action consistency

(Lampson /Sturgis model of Chapter 3).

Actions can fail due to:

Logic: e.g. duplicate key.

Limit: ran out of space

Contention: deadlock

Media: broken page or session

System: computer failure/restart

Page 24: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

24Gray & Reuter: Resource Manager

Making Logical Logging Work: Shadows

Keep old copy of each page

Reset page to old copy at abort (no undo log)

Discard old copy at commit.

Handles all online failures due to:

Logic: e.g. duplicate key.

Limit: ran out of space

Contention: deadlock

Problem: forces page locking, only one updater per page.

What about restart?

Need to atomically write out all changed pages.

Page 25: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

25Gray & Reuter: Resource Manager

Making Logical Logging Work: Shadows

Perform same shadow trick at disc level.

Keep shadow copy of old pages.

Write out new pages.

In one careful write, write out new page root.

Makes update atomic

Free Space Bit MapDirectory

Free Space Bit MapDirectory

Data

Old New

A Shadow Update

A B C A BC

Page 26: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

26Gray & Reuter: Resource Manager

ShadowsPro: Simple

Not such a bad deal with non-volatile ram

Con: page locking

extra space

extra overhead (for page maps)

extra IO

declusters sequential data

Page 27: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

27Gray & Reuter: Resource Manager

Compromise Physio-Logical Logging

Physio-Logical LoggingPhysical to a "page" (physical container)Logical within a "page".

Keep old and new value of container (page, file,...)Pro: Simple

Allows recovery of physical object (e.g. broken page)Con: Generates LOTS of log data

Page 28: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

28Gray & Reuter: Resource Manager

Logical vs Physio-logical Logging

Insert record r into table A

Table A

Index B

Index C

insert, A, rLogical log record

Table A

Index B

Index C

insert, A, page 508, r

Physiological log records

insert, B, page 72, s

insert, C, page 94, t

Note: physical log records would be bigger for sorted pages.

Page 29: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

29Gray & Reuter: Resource Manager

Physiological Logging RulesComplex operations are a sequence of simple operations on pages and

messages.

Each operation is constructed as a mini-transaction:lock the object in exclusive modetransform the objectgenerate an UNDO-REDO log recordrecord log LSN in objectunlock the object.

Action Consistent Object:When object semaphore free, no ops in progress.

Log-Consistency: contains log records of all complete page/msg actions.

Page 30: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

30Gray & Reuter: Resource Manager

Physiological Logging RulesOnline Operation - Only Need the Fix Rule

Each operation is structured as a mini-transaction.

Each operation generates an UNDO record.

No page operation fails with the semaphore set.(exception handler must clean up state and UNFIX any pages).

Then Rollback can be physical to a page/session/container and logical within page/session/container.

Page 31: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

31Gray & Reuter: Resource Manager

Physiological Logging RulesRestart Operation - Need WAL and F@C

Need Page-Action consistent disc state.Pages are action consistent.Committed actions can be redone from log.Uncommitted actions can be undone from log.

WAL: Write Ahead Log Write undo/redo log records before overwriting disc pageOnly write action-consistent pages

Force-Log-At-CommitMake transaction log records durable at commit.

Page 32: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

32Gray & Reuter: Resource Manager

Physiological Logging RulesWAL and F@C

WAL: Write Ahead Log write page: get page semaphore copy page give page semaphore /* avoids holding semaphore during IO */ Force_log(Page(LSN)) /*WAL logic, probably already flushed*/ Write copy to disc.

WAL gives idempotence and testability.

Force-Log-At-CommitAt commit phase 1:

Force_log(transaction.max_lsn)

Page 33: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

33Gray & Reuter: Resource Manager

WAL & F@C in PicturesWAL & F@C in Pictures

VVlsn

Volatile Page Versions

Volatile Log Records

VLlsn

PVlsn

Persistent Page Versions

Durable Log Records

DLlsnTim

e

online: VVlsn = VLlsn restart: DLlsn <= VVlsn

PVlsn <= DLlsnCommit:

commit_lsn <= DLlsn

At restart all volatile memory is reset and must be reconstructed from persistent memory.

restart: PVlsn <= DLlsn commit_lsn <= DLlsn

PVlsn

DLlsn

FIX, WAL and F@C assure these assertions

Page 34: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

34Gray & Reuter: Resource Manager

The One Bit Resource Manager

Manages an array of transactional bits (the free space bit map).

i = get_bit(); /* gets a free bit and sets it */

give_bit(i); /* returns a free bit (when transaction commits) */

Page 35: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

35Gray & Reuter: Resource Manager

The Bitmap and Its Log Records

The Data Structure

struct { /* layout of the one-bit RM data structure */LSN lsn; /* page LSN for WAL protocol */xsemaphore sem; /* semaphore regulates access to the page */Boolean bit[BITS]; /* page.bit[i] = TRUE => bit[i] is free */} page; /* allocates the page structure */

The Log Recordsstruct /* log record format for the one-bit RM */

{ int index; /* index of bit that was updated */Boolean value; /* new value of bit[index] */} log_rec; /* log record used by the one-bit RM */

const int rec_size = sizeof(log_rec); /*size of the log record body. */

Page 36: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

36Gray & Reuter: Resource Manager

Page and Log Consistency for 1-Bit RM

Data dirty if reflects an uncommitted transaction update Otherwise, data is clean.

Page Consistency:• No clean free bit has been given to any transaction.• Every clean busy bit was given to exactly one transaction.• Dirty bits locked in X mode by updating transactions .• The page.lsn reflects most recent log record for page.Log Consistency:• Log contains a record for every completed

mini-transaction update to the page.

Page 37: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

37Gray & Reuter: Resource Manager

give_bit()get_bit() & give_bit(i) temporarily violate page consistency. Mini-transaction holds semaphore while violating consistency.Makes page & log mutually consistent before releasing sem.=> each mini-transaction observes a consistent page state.

void give_bit(int i) /* free a bit */{ if (LOCK_GRANTED==lock(i,LOCK_X,LOCK_LONG,0)) /* Lock bit */

{ Xsem_get(&page.sem); /* get page sem */page.bit[i] = TRUE; /* free the bit */log_rec.index = i; /* generate log rec */log_rec.value = TRUE; /*saying bit is free */page.lsn = log_insert(log_rec,rec_size); /*write log rec&update lsn */Xsem_give(&page.sem);} /* page consistent */

else /* if lock failed, caller doesn't own bit, */ Abort_Work(); /* in that case abort caller's trans */

return; }; /* */

Page 38: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

38Gray & Reuter: Resource Manager

get_bit()

int get_bit(void) /* allocate a bit to and returns bit index */{ int i; /* loop variable */Xsem_get(&page.sem); /* get the page semaphore */for ( i = 0; i<BITS; i++); /* loop looking for a free bit */

{if (page.bit[i]) /* if bit is free, may be dirty (so locked) */ {if (LOCK_GRANTED =lock(i,LOCK_X,LOCK_LONG,0));/* lock bit */

{ page.bit[i] =FALSE; /* got lock on it, so it was free */log_rec.value = FALSE; /* generate log rec describing update */log_rec.index = i; /* */page.lsn = log_insert(log_rec,rec_size); /* write log rec&update lsn */Xsem_give(&page.sem); /* page now consistent, give up sem */return i; } /* return to caller */

}; /* else lock bounce so bit dirty */}; /* try next free bit, */

Xsem_give(&page.sem); /* if no free bits, give up semaphore */Abort_Work(); /* abort transaction */return -1;}; /* returns -1 if no bits are available. */

Page 39: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

39Gray & Reuter: Resource Manager

Compensation Logging

Undo may generate a log record recording undo stepMakes Page LSN monotonicSimilar technique was used for Communication Manager

(session sequence number was monotonic)

New State Logical Old State

UNDO

log record compensation log record

Page 40: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

40Gray & Reuter: Resource Manager

1-bit RM UNDO Callback

void undo(LSN lsn) /* undo a one-bit RM operation */{ int i; /* bit index */Boolean value; /* old bit value from log rec to be undone*/log_rec_header header; /* buffer to hold log record header */rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log rec */Xsem_get(&page.sem); /* get the page semaphore */i = log_rec.index; /* get bit index from log record */value = ! log_rec.value; /* get complement of new bit value */page.bit[i] = value; /* update bit to old value */log_rec.value= value; /* make a compensation log record */page.lsn = log_insert(log_rec,rec_size); /* log it and bump page lsn */Xsem_give(&page.sem); /* free the page semaphore */return; } /* */

Page 41: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

41Gray & Reuter: Resource Manager

1-bit RM Checkpoint Callback

LSN checkpoint(LSN * low_water) /* copy 1-page RM state to persistent store*/{ Xsem_get(&page.sem); /* get the page semaphore */*low_water = log_flush(page.lsn); /* WAL force up to page lsn, and */

/* set low water mark */write(file,page,0,sizeof(page)); /* write page to persistent memory */Xsem_give(&page.sem); /* give page semaphore */return NULLlsn; } /* return checkpoint lsn (none needed) */

Page 42: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

42Gray & Reuter: Resource Manager

1-bit RM REDO Callbackvoid redo( LSN lsn) /* redo an free space operation */

{ int i; /* bit index */Boolean value; /* new bit value from log rec to be redone*/log_rec_header header; /* buffer to hold log record header */rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log record */i = log_rec.index; /* Get bit index */lock(i,LOCK_X,LOCK_LONG,0); /* get lock on the bit (often not needed) */Xsem_get(&page.sem); /* get the page semaphore */if (page.lsn < lsn) /* if bit version older than log record */

{ value= log_rec.value; /* then redo the op. get new bit value */page.bit[i] = value; /* apply new bit value to bit */page.lsn = lsn; } /* advance the page lsn */

Xsem_give(&page.sem); /* free the page semaphore */return; }; /* */

Page 43: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

43Gray & Reuter: Resource Manager

1-BIT Rm Noise Callbacks

Boolean prepare(LSN * lsn) /* 1-bit RM has no phase 1 work */{*lsn = NULLlsn; return TRUE ;}; /* */

void Commit(void ) /* Commit release locks & */{ unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return */

void Abort(void ) /* Abort release all locks & */{ unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return */

Boolean savepoint((LSN * lsn) /* no work to do at savepoint */{*lsn = NULLlsn; return TRUE ;}; /* */

void UNDO_savepoint(LSN lsn) /* rollback work or abort transaction */{if (savepoint == 0) /* if at savepoint zero (abort) */

unlock_class(LOCK_LONG, TRUE, MyRMID()); /* release all locks */}; /* */

Page 44: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

44Gray & Reuter: Resource Manager

Summary

Model: Complex actions are a page/message action sequence.LSN: Each page carries an LSN and a semaphore.ReadFix: Read acts semaphore in shared mode.WriteFix: Update actions get semaphore in exclusive mode,

generate one or more log records covering the page, advance the page LSN to match highest LSN

give semaphoreWAL: log_flush(page.LSN) before overwriting persistent page F@C: force all log records up to the commit LSN at commitCompensation Logging: Invalidate undone log record with a

compensating log record.Idempotence via LSN: page LSN makes REDO idempotent

Page 45: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

45Gray & Reuter: Resource Manager

Two Phase Commit

Getting two or more logs to agreeGetting two or more RMs to agreeAtomically and DurablyEven in case one of them fails and restarts.The TM phasesPrepare. Invoke each joined RM asking for its vote.Decide. If all vote yes, durably write commit log record.Commit. Invoke each joined RM, telling it commit

decision.Complete. Write commit completion when all RM ACK.

Page 46: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

46Gray & Reuter: Resource Manager

Centralized Case of Two Phase Commit

Each participant: (TM &RM) goes through a sequence of states

These generate log records

Null ActiveAborting Aborted

Prepared Committing Committed

Page 47: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

47Gray & Reuter: Resource Manager

ExamplesExamples

Committed Abortedbegin beginDO rm1 DO rm1DO rm2 DO rm2DO rm2 DO rm2prepare rm2 {locks} UNDO rm2commit { rm1, rm2} UNDO rm2complete UNDO rm1

UNDO begin { rm1, rm2}

complete

Page 48: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

48Gray & Reuter: Resource Manager

Transitions in Case of Restart

Null ActiveAborting Aborted

Prepared Committing Committed

Active state not persistent, others are persistent

For both TM and RM.

Log records make them persistent (redo)

TM tries to drive states to the right. (to committed, aborted)

Page 49: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

49Gray & Reuter: Resource Manager

Successful two phase commit

Message/Call flow from TM to each RM joined to transaction

If TM and RM share the same log, the RM FORCE can piggyback on the TM FORCE

One IO to commit a transaction (less if commit is grouped)

Prepare

Local PrepareWrite Prepare RecordIn Log (force)

yes

Local Prepare(lazy)

Write CommitRecord In Log

(force)

Commit

Ack

Local Commit WorkWrite Completion RecordIn Log (lazy)Ack when durable.

Coordinator Participant

Write CompletionRecord In Log

(lazy)

State

Active

Prepared

Committing

Local CommitWork(lazy)

Committed

State

Active

Prepared

Committing

Committed

Page 50: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

50Gray & Reuter: Resource Manager

Abort Two Phase Commit

If RM sends "NO" or no response (timeout), TM starts abort.

Calls UNDO of each trans log record

May stop at a savepoint.

At begin_trans it calls ABORT() callback of each joined RM

Page 51: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

51Gray & Reuter: Resource Manager

Distributed two phase commit

Tracking joined TMs -- the communications manager helpsMuch as TRPC helps in the local case.

Root TM owes a Prepare/Commit/Abort message to each joined TM.Joined TM does "local" commit.

call

first time?

Transaction Manager A

trid is outgoing to

B

Communications Manager

first time?

Transaction Manager

trid is incom

ing from

ACommunications ManagerSession calleetrid, data

trid, data

Page 52: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

52Gray & Reuter: Resource Manager

Full Transaction State DiagramNext section explains how these states are implemented.

null

persistent save point n

= save point 0

Begun= save point 1

save point n active

prepared

committing

committed

aborting

abortedDurable States

Persistent States

Volatile States

live states

complete states

Page 53: Gray & Reuter: Resource Manager 1 Resource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq.

53Gray & Reuter: Resource Manager

Summary of Resource Manager Concepts

DO/UNDO/REDOIdempotent, Testable, Real operationsLogical vs Physical loggingShadows to make logical logging workPhysiological logging

Fix, WAL, Force-at-commitPage/Message/Log consistency

RM callbacks (the 1-bit resource manager)Join, Prepare, Commit, Abort, UNDO, REDO, ....

Restart REDO/UNDOTwo phase commit (RM story is simple).