Rewind, Repair, Replay: Three R’s to improve dependability Aaron Brown and David Patterson ROC Research Group University of California at Berkeley SIGOPS European Workshop, 23 September 2002
Rewind, Repair, Replay:Three R’s to improve
dependability
Aaron Brown and David Patterson
ROC Research GroupUniversity of California at Berkeley
SIGOPS European Workshop, 23 September 2002
Slide 2
What if computer systems could travel in time?
• We could have retroactive repair– travel back and fix problems before they had a
chance to corrupt data
• We could eliminate human operator error– make a mistake? Just travel back and try it again.
• Our systems could be more robust– we could eliminate the dangers of upgrades– we could better tolerate buggy software– we might even be able to tolerate viruses and
hackers• We could make more dependable systems
Slide 3
Sci-fi vs. computer time travel• Sci-fi time travel
– our hero loses a loved one or lives through disaster
– hero uses time machine to travel back in time
– hero alters the past to avert the future disaster
– hero returns to the present; past changes have been merged into the original timeline
• Computer time travel– human error, software
bug, or attack causes data loss
– Rewind: roll system state backwards in time
– Repair: make changes to avert foretold disaster
– Replay: roll system state forward, merging the original timeline with the effects of repairs
• Three R’s are the fundamental primitives of computer time travel
Slide 4
Key properties of the 3R’s• Recovery from problems at any system
layer– rewind, repair, replay cover OS through
application
• Recovery from unanticipated problems– arbitrary repair
• No assumptions about correct application behavior– physical rewind
• Integrated interface– provide “undo for sysadmins”
Slide 5
What about existing approaches?
Approach Rewind
Repair
Replay Comments
Backups, checkpoints, snapshots, no-overwrite storage
physical read-only view of
history
RDBMS log replay
physical
application-level only; cannot alter committed transactions
Workflow w/ compensating transactions
phys/log
limited apps; mechanisms not usefully integrated for time travel
Timewarp(PARC collaborative productivity apps)
logical
(limited)
application-level only; repair limited to well-understood history edits
Slide 6
Designing a 3R system• Goals
– application-neutrality– provide abstractions for reasoning about 3R behavior
• Target domain: network services– accessed by remote users via well-defined interfaces – email, messaging, e-commerce, auctions, forums,
web hosting, enterprise applications (J2EE, .NET), ...
• Challenges, learned from first attempt– integrating history and repair during replay– managing inconsistency in externally-visible state
Slide 7
Basic architecture• Application-independent undo manager
– coordinates 3R cycle; manages external inconsistencies
– linked via a set of APIs to application, time-travel storage, history log, and control UI
App. ServiceIncludes: - user state - application - operating system
History
Log
UndoManage
r
Time-travelstorage layer
control
3R API
ControlUI
Slide 8
Abstracting the application service
• To the undo manager, the application is:– a collection of state– a history of events affecting the state
» an event is typically a user interaction with the service
– a model of acceptable external consistency
• These are encoded into application-defined verbs– high-level encodings of user interactions (events)
» records of intent to alter state, not actual state changes
– reference application state by opaque UIDs– provide policies that define external consistency
Slide 9
Verbs and the 3R cycle• Normal operation
– undo manager logs application-provided verbs to disk
App. ServiceIncludes: - user state - application - operating system
History
Log
UndoManage
r
Time-travelstorage layer
control
Verbs
ControlUI
Userinteraction
Slide 10
Verbs and the 3R cycle• Rewind
– time-travel storage layer reverts system hard state to rewind point
– all changes since rewind point are discarded
App. ServiceIncludes: - user state - application - operating system
History
Log
UndoManage
r
Time-travelstorage layer
control
ControlUI
Slide 11
Verbs and the 3R cycle• Repair
– operator edits logged history and/or makes arbitrary changes to system
App. ServiceIncludes: - user state - application - operating system
History
Log
UndoManage
r
Time-travelstorage layer
control
ControlUIRepairs
Edits
Slide 12
Verbs and the 3R cycle• Replay
– undo manager feeds verbs back to application for re-execution in the context of repaired system
App. ServiceIncludes: - user state - application - operating system
History
Log
UndoManage
r
Time-travelstorage layer
control
Verbs
ControlUI
Slide 13
The fundamental roles of verbs
• Providing application-independence– verbs encapsulate application semantics, but remain
semi-opaque to undo manager
• Integration of repair into history– high-level specification of intent makes verbs
relatively independent of system changes– verbs are re-executed, not restored, so they inherit
effects of repairs
• Scoping restored history– only changes logged as verbs will be preserved by
3Rs» effects of bugs, corruption, human error are discarded
– can reason about what is preserved/lost in 3R cycle
Slide 14
Managing external inconsistency
• External inconsistency == time paradox?– system is internally-consistent after a 3R cycle– but external observers see inexplicable state
changes– external inconsistency is OK unless affected state
was externalized (observed) before the 3R cycle
• Coping with external inconsistency– cannot eliminate– must manage: ignore, explain, compensate,
encompass
• Verbs let us manage external inconsistency
Slide 15
Managing inconsistency with verbs
• To detect inconsistencies:– verbs specify the state that they depend upon– undo manager tracks signatures of that state– if verb is altered or if signatures don’t match, there is
an inconsistency» applications supporting relaxed consistency can
replace signature-check with arbitrary consistency predicates
• To detect state viewed externally:– verbs indicate what state they externalize
» example: IMAP fetch verb externalizes email message
• To handle externalized inconsistencies:– verb supplies compensation functions
Slide 16
Email example: original timeline
Systemboundary
Systemstate
Verbs
Historylog
Time
Inbox
Folder1olleH
MoveMsg
Move
Externalizes:— ContentDep: —ExistsDep: Inbox,
Folder1
olleH
FetchMsg
Fet
ch
m
Externalizes:m ContentDep: mExistsDep: m, Folder1
+ Signature(m)=“olleH”
Hello
olleH
DeliverMsg
Deliver
m
Externalizes:— ContentDep: —ExistsDep: Inbox
+ input “Hello”
Slide 17
olleH
MoveMsg
Move
Externalizes:— ContentDep: —ExistsDep: Inbox,
Folder1
olleH
FetchMsg
Fet
ch
m
Externalizes:m ContentDep: mExistsDep: m, Folder1
+ Signature(m)=“olleH”
Hello
olleH
DeliverMsg
Deliver
m
Externalizes:— ContentDep: —ExistsDep: Inbox
+ input “Hello”
XHello
DeliverMsg
Externalizes:— ContentDep: —ExistsDep: Inbox
+ input “Hello”
Hello
Deliver
m
Email example: replay timeline
Systemboundary
Systemstate
Verbs
Historylog
Time
Inbox
Folder1 Hello
MoveMsg
Move
Externalizes:— ContentDep: —ExistsDep: Inbox,
Folder1
Hello
FetchMsg
Fet
ch
m
Externalizes:m ContentDep: mExistsDep: m, Folder1
+ Signature(m)=“olleH”
mismatch! => inconsistency
Slide 18
Recap: 3R architecture• Goal: application-neutral implementation of
3R’s– verb abstraction couples generic undo manager
to app. – verbs provide tools to reason about 3R behavior
• Challenges– integrating history and repair during replay
» re-executing verbs restores intent of history
– managing inconsistency in externally-visible state» verbs track externalization, state dependencies,
and define compensations
Slide 19
Status• Prototype implementation of 3R primitives
nearly complete– app-independent undo manager written in Java– all APIs defined as Java interfaces– Network Appliance filer as time-travel storage layer– BerkeleyDB as history log
• First target app: web-based email service– 3R-enhanced JavaMail API provider classes
» plus additional hooks to verb-ify operator maintenance tasks like account creation
– JWebMail web front-end– RDBMS-based backend mail store (DB2 or MySQL)– implementation in progress
Slide 20
Open issues & future work• Resource impact of the 3R’s
– what are the performance/space penalties for the 3R’s?
• Verb definition– can we specify verbs & consistency policy declaratively?
• Providing the 3R’s at multiple granularities– can we track & manage cross-granularity
dependencies?
• Measuring the dependability benefit of 3R’s– how do we build recovery/dependability benchmarks?
• Other uses for verb-based characterizations– easy georeplication? online self-checking? automatic
verification of upgrades?
Slide 21
Conclusions• We can build time travel for computers
– using the 3R’s: Rewind, Repair, Replay
• An architecture for the 3R primitives– generic undo manager coupled to application by
verbs
• Verbs are a useful abstraction for the 3R’s– can use to reason about effects of 3R’s on state– help address problem of external inconsistencies
• Prototype 3R-enabled email system under construction– hope to demonstrate increased dependability and
faster recovery from problems
Rewind, Repair, Replay:Three R’s to improve
dependability
For more information:
http://roc.cs.berkeley.edu/[email protected]
Slide 23
Backup slides
Slide 24
Verbs vs. transactions• Both encapsulate state-altering events• But, unlike transactions:
– verbs are higher-level, recording end-user intent, not specific state changes
– verbs do not depend on internal data models (but do depend on external protocols)
» transactions are the reverse
– verbs do not necessarily conform to ACID consistency
» verbs inherit consistency model provided by application at the external-protocol level
Slide 25
Implementing verbs• Verbs are defined by a type hierarchy
– base type defines interfaces for state dependencies, externalizations, predicates, compensations
– applications subclass the base type for their verbs» additions to the type are opaque to the undo manager
• Referencing state– all user-visible state named by time-invariant UIDs– undo manager requires signature method for all state
• Consistency predicates and compensations are application-supplied functions– they encode the app’s external consistency model
Slide 26
Defining verbs• Currently, verbs are defined procedurally
– provide dependency information via lists of state IDs– provide functions for special consistency predicates – provide functions for compensation
• Better: declarative specification– compile textual specification into verb code using
libraries of predicates and compensation fns– reduces complexity of adding 3R’s to the application – increases confidence in undo system via easier
testing
Slide 27
External consistency policies• Verbs capture external consistency
policies• Example: email
– message order in folder is irrelevant» AppendMessage verb does not express dependency on
content of target folder, only its existence
– content of messages is relevant, except for headers» ReadMsg verb depends on hash of target message body;
if changed, compensate by inserting explanatory text
• Example: e-commerce– order total depends on item prices, not descriptions
» Checkout verb depends on prices of items in cart, not their hash-values; if sum of prices changed, compensate by emailing customer for approval
Slide 28
External consistency policies (2)
• Example: auctions– new bid must be larger than prior bids
» PlaceBid verb depends on content of all bids in bid set; if one is now larger than new bid, compensate by canceling new bid and informing bidder
Slide 29
Application implications• To support the 3R’s, an application
must have:– a high-level, verb-structured interface/API for
user, operator, and external actions– a state model where all user-visible state:
» is nameable via the API» is tagged with GUIDs» supports a signature/hash method
– a relaxed external consistency model that allows compensation for externalized inconsistent verbs
Slide 30
Example: a 3R email store
• State– mailstores, folders, messages, user properties, aliases
• Verbs– transport: create/delete/alter mapping; deliver msg– directory: create/alter/delete user-entry;
create/alter/delete filter-rule; add/remove maildrop– store: create/delete store; create/rename/delete
folder; expunge folder; list folder; set folder flags; copy msg; append msg; fetch msg; set msg flags
StoreTransportWebUI
Directory/Auth.
SMTP
HTTP
LDAP, internal
IMAP, internal
internal
UndoMgr verbs
verbs