Improving Transaction- Time DBMS Performance and Functionality David Lomet Microsoft Research Feifei Li Florida State University
Mar 29, 2015
Improving Transaction-Time DBMS Performance and Functionality
David LometMicrosoft Research
Feifei LiFlorida State University
Immortal DB: A Transaction-Time DB
• What is Transaction-Time DB?– Retains versions of records
• Current and prior database states
– Supports temporal based access to these versions• Using transaction time
• Immortal DB Goals– Performance close to unversioned DB– Full indexed access to history– Explore other functionality based on versions
• History as backup• Bad user transaction removal• Auditing
Prior Publications
• SIGMOD’04: demo’d and demo paper• ICDE’04: initial running system described• SIGMOD’06: removing effects of bad user
transactions• ICDE’08: indexing with version compression• ICDE’09: performance and functionality
Talk Outline• Immortal DB: a transaction time database• Update Performance: timestamping
– Timestamping is main update overhead– Prior approaches– Our new approach– Update performance results
• Support for auditing– What do we provide– Exploiting timestamping implementation
• Range Read Performance: new page splitting strategy– Storage utilization determines range read performance– Prior split strategy guaranteeing “as off” version utilization– Our new approach– Storage utilization results
Timestamping & Update Performance
• Timestamp not known until commit– Fixing it to early leads to aborts
• Requires 2nd “touch” to add TS to record– 1st for update when TS not known– 2nd for adding TS when known
• TID:TS mapping must be stable until all timestamping completes and is stable
• Biggest single extra cost for updates
Prior Timestamping Techniques
• Eager timestamping– As a 2nd update during transaction– Delays commit, ~doubles update
• Lazy Timestamping – several variations– Replace Transaction ID (TID) with timestamp (TS) lazily
after commit; but this requires …– Persisting (TID:TS) mapping
• Trick is in handling this efficiently• Most prior efforts updated Persistent Transaction Timestamp
Table (PTT) at commit with TID:TS mapping• We improve on this part of process
Lazier Timestamping
LogTID:TS
PTTTID:TS
Commit record: with TID:TS
TID:TS posted to log at commit
Main MemoryVol. ts table(VTT) TID:TS: ref cnt
TID:TS batch write from VTT to PTT at chkpt
Timestamping activityBased mostly on VTT
Removes VTT entriesWhen TS’ing completeRef cnt = 0 and stable
TS added at commit
Only TID:TS with unfinished TS’ing
Execution Time
50% PTT
batch inserts
20% PTT
batch inserts
Unversioned
Prior TS method
unbatched
100% PTT batch inserts
IMPORTANT: Simple “ONE UPDATE” Transaction
Expected result is less than 20% case
Talk Outline• Immortal DB: a transaction time database• Update Performance: timestamping
– Timestamping is main update overhead– Prior approaches– Our new approach– Update performance results
• Support for auditing– What do we provide– Exploiting timestamping implementation
• Range Read Performance: new page splitting strategy– Storage utilization determines range read performance– Prior split strategy guaranteeing “as off” version utilization– Our new approach– Storage utilization results
Adding Audit Support
• Basic infrastructure only– Too much in audit to try to do more– For every update, who did it and when
• Technique– Extend PTT schema to include User ID (UID)
– Always persist this information• No garbage collection
– Timestamping technique permits batch update to PTT
TID:TS:UIDPTT
What does it cost?
50% PTT
batch inserts
20% PTT
batch inserts
Unversioned
Prior TS method
unbatched
100% PTT batch inserts
Audit Mode: Always keep everything in PTT, never delete~ equal to 50% batch insert case as these also are batch deleted
IMPORTANT: Simple “ONE UPDATE” Transaction
Talk Outline• Immortal DB: a transaction time database• Update Performance: timestamping
– Timestamping is main update overhead– Prior approaches– Our new approach– Update performance results
• Support for auditing– What do we provide– Exploiting timestamping implementation
• Range Read Performance: new page splitting strategy– Storage utilization determines range read performance– Prior split strategy guaranteeing “as off” version utilization– Our new approach– Storage utilization results
Utilization => Range Read Performance
• Biggest factor is records/page• Current data is most frequently read• We need technique that will improve storage
utilization– Surely for current data– No compromise for historical data
• Prior page splitting technology evolved from WOB-tree– Which was constrained by write-once media
• We can do better with write-many media
Prior Approaches to Guaranteed Utilization
• Choose target fill factor for current database– Can’t be 100% like unversioned – Higher => more redundant versions for “partially persistent indexes”
• Like TSB-tree, BV-tree, WOB-tree• Because splitting by time creates redundant versions when they cross time split
boundary
• “Naked” key splits compromise version utilization– Key split splits history as well as current data– Excessive key splits without time splits drives down storage utilization by any specific
version.
• What to do? Always time split with key split– Removes historical data from new current pages– Permitting them to fill fully to fill factor– Protects historical versions from further splitting – Originally in WOB-tree– a necessity there with WO storage media
Why time split with key split?
Historical data
Added versions
Free space
Key
split
Page
fills
Key
split
Same page over time
Historical page
key
split
Tim
e sp
lit
Current page
Time split with key split guarantees historical page will have good utilization for its versions
Intuition for new splitting technique– Always time split when page first is full– Key split afterwards when the page is full again
Historical data
Added versions
Free spaceTi
me
split
Page
fills
Historical page
Current page
Key
split
Historical page utilization preservedCurrent page utilization improved
Analytical Result• We can show the following:
)1( maxmaxmaxno-deferno-deferdefer SVCU
incrup
inSVCUSVCU
Where in is the insertion ratio, up is the update ratio and cr is the compression ratio.
)]2ln
1(2ln
[2ln
SVCU
incrup
inSVCUSVCU
no-deferavg
no-deferavgdefer
avg
* Formula derived based on one extra time for current pages to fill
Added current records with one
extra page fill before key split
Analysis:Current Storage Utilization
vs Update Ratio
0 0.1 0.2 0.3 0.4 0.5
0.600000000000001
0.700000000000001 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
B-tree
CR=.10 Deferred
CR=.10
CR=1.0 Deferred
CR=1.0
Expect update ratio of 65% - 85%
Update Ratio
Cur Utilization
Summary
• Optimizing timestamping yields update performance close to unversioned
• Optimizing page splitting yields current time range search performance close to unversioned
• Audit functionality easy to add via timestamping infrastructure
• Questions???