Sándor Héman Marcin Zukowski Niels Nes Lefteris Sidirourgos Peter Boncz Positional Update Handling in Column Stores
Feb 22, 2016
Sándor HémanMarcin ZukowskiNiels NesLefteris SidirourgosPeter Boncz
Positional Update Handlingin Column Stores
UPDATE IN PLACE:A Poison Apple?Jim Gray, 1981
“..for performance reasons, most disc-based systems have been seduced into updating the data in place.”
30 years of hardware improvements in sequential/throughput beating random/latency…. in-place less feasible every year.
alternative: differential approach.In column stores, in-place updating is by now clearly infeasible
Problem: Column Store Updates
• I/O proportional to number of attributes– I/O blocks large and compressed– Sometimes even replicated– Read-Optimized Update-Unfriendly
• Table often kept ordered on sort-key (SK) attributes– Uniform update load scattered write access
Solution: Differential Structure
• Maintain updates (INS/DEL/MOD) in a differential structure– Merge with base table during scan
Solution: Differential Structure
• Maintain updates (INS/DEL/MOD) in a differential structure– Merge with base table during scan
• Challenges:– Efficiently maintainable data-structure– Minimize Merge impact for read-only queries
Naïve Approach: Delta Tables
• For each table, maintain two update friendly row-store tables:– INS(C1..Cn)– DEL(SK1..SKm)– MOD = DEL + INS
store prod new qty
London stool N 10
London table N 20
Paris rug N 1
Paris stool N 5
Base table: inventorySort-Key (SK): [store, prod]
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
Inserts table: INS
store prod
Paris rug
Deletes table: DEL
Naïve Approach: Delta Tables
store prod new qty
London stool N 10
London table N 20
Paris rug N 1
Paris stool N 5
Base table: inventorySort-Key (SK): [store, prod]
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
Inserts table: INS
store prod
Paris rug
Deletes table: DEL
• Rewrite table scans:MergeUnion[store,prod](Scan(INS), MergeDiff[store,prod]( Scan(Inventory), Scan(DEL)))
Naïve Approach: Delta Tables
• Rewrite table scans:MergeUnion[store,prod](Scan(INS), MergeDiff[store,prod]( Scan(Inventory), Scan(DEL)))
for up-to-date image• Expensive!
– I/O to scan SK ‘merge’ columns; also if querydoes not need SK cols
– Each query pays CPU effort to locate the same change positions over and over again
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
London stool N 10
London table N 20
Paris stool N 5
Actual table: inventorySort-Key (SK): [store, prod]
The Idea: Positional Updates• Remember the position of an update rather than
its SK values– Merge once at write Read-Optimized approach– No need to scan SK columns– Scan can skip less CPU overhead
Notation:• TABLEx state of TABLE at time x• SID(t): StableID
– Position of tuple t in immutable base TABLE0 Stable• RIDx(t): RowID
– Position of visible tuple t at time x VOLATILE!– SID(t) = RID0(t)
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 00 Berlin cloth Y 20 10 Berlin table Y 10 20 London chair N 30 31 London stool N 10 42 London table N 20 53 Paris rug N 1 64 Paris stool N 5 7
SID/RID Example
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
TABLE1
SID STORE PROD NEW QTY RID
0 London chair N 30 01 London stool N 10 12 London table N 20 23 Paris rug N 1 34 Paris stool N 5 4
TABLE0
SIDs and RIDs
• RID(t) = SID(t) + ∆(t)• ∆(t) = #inserts before t – #deletes before t
= RID(t) – SID(t)• SID and RID are monotonically increasing
– organize positional updates on SID in a counting B-Tree that keeps track cumulative deltas (∆)• Positional Delta Tree (PDT)
– SIDs are stable– Only need to maintain cumulative ∆ on path root leaf
PDT Example
STORE PROD NEW QTYBerlin table Y 10Berlin cloth Y 20Berlin chair Y 5
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
02 1
SID
∆
0 0ins insi2 i1
SIDtypevalue
0insi0
SIDtypevalue
Insert Value Table
i0i1i2
SID STORE PROD NEW QTY RID
0 London chair N 30 01 London stool N 10 12 London table N 20 23 Paris rug N 1 34 Paris stool N 5 4
TABLE0
PDT Example
TABLE1
DELETE FROM inventory WHERE store = ‘Berlin’ AND prod = ‘table’DELETE FROM inventory WHERE store = ‘Paris’ AND prod = ‘rug’
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 0
0 Berlin cloth Y 20 1
0 Berlin table Y 10 2
0 London chair N 30 31 London stool N 10 42 London table N 20 53 Paris rug N 1 64 Paris stool N 5 7
STORE PROD NEW QTYBerlin table Y 10Berlin cloth Y 20Berlin chair Y 5
02 -1
SID
∆
0 0ins insi2 i1
SIDtypevalue
3deld0
SIDtypevalue
Insert Value Table
i0i1i2
PDT Example
TABLE2
INSERT INTO inventory VALUES (‘Paris’, ‘rack’, Y, 4)
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 0
0 Berlin cloth Y 20 1
0 London chair N 30 21 London stool N 10 32 London table N 20 44 Paris stool N 5 5
Insert at RID = 5
STORE PROD NEW QTYBerlin table Y 20Berlin cloth Y 5Berlin chair Y 10
02 -1
SID
∆
0 0ins insi2 i1
SIDtypevalue
3deld0
SIDtypevalue
Insert Value Table
i0i1i2
STORE PROD NEW QTYParis rack Y 4Berlin cloth Y 20Berlin chair Y 5
02 -1
SID
∆
0 0ins insi2 i1
SIDtypevalue
3 3ins deli0 d0
SIDtypevalue
Insert Value Table
i0i1i2
RID 5 > 0 + 2
PDT Example
0 0ins insi2 i1
SIDtypevalue
0 10 1RID
∆
0insi4
SIDtypevalue
22RID
∆
1insi3
SIDtypevalue
34RID
∆
3 3ins deli0 d0
SIDtypevalue
4 57 8RID
∆
02 1
SID
∆
RID∆ 2
2
31 0
SID
∆
RID∆ 4
7
13 1
SID
∆
RID∆ 3
4
INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4)INSERT INTO inventory VALUES (‘Berlin’, ‘rack’, Y, 4)
Separator SIDsSubtree ∆
Separator RIDsRunning ∆
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDTgeneralization:• PDT contains all differences in time [lo,hi]
Table
PDT
PDT
PDT
lohi
PDT t1t2
PDT t0t1
consecutive t2=t1
PDT t2t3 PDT t0
t1vs are
Table
PDT
PDT
PDT
PDT
PDT
PDT PDT t2t3
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDTgeneralization:• PDT contains all differences in time [lo,hi]
Table
lohi
consecutive t2=t1aligned t2=t0
“same base”
PDT t2t3 PDT t0
t1vs are
Table
PDT PDT
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDTgeneralization:• PDT contains all differences in time [lo,hi]
Table
lohi
consecutive t2=t1aligned t2=t0
“same base”overlapping [t2,t3] overlaps [t0,t1]“uncomparable” / “incompatible”
PDT t2t3 PDT t0
t1vs are
Table
PDT PDT
PDT
Stacking for Isolation• ‘lock’ PDT down for further updates
– Immutable read-PDT BIG: main memory resident• ‘stack’ empty PDT on top
– Updateable write-PDT SMALL: L2 cache resident– Note: PDTs are consecutive
• once in a while changes are propagated– Propagate() operation
• Requires consecutive PDTs
Stable Table
Read-PDT
Write-PDTTABLEx
Propagate()Read-PDT
Stable Table
Read-PDT
Write-PDTTABLEx
Write-PDT
Trans PDT
CopyWrite-PDT
TransactionState
Snapshot Isolation• Transaction creates
snapshot copy of write-PDT
• Updates go into trans-PDT
• On commit, Propagate() trans-PDT into write-PDT
Propagate()
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
CopyWrite-PDT
TransA Trans
PDT
CopyWrite-PDT
TransB
• Two concurrent transactions
Optimistic Concurrency Control
Stable Table
Read-PDT
Trans PDT
TABLEx
CopyWrite-PDT
TransA Trans
PDT
CopyWrite-PDT
TransB
Propagate()
Write-PDT
• Two concurrent transactions• A commits before B
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
TransA Trans
PDT
CopyWrite-PDT
TransB
Pro
paga
te()
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
TransA
TransB
Serialize()Trans PDT
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
• Serialize(A, B)– Makes aligned PDTs consecutive– MAY FAIL!! trans abort
= succeeds if no conflict= write set intersection
Consecutive!Trans PDT
Optimistic Concurrency Control
Stable Table
Read-PDT
Trans PDT
TABLEx
TransA Trans
PDT
TransB
Write-PDT
Prop
agat
e()
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
• Serialize(A, B)– Makes aligned PDTs consecutive– MAY FAIL!! trans abort
= succeeds if no conflict= write set intersection
Extend to any number of concurrent transactions by serializing against all PDTs of transactions that committed during its lifetime
(a.k.a. backward looking OCC)
Serialize()
Concluding..
• PDTs speed-up differential update merging– Reduced I/O volume– Reduced CPU merge overhead
• Tree structure – logarithmic lookup & maintenance of volatile RIDs– main operations: Merge(), Propagate(), Serialize()
• PDTs are stackable, and capture Write-Set– Great structure for Snapshot Isolation
• Formal definitions, algorithms and benchmarks in paper
Thank you!
Microbenchmarks
TPCH-30