MonetDB: A column-oriented DBMS Ryan Johnson CSC2531
Dec 16, 2015
MonetDB: A column-oriented DBMSRyan JohnsonCSC2531
The memory wall has arrived• CPU performance
+70%/year• Memory performance
latency: -50%/decadebandwidth: +20%/year (est.)
• Why?– DRAM focus on capacity (+70%/year)– Physical limitations (pin counts, etc.)– Assumption that caches "solve” latency problem
DBMS spends 95% of time waiting for memory
The problem: data layouts• Logical layout: 2-D relation
=> Unrealizable in linear address space!• N-ary storage layout, aka “slotted pages”
– Easy row updates, strided access to columns=> Low cache locality for read-intensive workloads
“NSM layouts considered harmful”
. . .
Coping with The Wall• Innovation: decompose all data vertically
– Columns stored separately, rejoined at runtime• Binary Association Table (BAT) replaces Relation– List of (recordID, columnValue) pairs– Compression and other tricks ==> 1 byte/entry
BAT + clever algos => cache locality => Winner!
Exploring deeper• Performance study (motivation)• Physical data layouts• Cache-optimized algorithms• Evaluating MonetDB performance• Implications and lingering questions
NSM: access latency over time
Read one column(record size varies with x)
Latency increases ~10x as accesses/cache line 1 (slope changes at L1/L2 line size)
Efficient physical BAT layout• Idea #1: “virtual OID”– Optimizes common case– Dense, monotonic OIDs– All BATs sorted by OID
• Idea #2: compression– Exploits small domains– Boosts cache locality,
effective mem BW
How to handle gaps?
Out-of-band values?
Can’t we compress NSM also?
Joining two BAT on OID has O(n) cost!
Cache-friendly hash join• Hash partitioning: one
pass but trashes L1/L2 – #clusters > #cache lines
• Radix-partitioning: limit active #partitions by making more passes
Recall:CPU is cheap compared to
memory access
Great, but how well does it work?• Three metrics of interest– L1/L2 misses (= suffer latency of memory access)– TLB misses (even more expensive than cache miss)– Query throughput (higher is better)
• Should be able to explain throughput using other metrics– Given model makes very good predictions=> Memory really is (and remains!) the bottleneck
A few graphs
Big win: stability as cardinalities vary
Radix clustering behavior as cardinality varies
Radix-clustered HJ vs. other algorithms
Implications and discussion points• Cache-friendly really matters (even w/ I/O)– Traditional DBMS memory-bound
• Vertically decomposed data: superior density – Data brought to cache only if actually needed– Compression gives further density boost
• Questions to consider...– Queries accessing many columns?– What about inserts/updates (touch many BAT)? – What about deletes/inserts (bad for compression)?
Implications and discussion points• Cache-friendly really matters (even w/ I/O)– Traditional DBMS memory-bound
• Vertically decomposed data: superior density – Data brought to cache only if actually needed– Compression gives further density boost
• Questions to consider...– Queries accessing many columns?– How to make a good query optimizer?– Performance of transactional workloads?• Update-intensive, concurrency control, ...
– What about inserts (bad for compression)?