MonetDB: A column-oriented DBMS Ryan Johnson CSC2531.

MonetDB: A column-oriented DBMSRyan JohnsonCSC2531

The memory wall has arrived• CPU performance

+70%/year• Memory performance

latency: -50%/decadebandwidth: +20%/year (est.)

• Why?– DRAM focus on capacity (+70%/year)– Physical limitations (pin counts, etc.)– Assumption that caches "solve” latency problem

DBMS spends 95% of time waiting for memory

The problem: data layouts• Logical layout: 2-D relation

=> Unrealizable in linear address space!• N-ary storage layout, aka “slotted pages”

– Easy row updates, strided access to columns=> Low cache locality for read-intensive workloads

“NSM layouts considered harmful”

. . .

Coping with The Wall• Innovation: decompose all data vertically

– Columns stored separately, rejoined at runtime• Binary Association Table (BAT) replaces Relation– List of (recordID, columnValue) pairs– Compression and other tricks ==> 1 byte/entry

BAT + clever algos => cache locality => Winner!

Exploring deeper• Performance study (motivation)• Physical data layouts• Cache-optimized algorithms• Evaluating MonetDB performance• Implications and lingering questions

NSM: access latency over time

Read one column(record size varies with x)

Latency increases ~10x as accesses/cache line 1 (slope changes at L1/L2 line size)

Efficient physical BAT layout• Idea #1: “virtual OID”– Optimizes common case– Dense, monotonic OIDs– All BATs sorted by OID

• Idea #2: compression– Exploits small domains– Boosts cache locality,

effective mem BW

How to handle gaps?

Out-of-band values?

Can’t we compress NSM also?

Joining two BAT on OID has O(n) cost!

Cache-friendly hash join• Hash partitioning: one

pass but trashes L1/L2 – #clusters > #cache lines

• Radix-partitioning: limit active #partitions by making more passes

Recall:CPU is cheap compared to

memory access

Great, but how well does it work?• Three metrics of interest– L1/L2 misses (= suffer latency of memory access)– TLB misses (even more expensive than cache miss)– Query throughput (higher is better)

• Should be able to explain throughput using other metrics– Given model makes very good predictions=> Memory really is (and remains!) the bottleneck

A few graphs

Big win: stability as cardinalities vary

Radix clustering behavior as cardinality varies

Radix-clustered HJ vs. other algorithms

Implications and discussion points• Cache-friendly really matters (even w/ I/O)– Traditional DBMS memory-bound

• Vertically decomposed data: superior density – Data brought to cache only if actually needed– Compression gives further density boost

• Questions to consider...– Queries accessing many columns?– What about inserts/updates (touch many BAT)? – What about deletes/inserts (bad for compression)?

Implications and discussion points• Cache-friendly really matters (even w/ I/O)– Traditional DBMS memory-bound

• Vertically decomposed data: superior density – Data brought to cache only if actually needed– Compression gives further density boost

• Questions to consider...– Queries accessing many columns?– How to make a good query optimizer?– Performance of transactional workloads?• Update-intensive, concurrency control, ...

– What about inserts (bad for compression)?

MonetDB: A column-oriented DBMS Ryan Johnson CSC2531.

Documents

memory slide

memory access slide

bottleneck slide

low cache locality

memory wall

l1l2 line size slide

access latency

decomposed data