Click here to load reader
Aug 11, 2014
Level 400: Diving into VoronOren Eini
[email protected] ayende.com/blogHibernating Rhinos
Voron is… Low level key / value store Transactional / ACID MVCC Multi layers
WHY!?
background LevelDB LMDB Esent
Seeks are slow
0.01 ms – Compress 1kb with Zippy 0.25 ms – Read 1 MB from memory 0.50 ms – Ping inside data center 10.0 ms – Disk seek 10.0 ms – Read 1 MB from network 30.0 ms – Read 1 MB from disk
Binary Trees, Eh?
FB
A
DC
E
GH
I
B+ Trees
Implementation 4KB Pages B+ Tree Page translation table MVCC Journal file Scratch file Memory mapped
Modifying the tree Find appropriate #to modify. Get a scratch page, copy #to scratch page. Register scratch #with the old ## in #translation table
(PTT). Modify the #as you wish. On commit, the PTT becomes publicly visible. All changed pages are written to journal file. If rollback, revert to previous PTT, release scratch pages,
done.
#3#0
#4#1
#5#2
Dat
asc
ratc
h
#6#3#0
#7#4#1
#8#5#2
Tx #1#0 >- #3#1 >- #1
Tx #2#0 >- #3#1 >- #5
Background Find pages in scratch that have no one looking at
older versions of them. Copy to data file. Clear the scratch space.
How it works Only I/O during commits is a single write through,
compressed, of data to journal. Moving data to data file is done in async. No need to call fsync().
Full & incremental backups.
Missing the forest Voron isn’t a B+ Tree system. It doesn’t have a tree, it has trees. Plural. <blink>Important</blink>
Falling trees Single root tree Contain many additional trees. Tree is similar to a table. Operations on tree:
Add(key, value) Del(key, value) Find(key) : value Iterate() (Seek,Next, Prev)
How it works?
With indexes
Finding stuff
* Not the most efficient method
So, Voron has trees… Root tree Free Space tree Contains references to named trees
Enough?
Tree of trees MultiAdd, MultiDelete, MultiRead
Why multi trees? Optimization – if has just 1 item (and no value) can
directly use the parent tree store. Store multiple items for a single value.
Iterating multi trees
What voron does? Opens up a lot of interesting scenarios. We have far better control over persistence now. Very low level (bits & bytes). Very fast! Concurrency benefits:
Reads Writes*
* Yet Voron allows only a single writer!
What it does not? It isn’t about Linux. It can’t run on Linux*.
Need to implment: PosixPureMemoryPager PosixPageFileBackedMemoryMappedPager PosixMemoryMapPager
Waiting for big Linux push post 3.0 release.
The cloud story… Scratch / temp usage Utilize fast local drives that can go away. Slow I/O only hold us for tx commit (and we optimized
that).
Summary Voron learned from LevelDB, LMDB, Esent. Journal for Atomicity, Consistency & Durability. MVCC for Consistency & Isolation. Root tree, named tress, multi trees.
Questions?