Top Banner

Click here to load reader

25

How Voron works: Insight into the new RavenDB storage engine

Aug 11, 2014

Download

Data & Analytics

Oren Eini

In this Level 400 talk, we go deep into how Voron is implemented, including all the gory details of creating a high performance transnational storage.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How Voron works: Insight into the new RavenDB storage engine

Level 400: Diving into VoronOren Eini

[email protected] ayende.com/blogHibernating Rhinos

Page 2: How Voron works: Insight into the new RavenDB storage engine

Voron is… Low level key / value store Transactional / ACID MVCC Multi layers

Page 3: How Voron works: Insight into the new RavenDB storage engine

WHY!?

Page 4: How Voron works: Insight into the new RavenDB storage engine

background LevelDB LMDB Esent

Page 5: How Voron works: Insight into the new RavenDB storage engine

Seeks are slow

0.01 ms – Compress 1kb with Zippy 0.25 ms – Read 1 MB from memory 0.50 ms – Ping inside data center 10.0 ms – Disk seek 10.0 ms – Read 1 MB from network 30.0 ms – Read 1 MB from disk

Page 6: How Voron works: Insight into the new RavenDB storage engine

Binary Trees, Eh?

FB

A

DC

E

GH

I

Page 7: How Voron works: Insight into the new RavenDB storage engine

B+ Trees

Page 8: How Voron works: Insight into the new RavenDB storage engine

Implementation 4KB Pages B+ Tree Page translation table MVCC Journal file Scratch file Memory mapped

Page 9: How Voron works: Insight into the new RavenDB storage engine

Modifying the tree Find appropriate #to modify. Get a scratch page, copy #to scratch page. Register scratch #with the old ## in #translation table

(PTT). Modify the #as you wish. On commit, the PTT becomes publicly visible. All changed pages are written to journal file. If rollback, revert to previous PTT, release scratch pages,

done.

Page 10: How Voron works: Insight into the new RavenDB storage engine

#3#0

#4#1

#5#2

Dat

asc

ratc

h

#6#3#0

#7#4#1

#8#5#2

Tx #1#0 >- #3#1 >- #1

Tx #2#0 >- #3#1 >- #5

Page 11: How Voron works: Insight into the new RavenDB storage engine

Background Find pages in scratch that have no one looking at

older versions of them. Copy to data file. Clear the scratch space.

Page 12: How Voron works: Insight into the new RavenDB storage engine

How it works Only I/O during commits is a single write through,

compressed, of data to journal. Moving data to data file is done in async. No need to call fsync().

Full & incremental backups.

Page 13: How Voron works: Insight into the new RavenDB storage engine

Missing the forest Voron isn’t a B+ Tree system. It doesn’t have a tree, it has trees. Plural. <blink>Important</blink>

Page 14: How Voron works: Insight into the new RavenDB storage engine

Falling trees Single root tree Contain many additional trees. Tree is similar to a table. Operations on tree:

Add(key, value) Del(key, value) Find(key) : value Iterate() (Seek,Next, Prev)

Page 15: How Voron works: Insight into the new RavenDB storage engine

How it works?

Page 16: How Voron works: Insight into the new RavenDB storage engine

With indexes

Page 17: How Voron works: Insight into the new RavenDB storage engine

Finding stuff

* Not the most efficient method

Page 18: How Voron works: Insight into the new RavenDB storage engine

So, Voron has trees… Root tree Free Space tree Contains references to named trees

Enough?

Tree of trees MultiAdd, MultiDelete, MultiRead

Page 19: How Voron works: Insight into the new RavenDB storage engine

Why multi trees? Optimization – if has just 1 item (and no value) can

directly use the parent tree store. Store multiple items for a single value.

Page 20: How Voron works: Insight into the new RavenDB storage engine

Iterating multi trees

Page 21: How Voron works: Insight into the new RavenDB storage engine

What voron does? Opens up a lot of interesting scenarios. We have far better control over persistence now. Very low level (bits & bytes). Very fast! Concurrency benefits:

Reads Writes*

* Yet Voron allows only a single writer!

Page 22: How Voron works: Insight into the new RavenDB storage engine

What it does not? It isn’t about Linux. It can’t run on Linux*.

Need to implment: PosixPureMemoryPager PosixPageFileBackedMemoryMappedPager PosixMemoryMapPager

Waiting for big Linux push post 3.0 release.

Page 23: How Voron works: Insight into the new RavenDB storage engine

The cloud story… Scratch / temp usage Utilize fast local drives that can go away. Slow I/O only hold us for tx commit (and we optimized

that).

Page 24: How Voron works: Insight into the new RavenDB storage engine

Summary Voron learned from LevelDB, LMDB, Esent. Journal for Atomicity, Consistency & Durability. MVCC for Consistency & Isolation. Root tree, named tress, multi trees.

Page 25: How Voron works: Insight into the new RavenDB storage engine

Questions?