Top Banner
RocksDB Storage Engine Igor Canadi | Facebook
33

RocksDB storage engine for MySQL and MongoDB

Jan 06, 2017

Download

Technology

Igor Canadi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RocksDB storage engine for MySQL and MongoDB

RocksDB Storage Engine Igor Canadi | Facebook

Page 2: RocksDB storage engine for MySQL and MongoDB

Overview

•  Story of RocksDB

•  Architecture

•  Performance tuning

•  Next steps

1

Page 3: RocksDB storage engine for MySQL and MongoDB

Story of RocksDB

Page 4: RocksDB storage engine for MySQL and MongoDB

Pre-2011

•  FB infrastructure – many custom-built key-value stores

•  LevelDB released

2

Page 5: RocksDB storage engine for MySQL and MongoDB

Experimentation (2011 – 2013)

•  First use-cases

•  Not designed for server – many bottlenecks, stalls

•  Optimization

•  New features

3

Page 6: RocksDB storage engine for MySQL and MongoDB

Explosion (2013 – 2015)

•  Open sourced RocksDB

•  Big success within Facebook

•  External traction – Linkedin, Yahoo, CockroachDB, …

4

Page 7: RocksDB storage engine for MySQL and MongoDB

New Challenges (2015 - )

•  Bring RocksDB to databases

5

Page 8: RocksDB storage engine for MySQL and MongoDB

MongoRocks

•  Running in production at Parse for 6 months

•  Huge storage savings (5TB à 285GB)

•  Document-level locking

6

Page 9: RocksDB storage engine for MySQL and MongoDB

MyRocks

7 InnoDB RocksDB

0

0.2

0.4

0.6

0.8

1

1.2

Database size (relative)

InnoDB

RocksDB

InnoDB RocksDB 0

0.2

0.4

0.6

0.8

1

1.2

Bytes written (relative)

InnoDB

RocksDB

Page 10: RocksDB storage engine for MySQL and MongoDB

Architecture Log Structured Merge Trees

Page 11: RocksDB storage engine for MySQL and MongoDB

Log Structured Merge Trees

8

(64MB)

(256MB)

(512MB)

(5GB)

(50GB)

(500GB)

Memtable

Level 0

Level 1

Level 2

Level 3

Level 4

Page 12: RocksDB storage engine for MySQL and MongoDB

Log Structured Merge Trees – write

9

(64MB)

(256MB)

Memtable

Level 0

(key,value)

Page 13: RocksDB storage engine for MySQL and MongoDB

Log Structured Merge Trees – flush

10

(64MB)

(256MB)

Memtable

Level 0

Page 14: RocksDB storage engine for MySQL and MongoDB

Log Structured Merge Trees – compaction

11

(5GB)

(50GB)

Level 2

Level 3

Page 15: RocksDB storage engine for MySQL and MongoDB

Writes

•  Foreground:

•  Writes go to memtable (skiplist) + write-ahead log

•  Background:

•  When memtable is full, we flush to Level 0

•  When a level is full, we run compaction

12

Page 16: RocksDB storage engine for MySQL and MongoDB

Reads

13

(64MB)

(256MB)

(512MB)

(5GB)

(50GB)

(500GB)

Memtable

Level 0

Level 1

Level 2

Level 3

Level 4

Page 17: RocksDB storage engine for MySQL and MongoDB

Reads

•  Point queries

•  Bloom filters reduce reads from storage

•  Usually only 1 read IO

•  Range scans

•  Bloom filters don’t help

•  Depends on amount of memory, 1-2 IO

14

Page 18: RocksDB storage engine for MySQL and MongoDB

RocksDB Files

15

rocksdb/> ls MANIFEST-000032 000024.log 000031.log 000025.sst 000028.sst 000029.sst 000033.sst 000034.sst LOG LOG.old.1441234029851978 ...

Page 19: RocksDB storage engine for MySQL and MongoDB

RocksDB Files – MANIFEST

16

(initial state) Add file 1 Add file 2 Add file 3 Add file 4 …

(flush) Add file 9 Mark log 6 persisted

(compaction) Add file 10 Add file 11 Remove file 9 Remove file 8

Add new column family “system”

•  Atomical updates to database metadata

Page 20: RocksDB storage engine for MySQL and MongoDB

RocksDB Files – Write-ahead log

17

Write (A, B) Write (C, D) Write (E, F)

Delete(A) Write(X, Y) Delete(C)

•  Persisted memtable state

Page 21: RocksDB storage engine for MySQL and MongoDB

RocksDB Files – Table files

18

(Data block) •  compressed •  prefix encoded

(Data block) <key, value>

(Data block) (Data block)

(Data block)

(Data block)

(Data block)

(Data block)

(Index block) <key, block>

(Filter block) (Statistics) (Meta index block) Pointers to blocks

Page 22: RocksDB storage engine for MySQL and MongoDB

RocksDB Files – LOG files

•  Debugging output

•  Tuning options

•  Information about flushes and compactions

•  Performance statistics

19

Page 23: RocksDB storage engine for MySQL and MongoDB

Backups

•  Table files are immutable

•  Other files are append-only

•  Easy and fast incremental backups

•  Open sourced Rocks-Strata

20

Page 24: RocksDB storage engine for MySQL and MongoDB

Performance tuning

Page 25: RocksDB storage engine for MySQL and MongoDB

Tombstones

•  Deletions are deferred

•  May cause higher P99 latencies

•  Be careful with pathological workloads, e.g. queues

21

Page 26: RocksDB storage engine for MySQL and MongoDB

Caching

22

Block cache •  Managed by RocksDB •  Uncompressed data •  Defaults to 1/3 of RAM

Page cache •  Managed by kernel •  Compressed data

Page 27: RocksDB storage engine for MySQL and MongoDB

Memory usage

•  Block cache

•  Index and filter blocks (0.5 – 2% of the database)

•  Memtables

•  Blocks pinned by iterators

23

Page 28: RocksDB storage engine for MySQL and MongoDB

Reduce memory usage

•  Reduce block cache size – will increase CPU

•  Increase block size – decrease index size

•  Turn off bloom filters on bottom level

24

Page 29: RocksDB storage engine for MySQL and MongoDB

Reduce CPU

•  Profile the CPU usage

•  Increase block cache size – will increase memory usage

•  Turn off compression

•  It might be tombstones

25

Page 30: RocksDB storage engine for MySQL and MongoDB

Reduce write amplification

•  Write amplification = 5 * num_levels

•  Increase memtable and level 1 size

•  Stronger (zlib, zstd) compression for bottom levels

•  Try universal compaction

26

Page 31: RocksDB storage engine for MySQL and MongoDB

Next steps

Page 32: RocksDB storage engine for MySQL and MongoDB

Next steps

•  Increase performance & stability

•  Deploy MyRocks at Facebook

•  External adoption of MyRocks and MongoRocks

•  Build an ecosystem

27

Page 33: RocksDB storage engine for MySQL and MongoDB

Thank you