Redis on NVMe SSD - Zvika Guz, Samsung

Zvika Guz and Vijay Balakrishnan Memory Solutions Lab, Samsung Semiconductor Inc

Redis on NVMe SSD

2

Redis-on-Flash Closed-source (RLEC Flesh), 100% compatible with the open-source Redis Uses Flash as RAM extension to increase effective node capacity

Tiering memory into “fast” and “slow”: RAM saves keys and hot values Flash saves cold values

Dynamic configuration of RAM/Flash usage

Uses RockDB as the storage engine to optimize access to block storage

Multi-threaded and asynchronous Redis used to access Flash

3

Why Redis-on-Flash?

Optimize price-to-performance for a given workload DRAM is more performant than flash, but $/GB is higher

Limited DRAM capacity per server Tiering dramatically reduces $/GB, while preserving good performance ($/ops)

Enables orders-of-magnitude more capacity per server RoF is particularly suitable for large datasets with skewed access

distribution

4

Workload

Models real-world Redis Labs customers Benchmark: memtier_benchmark (open source)

GET/SET requests, varying:1. Object size2. Write-to-read ratio3. Redis RAM hit ratio

Performance target: Maximize operation-per-second on a single server, while maintaining sub-

millisecond latency Compared 3 system configuration

1. All-RAM: In-memory RLEC 2. Redis-on-NVMe: 4xSamsung PM1725 NVMe SSDs3. Redis-on-SATA: 16xSamsung 850 Pro SATA SSDs

https://github.com/RedisLabs/memtier_benchmark



5

Consistent sub-millisecond latencies favor NVMe NVMe SSD are designed for consistent high performance @ ultra-low

latency Modest incremental cost over SATA, with much better performance

Samsung PM1725 is the fastest NVMe in the market

Redis-on-NVMe

Samsung PM1725 Specification* Form Factor 2.5” Host Interface PCIe Gen3 x4Capacities 800GB, 1.6TB,

3.2TBSequential Read

3300 MB/s

Sequential Write

1900 MB/s

Random Read 840KIOPSRandom Write 130KIOPSRead Latency 95 usecWrite Latency 60 usec

>6X over SATA

>8.5X over SATA

*PM1725 HHHL version (PCIe Gen3 x8) provides ~double the performance and capacity, but we did not use it here

7

System Configuration

Single client, single server Industry-standard components, all available today

Server Dell PowerEdge R730xd, dual-socket Processor 2 x Xeon E5-2690 v3 @ 2.6GHz

12 cores, 24 logical processor per CPU24 cores, 48 logical processor total

Memory 256GB ECC DDR4Network 10GbEStorage 4 x Samsung

PM1725 NVMe16 x Samsung

850PRO SATA SSD

Memtier_benchmark

1.2.6

RLEC version 4.3.0Operating System Ubuntu 14.04 Linux Kernel 3.19.8

8

Use case #1: Small Objects

100B objects, write-to-read ratio: 1:1

Perf= 750 KOPS

Latency = 0.75 msec

Disk BW=1.7 GB/s

Perf= 1.8 MOPS

Latency=0.9 msec

Disk BW=602 MB/s

50% RAM-to-Flash ratio 85% RAM-to-Flash ratio

100% of requests served with <1msec latency

9

Disk Bandwidth Spike

Spikes in disk bandwidth align with RocksDB compaction phase Can reach 2-3x the average BW Drives must be able to sustain these spikes, otherwise tail latency suffers

Object Size=100B, write-to-read ratio=1:1, RAM-to-Flash hit ratio=85%

Disk BW=602 MB/s

10

Use case #2: Large Objects

1KB objects, write-to-read ratio: 1:4 100% of requests served with <1msec latency

Perf= 270 KOPS

Latency = 0.75 msec

Disk BW=4.3 GB/s

Perf= 816 KOPS

Disk BW=3.9 GB/s

50% RAM-to-Flash ratio 85% RAM-to-Flash ratio

latency= 0.78 msec

11

Redis-on-Flash Performance

80/20 read-to-write ratio With sufficient locality, RoF performance gets close to All-RAM NVMe speedup over SATA is 2x-2.5x (using ¼ of the drives)

20% 30% 40% 50% 60% 70% 80% 90% 100%0

250,000500,000750,000

1,000,0001,250,0001,500,0001,750,0002,000,0002,250,0002,500,000

0.00%10.00%

20.00%30.00%

40.00%50.00%

60.00%70.00%80.00%

90.00%100.00%

12%23%

36%

47%

60%

83%

7%14% 18%

23% 26%33%

100B ObjectsSeries1 Series3

RAM-to-Flash Hit Ratio

Ope

rati

ons

Per

Seco

nd

20% 30% 40% 50% 60% 70% 80% 90% 100%0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%100.00%

8%

15%

25%

35%

47%

74%

3% 8% 13%19%

26%35%

1KB ObjectsSeries1 Series3


Ope

rati

ons

Per

Seco

nd

20% 30% 40% 50% 60% 70% 80% 90% 100%0

250,000500,000750,000

1,000,0001,250,0001,500,0001,750,0002,000,0002,250,0002,500,000

0.00%10.00%

20.00%30.00%

40.00%50.00%

60.00%70.00%80.00%

90.00%100.00%

100B ObjectsSeries1 Series3


Ope

rati

ons

Per

Seco

nd

20% 30% 40% 50% 60% 70% 80% 90% 100%0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%100.00%

1KB ObjectsSeries1 Series3


Ope

rati

ons

Per

Seco

nd

12

The Problem with SATA

Need 4X the drives to get to ~half the performance of NVMe Performance is much more noisy:

99 latency percentile > 1msec Very difficult to get rid of these latency spikes, exists in almost all our SATA runs

Perf= 132 KOPS

Latency = 0.65 msec

Object Size=1000B, write-to-read ratio=1:4, RAM-to-Flash hit ratio =50%

13

DRAM or Flash?

Optimize performance/$ for each use-case

Affected by the dataset size, access pattern, and access locality

Redis in Memory

Redis-on-NVMe

Redis-on-SATA

$/GB DRAM:NVMe:SATA = 15:2.5:1

14

Summary

Redis-on-Flash enables: Order-of-magnitude more capacity per node High performance at significant lower cost

Samsung PM1725 NVME: Enables breakthrough performance @ sub-millisecond latency Consistent performance reduces tail latency

Industry standard components, available today

Thank [email protected]

m

15

Backup

Redis on NVMe SSD - Zvika Guz, Samsung

Technology