Zvika Guz and Vijay Balakrishnan Memory Solutions Lab, Samsung Semiconductor Inc Redis on NVMe SSD
Zvika Guz and Vijay Balakrishnan Memory Solutions Lab, Samsung Semiconductor Inc
Redis on NVMe SSD
2
Redis-on-Flash Closed-source (RLEC Flesh), 100% compatible with the open-source Redis Uses Flash as RAM extension to increase effective node capacity
Tiering memory into “fast” and “slow”: RAM saves keys and hot values Flash saves cold values
Dynamic configuration of RAM/Flash usage
Uses RockDB as the storage engine to optimize access to block storage
Multi-threaded and asynchronous Redis used to access Flash
3
Why Redis-on-Flash?
Optimize price-to-performance for a given workload DRAM is more performant than flash, but $/GB is higher
Limited DRAM capacity per server Tiering dramatically reduces $/GB, while preserving good performance ($/ops)
Enables orders-of-magnitude more capacity per server RoF is particularly suitable for large datasets with skewed access
distribution
4
Workload
Models real-world Redis Labs customers Benchmark: memtier_benchmark (open source)
GET/SET requests, varying:1. Object size2. Write-to-read ratio3. Redis RAM hit ratio
Performance target: Maximize operation-per-second on a single server, while maintaining sub-
millisecond latency Compared 3 system configuration
1. All-RAM: In-memory RLEC 2. Redis-on-NVMe: 4xSamsung PM1725 NVMe SSDs3. Redis-on-SATA: 16xSamsung 850 Pro SATA SSDs
https://github.com/RedisLabs/memtier_benchmark
5
Consistent sub-millisecond latencies favor NVMe NVMe SSD are designed for consistent high performance @ ultra-low
latency Modest incremental cost over SATA, with much better performance
Samsung PM1725 is the fastest NVMe in the market
Redis-on-NVMe
Samsung PM1725 Specification* Form Factor 2.5” Host Interface PCIe Gen3 x4Capacities 800GB, 1.6TB,
3.2TBSequential Read
3300 MB/s
Sequential Write
1900 MB/s
Random Read 840KIOPSRandom Write 130KIOPSRead Latency 95 usecWrite Latency 60 usec
>6X over SATA
>8.5X over SATA
*PM1725 HHHL version (PCIe Gen3 x8) provides ~double the performance and capacity, but we did not use it here
7
System Configuration
Single client, single server Industry-standard components, all available today
Server Dell PowerEdge R730xd, dual-socket Processor 2 x Xeon E5-2690 v3 @ 2.6GHz
12 cores, 24 logical processor per CPU24 cores, 48 logical processor total
Memory 256GB ECC DDR4Network 10GbEStorage 4 x Samsung
PM1725 NVMe16 x Samsung
850PRO SATA SSD
Memtier_benchmark
1.2.6
RLEC version 4.3.0Operating System Ubuntu 14.04 Linux Kernel 3.19.8
8
Use case #1: Small Objects
100B objects, write-to-read ratio: 1:1
Perf= 750 KOPS
Latency = 0.75 msec
Disk BW=1.7 GB/s
Perf= 1.8 MOPS
Latency=0.9 msec
Disk BW=602 MB/s
50% RAM-to-Flash ratio 85% RAM-to-Flash ratio
100% of requests served with <1msec latency
9
Disk Bandwidth Spike
Spikes in disk bandwidth align with RocksDB compaction phase Can reach 2-3x the average BW Drives must be able to sustain these spikes, otherwise tail latency suffers
Object Size=100B, write-to-read ratio=1:1, RAM-to-Flash hit ratio=85%
Disk BW=602 MB/s
10
Use case #2: Large Objects
1KB objects, write-to-read ratio: 1:4 100% of requests served with <1msec latency
Perf= 270 KOPS
Latency = 0.75 msec
Disk BW=4.3 GB/s
Perf= 816 KOPS
Disk BW=3.9 GB/s
50% RAM-to-Flash ratio 85% RAM-to-Flash ratio
latency= 0.78 msec
11
Redis-on-Flash Performance
80/20 read-to-write ratio With sufficient locality, RoF performance gets close to All-RAM NVMe speedup over SATA is 2x-2.5x (using ¼ of the drives)
20% 30% 40% 50% 60% 70% 80% 90% 100%0
250,000500,000750,000
1,000,0001,250,0001,500,0001,750,0002,000,0002,250,0002,500,000
0.00%10.00%
20.00%30.00%
40.00%50.00%
60.00%70.00%80.00%
90.00%100.00%
12%23%
36%
47%
60%
83%
7%14% 18%
23% 26%33%
100B ObjectsSeries1 Series3
RAM-to-Flash Hit Ratio
Ope
rati
ons
Per
Seco
nd
20% 30% 40% 50% 60% 70% 80% 90% 100%0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%100.00%
8%
15%
25%
35%
47%
74%
3% 8% 13%19%
26%35%
1KB ObjectsSeries1 Series3
RAM-to-Flash Hit Ratio
Ope
rati
ons
Per
Seco
nd
20% 30% 40% 50% 60% 70% 80% 90% 100%0
250,000500,000750,000
1,000,0001,250,0001,500,0001,750,0002,000,0002,250,0002,500,000
0.00%10.00%
20.00%30.00%
40.00%50.00%
60.00%70.00%80.00%
90.00%100.00%
100B ObjectsSeries1 Series3
RAM-to-Flash Hit Ratio
Ope
rati
ons
Per
Seco
nd
20% 30% 40% 50% 60% 70% 80% 90% 100%0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%100.00%
1KB ObjectsSeries1 Series3
RAM-to-Flash Hit Ratio
Ope
rati
ons
Per
Seco
nd
12
The Problem with SATA
Need 4X the drives to get to ~half the performance of NVMe Performance is much more noisy:
99 latency percentile > 1msec Very difficult to get rid of these latency spikes, exists in almost all our SATA runs
Perf= 132 KOPS
Latency = 0.65 msec
Object Size=1000B, write-to-read ratio=1:4, RAM-to-Flash hit ratio =50%
13
DRAM or Flash?
Optimize performance/$ for each use-case
Affected by the dataset size, access pattern, and access locality
Redis in Memory
Redis-on-NVMe
Redis-on-SATA
$/GB DRAM:NVMe:SATA = 15:2.5:1
14
Summary
Redis-on-Flash enables: Order-of-magnitude more capacity per node High performance at significant lower cost
Samsung PM1725 NVME: Enables breakthrough performance @ sub-millisecond latency Consistent performance reduces tail latency
Industry standard components, available today
Thank [email protected]
m
15
Backup