Reddy Chagam – Principal Engineer & Chief SDS Architect Tushar Gohad – Senior Staff Engineer Intel Corporation April 19, 2016 Acknowledgements: Orlando Moreno, Dan Ferber (Intel) Accelerating Ceph for Database Workloads with an all PCIe SSD Cluster
Reddy Chagam – Principal Engineer & Chief SDS ArchitectTushar Gohad – Senior Staff Engineer
Intel Corporation April 19, 2016
Acknowledgements: Orlando Moreno, Dan Ferber (Intel)
Accelerating Ceph for Database Workloads with an
all PCIe SSD Cluster
*Other names and brands may be claimed as the property of others.
Legal Disclaimer
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at http://intel.com.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Configurations: Ceph v0.94.3 Hammer, v10.1.2 Jewel Release, CentOS 7.2, 3.10-327 Kernel, CBT used for testing and data acquisition. OSD System Config: Intel Xeon E5-2699 v4 2x@ 2.20 GHz, 44 cores w/ HT, Cache 46080KB, 128GB DDR4, Each system with 4x P3700 800GB NVMe SSDs, partitioned into 4 OSD’s each, 16 OSD’s total per node. FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 36 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4. Ceph public and cluster networks 2x 10GbE each. FIO 2.2.8 with LibRBD engine. Sysbench 0.5 for MySQL testing. Tests run by Intel DCG Storage Group in Intel lab. Ceph configuration and CBT YAML file provided in backup slides.
For more information go to http://www.intel.com/performance.
Intel, Intel Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation.
*Other names and brands may be claimed as the property of others.
Agenda
• Transition to NVMe flash
• NVMe architecture with Ceph
• Database & Ceph – leading flash use case
• The “All NVMe” high-density Ceph Cluster
• MySQL workload performance results
• Summary and next steps
*Other names and brands may be claimed as the property of others.
Storage Evolution
Near TermTodayYesterday
Memory&
Storage
StorageNAND based Intel PCIe SSDs
for NVMe
3D NAND based Intel PCIeRamping in 2016
3D XPoint™ Technology basedOptane™ SSD for NVMe
3D XPoint™ Technology basedApache Pass (AEP) for DDR4
Revolutionary Storage
Class Memory
World’s Fastest
NVMe SSD
Next Gen NVM enables world’s fastest NVMe SSD and revolutionary storage class memory
NVMe SSD accelerates performance for latency sensitive workloads on Ceph
3D XPoint ™ Memory Media
Latency: ~100XSize of Data: ~1,000X
*Other names and brands may be claimed as the property of others.
Data Center Form Factors for
80, and 110mm lengths, Smallest footprint of PCIe, use for boot or for max storage density
2.5in makes up the majority of SSDs sold today because of ease of deployment, hotplug, serviceability, and small form factor
M.2 Add-in-cardU.2 2.5in (SFF-8639)
Add-in-card (AIC) has maximum system compatibility with existing servers and most reliable compliance program. Higher power envelope, and options for height and length
7mm 15mm
*Other names and brands may be claimed as the property of others.
Intel PlatformsTick-Tock Development Model
Intel® MicroarchitectureCodename Nehalem
Intel® MicroarchitectureCodename Sandy Bridge
Intel® MicroarchitectureCodename Haswell
Tock Tock TockTick Tick Tick
Nehalem
45nm
New Micro-architecture
Westmere
32nm
New ProcessTechnology
Sandy Bridge
32nm
New Micro-architecture
Ivy Bridge
22nm
New ProcessTechnology
Haswell
22nm
New Micro-architecture
Broadwell
14nm
New ProcessTechnology
Grantley Platform (Today)Romley PlatformThurley Platform
Wellsburg PCHPatsburg PCHTylersburg PCH
Xeon E5 v4 socket compatible with v3 series - improves Ceph performance
*Other names and brands may be claimed as the property of others.
Ceph WorkloadsS
tora
ge
Pe
rfo
rma
nce
(IO
PS
, T
hro
ug
hp
ut)
Storage Capacity (PB)
Lower Higher
Lo
we
rH
igh
er
Boot Volumes
CDN
Enterprise Dropbox
Backup, Archive
Remote Disks
VDI
App Storage
BigData
Mobile Content Depot
Databases
BlockObject
NVMFocus
Test & Dev
Cloud DVR
HPC
*Other names and brands may be claimed as the property of others.
Ceph - NVM Usages
Caching
RADOSNode
RADOSProtocol
RADOSProtocol
OSD
Journal Filestore
NVM
File System
NVM
NVM
Hypervisor
Guest VM
Qemu/Virtio
Application
LIBRBD
RADOS
RADOS Protocol
NVM
Kernel
User
Kernel RBD
RADOS
Application
RADOS Protocol
NVM
Caching
RADOSNodeOSD
NVM
NVM
NVM
data metadata
RocksDB
BlueRocksEnv
BlueFS
10-25 GbE
Client caching w/ write through
JournalingRead cacheOSD data
Production - FileStore Tech Preview – BlueStore
Kernel
Container Application
Kernel RBD
RADOS
RADOS Protocol
NVM CachingCaching
Baremetal Virtual Machine Container
Today’s Focus
Today’s Focus
*Other names and brands may be claimed as the property of others.
Ceph and Percona Server MySQL Integration
IP Fabric
Virtual Machine
Hypervisor
Guest VM
Qemu/Virtio
Application
RBD
RADOS
MySQL
Virtual Machine
Hypervisor
Guest VM
Qemu/Virtio
Application
RBD
RADOS
MySQL
Linux Container
Host
Application
Kernel RBD
RADOS
MySQL
Ceph Storage Cluster
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
MON MON
Deployment Considerations• Bootable Ceph volumes (OS & MySQL
data)• MySQL RBD volumes (all in one, separate)
Configurations• Good: NVMe SSD for Journal/Cache, HDDs
as OSD data drive• Better: NVMe SSD as Journal, High
capacity SATA (or) 3D-NAND NVMe SSDfor data drive
• Best: All NVMe SSD
*Other names and brands may be claimed as the property of others.
An “All-NVMe” high-density Ceph Cluster Configuration
Supermicro 1028U-TN10RT+
NVMe1 NVMe2
NVMe3 NVMe4
Ce
ph
OS
D1
Ce
ph
OS
D2
Ce
ph
OS
D3
Ce
ph
OS
D4
Ce
ph
OS
D1
6
5-Node all-NVMe Ceph ClusterDual-Xeon E5 [email protected], 44C HT, 128GB DDR4
Centos 7.2, 3.10-327, Ceph v10.1.2, bluestore async
Clu
ste
r N
W 2
x 1
0G
bE
10x Client Systems + 1x Ceph MONDual-socket Xeon E5 [email protected]
36 Cores HT, 128GB DDR4
Public NW 2x 10GbE
Docker1 (krbd)MySQL DB Server
Docker2 (krbd)MySQL DB Server
Docker3Sysbench Client
FIO
Docker4Sysbench Client
DB containers - 16 vCPUs, 32GB mem, 200GB RBD volume, 100GB MySQL dataset, InnoDB buf cache 25GB (25%)
Client containers – 16 vCPUs, 32GB RAMFIO 2.8, Sysbench 0.5
Test-set 1
Te
st-
se
t 2
*Other names and brands may be claimed as the property of others.
Multi-partitioning flash devices
• High performance NVMe devices are capable of high parallelism at low latency
• DC P3700 800GB Raw Performance: 460K read IOPS & 90K Write IOPS at QD=128
• High Resiliency of “Data Center” Class NVMe devices
• At least 10 Drive writes per day
• Power loss protection, full data path protection, device level telemetry
• By using multiple OSD partitions, Ceph performance scales linearly
• Reduces lock contention within a single OSD process
• Lower latency at all queue-depths, biggest impact to random reads
• Introduces the concept of multiple OSD’s on the same physical device
• Conceptually similar crushmap data placement rules as managing disks in an enclosure
Ce
ph
OS
D1
Ce
ph
OS
D2
Ce
ph
OS
D3
Ce
ph
OS
D4
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
NVMe SSD
*Other names and brands may be claimed as the property of others.
Partitioning multiple OSD’s per NVMe
0
2
4
6
8
10
12
0 200,000 400,000 600,000 800,000 1,000,000 1,200,000
Av
gL
ate
ncy
(m
s)
IOPS
Latency vs IOPS - 4K Random Read - Multiple OSD's per Device comparison
5 nodes, 20/40/80 OSDs, Intel DC P3700 Xeon E5 2699v3 Dual Socket /
128GB Ram / 10GbE
Ceph0.94.3 w/ JEMalloc,
1 OSD/NVMe 2 OSD/NVMe 4 OSD/NVMe
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
0
10
20
30
40
50
60
70
80
90
% C
PU
Uti
liza
tio
n
Single Node CPU Utilization Comparison - 4K Random Reads@QD32
4/8/16 OSDs, Intel DC P3700, Xeon E5 2699v3 Dual Socket /
128GB Ram / 10GbE
Ceph0.94.3 w/ JEMalloc
1 OSD/NVMe 2 OSD/NVMe 4 OSD/NVMe
Single OSD
Double OSD
Quad OSD
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
Multiple OSD’s per NVMe result in higher performance,
lower latency, and better CPU utilization
*Other names and brands may be claimed as the property of others.
4K Random Read/Write Performance and Latency(Baseline FIO Test)
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
0
1
2
3
4
5
6
7
8
9
10
11
12
0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000
Ave
rage
Lat
ency
(m
s)
IOPS
IODepth Scaling - Latency vs IOPS - Read, Write, and 70/30 4K Random Mix5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE
Ceph 10.1.2 w/ BlueStore w/ async msgr. 6 RBD FIO Clients
100% Rand Read 100% Rand Write 70% Rand Read
~1.4 M 100% 4k Random
Read IOPS @~1 ms avg
~220k 100% 4k Random Write IOPS @~5 ms avg
~560k 70/30% (OLTP) Random
IOPS @~3 ms avg ~1.6 M 100% 4k Random Read
IOPS @~2.2 ms avg
*Other names and brands may be claimed as the property of others.
Sysbench MySQL OLTP Performance (100% SELECT)
0
5
10
15
20
25
30
35
0 200000 400000 600000 800000 1000000 1200000 1400000
Avg
Lat
en
cy (
ms)
Aggregate Queries Per Second (QPS)
Sysbench Thread Scaling - Latency vs QPS – 100% read (Point SELECTs)5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE
Ceph 10.1.2 w/ BlueStore w/ async msgr. 20 Docker-rbd Sysbench Clients (16vCPUs, 32GB)
100% Random Read
1 million QPS (Aggregate, 20 clients) @~11 ms avg
~55000 QPS per client w/ 2 Sysbench threads
~1.3 million QPS (Aggregate, 20 clients)8 Sysbench threads
InnoDB buf pool = 25%, SQL dataset = 100GB
*Other names and brands may be claimed as the property of others.
Sysbench MySQL OLTP Performance (100% UPDATE, 70/30% SELECT/UPDATE)
0
50
100
150
200
250
300
350
400
450
500
0 100000 200000 300000 400000 500000 600000
Avg
Lat
en
cy (
ms)
Aggregate Queries Per Second (QPS)
Sysbench Thread Scaling - Latency vs QPS – 100% Write (Index UPDATEs), 70/30% OLTP5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE
Ceph 10.1.2 w/ BlueStore w/ async msgr. 20 Docker-rbd Sysbench Clients (16vCPU, 32GB)
100% Random Write 70/30% Read/Write
~400k 70/30% OLTP QPS@~50 ms avg
~25000 QPS w/ 1 Sysbench client (4-8 threads)
~100k Write QPS@~200 ms avg (Aggregate, 20 clients)
~5500 QPS w/ 1 Sysbench client (2-4 threads)
InnoDB buf pool = 25%, SQL dataset = 100GB
*Other names and brands may be claimed as the property of others.
Summary & Conclusions
• NVMe Flash storage for low latency workloads
• Ceph a compelling case for database workloads
• With Ceph, 1.4 million random IOPS is achievable in 5U with ~1ms latency today. Ceph performance is only getting better!
• Using Xeon E5 v4 standard high-volume servers and Intel NVMe SSDs, you can now deploy a high performance Ceph cluster for database workloads
• Next steps:• Evaluation on large scale cluster
• Ceph community collaboration in improving write latency
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
*Other names and brands may be claimed as the property of others.
Intel Ceph Contributions
19
2014 2015 2016
New Key/Value Store
Backend (rocksdb)
Giant* Hammer Infernalis Jewel
CRUSH Placement Algorithm improvements
(straw2 bucket type)
Bluestore Backend Optimizations for NVM
Bluestore SPDKOptimizations
RADOS I/O Hinting(35% better EC Write erformance)
Cache-tiering with SSDs(Write support)
PMStore(NVM-optimized backend
based on libpmem)
RGW, Bluestore
Compression, Encryption(w/ ISA-L, QAT backend)
Virtual Storage Manager (VSM) Open Sourced
CeTuneOpen Sourced
Erasure Coding support
with ISA-LCache-tiering with SSDs
(Read support)
Industry-first Ceph Cluster to break
1 Million 4k Random IOPs
Client-side Block Cache (librbd)
*Other names and brands may be claimed as the property of others.
Configuration Detail – ceph.conf
20
[global]enable experimental unrecoverable data corrupting features = bluestore rocksdbosd objectstore = bluestorems_type = async
rbd readahead disable after bytes = 0rbd readahead max bytes = 4194304bluestore default buffered read = true
auth client required = noneauth cluster required = noneauth service required = nonefilestore xattr use omap = true
cluster network = 192.168.142.0/24, 192.168.143.0/24private network = 192.168.144.0/24, 192.168.145.0/24
log file = /var/log/ceph/$name.loglog to syslog = falsemon compact on trim = falseosd pg bits = 8osd pgp bits = 8mon pg warn max object skew = 100000mon pg warn min per osd = 0mon pg warn max per osd = 32768
debug_lockdep = 0/0debug_context = 0/0debug_crush = 0/0debug_buffer = 0/0debug_timer = 0/0debug_filer = 0/0debug_objecter = 0/0debug_rados = 0/0debug_rbd = 0/0debug_ms = 0/0debug_monc = 0/0debug_tp = 0/0debug_auth = 0/0debug_finisher = 0/0debug_heartbeatmap = 0/0debug_perfcounter = 0/0debug_asok = 0/0debug_throttle = 0/0debug_mon = 0/0debug_paxos = 0/0debug_rgw = 0/0perf = truemutex_perf_counter = truethrottler_perf_counter = falserbd cache = false
*Other names and brands may be claimed as the property of others.
Configuration Detail – ceph.conf (continued)
21
[osd]osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylogosd_mkfs_options_xfs = -f -i size=2048osd_op_threads = 32filestore_queue_max_ops=5000filestore_queue_committing_max_ops=5000journal_max_write_entries=1000journal_queue_max_ops=3000objecter_inflight_ops=102400filestore_wbthrottle_enable=falsefilestore_queue_max_bytes=1048576000filestore_queue_committing_max_bytes=1048576000journal_max_write_bytes=1048576000journal_queue_max_bytes=1048576000ms_dispatch_throttle_bytes=1048576000objecter_infilght_op_bytes=1048576000osd_mkfs_type = xfsfilestore_max_sync_interval=10osd_client_message_size_cap = 0osd_client_message_cap = 0osd_enable_op_tracker = falsefilestore_fd_cache_size = 64filestore_fd_cache_shards = 32filestore_op_threads = 6
[mon]mon data =/home/bmpa/tmp_cbt/ceph/mon.$idmon_max_pool_pg_num=166496mon_osd_max_split_count = 10000mon_pg_warn_max_per_osd = 10000
[mon.a]host = ft02mon addr = 192.168.142.202:6789
*Other names and brands may be claimed as the property of others.
Configuration Detail - CBT YAML File
cluster:
user: "bmpa"
head: "ft01"
clients: ["ft01", "ft02", "ft03", "ft04", "ft05", "ft06"]
osds: ["hswNode01", "hswNode02", "hswNode03", "hswNode04", "hswNode05"]
mons:
ft02:
a: "192.168.142.202:6789"
osds_per_node: 16
fs: xfs
mkfs_opts: '-f -i size=2048 -n size=64k'
mount_opts: '-o inode64,noatime,logbsize=256k'
conf_file: '/home/bmpa/cbt/ceph.conf'
use_existing: False
newstore_block: True
rebuild_every_test: False
clusterid: "ceph"
iterations: 1
tmp_dir: "/home/bmpa/tmp_cbt"
pool_profiles:
2rep:
pg_size: 8192
pgp_size: 8192
replication: 2 22
benchmarks:
librbdfio:
time: 300
ramp: 300
vol_size: 10
mode: ['randrw']
rwmixread: [0,70,100]
op_size: [4096]
procs_per_volume: [1]
volumes_per_client: [10]
use_existing_volumes: False
iodepth: [4,8,16,32,64,128]
osd_ra: [4096]
norandommap: True
cmd_path: '/usr/local/bin/fio'
pool_profile: '2rep'
log_avg_msec: 250
`
*Other names and brands may be claimed as the property of others.
Storage Node Diagram
23
Two CPU Sockets: Socket 0 and Socket 1 Socket 0
• 2 NVMes• Intel X540-AT2 (10Gbps)• 64GB: 8x 8GB 2133 DIMMs
Socket 1• 2 NVMes• 64GB: 8x 8GB 2133 DIMMs
Explore additional optimizations using cgroups, IRQ affinity
*Other names and brands may be claimed as the property of others.
High Performance Ceph Node Hardware Building Blocks
• Generally available server designs built for high density and high performance
• High density 1U standard high volume server• Dual socket 3rd Generation Xeon E5 (2699v3)
• 10 Front-removable 2.5” Formfactor Drive slots, 8639 connector
• Multiple 10Gb network ports, additional slots for 40Gb networking
• Intel DC P3700 NVMe drives are available in 2.5” drive form-factor• Allowing easier service in a datacenter environment
*Other names and brands may be claimed as the property of others.
MySQL configuration file (my.cnf)
performance_schema=offinnodb_buffer_pool_size = 25Ginnodb_flush_method = O_DIRECTinnodb_log_file_size=4Gthread_cache_size=16innodb_file_per_tableinnodb_checksums = 0innodb_flush_log_at_trx_commit = 0innodb_write_io_threads = 8innodb_page_cleaners= 16innodb_read_io_threads = 8max_connections = 50000
[mysqldump]quickquote-namesmax_allowed_packet = 16M
[mysql]
!includedir /etc/mysql/conf.d/
[client]port = 3306socket = /var/run/mysqld/mysqld.sock[mysqld_safe]socket = /var/run/mysqld/mysqld.socknice = 0[ mysqld]user = mysqlpid-file = /var/run/mysqld/mysqld.pidsocket = /var/run/mysqld/mysqld.sockport = 3306datadir = /databasedir = /usrtmpdir = /tmplc-messages-dir = /usr/share/mysqlskip-external-lockingbind-address = 0.0.0.0max_allowed_packet = 16Mthread_stack = 192Kthread_cache_size = 8query_cache_limit = 1Mquery_cache_size = 16Mlog_error = /var/log/mysql/error.logexpire_logs_days = 10max_binlog_size = 100M
*Other names and brands may be claimed as the property of others.
Sysbench commands
preparesysbench --test=/root/benchmarks/sysbench/sysbench/tests/db/parallel_prepare.lua --mysql-user=sbtest --mysql-password=sbtest --oltp-tables-count=32 --num-threads=128 --oltp-table-size=14000000 --mysql-table-engine=innodb --mysql-port=$1 --mysql-host=172.17.0.1 run
READsysbench --mysql-host=${host} --mysql-port=${mysql_port} \--mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest --mysql-engine=innodb --oltp-tables-count=32 --oltp-table-size=14000000 --test=/root/benchmarks/sysbench/sysbench/tests/db/oltp.lua --oltp-read-only=on --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=0 --oltp-point-selects=10 --rand-type=uniform --num-threads=${threads} --report-interval=60 --warmup-time=400 --max-time=300 --max-requests=0 --percentile=99 run
WRITEsysbench --mysql-host=${host} --mysql-port=${mysql_port} --mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest --mysql-engine=innodb --oltp-tables-count=32 --oltp-table-size=14000000 --test=/root/benchmarks/sysbench/sysbench/tests/db/oltp.lua --oltp-read-only=off --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0 --oltp-distinct-ranges=0 --oltp-index-updates=100 --oltp-point-selects=0 --rand-type=uniform --num-threads=${threads} --report-interval=60 --warmup-time=400 --max-time=300 --max-requests=0 --percentile=99 run
*Other names and brands may be claimed as the property of others.
Docker Commands
Database containersdocker run -ti --privileged --volume /sys:/sys --volume /dev:/dev -d -p 2201:22 -p 13306:3306 --cpuset-cpus="1-16,36-43" -m 48G --oom-kill-disable --name database1 ubuntu:14.04.3_20160414-db /bin/bash
Client containersdocker run -ti -p 3301:22 -d --name client1 ubuntu:14.04.3_20160414-sysbench /bin/bash
*Other names and brands may be claimed as the property of others.
RBD Commands
ceph osd pool create database 8192 8192
rbd create --size 204800 vol1 --pool database --image-feature layering
rbd snap create database/vol1@master
rbd snap ls database/vol1
rbd snap protect database/vol1@master
rbd clone database/vol1@master database/vol2
rbd feature disable database/vol2 exclusive-lock object-map fast-diff deep-flatten
rbd flatten database/vol2
*Other names and brands may be claimed as the property of others.
An "All-NVMe” high-density Ceph Cluster Configuration
Ceph Storage Cluster
CBT / Zabbix / Monitoring
• 5-Node all-NVMe Ceph Cluster based on Intel Xeon E5-2699v4, 44 core HT, 128GB DDR4
• Storage: Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node• Networking: 2x10GbE public, 2x10GbE cluster, partitioned, replication factor 2
• Ceph 10.1.2 Jewel Release, CentOS 7.2, 3.10.0-327.13.1.el7 Kernel
• 10x FIO/Sysbench Clients: Intel Xeon E5-2699 v3 @ 2.30 GHz, 36 cores w/ HT, 128GB DDR4
• Docker with kernel RBD volumes – 2 database and 2 client containers per node• Database containers – 16 vCPUs, 32GB RAM, 250GB RBD volume• Client containers – 16 vCPUs, 32GB RAM
SuperMicro FatTwin(4x dual-socket XeonE5 v3)
Ce
ph
OS
D1
NVMe1 NVMe3
NVMe2 NVMe4
Ce
ph
OS
D2
Ce
ph
OS
D3
Ce
ph
OS
D4
Ce
ph
OS
D1
6
…
Ce
ph
OS
D1
NVMe1 NVMe3
NVMe2 NVMe4
Ce
ph
OS
D2
Ce
ph
OS
D3
Ce
ph
OS
D4
Ce
ph
OS
D1
6
…
Ce
ph
OS
D1
NVMe1 NVMe3
NVMe2 NVMe4
Ce
ph
OS
D2
Ce
ph
OS
D3
Ce
ph
OS
D4
Ce
ph
OS
D1
6
…
Ce
ph
OS
D1
NVMe1 NVMe3
NVMe2 NVMe4
Ce
ph
OS
D2
Ce
ph
OS
D3
Ce
ph
OS
D4
Ce
ph
OS
D1
6
…
Ce
ph
OS
D1
NVMe1 NVMe3
NVMe2 NVMe4
Ce
ph
OS
D2
Ce
ph
OS
D3
Ce
ph
OS
D4
Ce
ph
OS
D1
6
…
SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U
Intel Xeon E5 v4 22 Core CPUsIntel P3700 NVMe PCI-e Flash
Easily serviceable NVMe Drives
SuperMicro FatTwin(4x dual-socket XeonE5 v3)
FIO RBD ClientFIO RBD Client
FIO RBD ClientFIO RBD Client
FIO RBD ClientFIO/Sysbench FIO/Sysbench
Intel PCSD(4x dual-socket Xeon E5 v3)
Ceph cluster network (192.168.144.0/24) – 2x10Gbps
Ceph network (192.168.142.0/24) – 2x10Gbps
FIO RBD ClientFIO RBD Client
FIO RBD ClientFIO/Sysbench
SuperMicro FatTwin(1x dual-socket XeonE5 v3)
Ceph MON