REFERENCE ARCHITECTURES WHAT’S NEW Brent Compton Director Storage Solution Architectures Red Hat Red Hat Storage Day NYC Oct 2016
REFERENCE ARCHITECTURES WHAT’S NEW
Brent Compton Director Storage Solution Architectures Red Hat
Red Hat Storage Day NYC Oct 2016
Reference Architecture Work
2
MYSQL & HADOOP SOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
Reference Architecture Work
3
MYSQL & HADOOP SOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
(link)
Appetite for Storage-Centric Cloud Services
EC2 EBS S3 KVM RBD RGW (Nova) (Cinder) (Swift)
RDS EMR MySQL Hadoop
Cloud Storage Features Useful for MySQL
HEAD-TO-HEAD LAB TEST ENVIRONMENTS
• EC2 r3.2xlarge and m4.4xlarge
• EBS Provisioned IOPS and GPSSD
• Percona Server
• Supermicro servers
• Red Hat Ceph Storage RBD
• Percona Server
30.0 29.8
3.6
25.6 25.7
4.1
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
P-IOPS m4.4xl
P-IOPS r3.2xl
GP-SSD r3.2xl
100% Read
100% Write
AWS IOPS/GB BASELINE: ~ AS ADVERTISED
IOPS/GB PER MYSQL INSTANCE
30
252
150
26
78
19
0
50
100
150
200
250
300
P-IOPS m4.4xl
Ceph cluster 1x "m4.4xl"
(14% capacity)
Ceph cluster 6x "m4.4xl"
(87% capacity)
MySQL IOPS/GB Reads MySQL IOPS/GB Writes
FOCUSING ON WRITE IOPS/GB AWS IO THROTTLING LEVEL FOR DETERMINISTIC PERFORMANCE
26
78
19
0 10 20 30 40 50 60 70 80 90
P-IOPS m4.4xl
Ceph cluster 1x "m4.4xl"
(14% capacity)
Ceph cluster 6x "m4.4xl"
(87% capacity)
EFFECT OF CEPH CLUSTER LOADING ON IOPS/GB
78
37
25 19
0
10
20
30
40
50
60
70
80
90
Ceph cluster (14%
capacity)
Ceph cluster (36%
capacity)
Ceph cluster (72%
capacity)
Ceph cluster (87%
capacity)
IOPS
/GB
100% Write
$/STORAGE-IOP* FOR COMPRABLE CONFIGS
$2.40
$0.80 $0.78 $1.06
$-
$1.00
$2.00
$3.00
Stor
age
$/IO
P (S
ysbe
nch
Writ
e)
AWS EBS Provisioned-IOPS
Ceph on Supermicro FatTwin 72% Capacity
Ceph on Supermicro MicroCloud 87% Capacity
Ceph on Supermicro MicroCloud 14% Capacity * Ceph configs do not include power, cooling, or admin costs
18 18 19
6
0
5
10
15
20
25
Ceph cluster 80 cores 8 NVMe
(87% capacity)
Ceph cluster 40 cores 4 NVMe
(87% capacity)
Ceph cluster 80 cores 4 NVMe
(87% capacity)
Ceph cluster 80 cores 12 NVMe
(84% capacity)
IOPS
/GB
100% Write
CONSIDERING CORE-TO-FLASH RATIO
8x Nodes in 3U chassis
Model: SYS-5038MR-OSDXXXP
Per Node Configuration: CPU: Single Intel Xeon E5-2630 v4 Memory: 32GB NVMe Storage: Single 800GB Intel P3700 Networking: 1x dual-port 10G SFP+
+ +
1x CPU + 1x NVMe + 1x SFP
SUPERMICRO MICRO CLOUD CEPH MYSQL PERFORMANCE SKU
Enhancing On-premise MySQL Scalability
Appetite for Storage-Centric Cloud Services
EC2 EBS S3 KVM RBD RGW (Nova) (Cinder) (Swift)
RDS EMR MySQL Hadoop
Trend: Disaggregating Hadoop Compute and Storage
§ Hadoop retained data growing at faster pace than hadoop compute needs.
§ Don’t want to waste money on un-needed compute in more hadoop data nodes.
§ Driving trend to disaggregate storage from traditional hadoop nodes. (eBay blog on tiering – here)
§ Multiple disaggregation architecture options
Data Flow Options (Traditional, Partial Disaggregation)
ingres copy
?
?
HDFS
HDFS
aging
retrieval Ceph
MapReduce/Pig, Spark, Hbase/Hive
ingres copy S3A
MapReduce, Cold Data
S3A MapReduce/Pig, Spark, Hbase/Hive
Data Flow Options (Full Disaggregation)
Ceph
MapReduce/Pig, Hive/HBase MapReduce, Spark Hot Data
Non-Hadoop Tools
ingres
S3
S3A
S3
RBD
HDFS over RBD
volumes
Reference Architecture Work
20
MYSQL & HADOOP SOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
• Digital object repo • Digital file repo (WIP)
(link)
- 20,00040,00060,00080,000100,000
UHDmovie
Blue-raymovie
HDmovie
DVDmovie
AudioCD
MP3song
e-book
100,000
25,000
12,000
3,000
750
4
1
MB
Different Servers Yield 10x Ceph Performance
1 DVD movie/sec with 3-node* cluster A
1 Blue-ray movie/sec with 3-node* cluster B
*Both A & B cluster nodes are 2U servers
Ceph Nodes Saturating 80GbE Pipes
1MBSeq.Read=28.5GB/sec 1MBSeq.Write=6.2GB/sec
… and note optimal CPU/SSD ratio for IOPS
4KBRandomRead=693KIOPS 4KBRandomWrite=87.8KIOPS
Sample High Throughput Config for Ceph
• 2x Intel E5-269xv3 (up to 145W per CPU) • 4x-24x 2.5” hot swap Samsung NVMe SSDs • 16x DDR4 2133MHz L/RDIMM & up to 1024GB • 2x 16-lane PCIe Gen3 slots • 2x dual 40 GbE NICs (with 100 GbE option) • EIA 310-D 19” in 2U
Scaling Ceph Object Storage (RGW) to NIC Saturation
Reference Architecture Work
27
MYSQL & HADOOP SOFTWARE-DEFINED NAS DIGITAL MEDIA REPOS
(link)
Server Category Nomenclature ContemporaryStorage
ServerChassisCategoriesFlashBlades FlashArray Standard Dense Ultra-dense
StoragemediaSSD
(NVMehi-perf)SSD
(NVMemid-perf)HDD+SSD HDD+SSD HDD
Mediadrives/node 1-4 4-24 12-16 24-46 60-90
Servernodesperchassis 2-8 1 1 1-2 1
CPUsizing*** 10coresperSSD 4coresperSSD 1coreper2HDD 1coreper2HDD 1coreper2HDD
Servernetworking 10GbE 40GbE+ 10GbE10GbE(archive)40GbE(active)
10GbE(archive)40GbE(active)
TargetIOpattern smallrandomIO largesequentialIO mixed mixed largesequentialIO
VendorExamples
SupermicroMicrocloudSamsungSierra2U/4*
SanDiskInfiniFlash**SamsungSierra2U/24
Supermicro2U/12Quanta/QCT1U/12DellR730XDCiscoC240MLenovoX3650
Supermicro4U/36Quanta/QCT4U/35x2DellDSS7000/2CiscoC3260
Supermicro4U/72Quanta/QCT4U/76DellDSS7000/1CiscoC3160
*smallerflasharray **JBOFwithservers ***characterizedforCeph
- 20,00040,00060,00080,000100,000
UHDmovie
Blue-raymovie
HDmovie
DVDmovie
AudioCD
MP3song
e-book
100,000
25,000
12,000
3,000
750
4
1
MB
Size of Common Things
Jumbo File Performance Comparison (4GB files with 4MB sequential IO - think DVD video size)
0 50 100 150 200 250
Manufacturer spec (HDD)
Standard, baseline (1xRAID6 vol, no Gluster)
Dense, baseline (3xRAID6 vol, no Gluster)
Standard (EC4:2), JBOD Bricks
Dense, (2xRep), RAID6 Bricks
Standard, (2xRep) RAID6 Bricks
Dense (EC4:2), RAID6 bricks
Standard (EC4:2), RAID6 Bricks
MB/sec per Drive (HDD)
Read
Write
Jumbo File Price-Performance Comparison (4GB files with 4MB sequential IO - think DVD video size)
0 50 100 150 200 250 300 350
Standard (EC4:2), JBOD Bricks
Dense, (2xRep), RAID6 Bricks
Standard, (2xRep) RAID6 Bricks
Dense (EC4:2), RAID6 bricks
Standard (EC4:2), RAID6 Bricks
MB/sec per $
Read
Write
3 Year TCO (incl. support)
TCO COMPARISON
For 1PB Usable Capacity Throughput-optimized Solutions Configuration Highlights:
• HDD-only media
• 2x replication with RHGS
• 8:3 Erasure Coding with
EMC Isilon
• Higher CPU-to-media ratio
than archive-optimized
X-210 12LFF 12LFF (standard)
X-410 36LFF 36LFF (dense)
Pricing Sources: EMC Isilon: Gartner Competitive Profiles, as of 2/16/16) & Supermicro: Thinkmate, as of 1/13/16)
Comparing Throughput and Costs at Scale
STORAGE PERFORMANCE SCALABILITY
STORAGE COSTS SCALABILITY
Number of Storage Nodes Number of Storage Nodes
Tota
l Sto
rage
Cos
ts ($
)
Rea
ds/W
rite
s Th
roug
hput
(mB
ps)
Software Defined Scale out Storage
Traditional Enterprise NAS
Storage
Traditional Enterprise NAS
Storage
Software Defined Scale out Storage
Small File Performance Comparison (50KB files - think small jpeg image size)
0 100 200 300 400 500 600
Dense (EC4:2), Tiered (2xSSD/svr)
Dense, no tiering
Dense, Tiered (2xSSD/svr)
Standard, no tiering
Standard, Tiered (4xSSD/svr)
Standard, Tiered (2xNVMe/svr), 70%
Standard, Tiered (2xNVMe/svr)
Standard, Tiered (1xNVMe/svr)
File Operations/Second per drive (HDD), 50KB Files
Read
Create
Small File Performance Comparison (50KB files - think small jpeg image size)
0 100 200 300 400 500 600
Dense (EC4:2), Tiered (2xSSD/svr)
Dense, no tiering
Dense, Tiered (2xSSD/svr)
Standard, no tiering
Standard, Tiered (4xSSD/svr)
Standard, Tiered (2xNVMe/svr), 70% full
Standard, Tiered (2xNVMe/svr)
Standard, Tiered (1xNVMe/svr)
File Operations/Second per drive (HDD), 50KB Files
Read
Create
Small File Performance Comparison (50KB files - think small jpeg image size)
0 100 200 300 400 500 600
Dense (EC4:2), Tiered (2xSSD/svr)
Dense, no tiering
Dense, Tiered (2xSSD/svr)
Standard, no tiering
Standard, Tiered (4xSSD/svr)
Standard, Tiered (2xNVMe/svr), 70%
Standard, Tiered (2xNVMe/svr)
Standard, Tiered (1xNVMe/svr)
File Operations/Second per drive (HDD), 50KB Files
Read
Create
Test drive: bit.ly/cephtestdrive bit.ly/glustertestdrive
Test drive:
Try it Building your first software-defined storage cluster
THANK YOU