1,000,000,000,000,000 bytes. On demand. Online. Live. Big doesn't quite describe this data. Amazon Web Services makes it possible to construct highly elastic computing systems, and you can further increase cost efficiency by leveraging the Spot Pricing model for Amazon EC2. We showcase elasticity by demonstrating the creation and teardown of a petabyte-scale multiregion MongoDB NoSQL database cluster, using Amazon EC2 Spot Instances, for as little as $200 in total AWS costs. Oh and it offers up four million IOPS to storage via the power of PIOPS EBS. Christopher Biow, Principal Technologist at 10gen | MongoDB covers MongoDB best practices on AWS, so you can implement this NoSQL system (perhaps at a more pedestrian hundred-terabyte scale?) confidently in the cloud. You could build a massive enterprise warehouse, process a million human genomes, or collect a staggering number of cat GIFs. The possibilities are huMONGOus.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• AWS CloudFormation: Your Infrastructure belongs in your source control
AWS CloudFormation
AWS Storage Options
• Amazon EBS – Provisioned IOPS volumes • Deliver predictable, high performance for I/O intensive workloads • Specify IOPS required upfront, and EBS provisions for lifetime of volume – 4000 IOPS per volume, can stripe to get thousands of IOPS to an EC2 instance
• High IO Instances – hi1.4xlarge • For some applications that require tens of thousands of IOPS • Eliminates network latency/bandwidth as a performance constraint to storage
EBS PIOPS
SSD
AWS Storage Options Testing: random 4k reads
EBS
SSD
PIOPS
+
One Volume: ~200 MongoOPS with some variability, <1mb/s Loaded instance: ~ 1000 MongoOPS with some variability <10mb/s One Volume: 200 0 MongoOPS with <1% variability, 16mb/s Loaded Instance: 16,000 MongoOPS with <1% variability, 64mb/s Loaded Cluster Instance: MongoOPS, 320mb/s Hi1.4xlarge ephemeral: ~64,000 MongoOPS with low variability, ~245mb/s
Testing: random 4k reads
EBS
SSD
PIOPS
+
Sta
ble
Stability Tips
• Ext4 or XFS, nodiratime, noatime • Raise file descriptor limits • Set disk read-ahead • No large virtual memory pages • SNAPSHOT SNAPSHOT SNAPSHOT
• Retain a PIOPS EBS node for snapshot backups
• Snapshots allow cross-AZ and cross-region recovery
• SSD hosts as primary
• Shard for scale
244gb cr1.8xlarge Another option…
So, about that Petabyte v.cheap • Spot Market • m1.small • 1024 shards • 1TB EBS from snapshot • PowerBench reader • Aggregation queries
The naming of parts Amazon Terms • Provisioned IOPS • Elastic Compute Cloud • EC2 Spot Instances • Auto Scaling groups
Nicks • PIOPS • EC2 • Here, Spot! • ASG
Players
MongoDB • Document-model,
NoSQL database
• Dev adoption is STRONG
• MongoDB Inc. trending toward zero h/w
• Scale-up with commodity h/w • Scale-out with sharding • Scale-around with replication
AWS • PIOPS for an IO-hungry client • 40% of MongoDB customer usage • 90% of MongoDB internal usage • More ports :2701[79] than :[15]521
PB & Chocolate Differentiators for mutual customers
• Fast time-to-solution • Easy global distribution • Secondary index • Geo, text, security • Fast analytic aggregation
Challenge
Motivation: IWBCI…
• Test scale-out of MongoDB beyond typical • Learn massive scale-out on AWS • Do it as cheaply as possible • Apply customer data • Break the petabarrier
m1.small us-east1 Spot Market
m1.small us-east1d Spot Market
Proposal Item Units Time Unit Cost Net Cost m1.small Spot 1050 3hr $0.007/hr $22.05 m1.large 3 48hrs $0.056/hr $8.07 S3 1TB 1wk $95/TB/mo 23.75 EBS 1024 x 1TB 1hr $100/TB/mo 142.22 S3 EBS 1PB ?? $0/TB 0.00 Total $196.09