Jun 29, 2015
in data transfer from S3
not including Amazon Web Services use
Architecture
Choosing a region
Building a naming scheme
Considering LISTs
Optimizing PUTs
Multipart upload
Demo
Optimizing GETs
Using CloudFront
Range-based GETs
Demo
Customer Case
BigData Corp
1 2
58
100/8 = 12.5 events/sec
100,000 users @ 10 events an hour = 224 TPS
<my_bucket>/2013_11_13-164533125.jpg<my_bucket>/2013_11_13-164533126.jpg<my_bucket>/2013_11_13-164533127.jpg<my_bucket>/2013_11_13-164533128.jpg<my_bucket>/2013_11_12-164533129.jpg<my_bucket>/2013_11_12-164533130.jpg<my_bucket>/2013_11_12-164533131.jpg<my_bucket>/2013_11_12-164533132.jpg<my_bucket>/2013_11_11-164533133.jpg<my_bucket>/2013_11_11-164533134.jpg<my_bucket>/2013_11_11-164533135.jpg<my_bucket>/2013_11_11-164533136.jpg
1 2 N1 2 N
Partition Partition Partition Partition
<my_bucket>/521335461-2013_11_13.jpg<my_bucket>/465330151-2013_11_13.jpg<my_bucket>/987331160-2013_11_13.jpg<my_bucket>/465765461-2013_11_13.jpg<my_bucket>/125631151-2013_11_13.jpg<my_bucket>/934563160-2013_11_13.jpg<my_bucket>/532132341-2013_11_13.jpg<my_bucket>/565437681-2013_11_13.jpg<my_bucket>/234567460-2013_11_13.jpg<my_bucket>/456767561-2013_11_13.jpg<my_bucket>/345565651-2013_11_13.jpg<my_bucket>/431345660-2013_11_13.jpg
1 2 N1 2 N
Partition Partition Partition Partition
• Store objects as a hash of their name– add the original name as metadata
• “deadmau5_mix.mp3” 0aa316fb000eae52921aab1b4697424958a53ad9
– prepend key name with short hash
• 0aa3-deadmau5_mix.mp3
• Epoch time (reverse)– 5321354831-deadmau5_mix.mp3
<my_bucket>/images/521335461-2013_11_13.jpg<my_bucket>/images/465330151-2013_11_13.jpg<my_bucket>/movies/293924440-2013_11_13.jpg<my_bucket>/movies/987331160-2013_11_13.jpg<my_bucket>/thumbs-small/838434842-2013_11_13.jpg<my_bucket>/thumbs-small/342532454-2013_11_13.jpg<my_bucket>/thumbs-small/345233453-2013_11_13.jpg<my_bucket>/thumbs-small/345453454-2013_11_13.jpg
faster flexible
set of parts
presents all parts as
a single object
parallel pausing resuming
beginning uploads before
you know the total object size
DEMOMultipart Uploads
DEMOAmazon CloudFront vs. Amazon S3 download performance
• Align your ranges with your parts!
DEMORange based GETs
DynamoDB Amazon RDS Amazon
CloudSearchAmazon EC2
Maestro
(Reserved Instance)
List of crawl
URLs Main workers
Execute crawling
and process data
Spot Instances
Secondary workers
(queue listeners)
Reprocess data,
query additional
services, store
data on MongoDB
Spot Instances
Secondary
work queues –
processed data
MongoDB
cluster
Command and
Control Queue
Architecture
Choosing a region
Building a naming scheme
Considering LISTs
Optimizing PUTs
Multipart upload
Demo
Optimizing GETs
Using CloudFront
Range-based GETs
Demo
Customer Case
BigData Corp
Please give us your feedback on this
presentation