(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

Post on 29-Jun-2015

6534 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

This session drills deep into the Amazon S3 technical best practices that help you maximize storage performance for your use case. We provide real-world examples and discuss the impact of object naming conventions and parallelism on Amazon S3 performance, and describe the best practices for multipart uploads and byte-range downloads.

Transcript

in data transfer from S3

not including Amazon Web Services use

Architecture

Choosing a region

Building a naming scheme

Considering LISTs

Optimizing PUTs

Multipart upload

Demo

Optimizing GETs

Using CloudFront

Range-based GETs

Demo

Customer Case

BigData Corp

Request Rate and Performance Considerations

http://amzn.to/18oF5LCTIP

1 2

58

100/8 = 12.5 events/sec

100,000 users @ 10 events an hour = 224 TPS

<my_bucket>/2013_11_13-164533125.jpg<my_bucket>/2013_11_13-164533126.jpg<my_bucket>/2013_11_13-164533127.jpg<my_bucket>/2013_11_13-164533128.jpg<my_bucket>/2013_11_12-164533129.jpg<my_bucket>/2013_11_12-164533130.jpg<my_bucket>/2013_11_12-164533131.jpg<my_bucket>/2013_11_12-164533132.jpg<my_bucket>/2013_11_11-164533133.jpg<my_bucket>/2013_11_11-164533134.jpg<my_bucket>/2013_11_11-164533135.jpg<my_bucket>/2013_11_11-164533136.jpg

1 2 N1 2 N

Partition Partition Partition Partition

<my_bucket>/521335461-2013_11_13.jpg<my_bucket>/465330151-2013_11_13.jpg<my_bucket>/987331160-2013_11_13.jpg<my_bucket>/465765461-2013_11_13.jpg<my_bucket>/125631151-2013_11_13.jpg<my_bucket>/934563160-2013_11_13.jpg<my_bucket>/532132341-2013_11_13.jpg<my_bucket>/565437681-2013_11_13.jpg<my_bucket>/234567460-2013_11_13.jpg<my_bucket>/456767561-2013_11_13.jpg<my_bucket>/345565651-2013_11_13.jpg<my_bucket>/431345660-2013_11_13.jpg

1 2 N1 2 N

Partition Partition Partition Partition

• Store objects as a hash of their name– add the original name as metadata

• “deadmau5_mix.mp3” 0aa316fb000eae52921aab1b4697424958a53ad9

– prepend key name with short hash

• 0aa3-deadmau5_mix.mp3

• Epoch time (reverse)– 5321354831-deadmau5_mix.mp3

<my_bucket>/images/521335461-2013_11_13.jpg<my_bucket>/images/465330151-2013_11_13.jpg<my_bucket>/movies/293924440-2013_11_13.jpg<my_bucket>/movies/987331160-2013_11_13.jpg<my_bucket>/thumbs-small/838434842-2013_11_13.jpg<my_bucket>/thumbs-small/342532454-2013_11_13.jpg<my_bucket>/thumbs-small/345233453-2013_11_13.jpg<my_bucket>/thumbs-small/345453454-2013_11_13.jpg

Request Rate and Performance Considerations

http://amzn.to/18oF5LCTIP

faster flexible

set of parts

presents all parts as

a single object

parallel pausing resuming

beginning uploads before

you know the total object size

DEMOMultipart Uploads

DEMOAmazon CloudFront vs. Amazon S3 download performance

• Align your ranges with your parts!

DEMORange based GETs

DynamoDB Amazon RDS Amazon

CloudSearchAmazon EC2

Maestro

(Reserved Instance)

List of crawl

URLs Main workers

Execute crawling

and process data

Spot Instances

Secondary workers

(queue listeners)

Reprocess data,

query additional

services, store

data on MongoDB

Spot Instances

Secondary

work queues –

processed data

MongoDB

cluster

Command and

Control Queue

Architecture

Choosing a region

Building a naming scheme

Considering LISTs

Optimizing PUTs

Multipart upload

Demo

Optimizing GETs

Using CloudFront

Range-based GETs

Demo

Customer Case

BigData Corp

gfelipe@amazon.com

thoran@bigdatacorp.com.br

Please give us your feedback on this

presentation

top related