Page 1
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STG215
December 1, 2016
How Amazon S3 Storage Management
Helps Optimize Storage at Scale, with
Special Guest, Pinterest
Omair Gillani, Sr. Product Manager, AWS
John Elliott, Mgr. Data and Storage, Pinterest
Page 2
What to Expect from the Session
• How we think about Storage Management for
Amazon S3
• Storage Management portfolio for S3
• Understand your data
• Monitor your data
• Manage your data
• Pulling it all together
• Storage management @ Pinterest
Page 3
How we think about Storage Management
for Amazon S3
Page 4
2012 2013 2014
Amazon storage usage
Trillions of objects
Millions of transactions per second
Page 5
What data do I have?
How is my data being used?
How can I better manage my data?
Do I have data that is not being accessed?
Can I perform data-driven storage management?
“Why Storage Management?”
The New Yorker 2013
What data should I archive?
Page 6
A comprehensive Storage Management
portfolio for Amazon S3
Page 7
Cross-Region
ReplicationLifecycle
Policy
S3 Object TagsEvent
Notifications
Amazon S3 CloudWatch
Metrics S3 Inventory Audit with AWS CloudTrail
S3 Data EventsS3 Analytics
Standard Standard - Infrequent Access Amazon Glacier
Storage Management for S3
Page 8
Understand your storage usage
S3 InventoryAnalyze Logs with
Amazon EMR S3 Analytics
Page 9
S3 Inventory
Save time Daily or Weekly delivery Delivery to S3 bucketCSV File Output
Trigger business workflows and applications such as secondary index, garbage collection,
data auditing, and offline analytics
Half the price of LIST API at $0.0025 per million objects listed
Page 10
S3 Inventory
More information about your objects than provided by LIST API such as replication
status, multipart upload flag, and delete marker
Name Value Type Description
Bucket String Bucket name. UTF-8 encoded.
Key String Object key name. UTF-8 encoded.
Version Id String Version Id of the object
Is Latest Boolean true if object is the latest version (current version) of a versioned object, otherwise false
Delete Marker Boolean true if object is a delete marker of a versioned object, otherwise false
Size Long Object size in bytes
Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ
ETag String eTag in HEX encoded format
StorageClass StringValid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA.
UTF-8 encoded.
Multipart Uploaded Boolean true if object is uploaded by using multipart, otherwise false
Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded.
Page 11
S3 Inventory
Setup notification when S3 Inventory is complete
/Data/<InventoryFile>.gz
/<InventoryFile>.gz
…
/<DayofReport>/manifest.json
/manifest.checksum
…AWS Lambda
Amazon SQS
Amazon SNS
Page 12
Eventually consistent rolling snapshot
S3 Inventory
New objects may not be listed Recently deleted objects may still be included
O1
O2
O3
O1
O2
O3
O1
O2
O1
O2
O3NEW
Validate before you act!Use HEAD OBJECT or GET OBJECT
Page 13
S3 Analytics – Storage Class Analysis
Analyze buckets,
prefixes or tags
$0.10 per million objects
analyzed per month
Daily Storage
Class Analysis
&
Lifecycle
candidates
Data-driven storage management for S3
Export Analysis data
to your S3 bucket
Page 14
S3 Analytics – Storage Class Analysis
Export to use BI tool of your choice
Page 15
Demo
Heavily used storage
Archival storage
Infrequently used storage
Page 16
S3 Analytics – Storage Class Analysis
Page 17
S3 Analytics – Storage Class Analysis
Page 18
Simple to configure S3 Analytics
S3 Management Console PUT Bucket AnalyticsMultiple Policy
Documents
<AnalyticsConfiguration>
<Id>...</Id>
<Filter>
...
</Filter>
<StorageClassAnalysis>
<DataExport>
...
</DataExport>
</StorageClassAnalysis> </AnalyticsConfiguration>
Page 19
Monitor your storage
Monitor and Alert with
CloudWatch
Audit your storage with
CloudTrail Data Events
Server Access Logs
Page 20
CloudWatch metrics for S3
Operational & performance monitoring
• Generate metrics for data of your choice
• Entire bucket, Prefixes, and Tags
• Up to 1,000 object groups
• 1-minute CloudWatch metrics
• Alert and alarm on metrics
Page 21
CloudWatch metrics for S3
Metric Name Metric value
AllRequests Count
PutRequests Count
PostRequests Count
GetRequests Count
ListRequests Count
DeleteRequests Count
HeadRequests Count
Metric Name Metric value
BytesDownloaded MB
BytesUploaded MB
4xxErrors Count
5xxErrors Count
FirstByteLatency ms
TotalRequestLatency ms
$0.30 per metric per month
Page 22
Demo
S3 CloudWatch Metrics
Page 23
S3 Data Events in CloudTrail
Perform security analysis, meet your IT auditing and compliance needs,
and take immediate action on object-level activity to immediately improve
security posture
Pricing: $1 per million data events recorded and storage charges apply
Log object level
operations
Changes to bucket
configurations
SNS notification for
log delivery
Page 24
Manage your data
Cross-Region
Replication
Lifecycle Policies Event
Notifications
S3 Object Tags
Page 25
Manage your data
S3 Object Tags
Easily manage and control access for Amazon S3 objects
• Classify your data
• Tag your objects with key-value pairs
• Write policies once based on the type of data
AnalyzeLifecycle PolicyAccess Control
Page 26
Deep dive on tags
• Tags are key-value pairs
• Maximum 10 tags per object
• Maximum key length—127 Unicode characters
• Maximum value length—255 Unicode characters
• Tag keys and values are case-sensitive.
2 ways to put tags via API
• Put objects with tag parameter, or
• add tag API after object is created
Simple pricing
• $0.01 per 10,000 tags per month
Page 27
What can I do with tags?
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*"
"Condition": {"StringEquals": {"S3:ResourceTag/HIPAA":"True"}}
}
]
}
Manage permissions with tags
Page 28
Lifecycle policies based on tags<LifecycleConfiguration>
<Rule>
<ID>sample-rule</ID>
<Filter>
<And>
<Prefix>documents/</Prefix>
<Tag>
<Key>Project</Key>
<Value>Delta</Value>
</Tag>
<Tag>
<Key>Data type</Key>
<Value>HPI</Value>
</Tag>
</And>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
<Expiration>
<Days>3650</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
• Transition or expire storage using tags
• Simplify S3 lifecycle policies
• Filter with prefix, tag, or both
Page 29
Putting it all together
Page 30
Storage Management for S3
Cross-Region
ReplicationLifecycle Policy S3 Object TagsEvent
Notifications
S3 CloudWatch Metrics S3 Inventory CloudTrail S3 Data EventsS3 Analytics
Page 31
Confidential
Pinterest Infrastructure
John Elliott
31
Page 32
Confidential
80+ Billion Pinscategorized by people into more than
2.6 Billion Boards
3
2
Page 33
Confidential
80+ terabytes of new data...every dayAlmost entirely log data...
Over 140 petabytes of data
33
Page 35
Proprietary and Confidential
Pinterest Growth for S3
35
Storage Growth
YTD 60%
12 Months 86%
Since Jan ‘14 1,467%
Page 36
Proprietary and Confidential
Old data flow 6hr runtime
Inventory Job
Operations Job Efficiency Job
• Count object sizes and read API log
• Join datasets to determine object access activity in
order to make tiering decisions
S3 API
logs
Rollup Job
Efficiency
Report
S3 bucket
listing
Page 37
Proprietary and Confidential
New data flow 20 min runtime
Efficiency Job
• S3 Inventory report allows full bucket inventory
and operations data
• S3 Analytics provides much needed data on
object age and access patterns
Rollup Job
Efficiency
Report
S3
Inventory
Page 38
Proprietary and Confidential
A single click with S3 Analytics
● S3 Analytics provides Storage Class Analysis
Page 39
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Who: Lead Software Development Engineers, Architects, and Technical PMs
Where: Storage Booth Walk-up Bar
When: Exhibit hours (Tues 5-7pm, Wed & Thurs 10:30a-6:00p)
What: Architecture best practices, code reviews, feature requests
Storage “Office Hours”Meet the People who Build AWS Storage
Page 40
Remember to complete
your evaluations!