Page 1
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mas Kubo, Senior Product Manager, Amazon Glacier
Andy Shenkler, EVP and Chief Solutions & Technology Officer, Sony DADC New Media
Solutions (NMS)
November 30, 2016
Deep Dive on Amazon Glacier
STG302
Page 2
Audio archives – SoundCloud
• World’s leading social sound
platform
• Audio files transcoded and
stored in multiple formats
• Stores petabytes of data
• Transcoded files served from
Amazon S3
• Originals moved to Amazon
Glacier for longterm retention
Page 3
Patient data – Philips Healthcare
• HealthSuite digital platform
powered by AWS
• 15 petabytes of patient data
• Securely stored for decades
(beyond the lifetime of patients)
• Uses HIPAA-eligible AWS
services
Page 4
Tape replacement – King County
• Most populous county in
Washington State
• Replaced tape solution for
backups from 17 agencies
• Meets compliance
requirements
• Saved $1MM in first year, no
more tape refresh or
management churn
Page 6
Batches and Streams
Direct
Connect
Snowball,
Snowball Edge,
Snowmobile
3rd Party
Connectors
Transfer
Acceleration
Storage
GatewayKinesis Firehose
File
Amazon EFS
Block
Amazon EBS (persistent)
Object
Amazon GlacierAmazon S3 Amazon EC2
Instance Store (ephemeral)
Page 7
Data Storage Demand
• Media assets, 4k, 8k
• Healthcare/life sciences
• Financial services
• Regulated industries
• Oil and gas/geospatial
• Digital preservation
• Longterm backups
• Logs
Archive:
• Secure and durable
• Low cost
• Flexible data access
• Compliant
Page 8
Amazon Glacier
• Extremely low-cost archive storage service, starting at $0.004
per GB per month
• New! Three retrieval options ranging from minutes to hours
(more later)
• 99.999999999% of durability (5-6 orders of magnitude higher
than 2 copies of tape)
• All data is encrypted at rest
• Features: compliance, data management, cost management,
audit logging
Page 9
Amazon Glacier
Metered
usage:
pay as you go
No capital investment
No commitment
No risky capacity
planning
Avoid risks of
physical media
handling
Control your
geographic
locality for
performance
and compliance
Page 10
Key Terms and Concepts
• Vaults – container for archives, up to 1,000 vaults per account
• Archives – basic unit, write-once, 40 TB max, unlimited archives
• Inventory – cold index of archives refreshed every 24 hours
• Access – three ways to access Amazon Glacier
• Uploads – multipart, lifecycle, cost optimizations, AWS Snowball
• Data management – Vault Lock, tagging, audit logs
• Retrievals – retrieval policies, range retrievals, new retrieval
features
Page 11
Accessing Amazon Glacier
1. Direct Amazon Glacier API/SDK
2. Amazon S3 lifecycle integration
3. Third-party tools and gateways
FastGlacier
Page 12
Uploading data: Internet or sneaker-net
AWS Direct
ConnectDedicated bandwidth between
your site and AWS
InternetTransfer data in a secure SSL tunnel
over the public Internet
AWS Import/Export
AWS SnowballPhysical transfer of media into
and out of AWS
Page 13
Uploading data: archive descriptions
• Use archive description field for
metadata
• If local index is corrupted or
destroyed, use archive description
to reconstruct critical mappings
• For example, create index entry,
add primary key to archive
description on upload
Local Index Entry
Primary key: 12345
Description: 2014Audit
Dept: FinanceDept
ArchiveID: 9FG23…..
…..
UploadArchive(data,
ArchiveDescription=“12345,
2014Audit,FinanceDept”) ->
Archive ID = 9FG23…..
Page 14
Uploading data: optimizing costs
• Every archive has 32 KB of associated
overhead and some operations are charged per
request
• For archive size of 3.2 MB ~1% cost overheads
• For 1 KB archive, 97% of cost would go to
overhead
• Solution is aggregation – recommend minimum
size on the order of at least MBs
Page 15
Checksum 2
Checksum 1
File 2
Checksum 3
. . .
Local index
File 1 offset
File 1
File 2 offset
File 3 offset
Index/directory
…
Checksum & metadata
Checksum & metadata
Checksum & metadata
Archive
Uploading data: aggregating archives
Page 16
Best practices: multipart uploads
Improve throughput, reliability, and get idempotency
1. InitiateMultipartUpload(partSize) → uploadId
2. UploadPart(uploadId, data)
3. CompleteMultipartUpload(uploadId) → archiveId
Arc
hiv
e
Parallel Uploads
Parts
Page 17
Amazon Glacier: Amazon S3 lifecycle policies
• Seamlessly move data from Amazon S3 to Amazon Glacier
• Automated lifecycle rules
• Transition based on object age
Page 18
Amazon Glacier: Amazon S3 lifecycle policies
• Object-level tagging for S3
objects
• Apply lifecycle rules based on
object tags
• Example: transition objects to
Amazon Glacier when 1 year
old and have object tags
‘Project=Delta’ and ‘Data
type=HPI’.
Page 19
Management features: vault tagging
Page 20
Management features: audit logging via
AWS CloudTrail
• Enable AWS
CloudTrail in console
• Control plane events:
vault activities
• Data plane events:
archive activities
Page 21
Management features: vault access policies
• Manage access to a vault in a single location – single AWS Identity and
Access Management (IAM) policy
– Grant/revoke access to internal business units/teams
– “Marketing_Vault” has an access policy that is distinct from
“DevOps_Vault”
• Easily manage cross-account access for your business partner
– Simply add a section for your business partner in the same policy
Page 22
Management features: Vault Lock
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure optional designated third-party access and grant
temporary access
Page 23
Vault Lock: two-step locking
• InitiateVaultLock
– Effectuates a retention policy for testing (in-progress state)
– Returns a unique lock ID (expires after 24 hours)
• AbortVaultLock
– Deletes an in-progress policy
– Ability to modify a policy before locking it down
• CompleteVaultLock
– Locks down the vault with the appropriate lock ID
– A Vault Lock policy cannot be aborted once locked
Management features: Vault Lock
Page 24
• Set up a legal hold tag
– Configure a vault-level tag “LegalHold”
– Set initial value to “False”
• Add compliance control for legal hold in a vault lock policy
– Deny delete archive operation
– From anybody (root, administrators, users, business partners)
– When LegalHold tag = “True”
• Place or lift legal hold by updating the tag value
Legal hold with vault-level tags
Management features: Vault Lock
Page 25
Example control: legal hold
Management features: Vault Lock
Page 26
• Map one vault to a single retention range
– Group regulatory data by retention: 1-year vault, 6-year vault, etc.
• Create a new vault and lock it before storing production data
– Enforce the full ArchiveAgeInDays on all new archives
– Leave no “gap” on existing archives
• Thoroughly test a vault lock policy before locking it down (Abort/Initiate)
• Implement only the most restrictive controls with Vault Lock
– Leave the flexible controls to vault access policy
Vault Lock best practices
Management features: Vault Lock
Page 27
Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the
requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c)
Third-party assessment
Management features: Vault Lock
Page 28
Data retrievals: basic concepts
Initiate jobArchiveId: AE99F…
Vault: Films -> Job ID
1
3-5 hours for job completion2
3 Job completion notification
4 Download output
Page 29
Data retrievals: restoring via lifecycle
1 2
Page 30
Data retrievals: restoring via lifecycle
3
4
Page 31
Data retrievals: data retrieval policies
• Provides transparency and cost control for data retrievals
• Governs all retrieval activities for an account in a region
• Synchronously accepts or rejects each retrieval request
• Accounts for inflight retrieval operations
Page 32
Checksum 2
Checksum 1
File 2
Checksum 3
. . .
Local index
File 1 offset
File 1
File 2 offset
File 3 offset
Index/directory
…
Checksum & metadata
Checksum & metadata
Checksum & metadata
Archive
Data retrievals: range retrievals
Page 33
Data retrievals: expedited and bulk retrievals
Expedited Standard Bulk
Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours
Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB
Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests
• Expedited: designed for occasional urgent access to a small
number of archives
• Standard: low-cost option for retrieving data in just a few hours
• Bulk: lowest cost option optimized for large retrievals, up to
petabytes of data in 12 hours
• Three flexible and powerful retrieval options to access any of your
Amazon Glacier data
Page 35
Accelerated Media Lifecycle
@SonyDADCNMS
Page 36
“If physical deliveries can happen within one hour based on
unpredictable requests, surely we are able to exceed such expectations digitally”
@SonyDADCNMS
Page 37
Our migration
The Challenge
• Seamlessly migrate a platform that enables content
delivery across all devices and more than 1,200
distribution points worldwide
• Store 20 petabytes of motion picture and television
content
• Equating to 1,000,000 M+ hours of content
• At a growth curve of ~1 petabyte every quarter
Desired Goals:
• One-hour delivery turn around time
• Agile, scalable, predictable cost model and
infrastructure
• Investing in innovation vs. hardware
@SonyDADCNMS
Page 38
On-premise Asset Storage Workflow
@SonyDADCNMS
Page 39
AWS Cloud-based asset storage workflow
@SonyDADCNMS
AMAZON
GLACIER
Page 40
Amazon Glacier vs. on-premises cost comparison
@SonyDADCNMS
Page 42
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Who: Lead Software Development Engineers, Architects, and Technical PMs
Where: Storage Booth Walk-up Bar
When: Exhibit hours (Tues 5-7pm, Wed & Thurs 10:30a-6:00p)
What: Architecture best practices, code reviews, feature requests
Storage “Office Hours”Meet the People who Build AWS Storage
Page 43
Remember to complete
your evaluations!