© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mas Kubo, Senior Product Manager, Amazon Glacier December 12, 2016 Deep Dive on Amazon Glacier © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apr 16, 2017
Mas Kubo, Senior Product Manager, Amazon Glacier
December 12, 2016
Deep Dive on Amazon Glacier
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storing 20 PB and 1M+ hours of motion picture and television content, growing 1 PB per quarter
Single-copy on Glacier Over $10MM in savings Replaced legacy tape solution Higher performance, higher
durability, lower cost
Media Content Distribution – Sony DADC
HealthSuite digital platform powered by AWS
15 PB of patient data Archives patient records and medical
images produced across over 1,500 hospitals
Securely stored for decades (lifetime of patients)
Uses HIPAA-eligible AWS services
Patient data – Philips Healthcare
Batches and Streams
Direct Connect
Snowball, Snowball Edge,
Snowmobile
3rd Party Connectors
Transfer Acceleration
Storage Gateway
Kinesis Firehose
File
Amazon EFS
Block
Amazon EBS (persistent)
Object
Amazon GlacierAmazon S3 Amazon EC2 Instance Store
(ephemeral)
Data Storage Demand
Media assets, 4k, 8k Healthcare/life sciences Financial services Regulated industries Oil and gas/geospatial Digital preservation Longterm backups Logs
Solution Requirements: Secure and durable Scalable Cost-effective Flexible data access Compliant
Flexible Data Access
Three retrieval options from minutes to hours
Durable11 9s of durability (5 orders of
magnitude better than 2 copies on tape)
Management FeaturesVault Lock, Retrieval Policies,
CloudTrail
Cost-EffectiveStarting at $0.004 per GB
per month
SecureAll data encrypted at rest
ScalableFrom gigabytes to exabytes
Amazon Glacier
Amazon Glacier
Metered usage:pay as you go
No capital investmentNo commitment
No risky capacity planning
Avoid risks of physical media handling
Control your geographic locality for
performance and compliance
Key Terms and Concepts
Vaults – container for archives, up to 1,000 vaults per account
Archives – basic unit, write-once, 40 TB max, unlimited archives
Inventory – cold index of archives refreshed every 24 hours
1. Access – three ways to access Amazon Glacier
2. Uploads – multipart, lifecycle, cost optimizations, AWS Snowball
3. Data management – Vault Lock, tagging, audit logs
4. Retrievals – retrieval policies, range retrievals, new retrieval features
Accessing Amazon Glacier
1. Direct Amazon Glacier API/SDK2. Amazon S3 lifecycle integration3. Third-party tools and gateways
FastGlacier
Uploading data: Internet or sneaker-net
AWS DirectConnect
Dedicated bandwidth between your site and AWS
InternetTransfer data in a secure SSL tunnel
over the public Internet
SnowballSnowball Edge
SnowmobilePhysical transfer of media into
and out of AWS
Uploading data: archive descriptions
Use archive description field for metadata
If local index is corrupted or destroyed, use archive description to reconstruct critical mappings
For example, create index entry, add primary key to archive description on upload
Local Index Entry
Primary key: 12345Description: 2014AuditDept: FinanceDeptArchiveID: 9FG23…..…..
UploadArchive(data,ArchiveDescription=“12345, 2014Audit,FinanceDept”) -> Archive ID = 9FG23…..
Uploading data: optimizing costs
Every archive has 32 KB of associated overhead and some operations are charged per request
For archive size of 3.2 MB ~1% cost overheads
For 1 KB archive, 97% of cost would go to overhead
Solution is aggregation – recommend minimum size on the order of at least MBs
Checksum 2
Checksum 1
File 2
Checksum 3
. . .
Local indexFile 1 offset
File 1
File 2 offset
File 3 offset
Index/directory…
Checksum & metadataChecksum & metadata
Checksum & metadata
Archive
Uploading data: aggregating archives
Best practices: multipart uploadsImprove throughput, reliability, and get idempotency
1. InitiateMultipartUpload(partSize) → uploadId2. UploadPart(uploadId, data)3. CompleteMultipartUpload(uploadId) → archiveId
Archive
Parallel Uploads
Parts
Amazon Glacier: Amazon S3 lifecycle policies
Seamlessly move data from Amazon S3 to Amazon Glacier Automated lifecycle rules Transition based on object age
Amazon Glacier: Amazon S3 lifecycle policies
Object-level tagging for S3 objects
Apply lifecycle rules based on object tags
Example: transition objects to Amazon Glacier when 1 year old and have object tags ‘Project=Delta’ and ‘Data type=HPI’.
Management features: vault tagging
Management features: AWS CloudTrail
Enable AWS CloudTrail in console
Control plane events: vault activities
Data plane events:archive activities
Management features: vault access policies
Manage access to a vault in a single location – single AWS Identity and Access Management (IAM) policy Grant/revoke access to internal business units/teams “Marketing_Vault” has an access policy that is distinct from
“DevOps_Vault”
Easily manage cross-account access for your business partner Simply add a section for your business partner in the same policy
Management features: Vault Lock
Non-overwrite, non-erasable records
Time-based retention with “ArchiveAgeInDays” control
Policy lockdown (strong governance)
Legal hold with vault-level tags
Configure optional designated third-party access and grant temporary access
Vault Lock: two-step locking InitiateVaultLock
Effectuates a retention policy for testing (in-progress state) Returns a unique lock ID (expires after 24 hours)
AbortVaultLock Deletes an in-progress policy Ability to modify a policy before locking it down
CompleteVaultLock Locks down the vault with the appropriate lock ID A Vault Lock policy cannot be aborted once locked
Management features: Vault Lock
Set up a legal hold tag Configure a vault-level tag “LegalHold” Set initial value to “False”
Add compliance control for legal hold in a vault lock policy Deny delete archive operation From anybody (root, administrators, users, business partners) When LegalHold tag = “True”
Place or lift legal hold by updating the tag value
Legal hold with vault-level tagsManagement features: Vault Lock
Example control: legal holdManagement features: Vault Lock
Map one vault to a single retention range Group regulatory data by retention: 1-year vault, 6-year vault, etc.
Create a new vault and lock it before storing production data Enforce the full ArchiveAgeInDays on all new archives Leave no “gap” on existing archives
Thoroughly test a vault lock policy before locking it down (Abort/Initiate)
Implement only the most restrictive controls with Vault Lock Leave the flexible controls to vault access policy
Vault Lock best practicesManagement features: Vault Lock
Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC 17a-4(f) and
CFTC 1.31(b)-(c)
Third-party assessmentManagement features: Vault Lock
Data retrievals: basic concepts
Initiate jobArchiveId: AE99F…Vault: Films -> Job ID
1
Retrieval Processing (minutes or hours depending on retrieval option)
2
3 Job completion notification
4 Download output
Data retrievals: restoring via lifecycle
1 2
Data retrievals: restoring via lifecycle
3
4
Data retrievals: data retrieval policies Provides transparency and cost control for data retrievals Governs all retrieval activities for an account in a region Synchronously accepts or rejects each retrieval request Accounts for inflight retrieval operations
Data retrievals: expedited and bulk retrievals
Expedited Standard Bulk
Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours
Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB
Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests
Expedited: designed for occasional urgent access to a small number of archives Standard: low-cost option for retrieving data in just a few hours Bulk: lowest cost option optimized for large retrievals, up to petabytes of data in
12 hours Three flexible and powerful retrieval options to access any of your Amazon
Glacier data
Data retrievals: expedited retrievals
Expedited: two types of requests On-demand: like EC2 On-Demand instances are available
the vast majority of the time Provisioned requests: guaranteed capacity
Provisioned capacity Guarantees expedited retrieval capacity is available when
needed Ensure at least 3 expedited requests every 5 minutes and
provides up to 150 MB/s of retrieval throughput $100 per month per unit
Thank you!
Q&A