(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Post on 08-Jan-2017

1537 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

Transcript

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Henry Zhang, Senior Product Manager, Amazon Glacier

October 2015

Amazon Glacier Deep Dive

STG312

Audio archives – SoundCloud

• World’s leading social sound platform

• Audio files transcoded and stored in multiple formats

• Stores PBs of data

• Transcoded files served from Amazon S3

• Originals moved to Amazon Glacier for long-term retention

Video archives – Sony Media Cloud (Ci)

Amazon

Glacier

Tape replacement – King County

• Most populous county in Washington State

• Replace tape solution for backup from 17 agencies

• Meet compliance requirement

• Saved $1MM in first year, no more tape refresh or

management churn

Archive:

Data retained for the long term,

for compliance or potential

future reference

Data archiving needs are growing everywhere

• Media assets, 4K, 8K

• Health care / Life sciences

• Financial services

• Regulated industries

• Oil and gas / Geospatial

• Digital preservation

• Long-term backups

• Logs

Traditional archiving approaches

• Tape silos / Tape libraries

• Tape drives (LTO-X / DLT / etc.)

• Virtual tape libraries (VTLs)

• Tape out / Vaulting

• Specialized software & personnel

How can Amazon Glacier help with your archival?

Metered usage:

Pay as you go

No capital investment

No commitment

No risky capacity planning

Avoid risks of physical

media handling

Control your

geographic locality for

performance and

compliance

Amazon Glacier is a low-cost storage service for

archival data with long-term retention requirements.

$0.007/GB per month 3-5 hour data retrievalFinancial records

Medical PACs images

High Res Media Assets

How can Amazon Glacier help with your archival?

Extremely low-cost archive storage service, starting at $0.007 GB/mo

Allows you to retrieve data within 3-5 hours

99.999999999% of durability (7 orders of magnitude higher than 2 copies of tape)

No data migration, no hardware/infrastructure investments

Infinite scale and pay for what you use

Access to on-demand compute resource on AWS

Getting started – key concepts

• Account – Access AWS services, view billing/usage, manage security

• Vaults – Container for archives, up to 1000 vaults per account

• Archives – Files and records, write-once, 40TB max, unlimited archives

• Inventory – Cold index of archive properties refreshed every 24 hours

Amazon Glacier – 3 ways to Access

•Direct Glacier API/SDK

•S3 lifecycle integration

•Third party tools and gateways

Amazon Glacier concepts: Uploading data

Create vault (films)1

Configure access policies2

ArchiveApp user policy

Effect:Allow

Resource:

arn:aws:glacier:<accountId>:vaults/Films

Action: glacier:UploadArchive

3 Upload archivesUploadArchive(data) ->

Archive ID

Amazon Glacier concepts: Retrieving data

Initiate JobArchiveId: AE99F…

Vault: Films -> Job ID

1

3-5 hours for job completion2

3 Job completion notification

4 Download output

Amazon Glacier – Amazon S3 lifecycle archival

• Seamlessly move data from Amazon S3 to Amazon Glacier

• Automated lifecycle rules

• Transition based on object age or predefined date

Amazon Glacier – Backup software integration

• CommVault – Native Integration

with Amazon S3 & Amazon Glacier

• Deduplication & encryption

• Single console management

Amazon S3 Amazon Glacier

Amazon Glacier – Third-party tools and gateways

•Consumer grade: less than $50

• Example: Cloudberry, FastGlacier, Arq (Haystack Software)

•Small / medium business: $500 - $1,000

• Example: Synology, Veeam, QNap

•Enterprise grade gateway (price varies)

• Example: NetApp AltaVault

Best practices – Prepare your data

Use Archive descriptions

• Use Archive description field for

metadata.

• If local index is corrupted or

destroyed, use archive description

to reconstruct critical mappings.

• For example, create index entry,

add primary key to archive

description on upload.

Small objects and object size overhead

• Every archive has 32KB of associated overhead

and some operations are charged per request

• For archive size of 3.2MB ~1% cost overheads

• For 1KB archive, 97% of cost would go to

overhead

• Solution is aggregation – recommend minimum

size on the order of at least MBs

Archive aggregation

Checksum 2

Checksum 1

File 2

Checksum 3

. . .

Local index

File 1 offset

File 1

File 2 offset

File 3 offset

Index/directory

Checksum & metadata

Checksum & metadata

Checksum & metadata

Archive

Best practices – Optimize upload

Best practices: Multipart uploads

Improve throughput, reliability, and get idempotency with multipart uploads

1. InitiateMultipartUpload(partSize) → uploadId

2. UploadPart(uploadId, data)

3. CompleteMultipartUpload(uploadId) → archiveId

Arc

hiv

e

Parallel Uploads

Parts

Best practices: Data ingestion options

AWS Direct

ConnectDedicated bandwidth between

your site and AWS

InternetTransfer data in a secure SSL tunnel

over the public Internet

AWS Import/Export

SnowballPhysical transfer of media into

and out of AWS

Best practices – Cost management

Amazon Glacier – Data retrieval policies

• Provides transparency and cost control for data retrievals

• Governs all retrieval activities for an account in a region

• Synchronously accept/reject each retrieval request

• Accounts for inflight retrieval operations

Amazon Glacier – Data retrieval policies

Amazon Glacier – Data retrieval policies

Amazon Glacier – Data retrieval policies

Amazon Glacier – Data retrieval policies

Cost allocation with vault tags

Best practices – Security and compliance

Amazon Glacier – Audit logging with AWS CloudTrail

• Enable AWS CloudTrail in

console

• Control plane events –

Vault activities

• Data plane events –

Archive activities

Vault access policies

• Manage access to a Vault in a single location – single IAM policy

– Grant/revoke access to internal business units/teams

– “Marketing_Vault” has a distinct access policy than “DevOps_Vault”

• Easily manage cross-account access for your business partner

– Simply add a section for your business partner in the same policy

Amazon Glacier Vault Lock allows you to easily

set compliance controls on individual vaults and

enforce them via a lockable policy.

Time-based retention

MFA Authentication

Controls govern all

records in a Vault

Immutable policy

Two-step locking

Compliance Storage with Vault Lock

Vault Lock for compliance storage

• Non-overwrite, non-erasable records

• Time-based retention with “ArchiveAgeInDays” control

• Policy lockdown (strong governance)

• Legal hold with vault-level tags

• Configure optional designated third-party access and grant

temporary access

Example control: 1 year record retention

Example control: 1 year record retention

Vault Lock: Two-step locking

Legal hold with vault-level tags

Example control: Legal hold

Vault lock best practices

Vault access policy• Can be updated/deleted

Vault lock policy• Lockable/Immutable policy

• Cannot be updated/deleted after lockdown

Use vault access policy to:• Designate third-party access

• Grant temporary read permissions when necessary

Use vault lock policy to:• Deploy regulatory controls such

as records retention

• Enforce data access through multi-factor authentication only

Compliance/Governance Flexibility

Using vault lock policy with vault access policy

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Vault Lock in the Glacier Console

Amazon Glacier received a third-party assessment

from Cohasset Associates on how Amazon Glacier

with Vault Lock can be used to meet the

requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).

Thank you!

Remember to complete

your evaluations!

top related