AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amazon Glacier (STG209)
Post on 16-Apr-2017
524 Views
Preview:
Transcript
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Henry Zhang, Senior Product Manager, AWS
Rich Sutton, VP of Engineering, Digital Risk, Proofpoint
November 30, 2016
STG209
Strategic Planning for Long-Term
Data Archiving with Amazon Glacier
AWS storage maturity
Amazon EFS
File
Amazon Elastic
Block Store
Amazon EC2
Instance Store
Block
Amazon S3 Amazon Glacier
Object
Data Transfer
AWS Direct
Connect
AWS
Snowball
ISV
Connectors
Amazon
Kinesis
Firehose
Amazon S3
Transfer
Acceleration
AWS Storage
Gateway
• Media distribution backbone (Ve.nue platform)
• Over-The-Top (OTT) broadcast service
• 20PBs of media assets, 800,000 hours of high-res content
• Assets to be archived and retained for decades
Video archives
Patient data–Philips Healthcare
• HealthSuite digital platform powered by AWS
• 15 petabytes of patient data
• Archived for decades (beyond the lifetime of patients)
• Uses AWS HIPAA-eligible services in the BAA
Public sector–King County
• Most populous county in Washington state
• Replaced tape solution for backup from 17 agencies
• Meets compliance requirement
• Saved $1MM in first year; no more tape refresh or
management churn
Archive:
Data retained for the long term,
for compliance or potential
future reference
Data archiving needs are growing everywhere
• Media assets, 4K, 8K
• Health care/life sciences
• Financial services
• Regulated industries
• Oil and gas/geospatial
• Digital preservation
• Long-term backups
• Logs
Consideration 1 – Total Archive Cost
Traditional archiving approaches
• Tape libraries, robots, drives, media
• Onsite (online and offline)
• Offsite tape out/vaulting
• Specialized software and personnel
• Tape refresh every 3-5 years
How can AWS help with your archival?
Metered usage:
Pay as you go
No capital investment
No commitment
No risky capacity planning
Avoid risks of physical
media handling
Control your
geographic locality for
performance and
compliance
1 PB raw storage
800 TB usable storage
600 TB allocated storage
400 TB application data
Storage pricing - pay only for what you use
AWS Cloud
Storage
Amazon Glacier starts at $0.004/GB/month
Price drop by 43% on 11/21
Consideration 2 – Durability
99.999999999%Durability
Durability for long-term preservation
Built-in Fixity Checking
Automatic recovery
Consideration 3 – Accessibility
Amazon Glacier – Data Retrieval Tiers
Standard Retrieval
• Current model
• 3-5 hours
• Disaster Recovery
Bulk Retrieval
• Batch/Bulk access
• 5-12 hours
• PB scale re-transcoding
or video/image analysis
Expedited Retrieval
• Emergency access
• 1-5 minutes
• Last minute play-out
schedule swap
$0.03/GB $0.01/GB $0.0025/GB
On-site tape replacement Off-site tape replacement
Consideration 4 - Application & Data Management
Amazon Glacier – 3 ways to Access
•Direct Glacier API/SDK
•S3 lifecycle integration
•Third party tools and gateways
Amazon Glacier – Direct access/APIs
Create Vault
Configure Access
Upload Archives
Register Archive ID
Data Upload
Initiate Retrieval
AsyncRetrieval
Completion
Completion Notification
Download Data
Data Retrieval
Use Glacier via S3 Object Lifecycle
S3 Standard
Active data Archive dataInfrequently accessed data
S3 - Infrequent Access Amazon Glacier
Synchronous access Async accessSynchronous access
$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.
- Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
- Transition based on object tags
- Expiration and versioning
Data lifecycle management
T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days
Data access frequency over time
Transition older videos to Standard-IA
Save money on storage
45% saving over S3 Standard
44% saving over S3 Standard-IA
* Assumes the highest public pricing tier
Amazon Glacier – Third-party tools and gateways
• Consumer grade: less than $50
• Example: Cloudberry, FastGlacier, Arq (Haystack Software)
• Small / medium business: $500 - $1,000
• Example: Synology, Veeam, QNap
• Enterprise gateway and data management software
• Example: NetApp AltaVault, CommVault, StorNext, Vidispine
Which option should I choose?
• Use S3 lifecycle managed Amazon Glacier if the S3
object keys are sufficient for index/search capability
• Use Amazon Glacier directly if you already plan to store
more metadata/indices in a database
• Use 3rd party tools to minimize coding
• Does the tool write data in proprietary or native format in AWS?
corporate data center
Media Archive and Metadata (cloud transition)
Onsite Archive Offsite Tape Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
On-Premise Tape
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing Metadata from on-prem)
Amazon Direct Connect
Offsite Tape ArchiveOn-Premise Tape
Media Archive (transition to the cloud)
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing
Metadata from on-
prem)
Amazon S3
Cloud Based Processing
Tasks
Amazon Direct Connect
On-Premise Tape Offsite Tape Archive
Media Archive (transition to the cloud)
Onsite Archive
Hierarchical Storage Manager
Metadata (Asset Manager)
Processing Tasks
corporate data center
AWS Region
Amazon Glacier
Cloud DAM (Syncing
Metadata from on-
prem)
Amazon S3
Cloud Based Processing
Tasks
Amazon Direct Connect
Onsite Cache Offsite Tape ArchiveOn-Premise Tape
Media Archive (transition to the cloud)
Consideration 5 - Compliance and Retention
Amazon Glacier Vault Lock allows you to easily
set compliance controls on individual vaults and
enforce them via a lockable policy
Time-based retention
MFA authentication
Controls govern all
records in a vault
Immutable policy
Two-step locking
Compliance storage with Vault Lock
Vault Lock for compliance storage
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure optional designated third-party access and grant
temporary access
Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the requirements
of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rich Sutton, VP of Engineering
Digital Risk, Social Media Security, and Compliance
Proofpoint SocialPatrol Archive
AWS Glacier and Vault Lock
Use Case
Proofpoint
• Cloud-based security and compliance for the enterprise:
threat research, email, mobile, social, digital risk
• Founded 2002, public in 2012
• $350M annual revenue, $3B market cap
• Huge AWS user
Proofpoint SocialPatrol
Policy controls and enforcement for social
• Combats fraudulent brand impersonation
• Moderates content at scale
• Ensures compliance in publishing
• Integrates with social APIs
• 150+ classifiers using NLP and ML
• Text, links, images, meta data
• Ingesting >1M social posts per day
• Built in AWS
Proofpoint SocialPatrol
How it works:
PFPT in AWS
Policy engine MySQL/C*/SolrEnterprise
Archive
“Awesome. Help me with retention by integrating with my existing email archive.”
Social
Proofpoint SocialPatrol archiving integration
Imperfect …
Social != Email Every archive is
different
Requires internal
collaboration
Proofpoint SocialPatrol Archive
SEC Rule 17a-4(f)-compliant archive, purpose-built for
social, enabled by Amazon Glacier and Vault Lock
PFPT in AWS
Policy engine MySQL/C*/SolrSocial
Amazon Glacier
& Vault Lock
Proofpoint SocialPatrol Archive
The customer specifies the retention period in Proofpoint
Social:
Proofpoint SocialPatrol Archive
Via AWS API we create a vault for that customer:
Proofpoint SocialPatrol Archive
Via AWS API,
we lock the vault,
and specify policy
to observe a
legal hold via a tag.
Proofpoint SocialPatrol Archive
As social content flows in, we record its purge date and
surface that to the user. Each piece of social content is an
archive in the vault.
Proofpoint SocialPatrol Archive
Search UI uses
the copy of the data
we already had.
As archives expire,
we purge them.
Proofpoint SocialPatrol Archive
• Legal hold can be put in place by Proofpoint Support
• Data can be exported from Amazon Glacier by
Proofpoint Support when necessary
• Amazon Glacier with Vault Lock allowed us to build a
product that complies with SEC Rule 17a-4(f) and CFTC
Rule 1.31(b)-(c)
What would it have cost for us to build a WORM data store,
get it certified, and scale it … ?
Snowball Edge
• Accelerate PBs with AWS-
provided appliances
• NEW 100 TB model with
compute
Storage Gateway
• Instant hybrid cloud
• Up to 120 MB/s cloud upload rate
(4x improvement)
Data ingestion into AWS storage services
Firehose
• Ingest data streams directly into
AWS data stores
Direct Connect
• COLO to AWS
ISV Connectors
• Commvault
• Veritas
• etcetera
NEW S3 Transfer Acceleration
• Accelerate object transfer up to
300% using AWS’s private
network
Related Sessions
STG302 - Deep Dive on Amazon Glacier
STG210 - Simplified Data Center Migration—Lessons
Learned by Live Nation
STG312 - Workshop: Working with AWS Snowball -
Accelerating Data Ingest into the Cloud
Related Sessions
STG302 - Deep Dive on Amazon Glacier
STG210 - Simplified Data Center Migration—Lessons
Learned by Live Nation
STG312 - Workshop: Working with AWS Snowball -
Accelerating Data Ingest into the Cloud
Remember to complete
your evaluations!
Thank you!
top related