© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. David Stein, Business Development EBS November 30, 2016 Case Study: How Zendesk and Videology Modernized Their Big Data Platforms on Amazon EBS STG311
Jan 06, 2017
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
David Stein, Business Development EBS
November 30, 2016
Case Study: How Zendesk and
Videology Modernized Their Big
Data Platforms on Amazon EBS
STG311
What to Expect from the Session
• How to architect big data processing platforms to scale to meet
growing demand while improving performance, availability, and cost
with Amazon EBS
• Learn how about new ST1 and SC1 Throughput Optimized EBS
volumes designed for big data workloads
• Overview of how Zendesk runs a large ELK (Elasticsearch,
Logstach, Kibana) on Amazon EC2 and EBS for their cloud-based
customer support platform
• Overview of how Videology runs a Hadoop architecture on EC2 and
EBS to ingest, process, and analyze logs for their converged
advertising solution
Amazon EFS
File
Amazon EBSAmazon EC2
Instance Store
Block
Amazon S3 Amazon Glacier
Object
AWS storage is a platform
Data Transfer
AWS Direct
Connect
ISV
Connectors
Amazon
Kinesis
Firehose
AWS Storage
Gateway
Amazon S3
Transfer
Acceleration
AWS
SnowballAmazon
CloudFront
Internet/VPN
EBS volume types
Hard disk drive
(HDD)Solid state drive
(SSD)
EBS volume types
General Purpose
SSD
gp2
Provisioned IOPS
SSD
io1
Throughput Optimized
HDD
st1
Cold
HDD
sc1
SSD HDD
EBS volume types: throughput
Throughput
Optimized HDD
st1
Baseline: 40 MB/s per TB up to 500 MB/s
Capacity: 500 GB to 16 TB
Burst: 250 MB/s per TB up to 500 MB/s
Ideal for large-block, high-throughput sequential workloads
Cold HDD
sc1
EBS volume types: throughput
Baseline: 12 MB/s per TB up to 192 MB/s
Capacity: 500 GB to 16 TB
Burst: 80 MB/s per TB up to 250 MB/s
Ideal for sequential throughput workloads such as logging and backup
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kyle House, David Bernstein, Zendesk
November 30, 2016
Case Study: How Zendesk Modernized
Their Big Data Platforms on Amazon EBS
Inside Our New ELK Deployment
Zendesk builds software for better customer relationships. It empowers organizations to improve customer engagement and better understand their customers. More than 87,000 paid customer accounts in over 150 countries and territories use Zendesk products. Based in San Francisco.
What to Expect from the Session
• Discuss storage redesign, utilizing
new Amazon EBS volumes
• Talk through design choices
• Explain benefits of new storage
• model
• Cost benefits of “rightsizing” storage
ELK at Zendesk
Distributed database
Log ingestion/parsing
Beautiful visualizations
The Problem
- Operational headaches
- Encryption
- Data retention
- Cost too high
The Investigation
- User access patterns
- Performance requirements
- New EBS volume types
The Proposal
- Full usage of EBS with new volume types
- Create a tiered storage model
- Optimize instance types; decouple instances from
storage
Tiered storage
Hot (0-7 days)General Purpose
SSD (gp2)
Warm (8-30 days)Throughput
Optimized HDD (st1)
Cold (31-60 days) Cold HDD (sc1)
Topology
VPN
gateway
3 x m4.large
esclient/esmaster
Proxy
Bastion
3 x m4.large
esclient/esmaster
gp2 roots
8 x c4.large
logindexers
8 x c4.large
logindexers
gp2 roots
gp2 roots
gp2 roots
gp2 roots +
11G (hot)
st1
35G (warm)
sc1
80g (cold)
10 x r3.2large
esdata
10 x r3.2large
esdata
gp2 roots +
11G (hot)
st1
35G (warm)
sc1
80g (cold)
Availability Zone
Availability Zone
Sparkleformation
The Result
- Reduced operating costs by 50%
- Increased data retention 3x
- Predictable scaling model• Storage allocation detached from instance count
- Increased data transport reliability
- Reduced operational overhead
- Increased cluster stability
49% Reduction 79% Reduction
Recommendations
- Identify data usage model before you build
- Find places where performance matters, and where
cost can be optimized
- Reduce over-provisioned storage/IOPS
- Utilize AWS managed services whenever possible
Thank you!
Up next in this session:
Videology
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
VideologyPaul Frederiksen – Principal DevOps Engineer
David Ortiz – Senior Software Engineer
Videology Big Data Team
November 30, 2016
On the Rocky Road to EBS
Videology’s Journey to EBS-backed Big Data
What to Expect from the Session
• Intro to Videology
• Challenges
• Road to EBS-backed cluster
• Happy engineers
Videology overview
Founded:2007 by Scott Ferber, co-founder of Advertising.com, which sold to
AOL Time Warner in 2004 for $497 Million
Corporate
Headquarters:New York, NY
Operations:• Operating in 28 Global Markets
• Key Offices – New York, Baltimore, Toronto, London, Singapore
& Sydney
Employees: Approximately 380
InvestorsNEA, Comcast Ventures, Harbourvest, Catalyst Investors,
Pinnacle Ventures, Valhalla Venture
Customers:4,500 Active Users including Brand marketers, agencies, trading
desks, media companies, MVPD’s
Ecosystem
Integrations:
Open platform with 2200+ ecosystem integrations, including 1000+
media companies, 40 data providers, all major 3rd party
verification providers, and dozens of technology partners across
the media ecosystem
Recent Client
Wins:
Videology provides a
converged advertising solution
that is screen-agnostic,
ensuring unduplicated reach
with the right frequency
cadence to achieve
guaranteed results.
45
Industry accolades…
Videology was named Best Digital Video Ad Platform by Cynopsis Media at their
2015 Model D Awards.“ ”
Videology was able to show that their platform drove brand lift that was on average 6X
higher than Nielsen's norms.“ ”
Videology has the most sophisticated media optimizer to analyze the right
allocation of TV and online video to optimize reach and campaign cost.“ ”
Hadoop overview
NameNode
ResourceManager
Gateway
DataNode
NodeManager
Where does big data processing fit in?
Original production
Instance
Type
Qty Role vCPU RAM
(GB)
Storage
(GB)
m3.xlarge 1 Jumpbox 4 15 80
m3.xlarge 1 Cloudera
Manager
4 15 80
m3.2xlarge 2 NN/RM 8 30 160
cc2.8xlarge 1 Service Master 32 60 3,200
cc2.8xlarge 30 Worker 32 60 3,200
I’ve got 99 problems and Hadoop is a few of them
Reliability
Scalability
Distcp
CPU to Memory Ratio
2015Q2
2016Q3
2016Q4 and beyond
Engaged Cloudera
for EBS support
Gave up on EBS
and tested D2s
New EBS to
the rescue!
Take advantage of
new hardware
CC2.8XL M4.10XLD2.8XL
Old
Not enough disk
Expensive
NirvanaLots of disk!
Not enough memory
Expensive
D2.8xl prototype
Instance
Type
Qty Role vCPU RAM
(GB)
Storage
(GB)
r3.large 1 Jumpbox 2 15.25 32
r3.large 1 Cloudera
Manager
2 15.25 32
r3.xlarge 2 NN/RM 8 30 160
r3.2xlarge 2 Service Master 8 61 160
d2.8xl 10 Worker 36 244 48,000
M4.10xlarge w/ sc1 prototype
Instance
Type
Qty Role vCPU RAM
(GB)
Storage
(GB)
r3.large 1 Jumpbox 2 15.25 32
r3.large 1 Cloudera
Manager
2 15.25 32
r3.xlarge 2 NN/RM 8 30 160
r3.2xlarge 2 Service Master 8 61 160
m4.10xlarge 18 Worker 40 160 4,000
M4.10xlarge w/ st1 prototype
Instance
Type
Qty Role vCPU RAM
(GB)
Storage
(GB)
r3.large 1 Jumpbox 2 15.25 32
r3.large 1 Cloudera Manager 2 15.25 32
r3.xlarge 2 NN/RM 8 30 160
r3.2xlarge 2 Service Master 8 61 160
m4.10xlarg
e
18 Worker 40 160 8,000
Problems no more!
• No more rebuilding Nodes
• 1 critical incident since switch vs. 5 in the year prior to release
• Get to play with kids instead of babysitting cluster
Engineering benefits - capacity
No longer restricted by
memory, we now have
resources to pursue other
tools to improve our reliability
and speed:
• Spark
• HBase
• Flafka
• Offloading processing from
Amazon Redshift to CDH
More resilient to log volume
increases
Can expand storage as
requirements changes
Financial benefits
$0.00
$5,000.00
$10,000.00
$15,000.00
$20,000.00
$25,000.00
$30,000.00
Total Cost Cost by Utilization
Cc2 M4
$0.00
$0.01
$0.02
$0.03
$0.04
$0.05
$0.06
$0.07
$0.08
$0.09
$0.10
Cost to Process 1000 Requests
Cc2 M4
Thank you!
Questions?
Remember to complete
your evaluations!