The Elepha nt in the Cloud Putting Hadoop on Any Cloud @natishalom
Jan 15, 2015
The Elephant
in the Cloud
Putting Hadoop on Any Cloud
@natishalom
Columbus & The Cloud
THE DISCOVERY OF AMERICA THE THING THAT MADE IT POSSIBLE
Why Cloud Portability
Matters
Cloud Portability Myth #1
No one really needs cloud portability
Cloud Portability
Facts
Zynga moved ~80% of their workload from Amazon to their private zCloud
“own the base, rent the spike”
http://code.zynga.com/2012/02/the-evolution-of-zcloud/
Cloud Portability
Facts Started with Linode, then moved to RackSpace, then to AWS
http://code.mixpanel.com/2010/11/08/amazon-vs-rackspace/
Cloud Portability
Facts
• You want the flexibility to choose what’s right for you, when it’s right for you
• Based on pricing, features, availability, performance, etc.
Cloud Portability Myth #2
Cloud Portability ==
Cloud API Standardization
Cloud APIs, Today
Standard APIs (?)OCCIVCloud
OSS FrameworksOpenStackCloudStackEucalyptus
Abstraction frameworksJCloudsDeltacloudFogLibvirt
Cloud APIs, Today
Standard APIsNot practical in the foreseeable future
OSS Projects Need a couple more years to converge &
mature
Abstraction FrameworksProbably the only
practical (near-term) option
Realization:
What You Really Care
about Is App
Portability
OS is the same on any cloud
Most clouds have compute & storage
Elasticity & scaling have same effects on the app, regardless of the cloud
Cloud Portability Myth #3 All infrastructure
clouds were born equal
Food for Thought
Offerings can vary quite a bit:
• Amazon guarantees only 99.5% uptime
• RackSpace will give you $$$ every time they crash
• Joyent claims to be significantly faster than both
And Some Features Are
Unique…
Amazon the only major vendor to offer SSD storage. Netflix says it’s:
• ½ the price for the same throughput
• ⅕ the latency on avg.
• Even slowest requests are 6x faster
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Let’s Talk Big Data on the Cloud
A Typical Big Data App…
Managing Big Data on the
Cloud
• Auto start VMs• Install and configure
app components • Monitor • Repair • (Auto) Scale• Burst…
The Challenges ..
Consistent Management
Making the deployment, installation, scaling, fail-over looks the same through the entire stack
The Challenges (Cont)..
Cloud Portability
Choosing the Right Cloud for the Job
Running Bare-Metal for high I/O workload, Public cloud for sporadic workloads..
Hadoop
• Available under different distributions
• Cloudera• IBM BigInsights• MapR• Hortonworks
Big Data Apps, on Any Cloud, Your Way
Open source (Apache2)
Putting Cloudify and
Hadoop Together
• Run on Any Cloud• Consistent MGT• Dynamic Scaling • Auto Recovery• Auto Scaling• Role Assignments • Monitoring• Simple maintenance
How it works..1 Upload your recipe.
2 Cloudify creates VM’s & installs agents
3 Agents install and manage your app
4 Cloudify automate the scaling
Few Snippets..
Thank You!
References: http://www.cloudifysource.org http://github.com/CloudifySource https://github.com/CloudifySource/cloudify-recipes/tree/master/services/biginsights