Top Banner
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution* * All unlicensed or borrowed works retain their original licenses Pets vs. Cattle: The Elastic Cloud Story DevOps Chicago Meetup February 26, 2014 @randybias
49

Pets vs. Cattle: The Elastic Cloud Story

Jan 16, 2015

Download

Technology

My recent presentation to the Chicago DevOps Meetup that explains how we're moving from a servers as Pets world to a servers as Cattle world. Understanding this change is critical to success in cloud, DevOps, and delivering new value to the enterprise.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pets vs. Cattle: The Elastic Cloud Story

CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*!* All unlicensed or borrowed works retain their original licenses

Pets vs. Cattle:!The Elastic Cloud Story!DevOps Chicago Meetup!February 26, 2014

@randybias

Page 2: Pets vs. Cattle: The Elastic Cloud Story

A Tale of Two Clouds

�2

Page 3: Pets vs. Cattle: The Elastic Cloud Story

Enterprise Computing Approach

�3

GUI Driven!Ticket-Based!Hand-Crafted!

Reserved !Scale-up!

Smart Hardware!Proprietary!

Traditional Dev!…

Page 4: Pets vs. Cattle: The Elastic Cloud Story

Cloud Computing Approach

�4

API Driven!Self-Service!Automated!On-demand!Scale-out!

Smart Apps!Open Source!Agile DevOps!

Page 5: Pets vs. Cattle: The Elastic Cloud Story

Elastic Cloud Shifts Uptime Responsibility

�5

Enterprise Model Cloud Model

99.9%!Applications!

(8h46m down)

99.999%!Infrastructure!

($$$$)

99.999% Applications!(5m down)

99% Infrastructure!

($$)

Page 6: Pets vs. Cattle: The Elastic Cloud Story

Elastic Cloud Origins

�6

Elastic !Private Cloud

Enterprise Virtualization!Private Cloud

Elastic & Virtualization

2.0 Clouds are very different.!

!Different

workloads.!!

Different !architectures.!

!Different !

skills.!!

Different economics.

Virtual Infrastructure

Standardization, Automation,!

Chargeback, Self-Service!

Designed for Server Consolidation !IT Admins manage Infrastructure!Ticket-based manual provisioning!Improves virtualization value

=

+

Elastic Public Cloud

On-premise Deployment!

Designed for Agility!Cloud Admins manage Services!

Self-service automated provisioning!Delivers cloud value on-premise

=

+

Page 7: Pets vs. Cattle: The Elastic Cloud Story

What Companies Care About?

�7

Cloud Computing!

Agile Development!

Business !Agility!

Operational Discipline!

ACCELERATING!TIME TO VALUE!Continuous

Integration

Continuous Testing & Delivery

Agile Methodologies

IaaS / PaaS !!

Public / Private / Hybrid !!

Big Data / Analytics

!!

Public APIs

Continuous Deployment

DevOps Data Center & App Automation

Line of Business

Enablement

New App Initiatives

(Mobile, SaaS, etc.)

Data Center Modernization

Page 8: Pets vs. Cattle: The Elastic Cloud Story

Elastic Cloud is a Mindset Change

�8

Attribution: Bill Baker, Distinguished Engineer, Microsoft

bowzer.company.com!(scale-up)

web001.company.com!(scale-out)

(Virtual) Servers *are* cattle

Page 9: Pets vs. Cattle: The Elastic Cloud Story

Pets vs. Cattle Takes Off

�9

MicrosoftCloudscaling

CERN

IBM

ScalrRackspaceRed Hat

Scale-out, not UP in Cloud

Page 10: Pets vs. Cattle: The Elastic Cloud Story

(Some) Elastic Cloud Patterns!

!

!

What follows are *some* Elastic Cloud Patterns!There are many more, but these are mine!Input, ideas, & other thoughts welcome via twitter / email

�10

Page 11: Pets vs. Cattle: The Elastic Cloud Story

Big Failure Domains !Make Big Craters

�11

Page 12: Pets vs. Cattle: The Elastic Cloud Story

Big Failure Domains !Make Big Craters

�12

Anti-Pattern

Anti-Pattern

Page 13: Pets vs. Cattle: The Elastic Cloud Story

Smaller Failure Domains

�13

Would you rather have the whole cloud down !or just a small bit of it for a short time?

vs

Page 14: Pets vs. Cattle: The Elastic Cloud Story

Loose Coupling

�14

Synchronous, blocking calls mean cascading

failures.

Async, non-block calls mean failure in

isolation.

Page 15: Pets vs. Cattle: The Elastic Cloud Story

Open Source Software

�15

Excessive software taxation is the past.

Black boxes create lock-in.

You can !always fork.

Page 16: Pets vs. Cattle: The Elastic Cloud Story

Uptime in Software Self-management

�16

Hardware fails.!Software fails.!

People fail.

Only software can measure itself &

respond to failure in near real-time.

Applications designed for 99.999% uptime can

run anywhere

Page 17: Pets vs. Cattle: The Elastic Cloud Story

Scale Out vs Scale up

�17

Vertical Scaling Make boxes bigger (usually an HA pair)

Horizontal ScalingMake more boxes

A

A

➔➔

B

B ...A B C N

Page 18: Pets vs. Cattle: The Elastic Cloud Story

Circuit Breaker Pattern

�18

Fallback mechanisms (e.g. cached data)

ensure uninterrupted service while giving service time to

recover

When failing service detected, stop calling that

API and serve fallback responses

Page 19: Pets vs. Cattle: The Elastic Cloud Story

Buy from ODMs

�19

ODMs operate their businesses on 3-10%

margins.

AMZN, GOOG, and Facebook buy direct without a middleman.

Only a few enterprise vendors are pivoting to

compete.

Page 20: Pets vs. Cattle: The Elastic Cloud Story

Less Enterprise “Value” in x86 Servers

�20

Generic servers rule. Full stop. Nothing is better because nothing else is

*generic*.

“... a data center full of vanity free servers ... more efficient ... less expensive to build

and run ... “ - OCP

Page 21: Pets vs. Cattle: The Elastic Cloud Story

Fully Routed (L3) Networking

�21

The largest cloud operators all run layer-3 routed,

networks with no VLANs.

Cloud-ready apps don’t need or want VLANs.

Enterprise apps can be supported on elastic clouds

using Software-defined Networking (SDN)

Page 22: Pets vs. Cattle: The Elastic Cloud Story

Software-defined Networking (SDN)

�22

• x86 server is the new Linecard!• network switch is the new ASIC!• VXLAN (or NVGRE) is the new Chassis!• SDN Controller is the new SUP Engine

“Network Virtualization”

Page 23: Pets vs. Cattle: The Elastic Cloud Story

Flat Networking + SDNs

�23

Flat + SDN co-exist & thrive together

Standard SecurityGroup

1 2

Availability Zone

VM VM

VM

VM

VM

VM

Virtual L2 Network

VM

VMVM

Virtual Private Cloud

Networking

VPC SecurityGroup

Internet

VPC Gateway

Physical Node

Page 24: Pets vs. Cattle: The Elastic Cloud Story

RAIS instead of HA Pairs/ClustersRedundant arrays of inexpensive services (RAIS)!

Load balanced with no state sharing!Active … active … active … active … !On failure, connections are lost, but failures are rare!Rolling upgrades are easier, because each server is an island!Think: scale-out + fault isolation (sharding)!

Ridiculously simple & scalable!

Hardware failures are infrequent & impact subset of traffic!(N-F)/N, where N = total, F = failed!10 RAIS servers - 1 failure == 90% capacity!Most things retry anyway!

Cascade failures are unlikely and failure domains are small

�24

Page 25: Pets vs. Cattle: The Elastic Cloud Story

Service Array (RAIS) Example

�25

Backbone Routers

Cloud Access Switches

AZ (Spine) Switches

RAIS (NAT, LB, VPN)

OSPF Route Announcements

Return Traffic (default or source NAT)

API

Public IP Blocks

Cloud Control Plane

Page 26: Pets vs. Cattle: The Elastic Cloud Story

Lots of Inexpensive 1RU Switches

�26

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Modular: 40K-200K VMs / AZ

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Page 27: Pets vs. Cattle: The Elastic Cloud Story

Direct-attached Storage (DAS)

�27

Cloud-ready apps manage their own data replication.

DAS is the smallest failure domain possible with

reasonable storage I/O.

SAN == massive failure domain.

SSDs will be the great equalizer.

Page 28: Pets vs. Cattle: The Elastic Cloud Story

Elastic Block Device Services

�28

EBS/EBD is a crutch

Bigger failure domains (AWS outage anyone?), complex,

sets high expectations

Sometimes you need a crutch. When you do, overbuild the network, and make sure

you have a smart scheduler.

AWS EBS Outage!http://aws.amazon.com/message/65648/

Page 29: Pets vs. Cattle: The Elastic Cloud Story

More Servers == More Storage I/O

�29

>1M writes/second, triple-redundancy w/ Cassandra on AWS

Linear scale-out == linear costs for performance

Page 30: Pets vs. Cattle: The Elastic Cloud Story

Hypervisors are a Commodity

�30

Cloud end-users want OS of choice, not HVs.

Level up! Managing iron is for mainframe operators.!… hypervisors are bare metal APIs

Hypervisor of the future is open source, easily modifiable, &

extensible.

Page 31: Pets vs. Cattle: The Elastic Cloud Story

The Hypervisor of the Future May Be NO Hypervisor

�31

LXC

ironic

Bare Metal Cloud

Page 32: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�32

Page 33: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�33

Pets CattleLACP?

Page 34: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�34

Pets CattleLACP ➔

Page 35: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�35

Pets CattleLACP

Managing a Server at a Time?

Page 36: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�36

Pets CattleLACP

Managing a Serverat a Time ➔

Page 37: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�37

Pets CattleLACP

Managing Server at a Time

Auto-scaling?

Page 38: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�38

Pets CattleLACP

Managing Server at a Time

Auto-scaling➔

Page 39: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�39

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure?

Page 40: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�40

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure➔

Page 41: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�41

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals?

Page 42: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�42

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals ➔

Page 43: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�43

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy?

Page 44: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�44

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy ➔

Page 45: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�45

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture?

Page 46: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�46

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture➔

Page 47: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�47

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture

Persistent Block Storage?

Page 48: Pets vs. Cattle: The Elastic Cloud Story

Quiz Time

�48

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture

Persistent Block Storage ➔

Page 49: Pets vs. Cattle: The Elastic Cloud Story

Q & A

�49

Randy Bias!Founder & CEO, Cloudscaling!Director, OpenStack Foundation!@randybias