Pets vs. Cattle: The Elastic Cloud Story

Post on 16-Jan-2015

14188 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

My recent presentation to the Chicago DevOps Meetup that explains how we're moving from a servers as Pets world to a servers as Cattle world. Understanding this change is critical to success in cloud, DevOps, and delivering new value to the enterprise.

Transcript

CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*!* All unlicensed or borrowed works retain their original licenses

Pets vs. Cattle:!The Elastic Cloud Story!DevOps Chicago Meetup!February 26, 2014

@randybias

A Tale of Two Clouds

�2

Enterprise Computing Approach

�3

GUI Driven!Ticket-Based!Hand-Crafted!

Reserved !Scale-up!

Smart Hardware!Proprietary!

Traditional Dev!…

Cloud Computing Approach

�4

API Driven!Self-Service!Automated!On-demand!Scale-out!

Smart Apps!Open Source!Agile DevOps!

Elastic Cloud Shifts Uptime Responsibility

�5

Enterprise Model Cloud Model

99.9%!Applications!

(8h46m down)

99.999%!Infrastructure!

($$$$)

99.999% Applications!(5m down)

99% Infrastructure!

($$)

Elastic Cloud Origins

�6

Elastic !Private Cloud

Enterprise Virtualization!Private Cloud

Elastic & Virtualization

2.0 Clouds are very different.!

!Different

workloads.!!

Different !architectures.!

!Different !

skills.!!

Different economics.

Virtual Infrastructure

Standardization, Automation,!

Chargeback, Self-Service!

Designed for Server Consolidation !IT Admins manage Infrastructure!Ticket-based manual provisioning!Improves virtualization value

=

+

Elastic Public Cloud

On-premise Deployment!

Designed for Agility!Cloud Admins manage Services!

Self-service automated provisioning!Delivers cloud value on-premise

=

+

What Companies Care About?

�7

Cloud Computing!

Agile Development!

Business !Agility!

Operational Discipline!

ACCELERATING!TIME TO VALUE!Continuous

Integration

Continuous Testing & Delivery

Agile Methodologies

IaaS / PaaS !!

Public / Private / Hybrid !!

Big Data / Analytics

!!

Public APIs

Continuous Deployment

DevOps Data Center & App Automation

Line of Business

Enablement

New App Initiatives

(Mobile, SaaS, etc.)

Data Center Modernization

Elastic Cloud is a Mindset Change

�8

Attribution: Bill Baker, Distinguished Engineer, Microsoft

bowzer.company.com!(scale-up)

web001.company.com!(scale-out)

(Virtual) Servers *are* cattle

Pets vs. Cattle Takes Off

�9

MicrosoftCloudscaling

CERN

IBM

ScalrRackspaceRed Hat

Scale-out, not UP in Cloud

(Some) Elastic Cloud Patterns!

!

!

What follows are *some* Elastic Cloud Patterns!There are many more, but these are mine!Input, ideas, & other thoughts welcome via twitter / email

�10

Big Failure Domains !Make Big Craters

�11

Big Failure Domains !Make Big Craters

�12

Anti-Pattern

Anti-Pattern

Smaller Failure Domains

�13

Would you rather have the whole cloud down !or just a small bit of it for a short time?

vs

Loose Coupling

�14

Synchronous, blocking calls mean cascading

failures.

Async, non-block calls mean failure in

isolation.

Open Source Software

�15

Excessive software taxation is the past.

Black boxes create lock-in.

You can !always fork.

Uptime in Software Self-management

�16

Hardware fails.!Software fails.!

People fail.

Only software can measure itself &

respond to failure in near real-time.

Applications designed for 99.999% uptime can

run anywhere

Scale Out vs Scale up

�17

Vertical Scaling Make boxes bigger (usually an HA pair)

Horizontal ScalingMake more boxes

A

A

➔➔

B

B ...A B C N

Circuit Breaker Pattern

�18

Fallback mechanisms (e.g. cached data)

ensure uninterrupted service while giving service time to

recover

When failing service detected, stop calling that

API and serve fallback responses

Buy from ODMs

�19

ODMs operate their businesses on 3-10%

margins.

AMZN, GOOG, and Facebook buy direct without a middleman.

Only a few enterprise vendors are pivoting to

compete.

Less Enterprise “Value” in x86 Servers

�20

Generic servers rule. Full stop. Nothing is better because nothing else is

*generic*.

“... a data center full of vanity free servers ... more efficient ... less expensive to build

and run ... “ - OCP

Fully Routed (L3) Networking

�21

The largest cloud operators all run layer-3 routed,

networks with no VLANs.

Cloud-ready apps don’t need or want VLANs.

Enterprise apps can be supported on elastic clouds

using Software-defined Networking (SDN)

Software-defined Networking (SDN)

�22

• x86 server is the new Linecard!• network switch is the new ASIC!• VXLAN (or NVGRE) is the new Chassis!• SDN Controller is the new SUP Engine

“Network Virtualization”

Flat Networking + SDNs

�23

Flat + SDN co-exist & thrive together

Standard SecurityGroup

1 2

Availability Zone

VM VM

VM

VM

VM

VM

Virtual L2 Network

VM

VMVM

Virtual Private Cloud

Networking

VPC SecurityGroup

Internet

VPC Gateway

Physical Node

RAIS instead of HA Pairs/ClustersRedundant arrays of inexpensive services (RAIS)!

Load balanced with no state sharing!Active … active … active … active … !On failure, connections are lost, but failures are rare!Rolling upgrades are easier, because each server is an island!Think: scale-out + fault isolation (sharding)!

Ridiculously simple & scalable!

Hardware failures are infrequent & impact subset of traffic!(N-F)/N, where N = total, F = failed!10 RAIS servers - 1 failure == 90% capacity!Most things retry anyway!

Cascade failures are unlikely and failure domains are small

�24

Service Array (RAIS) Example

�25

Backbone Routers

Cloud Access Switches

AZ (Spine) Switches

RAIS (NAT, LB, VPN)

OSPF Route Announcements

Return Traffic (default or source NAT)

API

Public IP Blocks

Cloud Control Plane

Lots of Inexpensive 1RU Switches

�26

1RU: 6K-30K VMs / AZ

Simple spine-and-leaf flat routed network

Rack 1 Rack 2 Rack 3

Modular: 40K-200K VMs / AZ

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Rack 1Rack 2

MultipleRacks

Direct-attached Storage (DAS)

�27

Cloud-ready apps manage their own data replication.

DAS is the smallest failure domain possible with

reasonable storage I/O.

SAN == massive failure domain.

SSDs will be the great equalizer.

Elastic Block Device Services

�28

EBS/EBD is a crutch

Bigger failure domains (AWS outage anyone?), complex,

sets high expectations

Sometimes you need a crutch. When you do, overbuild the network, and make sure

you have a smart scheduler.

AWS EBS Outage!http://aws.amazon.com/message/65648/

More Servers == More Storage I/O

�29

>1M writes/second, triple-redundancy w/ Cassandra on AWS

Linear scale-out == linear costs for performance

Hypervisors are a Commodity

�30

Cloud end-users want OS of choice, not HVs.

Level up! Managing iron is for mainframe operators.!… hypervisors are bare metal APIs

Hypervisor of the future is open source, easily modifiable, &

extensible.

The Hypervisor of the Future May Be NO Hypervisor

�31

LXC

ironic

Bare Metal Cloud

Quiz Time

�32

Quiz Time

�33

Pets CattleLACP?

Quiz Time

�34

Pets CattleLACP ➔

Quiz Time

�35

Pets CattleLACP

Managing a Server at a Time?

Quiz Time

�36

Pets CattleLACP

Managing a Serverat a Time ➔

Quiz Time

�37

Pets CattleLACP

Managing Server at a Time

Auto-scaling?

Quiz Time

�38

Pets CattleLACP

Managing Server at a Time

Auto-scaling➔

Quiz Time

�39

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure?

Quiz Time

�40

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure➔

Quiz Time

�41

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals?

Quiz Time

�42

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals ➔

Quiz Time

�43

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy?

Quiz Time

�44

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy ➔

Quiz Time

�45

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture?

Quiz Time

�46

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture➔

Quiz Time

�47

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture

Persistent Block Storage?

Quiz Time

�48

Pets CattleLACP

Managing Server at a Time

Auto-scaling

Design-for-Failure

100% Uptime Goals

HA pairs for redundancy

Shared Nothing Architecture

Persistent Block Storage ➔

Q & A

�49

Randy Bias!Founder & CEO, Cloudscaling!Director, OpenStack Foundation!@randybias

top related