Top Banner
Planning Application Resilience Jennifer Davis 2/26/15
64
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Planning Application Resilience

Planning Application Resilience

Jennifer Davis 2/26/15

Page 2: Planning Application Resilience

Goal: Communication

Jennifer Davis Solutions Engineer Twitter: @sigje Hashtag: #getchef Email: [email protected]

Page 3: Planning Application Resilience

What is Resilience?

Page 4: Planning Application Resilience

Resilience

•  Elasticity – Spring back into shape •  Recoverability – Quick to recover/rebuild

Page 5: Planning Application Resilience

Enduring Resilience

•  State of Elasticity •  State of Recoverability

Page 6: Planning Application Resilience

Resilience Minimum Goal!

Page 7: Planning Application Resilience

Fear

Depression

Fear

Event

Degradation

Failure

Event

Page 8: Planning Application Resilience

Resilience

Recognition

Event Recovery

Page 9: Planning Application Resilience

Post Traumatic Growth

Recognition

Event

Growth

Page 10: Planning Application Resilience

Applications reflect Quality of Organizations

Conway’s law

Organizations which design systems … are constrained to produce designs which are copies of the communication

structures of these organizations.

Page 11: Planning Application Resilience

Applications reflect Quality of Organizations

The resilience of an organization will be reflected in the resilience of the applications and services built by the

organization.

Page 12: Planning Application Resilience

5 Critical Metrics of Resilient Organizations

• Willingness to tackle challenges. • Sense of Agency. • Adaptability. • Diversity. • Rope Factor

Page 13: Planning Application Resilience

Red Shirt Syndrome

• Tackling challenges. •  Learned Helplessness.

Page 14: Planning Application Resilience

Stormtrooper Syndrome

• Agency. • Adaptability: Role Adherence.

Page 15: Planning Application Resilience

Borg Syndrome

• Embrace Diversity. • Recognize differences in

perspectives. • Eliminate system

blindness.

Page 16: Planning Application Resilience

Rope Factor

Enough rope to get things done, not enough for cowboys.

Rope too short Excessive time in meetings Death march for each sprint

Rope too long Cowboy behavior.

Page 17: Planning Application Resilience

Resilience is ordinary.

•  Intentional behaviors, thoughts, and actions. •  Reflection of the organization.

Page 18: Planning Application Resilience

Don’t build organizational systems that encourage the wrong behaviors.

Page 19: Planning Application Resilience

Resilience isn’t managed through limiting change.

•  Security Patches? •  Over Engineering Delays in Schedule •  Under Engineering – Rewrite required to scale

Stability is a myth.

Page 20: Planning Application Resilience

Qualities of Resilient Software

•  Elasticity •  Recoverability

Page 21: Planning Application Resilience

Qualities of Resilient Software

•  Elasticity •  Recoverability

Automation Resilience

Page 22: Planning Application Resilience

Resilient Automation Platform

•  Complex dependency handling between nodes. •  Fault tolerance. •  Security. •  Multi-Platform. •  Flexibility.

Page 23: Planning Application Resilience

Chef is a Resilient Automation Platform.

Page 24: Planning Application Resilience

Chef is a language.

• Describe infrastructure as code. •  Programmatically provision and

configure servers. •  Versioning, artifacts

Page 25: Planning Application Resilience

Chef is a toolset

•  Collection of tools that allow you to model, measure, and improve workflows.

Page 26: Planning Application Resilience

chef is a command line utility

•  Generate skeleton for application, cookbook, recipes, attributes, files, templates, and custom resources.

•  Prep environment with correct ruby gems. •  Verifies environment is configured and installed correctly.

Page 27: Planning Application Resilience

Chef is a community.

•  Mailing lists •  https://supermarket.chef.io/ •  Chef Conf 3/31 – 4/2 Santa Clara •  Chef Summit •  IRC #chef •  Twitter @chef

CODE: BUILDITBETTER

Page 28: Planning Application Resilience

Configuration as Code

Elastic Configurable

Responsive

Elasticity Recoverability

Page 29: Planning Application Resilience

Infrastructure Automation is creating control systems that reduce the burden on people to manage services and increase the quality, accuracy and precision of a service to the consumers of the service.

Page 30: Planning Application Resilience

Infrastructure Elements to Resources

file package

cron user

File Package Cron Job

User

Page 31: Planning Application Resilience

Resources

•  Fundamental building blocks •  Describes piece of system and it’s desired state •  Chef DSL is ruby.

Page 32: Planning Application Resilience

Example of describing a resource

Recipe: (chef-apply cookbook)::(chef-apply recipe) * package[nano] action install - install version 2.0.9-7.el6 of package nano

sudo chef-apply -e "package 'nano'"

Page 33: Planning Application Resilience

Test and Repair Resources follow a test and repair model

•  package ”nano"

Is nano installed?

Done Install it

Yes No

Page 34: Planning Application Resilience

Recipe

•  A recipe is an ordered list of resources.

Page 35: Planning Application Resilience

Recipe

package “httpd”

template “/var/www/html/index.html” do

source “index.html.erb”

end

service “httpd” do

action [:enable, :start]

end

Page 36: Planning Application Resilience

Cookbook

•  A collection of recipes (and other elements like files and templates). •  Map 1-1 to a piece of software or functionality. •  Distribution unit •  Versioned •  Modular and re-usable.

Page 37: Planning Application Resilience

Chef Provisioning – Part of Chef DK

https://flic.kr/p/knDPjc

•  Describe multiple tier applications. •  Deploy many copies of your

application cluster. •  Spread cluster across different clouds/

machines. •  Orchestrate deployment. •  Parallelize machine deployment.

Page 38: Planning Application Resilience

Configuration as Code

Elastic Configurable

Responsive

Elasticity Recoverability

Page 39: Planning Application Resilience

Chef Provisioning

machine ‘web1’ do recipe ‘webserver’

end

Page 40: Planning Application Resilience

Multi-platform

•  AWS •  Azure •  Fog •  Vagrant •  Docker •  LXC •  .. more

.. We’ll use AWS in this example https://github.com/chef/chef-provisioning-aws

http://aws.amazon.com/start-ups/loft/

Page 41: Planning Application Resilience

AWS

•  SQS Queues •  SNS Topics •  Elastic Load Balancers •  VPCs •  Security Groups •  Instances •  Images •  Autoscaling Groups •  SSH Key pairs •  Launch configs

Page 42: Planning Application Resilience

AWS Config: ~/.aws/config

[default]  region=us-­‐west-­‐2  aws_access_key_id  =    aws_secret_access_key  =    

Page 43: Planning Application Resilience

Cookbook Setup

$ chef generate cookbook webserver

Page 44: Planning Application Resilience

Provision Recipe

$ cd webserver $ chef generate recipe provision

Page 45: Planning Application Resilience

Edit Provision Recipe

$ vi recipes/provision.rb

Page 46: Planning Application Resilience

Edit Provision Recipe

require “chef/provisioning/aws_driver” with_driver “aws”

machine ‘web1’ do

recipe ‘webserver’

converge true

end

Page 47: Planning Application Resilience
Page 48: Planning Application Resilience

..but I need multiple webservers

require “chef/provisioning/aws_driver” with_driver “aws”

num_webservers = 3

(0… num_webservers).each do |i|

machine “web_0#{i}” do

recipe ‘apache’

end

end

Page 49: Planning Application Resilience

…add security

aws_security_group "#{name}-http" do inbound_rules [{:ports => 80, :protocol => :tcp, :sources => ['0.0.0.0/0']}]

end

Page 50: Planning Application Resilience

…add security

with_machine_options({      :bootstrap_options  =>  {  

       :security_groups  =>  [  "#{name}-­‐http”]      }  })

Page 51: Planning Application Resilience

..add load balancing

load_balancer  "#{name}-­‐webserver-­‐lb"  do      load_balancer_options({          :availability_zones  =>  ["us-­‐west-­‐2a",  "us-­‐west-­‐2b",  “us-­‐west-­‐2c"],          :listeners  =>  [{:port  =>  80,  :protocol  =>  :http,  :instance_port  =>  80,  :instance_protocol  =>  :http  }],          :security_group_name  =>  “#{name}-­‐http”      })      machines  elb_instances  end  

Page 52: Planning Application Resilience

Bulkhead Pattern

•  Compartmentalization to limit failure. •  Repeatable Clusters •  … across platforms.

Page 53: Planning Application Resilience

Configuration as Code

Elastic Configurable Responsive

Elasticity Recoverability

Page 54: Planning Application Resilience

Responsive

•  Chef-Client •  Chef Handlers •  Jenkins with Test Kitchen •  Collaborate with Source Control •  Share your stories

Page 55: Planning Application Resilience

Responsive – chef-client

•  Agent that runs on node applies policy

Page 56: Planning Application Resilience

Responsive - Jenkins with Test Kitchen •  Write tests to minimize risk •  Push change regularly

Page 57: Planning Application Resilience

Responsive – Collaborate with Source Control

•  Don’t let role adherence get in the way of collaboration. •  Pull requests

Page 58: Planning Application Resilience

Responsive – Chef Handlers

•  Start •  Exception •  Report

Page 59: Planning Application Resilience

Share your stories

•  Blameless Postmortems are really useful. •  Knowledge sharing across teams. •  Share across companies – DevOpsDays

Page 60: Planning Application Resilience

Don’t build tools that create systems that encourage the wrong behaviors.

Page 61: Planning Application Resilience

Jumpstart Learning

•  The LearnChef Site •  Guided Tutorials •  Chef Fundamentals intro

http://learnchef.com •  How-To’s, Conference Talks, Webinars, more

http://youtube.com/user/getchef •  Attend a Chef Fundamentals Class (HELLO-CHEF code)

Page 62: Planning Application Resilience

Further Resources

•  http://chef.io •  http://docs.chef.io •  http://supermarket.chef.io •  http://lists.opscode.com •  irc.freenode.net #chef •  Twitter @chef #getchef, @learnchef #learnchef

Page 63: Planning Application Resilience

Thank you!

Jennifer Davis Twitter: @sigje Hashtag: #getchef Email: [email protected]

Page 64: Planning Application Resilience