Planning Application Resilience Jennifer Davis 2/26/15
Jul 17, 2015
Goal: Communication
Jennifer Davis Solutions Engineer Twitter: @sigje Hashtag: #getchef Email: [email protected]
Applications reflect Quality of Organizations
Conway’s law
Organizations which design systems … are constrained to produce designs which are copies of the communication
structures of these organizations.
Applications reflect Quality of Organizations
The resilience of an organization will be reflected in the resilience of the applications and services built by the
organization.
5 Critical Metrics of Resilient Organizations
• Willingness to tackle challenges. • Sense of Agency. • Adaptability. • Diversity. • Rope Factor
Borg Syndrome
• Embrace Diversity. • Recognize differences in
perspectives. • Eliminate system
blindness.
Rope Factor
Enough rope to get things done, not enough for cowboys.
Rope too short Excessive time in meetings Death march for each sprint
Rope too long Cowboy behavior.
Resilience is ordinary.
• Intentional behaviors, thoughts, and actions. • Reflection of the organization.
Resilience isn’t managed through limiting change.
• Security Patches? • Over Engineering Delays in Schedule • Under Engineering – Rewrite required to scale
Stability is a myth.
Resilient Automation Platform
• Complex dependency handling between nodes. • Fault tolerance. • Security. • Multi-Platform. • Flexibility.
Chef is a language.
• Describe infrastructure as code. • Programmatically provision and
configure servers. • Versioning, artifacts
chef is a command line utility
• Generate skeleton for application, cookbook, recipes, attributes, files, templates, and custom resources.
• Prep environment with correct ruby gems. • Verifies environment is configured and installed correctly.
Chef is a community.
• Mailing lists • https://supermarket.chef.io/ • Chef Conf 3/31 – 4/2 Santa Clara • Chef Summit • IRC #chef • Twitter @chef
CODE: BUILDITBETTER
Infrastructure Automation is creating control systems that reduce the burden on people to manage services and increase the quality, accuracy and precision of a service to the consumers of the service.
Resources
• Fundamental building blocks • Describes piece of system and it’s desired state • Chef DSL is ruby.
Example of describing a resource
Recipe: (chef-apply cookbook)::(chef-apply recipe) * package[nano] action install - install version 2.0.9-7.el6 of package nano
sudo chef-apply -e "package 'nano'"
Test and Repair Resources follow a test and repair model
• package ”nano"
Is nano installed?
Done Install it
Yes No
Recipe
package “httpd”
template “/var/www/html/index.html” do
source “index.html.erb”
end
service “httpd” do
action [:enable, :start]
end
Cookbook
• A collection of recipes (and other elements like files and templates). • Map 1-1 to a piece of software or functionality. • Distribution unit • Versioned • Modular and re-usable.
Chef Provisioning – Part of Chef DK
https://flic.kr/p/knDPjc
• Describe multiple tier applications. • Deploy many copies of your
application cluster. • Spread cluster across different clouds/
machines. • Orchestrate deployment. • Parallelize machine deployment.
Multi-platform
• AWS • Azure • Fog • Vagrant • Docker • LXC • .. more
.. We’ll use AWS in this example https://github.com/chef/chef-provisioning-aws
http://aws.amazon.com/start-ups/loft/
AWS
• SQS Queues • SNS Topics • Elastic Load Balancers • VPCs • Security Groups • Instances • Images • Autoscaling Groups • SSH Key pairs • Launch configs
AWS Config: ~/.aws/config
[default] region=us-‐west-‐2 aws_access_key_id = aws_secret_access_key =
Edit Provision Recipe
require “chef/provisioning/aws_driver” with_driver “aws”
machine ‘web1’ do
recipe ‘webserver’
converge true
end
..but I need multiple webservers
require “chef/provisioning/aws_driver” with_driver “aws”
num_webservers = 3
(0… num_webservers).each do |i|
machine “web_0#{i}” do
recipe ‘apache’
end
end
…add security
aws_security_group "#{name}-http" do inbound_rules [{:ports => 80, :protocol => :tcp, :sources => ['0.0.0.0/0']}]
end
…add security
with_machine_options({ :bootstrap_options => {
:security_groups => [ "#{name}-‐http”] } })
..add load balancing
load_balancer "#{name}-‐webserver-‐lb" do load_balancer_options({ :availability_zones => ["us-‐west-‐2a", "us-‐west-‐2b", “us-‐west-‐2c"], :listeners => [{:port => 80, :protocol => :http, :instance_port => 80, :instance_protocol => :http }], :security_group_name => “#{name}-‐http” }) machines elb_instances end
Bulkhead Pattern
• Compartmentalization to limit failure. • Repeatable Clusters • … across platforms.
Responsive
• Chef-Client • Chef Handlers • Jenkins with Test Kitchen • Collaborate with Source Control • Share your stories
Responsive – Collaborate with Source Control
• Don’t let role adherence get in the way of collaboration. • Pull requests
Share your stories
• Blameless Postmortems are really useful. • Knowledge sharing across teams. • Share across companies – DevOpsDays
Jumpstart Learning
• The LearnChef Site • Guided Tutorials • Chef Fundamentals intro
http://learnchef.com • How-To’s, Conference Talks, Webinars, more
http://youtube.com/user/getchef • Attend a Chef Fundamentals Class (HELLO-CHEF code)
Further Resources
• http://chef.io • http://docs.chef.io • http://supermarket.chef.io • http://lists.opscode.com • irc.freenode.net #chef • Twitter @chef #getchef, @learnchef #learnchef