Automating Cloud Applications Using Open Source

Post on 24-Jan-2015

6070 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

With the proliferation of tools, frameworks, and libraries, it’s now easier than ever to build cloud-based systems. However, while each tool is designed to solve a specific pain point, gaps exist when it comes to a holistic approach to managing the cloud-based software lifecycle. Using real-world examples, BrightTag engineers explain how they helped design a highly scalable platform and automated zero-downtime deploys using primarily off-the-shelf open source software. The talk will focus on the software lifecycle, broken into three high-level areas of focus: Design, deployment and monitoring. This session will review considerations for designing applications to take advantage of cloud-based deployment and demonstrate how to leverage existing open source tools like fabric, haproxy, libcloud, and graphite to create a scalable and flexible infrastructure.

Transcript

Automating Life in the Cloud!

Joshua Buss, Matthew Kemp & Cody Ray!

2

Add more features!!!This widget is too slow!!!No more downtime!!!Weʼre losing potential customers in Asia!!

Use Case 0: Scalability and Reliability!Designing for the Cloud!

3

Focus on scaling applications horizontally.!

Use Case 0: Scalability and Reliability!Scalability!

4

Wikipedia Definition:!SOA as an architecture relies on service-orientation as its fundamental design principle. If a service presents a simple interface that abstracts away its underlying complexity, users can access independent services without knowledge of the service's platform implementation.!!Laymanʼs terms:!A complex system is broken into simple components that are able to interact with each other (and possibly outside sources).!

Use Case 0: Scalability and Reliability!Service Oriented Architecture!

5

What is a Service in SOA?!

6

An independent unit that's composable with other components.!

Use Case 0: Scalability and Reliability!

Presenta(on  (web,  api,  etc)  

Business  Logic  

Data  Access    

Data  Access  

Business  Logic  

Data  Access  

Data  Stores  

Use Case 0: Scalability and Reliability!Services at BrightTag!

7

database  

stathub  

datahub  

ui   database  

tagserve  

When should you split services up?!

Use Case 0: Scalability and Reliability!Service Division of Labor!

8

Keep failures self contained.!

Use Case 0: Scalability and Reliability!Design for Failure!

9

Release It! by Nyard is a great resource for stability patterns

stathub   database  

database  datahub  tagserve  

ui  

Run a full stack in each region.!Use Case 0: Scalability and Reliability!Redundancy at BrightTag!

10

database  

stathub  

datahub  

ui   database  

tagserve  

Services are over HTTP.!!Able to use standard tools and components without extra effort.!

Use Case 0: Scalability and Reliability!!

Load Balancers!

11

Changes need to be allowed, but compatibility needs to be maintained.!!

Use Case 0: Scalability and Reliability!Backwards Compatibility!

12

Need some data available in all regions, but keep inter-region communication to a minimum.!!

Case 1: Inter-Region Communication!Cross-Region Data Replication!

13

Google's BigTable data model on Amazon's Dynamo infrastructure.!

Case 1: Inter-Region Communication!What is Cassandra?!

14

Case 1: Inter-Region Communication!Cassandra Token Ring!

15

cassandra01  [0-­‐63]  

cassandra02  [64-­‐127]  

cassandra03  [128-­‐191]  

cassandra04  [192-­‐255]  

East

cassandra01  [1-­‐64]  

cassandra02  [65-­‐128]  

cassandra03  [129-­‐192]  

cassandra04  [193-­‐0]  

West

Key hashes to 157?

Case 1: Inter-Region Communication!How Cassandra Writes!

16

cassandra01  [0-­‐63]  

cassandra02  [64-­‐127]  

cassandra03  [128-­‐191]  

cassandra04  [192-­‐255]  

East

cassandra01  [1-­‐64]  

cassandra02  [65-­‐128]  

cassandra03  [129-­‐192]  

cassandra04  [193-­‐0]  

West

Writes goes here.

Cross region messaging over HTTPS with compression.!

Case 1: Inter-Region Communication!Cross Region Messaging (Hiveway)!

17

local  hiveway  

remote  hiveway  

Mes

sage

s

Mes

sage

s

Use Case 2: Zero Downtime Builds!Smooth Code Pushes!

18

Easy migrations and upgrade path.!

Can be more expensive.!

Use Case 2: Zero Downtime Builds!Mirror Environment Cutover!

19

More complicated migrations and upgrades.!!Longer deploy window.!!Usually cheaper.!!

Use Case 2: Zero Downtime Builds!Rolling Deploy!

20

 for  region  in  regions:      for  app  in  apps:          for  server  in  region:              if  app  on  server:                  maintenance  app                  scp  new  code  to  <deployment_tag>  dir                  symlink  app/current  to  app/<deployment_tag>                  restart  app                  wait  for  healthy!

Use Case 2: Zero Downtime Builds!Fabric Pseudocode!

21

Use Case 2: Zero Downtime Builds!!

Health Checks at BrightTag!

22

Standardized health checks across services.!!!$  curl  -­‐si  'http://service/bthc'  HTTP/1.1  204  No  Content    $  curl  -­‐si  'http://service/bthc?action=maint'  HTTP/1.1  500  Internal  Server  Error  Connection:  close  Content-­‐Length:  5    MAINT  

At a glance environment health.!Use Case 2: Zero Downtime Builds!Keeping an Eye on the Pulse!

23

Provide multiple modes of operation.!

Use Case 2: Zero Downtime Builds!Runtime Controls!

24

Use Case 3: Generating /etc/hosts Connectivity

Use Case 3: Generating /etc/host What is Zerg?!

26

+   =  

DRIVER_MAPPING  =  {        "dev":  {          "office":  get_driver(Provider.EUCALYPTUS)(              DEV_ID,  secret=DEV_KEY,  host="openmaster",  port=8773,              secure=False,  path="/services/Cloud")      },      "prod":  {          "us-­‐east-­‐1":  get_driver(Provider.EC2_US_EAST)(PROD_ID,  PROD_KEY),          "eu-­‐west-­‐1":  get_driver(Provider.EC2_EU_WEST)(PROD_ID,  PROD_KEY)      }  }    

@app.route("/hosts/<env>/<region>")  def  hosts(env,  region):      hosts  =  DRIVER_MAPPING[env][region].list_nodes()      return  str([d.extra['private_dns']  for  host  in  hosts])  

!

Use Case 3: Generating /etc/hosts Flask and libcloud Working Together!

27

@app.route("/etchosts/<env>/<region>")  def  etchosts(env,  region):      driver  =  DRIVER_MAPPING[env][region]      sorted_nodes  =  sorted((node.name,  node.private_ips,  node.public_ips)  for  node  

in  driver.list_nodes())      hosts  =  [{'private_ip':private_ips[0],  'name':name,  'public_ip':public_ips[0]}  

for  (name,  private_ips,  public_ips)  in  sorted_nodes]      response  =  render_template('etc_hosts.txt',  hosts=hosts)      return  Response(response,  content_type='text/plain')    

Template:!#  The  following  lines  are  desirable  for  IPv6  capable  hosts  ::1  ip6-­‐localhost  ip6-­‐loopback  

{%  for  host  in  hosts  %}  {{  "%-­‐21s%-­‐21s#  External:  %s"|format(host.private_ip,  host.name,  

host.public_ip)  }}  {%-­‐  endfor  %}    

Use Case 3: Generating /etc/hosts The Zerg Code!

28

$ curl –s 'http://zerg/etchosts/prod/eu-west-1'

# The following lines are desirable for IPv6 capable hosts"

::1 ip6-localhost ip6-loopback

10.0.0.10 server01 # External: 123.123.123.123

10.0.0.11 server02 # External: 123.123.123.124

10.0.0.12 server03 # External: 123.123.123.125

10.0.0.13 server04 # External: 123.123.123.126

10.0.0.14 server05 # External: 123.123.123.127

10.0.0.15 server06 # External: 123.123.123.128

Use Case 3: Generating /etc/hosts !

The Zerg HTTP Response!

29

#  Set  variables  read  -­‐r  -­‐d  ''  STATIC_HOSTS  <<  static_hosts  #  The  following  lines  are  included  by  default  127.0.0.1              localhost    #  DO  NOT  EDIT  THIS  COMMENT  -­‐  everything  after  this  line  is  managed  by  zerg!  static_hosts  cp  /etc/hosts  ${TMPDIR}/old_hosts  grep  -­‐B  5000000  '#  DO  NOT'  ${TMPDIR}/old_hosts  >>  ${TMPDIR}/static_hosts  cp  ${TMPDIR}/static_hosts  ${TMPDIR}/new_hosts  wget  -­‐qO-­‐  "http://${ZERG_IP}/etchosts/${E}/${R}"  >>  ${TMPDIR}/new_hosts  &&      if  [[  $(diff  ${TMPDIR}/new_hosts  /etc/hosts  |  wc  -­‐l  |  awk  '{print  $1}')  <  7  

||  ${FORCE}  ==  '-­‐-­‐force'  ]];  then        cp  ${TMPDIR}/new_hosts  /etc/hosts;  fi  

Use Case 3: Generating /etc/hosts The bash update_hosts.sh script!

30

Update timing tricky to get right!

Too important to leave completely autonomous!

Use Case 4: Generating Load Balancer Configuration!Configuring Load Balanced Services!

31

Need a rock-solid foundation to deploy onto.

Use Case 4: Generating Load Balancer Configuration!Consistency > *

Set environment per-instance: /etc/puppet/puppet.conf

Symlink /etc/puppet/environments/ on master to various git checkouts of the source:

$ cd /etc/puppet/environments $ ln –s ~/src/puppet/prod_stable prod_stable $ ln –s ~/src/puppet/dev_stable dev_stable $ ln –s ~/src/puppet/dev_test dev_test

Use cron to keep all branches up-to-date

Use Case 4: Generating Load Balancer Configuration!Single Puppet Master

Each environment has its own branch.

Make a new branch for every new feature.

Merge into a test branch to test.

Merge into stable.

Use Case 4: Generating Load Balancer Configuration!Source Controlled Puppet Configs

APP_DEFS  :  {      "zerg":  {  "type":  "http",  "healthcheck":  {"port":  19999,  "resource":  "/zerghealth"}  },      "awesome":  {  "type":  "http",  "healthcheck":  {"port":  20000,  "resource":  "/ahc"},          "frontend"  :  "10080"  },      "haproxy_awesome":{  "type":  "http",  "healthcheck":  {"port":  20001,  "resource":  "/"}  },      "foo":  {  "type":  "http",  "healthcheck":  {"port":  20002,  "resource":  "/"},          "frontend"  :  "10081"  },      "mashed_potatoes":  {  "type":  "http",  "healthcheck":  {"port":  20003,  "resource":  ”/"},          "frontend"  :  "10082"  },      "haproxy_foo":  {  "type":  "http",  "healthcheck":  {"port":  20004,  "resource":  "/hc"}  },      "thehardproblem":  {  "type":  "http",  "healthcheck":  {"port":  20006,  "resource":  "/"}  },      "redis":  {  "type":  "tcp",  "healthcheck":  {"port":  20007,  "resource":  "/rhc"}  },      "dataserver":  {  "type":  "http",  "healthcheck":  {"port":  20008,  "resource":  "/"}  },          "frontend"  :  "10083"  },      "itshards":{  "type":  "http",  "healthcheck":  {"port":  20009,  "resource":  "/"}  },      "devnull":  {  "type":  "http",  "healthcheck":  {"port":  200010,  "resource":  "/hc"}  }  }  

Use Case 4 – Load Balancer Configs!The App Definitions in Zerg!

35

@app.route("/haproxy/<env>/<region>/<type>")  def  haproxy(env,  region,  type):      instances  =  get_region_manifest(region)      apps  =  {}      for  app  in  APP_DEFS[env]:          if  'frontend'  in  APP_PORTS[env][app].keys():              app_object  =  {                  'servers':[],                  'backend_port':  APP_PORTS[env][app]['healthcheck']['port'],                  'frontend_port':  APP_PORTS[env][app]['frontend']              }              for  server  in  instances:                  if  app  in  instances[server]['roles']:                      app_object['servers'].append({'name':server,  'details':instances[server]})              apps[app]  =  app_object      return  render_template('haproxy_%s_%s_%s.txt'  %  (env,  region,  type),  vips=apps)  

Use Case 4 – Load Balancer Configs!The Zerg Code!

36

global                  blah                  blah    defaults                  blah                  blah    frontend  dataserver_vip                  bind  *:{{  vips.dataserver.frontend_port  }}                  default_backend  dataserver    frontend  mashed_potatoes_vip                  bind  *:{{  vips.mashed_potatoes.frontend_port  }}                  default_backend  mashed_potatoes      backend  dataserver                  balance  roundrobin                  {%-­‐  for  server  in  vips.dataserver.servers  %}                  server  {{  server['name']  }}  {{  server.details['private  ip']  }}:{{  vips.dataserver.backend_port  }}  check                  {%-­‐  endfor  %}    backend  mashed_potatoes                  balance  roundrobin                  {%-­‐  for  server  in  vips.mashed_potatoes.servers  %}                  server  {{  server['name']  }}  {{  server.details['private  ip']  }}:{{  vips.mashed_potatoes.backend_port  }}  check                  {%-­‐  endfor  %}  

Use Case 4 – Load Balancer Configs!The Zerg Flask Template!

37

$  curl  –s  http://zerg/haproxy/<env>/<region>/<type>    

globals  and  defaults  blah  blah  frontend  dataserver_vip  

               bind  *:10083                        default_backend  dataserver    frontend  mashed_potatoes_vip  

               bind  *:10082                  default_backend  mashed_potatoes  

 backend  dataserver  

               blah  blah  options                  server  dataserv01  10.0.0.28:20008  check                  server  dataserv02  10.0.0.29:20008  check  

 backend  mashed_potatoes  

               blah  blah  options                  server  taters01  10.0.0.30:20003  check                  server  taters02  10.0.0.31:20003  check  

Use Case 4 – Load Balancer Configs!The Zerg HTTP Response!

38

Use Case 4 – Load Balancer Configs!The Config Workflow!

39

Large  changes  to  templates  (human)  

Git  (ops)  

Zerg  (genera(on)  

Script  (human)  

Git  (puppet)  

Server   Server   Server  

./update_haproxy.sh  <env>  <region>  <service>  **  Git  is  clean  and  in  sync  with  origin..  now  waiting  for  zerg  http  response..  [prod_stable  012345]  [puppet]  Haproxy  Auto-­‐Commit  for  <env>  <region>  <service>    1  files  changed,  2  insertions(+),  2  deletions(-­‐)  **  Template  pulled  and  committed  **  Here  is  the  diff  from  origin  to  the  new  version:  diff  -­‐-­‐git  a/modules/haproxy/templates/haproxy_<env>_<region>_<service>_cfg.erb    b/modules/haproxy/templates/haproxy_<env>_<region>_<service>_cfg.erb  -­‐-­‐-­‐  a/modules/haproxy/templates/haproxy_prod_us-­‐east-­‐1_tagserve_cfg.erb  +++  b/modules/haproxy/templates/haproxy_prod_us-­‐east-­‐1_tagserve_cfg.erb  -­‐      server  oldandslow01  10.0.0.23:20003  check  -­‐      server  oldandslow02  10.0.0.24:20003  check  +      server  taters01  10.0.0.30:20003  check  +      server  taters01  10.0.0.31:20003  check  **  Do  you  want  to  push  this  change?  (y/n)  y  blah  blah  successful  git  push  message  **  Commit  successfully  pushed  to  origin  **  All  done!  

Use Case 4 – Load Balancer Configs!The bash update_haproxy.sh script!

40

Alerting, Monitoring & Visualization!!

Use Case 5: Dashboards & Alerting!!

What's really going on?!

41

Identify metrics that act as signals.!!Add alerts after every incident.!

Use Case 5: Dashboards & Alerting!What to monitor?!

42

Use Case 5: Dashboards & Alerting!Metric Polling at BrightTag!

43

graphite  carbon  mpoller  

tagserve  haproxy  

datahub  redis  

cassandra  

graphite  carbon  

tagserve  haproxy  mpoller  

datahub  redis  

mpoller   cassandra  mpoller  

Storage of historical metrics allows for trending and comparisons.!!Aggregation is performed on data retrieval via the webapp.!

Use Case 5: Dashboards & Alerting!Graphite!

44

Expose a "metrics" service per region.!!Enables a flexible topology.!!

Use Case 5: Dashboards & Alerting!Branches and Leaves!

45

Use Case 5: Dashboards & Alerting!Metric Aggregation at BrightTag!

46

tagserve  haproxy  

datahub  redis  

cassandra  metrics  

metrics  metrics  

dashboard  

Use Case 5: Dashboards & Alerting!Realtime Numbers Across Regions!

47

Requests are farmed out to each metrics service.

Different visualizations tell you different things.!Use Case 5: Dashboards & Alerting!!

Visualization!

48

Tattle allows us to alert on any metric in Graphite.!!Alerting is done per region.!

Use Case 5: Dashboards & Alerting!Alerting!

49

Fabric is push, puppet is pull.!!Businesses don't move as fast as infrastructure changes, but configs have to stay up to date all the time.!

(/etc/hosts)  (systempoller.py)  (mashed_potatoes.env)                        (dataserver.war)  

puppet  =====================================  fabric  (real-­‐time  up-­‐to-­‐date)                  (moderately  up-­‐to-­‐date)                                    (weekly)  

!

Deployment!Fabric vs Puppet!

50

Have to go with what cloud provider offers.!!Not always ideal for every workload.!

Designing for the Cloud!Virtual Machines!

51

(but if you find one let us know)!There are no Silver Bullets!

52

Questions?!

53

top related