YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Chef patterns

Chef  Pa(erns  From  Building  Clusters  

Biju  Nair  Boston  DevOps  Meetup  

08-­‐July-­‐2015  

Page 2: Chef patterns

Background  

•  Automate  build  &  management  of  clusters    – Hadoop  – KaLa…  etc  

•  Pa(erns  which  can  be  used  elsewhere  

Page 3: Chef patterns

Movies  On  Demand  

Page 4: Chef patterns

Service  On  Demand  

•  Common  services  which  can  be  requested  – Copy  logs  from  applicaQons  to  a  centralized  locaQon  

– Service  available  on  all  the  nodes  – ApplicaQons  can  request  the  service  dynamically  

Page 5: Chef patterns

Service  On  Demand  

•  Node  A(ribute  to  store  service  requests  default['bcpc']['hadoop']['copylog'] = {}

{ 'app_id' => { 'logfile' => "/path/file_name_of_log_file", 'docopy' => true (or false) },... }

•  Data  Structure  to  make  service  requests  

Page 6: Chef patterns

Service  On  Demand  

•  ApplicaQon  recipes  make  service  requests  # # Updating node attributes to copy HBase master log file to HDFS # node.default['bcpc']['hadoop']['copylog']['hbase_master'] = { 'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.log", 'docopy' => true }

node.default['bcpc']['hadoop']['copylog']['hbase_master_out'] = { 'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.out", 'docopy' => true }

Page 7: Chef patterns

Service  On  Demand  •  Service  recipe  node['bcpc']['hadoop']['copylog'].each do |id,f| if f['docopy'] template "/etc/flume/conf/flume-#{id}.conf" do source "flume_flume-conf.erb” action :create ... variables(:agent_name => "#{id}", :log_location => "#{f['logfile']}" ) notifies :restart,"service[flume-agent-multi-#{id}]",:delayed end service "flume-agent-multi-#{id}" do supports :status => true, :restart => true, :reload => false service_name "flume-agent-multi" action :start start_command "service flume-agent-multi start #{id}" restart_command "service flume-agent-multi restart #{id}" status_command "service flume-agent-multi status #{id}" end

•  Separate  role  at  the  end  of  run  list    

Page 8: Chef patterns

Choices  

Page 9: Chef patterns

Pluggable  Alerts  

•  Single  source  for  monitored  stats  – Allows  users  to  visualize  stats  across  different  parameters  

– Didn’t  want  to  duplicate  the  stats  collecQon  by  alerQng  system  

– Need  to  feed  data  to  the  alerQng  system  to  generate  alerts  

Page 10: Chef patterns

Pluggable  Alerts  

•  A(ribute  where  users  can  define  alerts  default["bcpc"]["hadoop"]["graphite"]["queries"] = { 'hbase_master' => [ { 'type' => "jmx", 'query' => "memory.NonHeapMemoryUsage_committed", 'key' => "hbasenonheapmem", 'trigger_val' => "max(61,0)", 'trigger_cond' => "=0", 'trigger_name' => "HBaseMasterAvailability", 'trigger_dep' => ["NameNodeAvailability"], 'trigger_desc' => "HBase master seems to be down", 'severity' => 1 },{ 'type' => "jmx", 'query' => "memory.HeapMemoryUsage_committed", 'key' => "hbaseheapmem", ... },...], ’namenode' => [...] ...}

Page 11: Chef patterns

Pluggable  Alerts  

•  Recipes  and  templates  use  the  data  structure  – To  generate  queries  to  pull  data  from  staQsQcs  store  and  send  •  h(ps://github.com/bloomberg/chef-­‐bach/blob/master/cookbooks/bcpc-­‐hadoop/templates/default/graphite.query_graphite.config.erb  

– To  create  requested  trigger  related  objects  in  alarming  system  •  h(ps://github.com/bloomberg/chef-­‐bach/blob/master/cookbooks/bcpc-­‐hadoop/recipes/graphite_to_zabbix.rb  

Page 12: Chef patterns

Pluggable  Alerts  

•  Servers  Defined  in  role  is  used  by  recipes  "default_attributes" : { "jmxtrans": { "servers": [ { "type": "hbase_master", "service": "hbase-master", "service_cmd": "org.apache.hadoop.hbase.master.HMaster” }, { "type": "hbase_rs", "service": "hbase-regionserver", "service_cmd": "org.apache.hadoop.hbase.regionserver.HRegionServer" } ] } ...

Page 13: Chef patterns

Dependency  

Page 14: Chef patterns

Service  Restart  

•  We  use  jmxtrans  to  monitor  jmx  stats  – Services  to  be  monitored  varies  with  node  

– There  can  be  more  than  one  service  to  be  monitored  

– Monitored  service  restart  requires  JMXtrans  to  be  restarted**  

Page 15: Chef patterns

Service  Restart  

•  Data  structure  in  roles  to  define  the  services  "default_attributes" : { "jmxtrans": { "servers": [ { "type": "datanode", "service": "hadoop-hdfs-datanode", "service_cmd": "org.apache.hadoop.hdfs.server.datanode.DataNode" }, { "type": "hbase_rs", "service": "hbase-regionserver", "service_cmd": “org.apache.hadoop.hbase.regionserver.HRegionServer" } ] } ...

Page 16: Chef patterns

Service  Restart  

•  Jmxtrans  service  restart  logic  built  dynamically  jmx_services = Array.new jmx_srvc_cmds = Hash.new node['jmxtrans']['servers'].each do |server| jmx_services.push(server['service']) jmx_srvc_cmds[server['service']] = server['service_cmd'] end

service "restart jmxtrans on dependent service" do service_name "jmxtrans" supports :restart => true, :status => true, :reload => true action :restart jmx_services.each do |jmx_dep_service| subscribes :restart, "service[#{jmx_dep_service}]", :delayed end only_if {process_require_restart?("jmxtrans","jmxtrans-all.jar", jmx_srvc_cmds)} end

Page 17: Chef patterns

Service  Restart  

def process_require_restart?(process_name, process_cmd, dep_cmds) tgt_proces_pid = `pgrep -f #{process_cmd}` ... tgt_proces_stime = `ps --no-header -o start_time #{tgt_process_pid}` ... ret = false restarted_processes = Array.new dep_cmds.each do |dep_process, dep_cmd| dep_pids = `pgrep -f #{dep_cmd}` if dep_pids != "" dep_pids_arr = dep_pids.split("\n") dep_pids_arr.each do |dep_pid| dep_process_stime = `ps --no-header -o start_time #{dep_pid}` if DateTime.parse(tgt_proces_stime) < DateTime.parse(dep_process_stime) restarted_processes.push(dep_process) ret = true end ...

Page 18: Chef patterns

External  Dependency  

Page 19: Chef patterns

Rolling  Restart    

•  Changes  to  configuraQon  •  Availability  – Toxic  ConfiguraQon  

•  ContenQon  – Poll  &  Wait  – Fail  the  Run  – Simply  Skip  Service  Restart  and  Go  On  •  Store  the  state  and  need  for  restart  •  Breaks  assumpQons  of  Procedural  Chef  Runs  

Page 20: Chef patterns

Rolling  Restart    

•  ZooKeeper  – Service  specific  znode  as  lock  

•  Node  a(ribute  to  flag  restart  failures  

h(ps://github.com/bloomberg/chef-­‐bach/blob/rolling_restart/cookbooks/bcpc-­‐hadoop/definiQons/hadoop_service.rb  

Page 21: Chef patterns

Change  Course  

Page 22: Chef patterns

Logic  InjecQon  

•  We  use  Community  cookbooks  – Takes  care  of  standard  install,  enable  and  starQng  of  services  

•  Need  to  add  logic  to  cookbook  recipes  – Take  acQon  on  a  service  only  when  condiQons  are  saQsfied  

– Take  acQon  on  a  service  based  on  dependent  service  state  

Page 23: Chef patterns

Logic  InjecQon  

kafka_install node.kafka.version_install_dir do from kafka_target_path not_if { kafka_installed? } end

template ::File.join(node.kafka.config_dir, 'server.properties') do source 'server.properties.erb’ ... helpers(Kafka::Configuration) if restart_on_configuration_change? notifies :restart, 'service[kafka]', :delayed end end

service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true action kafka_service_actions end

Page 24: Chef patterns

Logic  InjecQon  

•  Changes  to  standard  cookbook  – Create  a  new  recipe  to  perform  service  acQon  •  Resource  to  intercept  noQficaQons  to  service  resource  •  Original  service  resource    • Add  node  attribute  which  stores  name  of  new  recipe  • Update  original  recipe  – Remove  the  service  resource  from  the  original  recipe  

– Replace  it  with  include_recipe  new_a(ribute  

Page 25: Chef patterns

Logic  InjecQon  

•  New  recipe  to  perform  service  acQons  – First  step  is  the  ruby_block  to  intercept  noQficaQons  

ruby_block 'coordinate-kafka-start' do block do Chef::Log.debug 'Default recipe to coordinate Kafka start is used' end action :nothing notifies :restart, 'service[kafka]', :delayed end

service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true action kafka_service_actions end

Page 26: Chef patterns

Logic  InjecQon  

•  A(ribute  to  set  the  recipe  for  service  acQons  # # Attribute to set the recipe to used to coordinate Kafka service star # if nothing is set the default recipe ”_coordinate" will be used # default.kafka.start_coordination.recipe = 'kafka::_coordinate'

Page 27: Chef patterns

Logic  InjecQon  

•  Changes  to  the  original  recipe  kafka_install node.kafka.version_install_dir do from kafka_target_path not_if { kafka_installed? } end

template ::File.join(node.kafka.config_dir, 'server.properties') do source 'server.properties.erb’ ... helpers(Kafka::Configuration) if restart_on_configuration_change? notifies :create,'ruby_block[coordinate-kafka-start]’,immediately end end

include_recipe node.kafka.start_coordination.recipe

Page 28: Chef patterns

Logic  InjecQon  

•  Changes  in  wrapper  cookbook  – Create  custom  recipe  in  wrapper  cookbook  •  NoQficaQon  interceptor  ruby_block  should  be  first  •  Logic  to  determine  service  restart  acQon  

•  service  resource  •  Any  clean-­‐up  logic  – Overwrite  a(ribute  with  custom  recipe  name  

Page 29: Chef patterns

Logic  InjecQon  

ruby_block 'coordinate-kafka-start' do block do Chef::Log.info 'Custom recipe to coordinate Kafka start/restart' end ...

ruby_block 'restart-coordination' do block do Chef::Log.info 'Implement the process to coordinate the restart' end ...

service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true ...

ruby_block 'restart-coordination-cleanup' do block do Chef::Log.info 'Implement any cleanup logic required' end

Page 30: Chef patterns

Logic  InjecQon  

•  Overwrite  a(ribute  to  set  the  custom  recipe    # # Overwrite the community cookbook attribute with custom recipe name # default[:kafka][:start_coordination][:recipe] = 'kafka-bcpc::coordinate'

Page 31: Chef patterns

QuesQons  ?  

Page 32: Chef patterns

References    

•  h(ps://github.com/bloomberg/chef-­‐bach  •  h(p://blog.asquareb.com/blog/categories/chef-­‐pa(erns/  

Page 33: Chef patterns

Thank  You!!  [email protected]  


Related Documents