Dec 14, 2014
Puppet at OperaPuppet Camp Oslo 2013
devs sysadmin
devs sysadmin
DevSys?
FDD
Frustration Driven Development
# LVS main config file # # Last modified: # 2012-12-10 Commented out all wlb servers, as they haven't been in use … # 2012-XX-XX Tons of shifting around servers, upgrading and problems (Everyone) # 2011-04-01 Removed all old b#-servers (N.....) # 2010-03-24 Bye bye bigma. (M..../Cosimo) # 2010-03-03 Restore pre Feb 26th config that seems to ensure stability (Cosimo) # When adding bigboy/bigcat, bad site lockups happen # 2010-03-03 Reducing weight on b12 as it is less powerfull (M....) # 2010-02-26 re-adding bigdog, and lowering bigunc, also vamping up b12 to 100% # 2010-02-26 Bigdog is crashing, removing from lvs (M......) # 2010-02-03 Enabled f8 and b7, first b7, then some hours later f8 … (N......) # 2010-01-19 Bigant ready to rock and roll! (Cosimo) # 2010-01-13 Removed bigpa, fatgirl from database pool (Cosimo) # 2010-01-07 Added b8 to backend pool (Cosimo) # 2010-01-05 Added bigant to the My Opera databases (Cosimo) # 2009-11-22 Added bigdog to the My Opera databases (Cosimo) # 2009-11-18 Added b7 and f8 as back-end servers (M.....) # 2009-11-18 Removed p23-02 backend, moved to auth (Cosimo) # 2009-11-12 Removing b7 and f8 from Mysql Load balancers (Cosimo) # 2009-11-11 Added Lenny backend p23-02 (Cosimo) # 2009-10-11 phased-in InnoDB-powered bigma in production (Cosimo) # 2009-09-23 phased-in InnoDB-powered bigma in production (Cosimo) # 2009-06-27 switched master from bigma to bigsis (w-mlb) \o/ (N.....) # 2009-06-23 shifting load away from bigbro. it's dying? (Cosimo) # 2009-03-18 pushing bigbro as much as we can, to test it out (Cosimo) global_defs { lvs_id MY_LVS … }
innodb_buffer_pool_size = 128M # was 64M # was 16M # was 32M
The Pilot – Goals
● New deployment procedure
● Sane configuration files
● Configuration management
CM Tools Evaluation (2009)
CFEngine 2
BCfg2
Puppet 0.25.4
LCFG
CM Tools Evaluation
CFEngine 2
BCfg2
Puppet 0.25.4 2.6.2 2.7.14→ →
LCFG
The very beginning...
commit 9c54321f51bf969940b63b48d055743ac504035eAuthor: Cosimo Streppone <[email protected]>Date: Thu Jan 14 13:21:40 2010 +0000
Generic puppet recipes. To be continued.
Our approach
A “conservative” approach, surely
• Keep it simple. No concat/append/modify
• As few dependencies as possible
• Stability and reliability is critical
• No pulls from github or external URLs
• We don't use puppet for deployment
• Even realize() gets me into panic mode
Three Years In
• Modules repository, with 60+ mods• Some custom facter plugins• Shared projects conventions & structure• Shared deployment procedures and libs• Good server baseline configuration• Our team, ~200 nodes• Opera Mini Ops team, thousands of nodes
Datacenters
It's Modules all the way down...
Apache
base_packages
Cassandra
Django
Bash
RRDCached
Munin
Solr 4.0
RabbitMQ
Postfix
Varnish
Statsd
PowerDNS
Tomcat
Sshsecurity_upgrades
Projects structure
Master config file /config/production.json
Role-specific files /config/role/<role>/
Puppet manifests /config/puppet/
Deployment scripts /deploy/
Master configuration file{ "master_rev" : "20130129", "application" : "geodns", "environment" : "production", "domain" : "localdomain", "contact" : "[email protected]",
"puppet_vars" : { # Available in manifests "some-password" : "hola/amigos" },
"systems" : { # List of all hostnames and their roles "node01" : { "puppet_class" : [ "geodns::backend" ] }, "node02" : { "puppet_class" : [ "geodns::frontend" ], "puppet_vars" : { … }, }, … }
/etc/puppet →
puppet.conf (master configuration file)
fileserver.conf
files → {auth, geodns, opcdn} (local project files)
modules → (shared generic modules)
{ntp, apache, varnish, nginx, ...}
manifests → (generic and project specific manifests)
classes/
{basenode, backend, frontend}.pp
classes/ <project> /
<anything goes, project-specific>
Puppet master layout
/etc/puppet/manifests/site.pp
$server = "puppetmaster.opera.com" import "os/*.pp" import "classes/*.pp" # generic classes import "classes/*/*.pp" # project classes node default { include basenode } filebucket { "main": server => $server } File { ignore => ['.svn', '.git', 'CVS' ], backup => "main", }
Puppet master - site.pp
/etc/puppet/puppet.conf
external_nodes = /etc/puppet/bin/puppet-node-classifier
node_terminus = exec
/etc/puppet/manifests/nodes/geodns-production.json
{ "application" : "geodns",
"environment" : "production",
"domain" : "localdomain",
"systems" : {
"node01" : {
"puppet_class" : [ "geodns::backend" ],
}, …
}
}
Puppet master – no nodes.pp
$ facter --puppetarchitecture => amd64datacenter => nervdomain => opera.comfacterversion => 1.5.7fqdn => node01.int.opera.comhardwareisa => unknownhardwaremodel => x86_64hostname => node01id => rootinterfaces => eth0,eth1ipaddress => 1.2.3.4ipaddress_eth0 => 1.2.3.4…
Facter
facter/datacenter.rb
Facter.add("datacenter") do setcode do datacenter = "unknown" # Get current ip address from Facter's own db ipaddr = Facter.value(:ipaddress) if ipaddr.match("^1\.2\.3\.") datacenter = "dc1" elsif ipaddr.match(...) … end endend
Facter – custom plugins
case $datacenter { "dc1" : { include opera::datacenters::dc1 } "dc2" : { include opera::datacenters::dc2 } "dc3" : { include opera::datacenters::dc3 } … default: { include opera::datacenters::base }}
Facter – custom plugins
class basenode {
include opera
# Opera-specific data-center based settings case $datacenter { "dc1" : { include opera::datacenters::dc1 } … default: { include opera::datacenters::base } }
include apt-opera include base_packages include locales include logcheck include munin include nagios include cron include perl include python include puppet include ntp include timezone … }
Basenode class
autosign+ some preinstalled packages+ internal apt repository+ a bit of shell scripting
Bootstrap script
Real world examples – 1 Project class geodns::backend {
include opera::admins::devops include security-upgrades include powerdns include geoip::city include memcache
package { [ 'libjson-xs-perl', … ]: ensure => 'present' }
bash::prompt { '/root/.bashrc': description => 'geodns', color => 'red', }
munin::plugin::custom { 'geodns_': } munin::plugin { [ 'geodns_country', 'geodns_errors', … ]: plugin_name => 'geodns_', } }
Real world examples – 2 Varnish
varnish::config { "project-varnish-config":
vcl_conf => "tvstore.vcl", storage_type => "malloc", storage_size => "512M", listen_port => 8100, sess_workspace => 131072, ttl => 60, thread_pools => 2, thread_min => 400, thread_max => 3000,
# Needed for GeoIP support in varnish: # http://stackoverflow.com/questions/5906603/ cc_command => "exec cc -fpic -shared -Wl,-x \ -L/usr/include/GeoIP.h -lGeoIP -o %o %s"
}
Real world examples – 3 Munin
include munin::server
file { '/etc/munin/munin-conf.d/project-settings.conf': … }
Real world examples – 4 Solr
include solr4
solr4::core { 'core1': config => '.../core1/solrconfig.xml', properties => '.../core1/solrcore.properties', schema => '.../core1/schema.xml',}
solr4::config { 'solr-search-config': cores => ['core1', … ],}
Pain points AKA wish-list
Speed!
~60 s runtime ~600 resources→
TOO SLOW!
notice: /Stage[main]/Django/Package[Django]/ensure: ensure changed '1.4.3' to '1.4.2'
notice: /Stage[main]/Package[cython]/ensure: created
notice: /Stage[main]/Java::Sun_java6/Exec[debconf-set-selections-sun-java6-bin] /returns: executed successfully
notice: /Stage[main]/Java::Sun_java6/Exec[debconf-set-selections-sun-java6-jre] /returns: executed successfully
Resources that don't go away
Shared resources
cron::logcleanup { … }
• Used by both Apache and Nginx modules• Getting conflicts if you pull both
Shared environment
Many projects run under the same master.
A syntax error anywhere blocks everyone.
Testing
Would be awesome to be ableto test our modules and manifests.
Locally.
Without a puppetmaster.
Future directions
Things we'd like to look into...
• PuppetDB
• Better systems inventory
• Better Nagios integration
• Testing manifests and modules
Q & A
https:/ /github.com/cosimo/http://w w w.streppone.it /cosimo/blog/