How TubeMogul reached 10,000 Puppet deployment in one year May 26 th , 2015 Nicolas Brousse | Sr. Director Of Operations Engineering | [email protected] Julien Fabre | Site Reliability Engineer | [email protected]
Jul 27, 2015
How TubeMogul reached 10,000 Puppet deployment in
one yearMay 26th, 2015
Nicolas Brousse | Sr. Director Of Operations Engineering | [email protected]
Julien Fabre | Site Reliability Engineer | [email protected]
Who are we?
TubeMogul● Enterprise software company for digital branding● Over 27 Billions Ads served in 2014● Over 30 Billions Ad Auctions per day● Bid processed in less than 50 ms● Bid served in less than 80 ms (include network round trip)● 5 PB of monthly video traffic served● 1.3 EB of data stored
Who are we?
Operations Engineering● Ensure the smooth day to day operation of the platform
infrastructure● Provide a cost effective and cutting edge infrastructure● Team composed of SREs, SEs and DBAs● Managing over 2,500 servers (virtual and physical)
Our Infrastructure
Public Cloud On Premises
Multiple locations with a mix of Public Cloud and On Premises
● Java (a lot!)● MySQL● Couchbase● Vertica● Kafka● Storm● Zookeeper, Exhibitor● Hadoop, HBase, Hive● Terracotta● ElasticSearch, Kibana● LogStash● PHP, Python, Ruby, Go...● Apache httpd● Nagios● Ganglia
Technology Hoarders
● Graphite● Memcached● Puppet● HAproxy● OpenStack● Git and Gerrit● Gor● ActiveMQ● OpenLDAP● Redis● Blackbox● Jenkins, Sonar● Tomcat● Jetty (embedded)● AWS DynamoDB, EC2, S3...
● 2008 - 2010: Use SVN, Bash scripts and custom templates.
● 2010: Managing about 250 instances. Start looking at Puppet.
● 2011: Puppet 0.25 then 2.7 by EOY on 400 servers with 2 contributors.
● 2012: 800 servers managed by Puppet. 4 contributors.
● 2013: 1,000 servers managed by Puppet. 6 contributors.
● 2014: 1,500 servers managed by Puppet. Introduced Continuous Delivery Workflow. 9 contributors. Start 3.7 migration.
● 2015: 2,000 servers managed by Puppet. 13 contributors.
Five Years Of Puppet!
● 2000 nodes
● 225 unique nodes definition
● 1 puppetmaster
● 112 Puppet modules
Puppet Stats
● Virtual and Physical Servers Configuration : Master mode
● Building AWS AMI with Packer : Master mode
● Local development environment with Vagrant : Master mode
● OpenStack deployment : Masterless mode
Where and how do we use Puppet ?
Code Review?
● Gerrit, an industry standard : Eclipse, Google, Chromium, OpenStack, WikiMedia, LibreOffice, Spotify, GlusterFS, etc...
● Fine Grained Permissions Rules● Plugged to LDAP● Code Review per commit● Stream Events● Use GitBlit● Integrated with Jenkins and Jira● Managing about 600 Git repositories
A Powerful Gerrit Integration
Gerrit in Action
verify -1 when no ticket # or doesn’t pass Jenkins code validation
● 1 job per module● 1 job for the manifests and hiera data● 1 job for the Puppet fileserver● 1 job to deploy
Continuous Delivery with Jenkins
Global Jenkins stats for the past year● ~10,000 Puppet deployment● Over 8,500 Production App Deployment
Plugin : github.com/jenkinsci/job-dsl-plugin
● Automate the jobs creation
● Ensure a standard across all the jobs
● Versioned the configuration
● Apply changes to all your jobs without pain
● Test your configuration changes
Jenkins job DSL : code your Jenkins jobs
Team Awareness: HipChat Integration with Hubot
Infrastructure As Code● Follow standard development lifecycle● Repeatable and consistent server
provisioning
Continuous Delivery● Iterate quickly● Automated code review to improve code
quality
Reliability● Improve Production Stability● Enforce Better Security Practices
Puppet Continuous Delivery Workflow: The Vision
The Workflow
The Workflow : Puppet code logic
Puppet environments● Dedicated node manifests (*.pp)● Modules deployed by branch with Git submodules
All the data in Hiera● Try to avoid params.pp class● Store everything : modules parameters, classes, keys, passwords, ...
Puppet Code Hierarchy
/etc/puppet├── puppet.conf, hiera.yaml, *.conf├── hiera└── environments ├── dev │ ├── manifests │ │ ├── nodes/*.pp │ │ └── site.pp │ └── modules │ ├── activemq │ ... │ └── zookeeper └── production ├── manifests │ ├── nodes/*.pp │ └── site.pp └── modules ├── activemq … └── zookeeper
Git submodules, branch dev
Git submodules, branch production
Hiera Configuration
$ cat /etc/puppet/hiera.yaml---:backends: - eyaml - yaml:yaml: :datadir: /etc/puppet/hiera:eyaml: :pkcs7_private_key: /var/lib/puppet/hiera_keys/private_key.pkcs7.pem :pkcs7_public_key: /var/lib/puppet/hiera_keys/public_key.pkcs7.pem:hierarchy: - fqdn/%{::fqdn} - "%{::zone}/%{::vpc}/%{::hostgroup}" - "%{::zone}/%{::vpc}/all" - "%{::zone}/%{::hostgroup}" - "%{::zone}/all" - hostname/%{::hostname} - hostgroup/%{::hostgroup} - environment/%{::environment} - common:merge_behavior: deeper
Hiera eyaml : github.com/TomPoulton/hiera-eyaml
● Hiera backend● Easy to use● Powerful CLI : eyaml edit /etc/puppet/hiera/secrets.yaml
Encrypt Your Secrets
$ cat secret.yaml---ec2::access_key_id: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAVIa28OwyaqI5N1TDCvVkBZz3YG+s+Hfzr0lqgcvRCIuJGpq28sQmmuBaQjWY38i86ZSFu0gM6saOHfG64OzVlurO7k/l0CKeL0JfXNaVM4TUqMaN9dSkL5e2vsmpLKrMASawmarqbLYwllTrTe32H4NWxU1e+qWLeUMr9ciBnA3W1Azm4RIo+3bsvgvMfdks....=]
Encrypt Files
Blackbox : github.com/StackExchange/blackbox
● Use GPG to encrypt secret files● Easy to add/delete team members● No need to change your Puppet code !
# modules/${modules_name}/files/credentials.yaml.gpg
file { ‘/etc/app/credentials.yaml’: ensure => ‘file’, owner => ‘root’, group => ‘root’, mode => ‘0644’, source => ‘puppet:///modules/${module_name}/credentials.yaml’}
The Workflow
The Workflow : bottlenecks
● Only Ops team members can commit (SRE, SE)
● Review and validation is done only by a SRE
● Jenkins will verify the code but will not validate the commit
● Static Puppet environments
● Rely a lot on server hostnames
Flexibility : R10K github.com/adrienthebo/r10k !
● Dynamic environments
● No Git submodules anymore ! : - )
● Easy to reproduce any environment
● Can use private and forge Puppet modules
● Can use branches and tags
● Based on Puppetfile
Puppet Workflow Reloaded!
R10K
$ cat Puppetfileforge "https://forgeapi.puppetlabs.com"
# Forge modulesmod 'pdxcat/collectd'mod 'puppetlabs/rabbitmq'mod 'arioch/redis'mod 'maestrodev/wget'mod 'puppetlabs/apt'mod 'puppetlabs/stdlib'
# Tubemogul modulesmod "hosts", :git => 'ssh://<gerrit_host>/puppet/modules/hosts', :branch => 'dev'mod "timezone", :git => 'ssh://<gerrit_host>/puppet/modules/timezone', :branch => 'dev'
...
Puppet Workflow Reloaded!
Better code organization : Roles and Profiles
● Represent the business logic : Roleso Highest abstraction layero Use Profiles for implementation
● Implement the applications : Profileso Remove potential code duplicationo Use modules and other Puppet resources
Roles/Profiles Pattern
class role::logs { include profile::base include profile::logstash::server include profile::elasticsearch}
class profile::logstash { $version = hiera('profile::logstash::server::version', '1.4.2') $es_host = hiera('profile::logstash::server::es_host', 'es01') $redis_host = hiera('profile::logstash::server::redis_host', 'redis01')
class { 'logstash': package_url => "https://download.elasticsearch.org/logstash/.../logstash_${version}.deb", java_install => true, }
logstash::configfile { 'input_redis': content => template('logstash/configfile/logstash.input_redis.conf.erb'), order => 10, }
logstash::configfile { 'output_es': content => template('logstash/configfile/logstash.output_es.conf.erb'), order => 30, }}
Do not rely on hostname : nodeless approach
● Facts to guide Puppet● No node myawesomeserver { } anymore● Enforce a cluster vision● site.pp gives the configuration logic
Puppet Workflow Reloaded!
# /etc/puppet/manifests/site.pp
node default {
if $::ec2_tag_tm_role { notify { "Using role : ${ec2_tag_tm_role}": } include "role::${::ec2_tag_tm_role}" } else { fail(‘No role found. Nothing to configure.’) }
}
● Specify tags during the provisioning
● Retrieve tags with AWS Ruby SDK and create facts
● New hierarchy
AWS EC2 tags
$ facter -p | grep ec2_tagec2_tag_cluster => rtb-bidderec2_tag_nagios_host => mgmt01ec2_tag_name => bidderec2_tag_pupenv => productionec2_tag_tm_role => rtb::bidder
:hierarchy: - "%{::zone}/%{::ec2_tag_vpc}/%{::ec2_tag_cluster}" - "%{::zone}/%{::ec2_tag_vpc}/all" - "%{::zone}/all" - vpc/%{::ec2_tag_vpc}/%{::ec2_tag_cluster} - vpc/%{::ec2_tag_vpc}/all - environment/%{::environment} - common
New merging and reviewing rules
● Everyone can commit a Puppet code
● Allow everyone to review a Puppet change (+1)
● Allow SE and SRE to validate a Puppet change (+2)
● Auto validation/merging in dev if at least 80% of test (+2)
Next improvements
● Acceptance testing with Beaker and Docker
● Full test provisioning with ServerSpec
● PuppetDB to improve the reporting
● Dedicated Puppet Masters
Nicolas BrousseJulien Fabre
@orieg@julien_fabre