PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV Build Engineering @ Atlassian: Scaling to 150k builds per month & beyond PuppetConf 2015
Jan 22, 2018
PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV
Build Engineering @ Atlassian:Scaling to 150k builds per month & beyond
PuppetConf 2015
T E A M
I N T R O D U C T I O N
I N F R A S T R U C T U R E
B A M B O O S E RV E R S
Introduction
C O N C L U S I O N
Build platform & services used internally within
Atlassian to build, test & deliver
software
Developers expect a reliable infrastructure
& fast CI feedback
• 12 Bamboo Servers• maven.atlassian.com / 9 Nexus instances / 9 TB
• 7 Nexus proxies for internal traffic
• Monitoring• opsview, graphite, statsd, newrelic, datadog
Build Engineering today @ Atlassian
• 1200 build agents on EC2• include SCM clients, JDKs, JVM build tools, databases, headless
browser testing, Python builds, NodeJS, installers & more
• Maintain 20 AMIs of various build configurations
4 years ago:
Builds per month
21k
Last month:
Builds per month
186k
Build Engineering @ Atlassian
JIRA alone has
Automated tests
49k
3 stories of gaining maturity to handle Atlassian growth
I N T R O D U C T I O N
T E A M
I N F R A S T R U C T U R E
B A M B O O S E RV E R S
Team
C O N C L U S I O N
History of team roles
Individual Engineers
Information silos
Fault investigation, requests for advice, unplanned work
Little project work
Very interrupt driven
Duplication of effort
Limited to customer driven changes
Disturbed roleKnowledge Transfer
when switching between project / disturbed roles is difficult
More project workNon-disturbed can focus on larger tasks
Context switching
Reduction in duplication of effort, promotes collaboration within the team
2 week rotation
Team expands
Build Engineers
Team expands
Build Engineers
Team expands
Infra Engineers
Developers
Build Engineers
Disturbed for Dev & Infra
Too interrupt driven
To encourage knowledge transfer between infra & dev
Staggered changeoversMinimising disruption due to context switching
Disturbed pairing
Couldn’t handle smaller customer raised requests & interrupt driven work
Supporting Developers
team channel
Supporting Developers
Supporting Developers
1. Measure the pain
2. Continuous Improvement
Technical Debt
Technical Debt
Contact Rate
+ Confluence Questions+ Hipchat queriesCustomer JIRA issues
Number of Developers
( )÷
=
Contact Rate
The Shield
http://www.clker.com/cliparts/e/d/c/4/11970889822084687040sinoptik_Medieval_shield.svg.hi.png
Rebranding MaintenanceDisturbed
Removing the negative attitude towards the old role within the team
Project
work
Maintenance
The Shield
How do we avoid this in the future?P E T E R L E S C H E V
“ ”
Fix it now, fix it for the future
Self service
Chat bots
Self Service
Self Service
Maven Self Help Tool
I N T R O D U C T I O N
I N F R A S T R U C T U R E
T E A M
B A M B O O S E RV E R S
Infrastructure
C O N C L U S I O N
Infrastructure as Code
= Puppet + SCM ?
4 years ago…
Started using Puppet
Manually maintained snow flakes
Production rollout
puppetmaster
build agents
Production rollout failure
puppetmaster
build agents
Low confidence of change
atlassian.com/git
Style in Pull Requests
Puppet Lint
https://github.com/rodjek/puppet-lintTim Sharpe
@rodjek
Runs checks & posts results, fails if there are any warnings or errors
Automated Build
Automated Style Checking
• Coding on Puppet Master• Culture of manually modifying production - Configuration Drift• Impact on Builds
Using Staging for Development
puppetmaster
build agents
staging puppet environment
Vagrant
www.vagrantup.comMitchell Hashimoto
@mitchellh
Packer
packer.io
Rolling out to stagingRolling out to production
Broken build agents
Developing locally
But it works on my machineE V E RY D E V E L O P E R
“ ”
Continuous Integration‘From scratch’ provisioning
Confidence that you can rebuild in disaster
The Pets: you give nice names, you stroke them, and when they get ill, you nurse them back to health, taking a long time over it.
“
”The Cattle: you give them numbers. When they get ill, you shoot them T I M B E L L , C E R N
Provisioning from scratch is slow
Profiling Puppet Runs
Add “--evaltrace” to puppet apply
+ =Collect and show the longest occurrences of:“Evaluated in ([\d\.]+) seconds”
Profiling Cucumber runs
http://itshouldbeuseful.wordpress.com/2010/11/10/find-your-slowest-running-cucumber-features/
• Faster local provisioning• Different class of problems found• Closer to production
Delta Provisioning
‘from scratch’ provision ‘delta’ provision
provision VM
export VM fileshare
import VM box
provision VM
on success
Broken buildsmaster
Branch builds
BUILDENG-5670
BUILDENG-5669
master
Infrequent Releases
• Puppet runs impacted running builds• Disabling all the build agents
• Manually performing the roll out
• git clone / librarian-puppet / symlink update on puppetmaster
• Kick off puppet on all the build agents
• Enabling all the build agents
• Set of Puppet environments for every Bamboo server
Painful Puppet Rollouts
Graceful Service restarts
+Bamboo Agent JVM process watches for touch file & shutdowns when Idle(written as a Bamboo Plugin)
Puppet environments reduced
stagingproduction
server1_stagingserver1_productionserver2_stagingserver2_productionserver3_stagingserver3_production
etc
Bamboo Deployments
How environments work
Task list Available agents
Available agents
Available agents
Destination server
Destination server
Production
TASK 1TASK 2
TASK 1TASK 2
TASK 1TASK 2
1.3
Task list
Task list Available agents
TASK 1TASK 2
Task list
Task list
Release
Production
TASK 1TASK 2
1.3
Task list Available agents Destination server
Production
TASK 1TASK 2
1.3
Available agents Destination server
TASK 1TASK 2
Task list
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
staging
production
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
build• git clone
• librarian-puppet
• to specific environments
• scp to puppet master & symlink update
test deploy• ‘delta’ & ‘from scratch’
vagrant provisions
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Releasebuild & test AMIs
• Generated using Packer
• AMIs on Bamboo Servers updateddeploy AMIs
Puppet Build, Test & Deploy Pipeline
Puppet Build, Test & Deploy Pipeline
Terraform Pipeline
Plan & Apply changesof staging & production environments
terraform.io
‘open prs’ Bot
Less human effort through automation
= Increased frequency
& reliability of releases
SnowflakesPets
CattleStateless Machines
Infrastructure consistency is key
Challengesintroduces instability
Lots of packagesLarge number of constantly updating package dependencies
External dependencies
I N T R O D U C T I O N
B A M B O O S E RV E R S
T E A M
I N F R A S T R U C T U R E
Bamboo Servers
C O N C L U S I O N
At scale is hard
Bamboo Servers
12
Build Plans
3500
Plan Branches
14k
Bamboo is great, but hard to manage at scale
Build Configuration as code
Plan Templates
Bamboo Plugin:
Plan Templates
Checked into SCM
Bamboo Plugin:Reusable snippets
changes can be code reviewed
Export plans for backup, or move to another Bamboo instance easily
Bulk changes
Export existing plans
Update 100s of job requirements with a single commit
Pushing Bamboo to its limits
Agent Smith Wallboard
Bamboo Plugin:
Trend data sent to Graphite
https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugin.agent-smith-wallboard
Add metrics, then alert on them
Bamboo Monitoring Plugin
Metrics to graphiteBamboo Plugin:
Bamboo HealthActiveMQ, Database connections, Tomcat, JVM Memory usage.
Background thread workers. Number of plans / plan branches, plans / plan branches for deletion.
When a Bamboo Server starts
misbehaving…
Infrastructure differences? Is it Bamboo Configuration?
Is it a Bamboo Plugin? Is it Bamboo the product?
How is it being used?
Infrastructure consistency of Bamboo Servers is key
Bamboo Puppet provider
+
REST API for Administration
Bamboo Puppet Provider
REST calls
https://forge.puppetlabs.com/atlassian/bamboo_rest
Bamboo Puppet provider
https://forge.puppetlabs.com/atlassian/bamboo_rest
Hipchat Notification
Managed via Puppet
Bamboo Plugins‘Continuous Plugin Deployment’ Task
This text box is not intended to contain a bunch of copy.
1-click upgrades of
How environments work
Task list Available agents
Available agents
Available agents
Destination server
Destination server
Production
TASK 1TASK 2
TASK 1TASK 2
TASK 1TASK 2
1.3
Task list
Task list Available agents
TASK 1TASK 2
Task list
Task list
Release
Production
TASK 1TASK 2
1.3
Task list Available agents Destination server
Production
TASK 1TASK 2
1.3
Available agents Destination server
TASK 1TASK 2
Task list
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
All Bamboo Servers
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
build
Deploy
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
build & test AMIs
Build
https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugins.deploy.continuous-plugin-deployment
Bamboo Servers1-click upgrades of
Using scp / ssh & puppet
How environments work
Task list Available agents
Available agents
Available agents
Destination server
Destination server
Production
TASK 1TASK 2
TASK 1TASK 2
TASK 1TASK 2
1.3
Task list
Task list Available agents
TASK 1TASK 2
Task list
Task list
Release
Production
TASK 1TASK 2
1.3
Task list Available agents Destination server
Production
TASK 1TASK 2
1.3
Available agents Destination server
TASK 1TASK 2
Task list
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Upgrade Bamboo
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Build Bamboo
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
Deploymentproject
Build plan
How artifacts work
1.0
1.3
1.3
1.3
Build results(Artifacts)
Release Environments
Productio
n
Developm
ent
1.0
1.31.3
Productio
n
Developm
ent
1.31.3
Developm
ent
Artifactsn
n+1
n+2
Versions
Test & Build
JIRA issue Commit TriggerCode
Release notes
Repository Build artifacts Release
jira-bamboo
servicedesk-bamboo
Infrastructure differences? Is it Bamboo Configuration?
Is it a Bamboo Plugin? Is it Bamboo the product?
How is it being used?
T E A M
I N F R A S T R U C T U R E
B A M B O O S E RV E R S
Conclusion
C O N C L U S I O N
I N T R O D U C T I O N
Constant improvement
We’ve matured to handle the growth of Atlassian
Come join us!
Thank you!
PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV