MAXOS - the new Scalable Continuous Agile

Post on 15-Jan-2015

1321 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This talk got a good reception at Agile2014. I introduced MAXOS (Matrix of Services) as very different alternative to the old Scrum and SAFe way of organizing software development. MAXOS is a pattern used by top technology companies like Google, Amazon, Hubspot, and Edmunds. It uses continuous delivery from many different "service teams" in parallel, and it replaces a lot of management coordination with continuous integration. I highlighted in yellow a series of "debatable points" that are surprising to more traditional agile developers.

Transcript

MAXOS Scalable Continuous Agile

with debatable points

From Andy Singleton, http://continuousagile.com

www.assembla.com

Microsoft Journey

• Scale: Releases were buggy at scale• Stress: Releases were stressful• Online: SaaS, devices and services, cloud first• Competition: The competition was releasing

more frequently and moving faster

• Now, about 80% of the way to “second generation agile”

Survey on Continuous Delivery

46% think their competitors

have adopted Continuous

Delivery

Ways to Scale

Scrum + SAFe• Add more hierarchy• Hold big meetings

and teleconferences• Block everyone into

one cadence• Coordinate big

releases

Top Tech CompaniesAutomate management,

as well as testing and deployment.

Communicate peer to peerUnblock! teams to move

as fast a possibleRelease early and often.

Separate release from launch

2 Ways to Be More Productive

1) Do the right thing

Users ignore at least 50% of the new stuff you try. If you can measure usage or value and figure out what to ignore in development, you can increase development productivity by 100% for zero extra cost.

Measurement is very important.

More frequent releases = more measurements

2) Use more machines

Better organization is not high on the list of ways to get more productive. Does not change much over centuries.

Are the methodology booths in the exhibit hall are being pushed out by vendors that make build, test, and deploy tools?

“Technical practices”

Productivity comes from more/bigger machines

Continuous Delivery Basics

• Code contribution patterns• Test layering• Automation• Developer responsibility• Feature switches

Code Contribution Patterns

Manage code if possible. People are hard to manage and can’t be automated. They want to contribute.• Centralized continuous delivery

– No branches, finds and fixes problems as early as possible• Distributed continuous delivery

– Release every change with its own branch and test• Temporary branches

– Combines benefits of centralized and distributed• MAXOS

– Use centralized continuous integration to manage a massively scalable IT system

Centralized CI/CDContributor Commits – “as early as possible” to find problems

ContinuousIntegration tests

Fail - alarm

Release CandidateTest System Release

QA Testing

Pass

Continuous delivery at Edmunds.com

Distributed Continuous DeliveryContributor Commits

Branchor Fork

Deployed version

Peer review merge requests

Other contributionsmerged and released“as late as possible”

QA Consults

Pass FinalAuto Test?

Merge backCurrent

Deploy

Distributed: Multiple Test Systems

Contributor 1

Contributor 2

ProductionRevision

ReleaseAnytime

CI SystemQA Team

Test System 1

Test System 2

Assembla spins up test servers

MAXOS Service Team

Test Layering

Monitor your released software: Errors, Usage volume, usage patterns, user feedback

QA System with Human test consultants

Code review: Both a manual test, and a place to ask for test scripts.

Continuous integration: Run automated tests before using human review time

Unit tests in the development environment

Switch new features and architectureStart h

ere to

add layers

Start here to

release a change

More frequent releases can increase quality

9 (sparse) Layers at Edmunds

Feature Switch and Unveil

HiddenProgrammer sees a change locally. Change is tested in the main version but not seen.

Test Story Owner and testers see the change on test systems.

BetaInsiders see it and use it. Story Owner can show it to selected users for feedback or A/B testing.

UNVEIL! The big event. Communicate with all users. Measure reaction.

One code version

No special test builds

No long-running branches

Go Both Ways

Veloc i ty

Qu

alit

y

Increase Quality (more layers, longer beta)

Increase Velocity (less layers, faster unveil)

Role: Developer

• Developers have more power and responsibility.• Developers have more responsibility for testing.• Developers (not QA or PM) decide when to release.

This is a strong finding.• Incentives are correct. Developer might have to

come back from Friday night beers to fix a problem. This provides a motivation to make good decisions and automate testing.

• Features can be released but hidden. Product Managers and Marketers will unveil when they are ready. Unblock!

Programmers approve releases, not QA

Person, Dog, or Bot writes codeComputers will write code

Continuous Agile

Lean process

1. Release more frequently2. Improve

Role: Product Manager/Owner

• Batch -> Continuous

• Requirements -> User Experience

• Strategy -> Measurement– Usage measurements are so important, so

underutilized– Double your productivity

Product Owner -> Story Owner

Program Launch

Releases

Plan Program Test Doc Deploy

SkipAutomate& Blend Lag Automate

Pull

CI & CD

End up with

Separate Release from Launch

Measure

Matrix of ServicesBreaking the scale barrier

The Services MegatrendDesktop Web App Cloud

Services

App

DB

Service

Service

Service

Scale it like Google

• 15,000 developers, 5,000 projects, one current version of the code (2013). They can go from an idea to a release in 48 hours

• Vast Internet system divided into thousands of "services"

• Most programming done by teams of 3-4

• Centralized process with single version of the test system – run 100 million test cases daily

• Before developers release a change, they test it with the most recent version of all the other services. If a test script finds conflicts, it tells developers who to contact to resolve them

Matrix of Services - MAXOS

PrioritizedBacklog

CurrentWork

Each team releases

when ready

Hundredsof releases

per day

Service team Productionservice

Service team Productionservice

Service team Productionservice

Feedback on speed, errors, usage, and requests

Test as one system

Integrationtest env

Integrationtest env

Integrationtest env

Coordinate without big meetingsContinuous Integration between latest dev version of each service

• Continuous integration helps teams coordinate.

• See dependencies between “producers” and “consumers”

• Errors and conflicts show related team contact info

• Meetings and changes negotiated between two teams, not many

PrioritizedBacklog

CurrentWork

Service team

Service team

Service team

Integrationtest env

Integrationtest env

Integrationtest env

Machines can replace layers of management

Teams are largely self-managing

PrioritizedBacklog

CurrentWork

Service team Integrationtest env

Up to 50% of workfrom backlog

At least 50% of work is self-plannedProblems get fixed quickly

Productionservice

Productionservice

Productionservice

Feedback: quality, reliability, speed, user support

Productionservice

ProductionServer

Sense, respond, self manageminimize planning

Hubspot – Great at Mid-scale

• Transformed a monolithic app to 200 services over one year

• 3-person programming teams. Each of 20 teams is responsible for about 10 services

• Dev teams responsible for design, programming, testing, release, monitoring, and responding to production problems. No full-time QA. Shared PM and UX. 4 Ops guys for 2000 servers.

• Lot’s of tooling and dashboards to help teams deploy, manage, and monitor their services

• Feedback from customer support also grouped by team

• Transformed a monolithic app to 200 services over one year

• 3-person programming teams. Each of 20 teams is responsible for about 10 services

• Dev teams responsible for design, programming, testing, release, monitoring, and responding to production problems. No full-time QA. Shared PM and UX. 4 Ops guys for 2000 servers.

• Lot’s of tooling and dashboards to help teams deploy, manage, and monitor their services

• Feedback from customer support also grouped by team

Scaling

PrioritizedBacklog

CurrentWork

Each team releases

when ready

Service team Productionservice

Service team Productionservice

Add capacity fast by building aroundSingle-function programmer/tech leads

Integrationtest env

Integrationtest env

Teams are not permanently multifunctional

Core ITannual budget

Reliability & security mission

Web API LayerMobile SaaS / Cloud

Marketing

Fast ITmonthly budget

Mission to respond to opportunities

Service team Productionservice

Service team Productionservice

Productionservice

Integrationtest env

Integrationtest env

Integrationtest env

Fast IT (continuous)

Core IT (stable service)

Service team Productionservice

Service team Productionservice

Productionservice

Integrationtest env

Integrationtest env

Integrationtest env

Different culture, different company, or no company

United by Testbeds

Culture does not needto be consistent

Scaling Methods

Brooks and Mythical Man Month– Hypothesis: Scaling

problems comes from n^2 communication explosion

– Solution: Cells and hierarchy to contain communication

Internet reality– Scaling problems come

from dependencies– Solution: more sharing,

more communicationMythical Man Month is wrong40 years of awesome insights, but

SAFe (Copyright Dean Leffingwell)

Ways to Scale

Scrum + SAFe• Add more hierarchy• Complex multifunction

teams• Hold big meetings and

teleconferences• Block everyone into

one cadence• Coordinate big releases

Top Tech CompaniesAutomate management,

as well as testing and deployment.

Dev-lead teamsCommunicate peer to

peerUnblock! teams to move

as fast a possibleRelease more frequently

Competing with MAXOSThe secret weapon that Silicon Valley is using to

disrupt and destroy competitors

• Retailer X deploys changes to their monolithic online ordering app once every six weeks. Ops holds for three weeks to make sure the complete system is stable.

• Amazon has thousands of services and more than 1000 service teams. They release something about once every 11.6 seconds. In the time that Retailer X takes to try one new release, Amazon has made 100,000 changes.

• Amazon hosting competitor: “It’s an emergency”.

SAFe and MAXOS

Debatable Points• More frequent releases can increase quality• Culture does not need to be consistent• Teams are not permanently multifunctional• Programmers approve releases, not QA• Productivity comes from bigger/more machines• Machines will write code• Sense, respond, self-manage, minimize planning• Mythical Man Month is wrong• Machines and continuous integration can replace layers of

management• Scrum and SAFe cannot compete with MAXOS

top related