NICTA Copyright 2012 From imagination to impact Deployability Len Bass
NICTA Copyright 2012 From imagination to impact
Deployability
Len Bass
NICTA Copyright 2012 From imagination to impact
2
About NICTA
National ICT Australia
• Federal and state funded research
company established in 2002
• Largest ICT research resource in
Australia
• National impact is an important
success metric
• ~700 staff/students working in 5 labs
across major capital cities
• 22 university partners
• Providing R&D services, knowledge
transfer to Australian (and global) ICT
industry
NICTA technology is
in over 1 billion mobile
phones
NICTA Copyright 2012 From imagination to impact
“This project is vital to our company. How
long will it take?”
3 Day 1
NICTA Copyright 2012 From imagination to impact
“Its taking too long!!! ”
4 Day 30
NICTA Copyright 2012 From imagination to impact
“You Are Fired!”
5 Day 60
NICTA Copyright 2012 From imagination to impact
Where Does the Time Go?
• As Software Architects our view is that there are
the following activities in software development
– Concept
– Requirements
– Design
– Implementation
– Test
• Code Complete
• Different methodologies will organize these
activities in different ways.
• Agile focuses on getting to Code Complete
faster than with other methods. 6
NICTA Copyright 2012 From imagination to impact
What is wrong?
• Code Complete Code in Production
• Between the completion of the code and
the placing of the code into production is a
step called: Deployment
• Deploying completed code can be very
time consuming
7
NICTA Copyright 2012 From imagination to impact
Why is Deployment so Time Consuming?
• Errors in deployed code are a major source of
outages.
• So much so that organizations have formal
release plans.
• There is a position called a “Release Engineer”
that has responsibility for managing releases.
8
NICTA Copyright 2012 From imagination to impact
Release plan
1. Define and agree release and deployment plans with
customers/stakeholders.
2. Ensure that each release package consists of a set of related assets and
service components that are compatible with each other.
3. Ensure that integrity of a release package and its constituent components is
maintained throughout the transition activities and recorded accurately in
the configuration management system.
4. „„Ensure that all release and deployment packages can be tracked, installed,
tested, verified, and/or uninstalled or backed out, if appropriate.
5. „„Ensure that change is managed during the release and deployment
activities.
6. „„Record and manage deviations, risks, issues related to the new or changed
service, and take necessary corrective action.
7. „„Ensure that there is knowledge transfer to enable the customers and users
to optimise their use of the service to support their business activities.
8. „„Ensure that skills and knowledge are transferred to operations and support
staff to enable them to effectively and efficiently deliver, support and
maintain the service, according to required warranties and service levels
*http://en.wikipedia.org/wiki/Deployment_Plan
9
NICTA Copyright 2012 From imagination to impact
Look at one requirement
2. Ensure that each release package consists of a set of
related assets and service components that are
compatible with each other.
– Every development team contributing to the release
must have completed their code
– Every development team must have used the same
version of every supporting library
– The development teams must have agreed on a
common set of supporting technologies
• Every item requires coordination among developers
– Meetings
– Documents
• I.E. Time 10
NICTA Copyright 2012 From imagination to impact
How to Speed Up Deployment
11
• Set up a process and an architecture so that
development teams do not need to coordinate
with each other
• Support “partial release” deployments.
• This is called: continuous deployment
• Code Complete Code in Production
NICTA Copyright 2012 From imagination to impact
Continuous Deployment Pipeline
12
Developer pushes a button and, as long as all of the
automated tests are passed, the code is placed into
production automatically through a tool chain.
• No coordination with other teams during the execution of
the tool chain
• No dependence on other teams activities
The ability to have a continuous deployment pipeline depends
on the architecture of the system being deployed.
NICTA Copyright 2012 From imagination to impact
~2002 Amazon instituted the following
design rules - 1
• All teams will henceforth expose their data and
functionality through service interfaces.
• Teams must communicate with each other
through these interfaces.
• There will be no other form of inter-process
communication allowed: no direct linking, no
direct reads of another team’s data store, no
shared-memory model, no back-doors
whatsoever. The only communication allowed is
via service interface calls over the network.
13
NICTA Copyright 2012 From imagination to impact
Amazon design rules - 2
• It doesn’t matter what technology they[services]
use.
• All service interfaces, without exception, must be
designed from the ground up to be
externalizable.
• Amazon is providing the specifications for what
has come to be called “Microservice
Architecture”.
• (Its really an architectural style).
14
NICTA Copyright 2012 From imagination to impact
In Addition
• Amazon has a “two pizza” rule.
• No team should be larger than can be fed with two
pizzas (~7 members).
• Each (micro) service is the responsibility
of one team
• This means that microservices are
small and intra team bandwidth
is high
• Large systems are made up of many microservices.
• There may be as many as 140 in a typical Amazon page.
15
NICTA Copyright 2012 From imagination to impact
Microservice architecture supports
continuous deployment
• Two topics:
– What is microservice architecture?
– What are the deployment issues and how do I deal
with them?
16
NICTA Copyright 2012 From imagination to impact
Micro service architecture
17
Service • Each user request is
satisfied by some sequence
of services.
• Most services are not
externally available.
• Each service communicates
with other services through
service interfaces.
• Service depth may be 70,
e.g. LinkedIn
NICTA Copyright 2012 From imagination to impact
Relation of teams and services
• Each service is the responsibility of a single
development team
• Individual developers can deploy new version
without coordination with other developers.
• It is possible that a single development team
is responsible for multiple services
18
NICTA Copyright 2012 From imagination to impact
Coordination model of microservice
architecture
• Elements of service interaction
– Services communicate asynchronously through
message passing
– Each service could (in principle) be deployed
anywhere on the net.
• Latency requirements will probably force particular
deployment location choices.
• Services must discover location of dependent services.
– State must be managed
19
NICTA Copyright 2012 From imagination to impact
Service discovery
20
• When an instance of a
service is launched, it
registers with a
registry/load balancer
• When a client wishes
to utilize a service, it
gets the location of an
instance from the
registry/load balancer.
• Eureka is an open
source registry/load
balancer
Instance of
a service
Client
Register
Invoke
Registry/
load balancer
Query registry
NICTA Copyright 2012 From imagination to impact
Subtleties of registry/load balancer
• When multiple instances of the same service
have registered, the load balancer can rotate
through them to equalize number of requests to
each instance.
• Each instance must renew its registration
periodically (~90 seconds) so that load balancer
does not schedule message to failed instance.
• Registry can keep other information as well as
address of instance. For example, version
number of service instance.
21
NICTA Copyright 2012 From imagination to impact
State management
• Services can be stateless or stateful
– Stateless services
• Allow arbitrary creation of new instances for performance and
availability
• Allow messages to be routed to any instance
• State must be provided to stateless services
– Stateful services
• Require clients to communicate with same instance
• Reduces overhead necessary to acquire state
22
NICTA Copyright 2012 From imagination to impact
Where to keep the state?
• Persistent state is kept in a database
– Modern database management systems (relational)
provide replication functionality
– Some NoSQL systems may be replicated. Others will
require manual replication.
• Transient small amounts of state can be kept
consistent across instances by using tools such
as Memcached or Zookeeper.
• Instances may cache state for performance
reasons. It may be necessary to purge the cache
before bringing down an instance.
23
NICTA Copyright 2012 From imagination to impact
Provisioning new instances
• When the desired workload of a service is greater than
can be provided by the existing number of instances of
that service, new instances can be instantiated (at
runtime).
• Four possibilities for initiating new instance of a service:
1. Client. Client determines whether service is adequately
provisioned for its needs based on service SLA and services
current workload.
2. Service. Service determines whether it is adequately
provisioned based on number of requests it expects from
clients.
3. Registry/load balancer determines appropriate number of
instances of a service based on SLA and client instance
requests.
4. External entity can initiate creation of new instances
24
NICTA Copyright 2012 From imagination to impact
Questions about Micro SOA
• /Q/ Isn’t it possible that different teams will implement the
same functionality, likely differently?
• /A/ Yes, but so what? Major duplications are avoided
through assignment of responsibilities to services. Minor
duplications are the price to be paid to avoid necessity
for synchronous coordination.
• /Q/ what about transactions?
• /A/ Micro SOA privileges flexibility above reliability and
performance. Transactions are recoverable through
logging of service interactions. This may introduce some
delays if failures occur.
25
NICTA Copyright 2012 From imagination to impact
Microservice architecture supports
continuous deployment
• Two topics:
– What is microservice architecture?
– What are the deployment issues and how do I deal
with them?
26
NICTA Copyright 2012 From imagination to impact
Deploying a new version of an application
27
Multiple instances
of a application are
executing • Red is service being
replaced with new version
• Blue are clients
• Green are dependent
services
VA VB VB VB
UAT / staging / performance
tests
NICTA Copyright 2012 From imagination to impact
Deployment goal and constraints
• Goal of a deployment is to move from current
state (N instances of version A of a app) to a
new state (N instances of version B of a app)
• Constraints:
– Any development team can deploy their app at any
time. I.e. New version of a app can be deployed either
before or after a new version of a client. (no
synchronization among development teams)
– It takes time to replace one instance of version A with
an instance of version B (order of minutes)
– Service to clients must be maintained while the new
version is being deployed.
28
NICTA Copyright 2012 From imagination to impact
Deployment strategies
• Two basic all of nothing strategies
– Red/Black – leave N instances with version A as they
are, allocate and provision N instances with version B
and then switch to version B and release instances
with version A.
– Rolling Upgrade – allocate one instance, provision it
with version B, release one version A instance.
Repeat N times.
• Partial strategies are canary testing and A/B
testing.
29
NICTA Copyright 2012 From imagination to impact
Trade offs – Red/Black and Rolling Upgrade
• Red/Black
– Only one version available
to the client at any
particular time.
– Requires 2N instances
(additional costs)
• Rolling Upgrade
– Multiple versions are
available for service at the
same time
– Requires N+1 instances.
• Rolling upgrade is widely
used. 30
Update Auto Scaling
Group
Sort Instances
Remove & Deregister
Old Instance from ELB
Confirm Upgrade Spec
Terminate Old
Instance
Wait for ASG to Start
New Instance
Register New Instance
with ELB
Rolling
Upgrade
in EC2
NICTA Copyright 2012 From imagination to impact
Types of failures during rolling upgrade
Rolling Upgrade Failure
Provisioning
See references at end
Logical failure
Inconsistencies to be discussed
Instance failure
Handled by Auto Scaling Group in EC2
31
NICTA Copyright 2012 From imagination to impact
What are the problems with Rolling
Upgrade?
• Any development team can deploy their app at
any time.
• Three concerns
– Maintaining consistency between different versions of
the same app when performing a rolling upgrade
– Maintaining consistency among different apps
– Maintaining consistency between an app and
persistent data
32
NICTA Copyright 2012 From imagination to impact
Maintaining consistency between different
versions of the same app
• Key idea – differentiate between installing a new
version and activating a new version
• Involves “feature toggles” (described
momentarily)
• Sequence
– Develop version B with new code under control of
feature toggle
– Install each instance of version B with the new code
toggled off.
– When all of the instances of version A have been
replaced with instances of version B, activate new
code through toggling the feature. 33
NICTA Copyright 2012 From imagination to impact
Issues
• What is a feature toggle?
• How do I manage features that extend across
multiple apps?
• How do I activate all relevant instances at once?
34
NICTA Copyright 2012 From imagination to impact
Feature toggle
• Place feature dependent new code inside of an
“if” statement where the code is executed if an
external variable is true. Removed code would
be the “else” portion.
• Used to allow developers to check in
uncompleted code. Uncompleted code is
toggled off.
• During deployment, until new code is activated,
it will not be executed.
• Removing feature toggles when a new feature
has been committed is important.
35
NICTA Copyright 2012 From imagination to impact
Multi app features
• Most features will involve multiple apps.
• Each app has some code under control of a
feature toggle.
• Activate feature when all instances of all apps
involved in a feature have been installed.
– Maintain a catalog with feature vs service version
number.
– A feature toggle manager determines when all old
instances of each version have been replaced. This
could be done using registry/load balancer.
– The feature manager activates the feature.
– Archaius is an open source feature toggle manager.
36
NICTA Copyright 2012 From imagination to impact
Activating feature
• The feature toggle manager changes the value of the
feature toggle. Two possible techniques to get new value
to instances.
– Push. Broadcasting the new value will instruct each
instance to use new code. If a lag of several seconds
between the first service to be toggled and the last
can be tolerated, there is no problem. Otherwise
synchronizing value across network must be done.
– Pull. Querying the manager by each instance to get
latest value may cause performance problems.
• A coordination mechanism such as Zookeeper will
overcome both problems.
37
NICTA Copyright 2012 From imagination to impact
Maintaining consistency across versions
(summary)
• Install all instances before activating any new
code
• Use feature toggles to activate new code
• Use feature toggle manager to determine when
to activate new code
• Use Zookeeper to coordinate activation with low
overhead
38
NICTA Copyright 2012 From imagination to impact
Maintaining consistency among different
services
• Use case:
– Wish to deploy new version of app A without
coordinating with development team for clients of app
A.
• I.e. new version of app A should be backward compatible in
terms of its interfaces.
• May also require forward compatibility in certain
circumstances, e.g. rollback
39
NICTA Copyright 2012 From imagination to impact
Maintaining consistency between an app
and persistent data
• Assume new version is correct. Rollback discussed in a
minute.
• Inconsistency in persistent data can come about
because data schema or semantics change.
• Effect can be minimized by the following practices (if
possible).
– Only extend schema – do not change semantics of
existing fields. This preserves backwards
compatibility.
– Treat schema modifications as features to be toggled.
This maintains consistency among various apps that
access data.
40
NICTA Copyright 2012 From imagination to impact
Summary of consistency discussion so far.
• Feature toggles are used to maintain
consistency within instances of an app
• Disallowing modification of schema will maintain
consistency between apps and persistent data.
41
NICTA Copyright 2012 From imagination to impact
Canary testing
• Canaries are a small number of instances of a new
version placed in production in order to perform live
testing in a production environment.
• Canaries are observed closely to determine whether the
new version introduces any logical or performance
problems. If not, roll out new version globally. If so, roll
back canaries.
• Named after canaries
in coal mines.
42
NICTA Copyright 2012 From imagination to impact
Implementation of canaries
• Designate a collection of instances as canaries. They do
not need to be aware of their designation.
• Designate a collection of customers as testing the
canaries. Can be, for example
– Organizationally based
– Geographically based
• Then
– Activate feature or version to be tested for canaries.
Can be done through feature activation
synchronization mechanism
– Route messages from canary customers to canaries.
Can be done through making registry/load balancer
canary aware.
43
NICTA Copyright 2012 From imagination to impact
A/B testing
• Suppose you wish to test user response to a
system variant. E.g. UI difference or marketing
effort. A is one variant and B is the other.
• You simultaneously make available both
variants to different audiences and compare the
responses.
• Implementation is the same as canary testing.
44
NICTA Copyright 2012 From imagination to impact
Rollback
• New versions of an app may be unacceptable
either for logical or performance reasons.
• Two options in this case
• Roll back (undo deployment)
• Roll forward (discontinue current deployment and
create a new release without the problem).
• Decision to rollback or roll forward is almost
never automated because there are multiple
factors to consider.
• Forward or backward recovery
• Consequences and severity of problem
• Importance of upgrade 45
NICTA Copyright 2012 From imagination to impact
Summary
• Speeding up deployment time will reduce time to
market
• Continuous deployment is a technique to speed
up deployment time
• Microservice architecture is designed for
minimizing coordination needs and allowing
independent deployment
• Multiple simultaneous versions managed with
feature toggles.
• Feature toggles support rollback, canary testing,
and A/B testing.
46
NICTA Copyright 2012 From imagination to impact
NICTA team
• Liming Zhu
• Ingo Weber
• Min Fu
• Sherry Xu
• Daniel Sun
• Ah Binh Tran
• Chao Li
47
NICTA Copyright 2012 From imagination to impact
More Information
Contact
Book is due out May,
2015
Research papers:
ssrg.nicta.com.au’projects
/cloud
48