Deployability

NICTA Copyright 2012 From imagination to impact

Deployability

Len Bass


2

About NICTA

National ICT Australia

• Federal and state funded research

company established in 2002

• Largest ICT research resource in

Australia

• National impact is an important

success metric

• ~700 staff/students working in 5 labs

across major capital cities

• 22 university partners

• Providing R&D services, knowledge

transfer to Australian (and global) ICT

industry

NICTA technology is

in over 1 billion mobile

phones


“This project is vital to our company. How

long will it take?”

3 Day 1


“Its taking too long!!! ”

4 Day 30


“You Are Fired!”

5 Day 60


Where Does the Time Go?

• As Software Architects our view is that there are

the following activities in software development

– Concept

– Requirements

– Design

– Implementation

– Test

• Code Complete

• Different methodologies will organize these

activities in different ways.

• Agile focuses on getting to Code Complete

faster than with other methods. 6


What is wrong?

• Code Complete Code in Production

• Between the completion of the code and

the placing of the code into production is a

step called: Deployment

• Deploying completed code can be very

time consuming

7


Why is Deployment so Time Consuming?

• Errors in deployed code are a major source of

outages.

• So much so that organizations have formal

release plans.

• There is a position called a “Release Engineer”

that has responsibility for managing releases.

8


Release plan

1. Define and agree release and deployment plans with

customers/stakeholders.

2. Ensure that each release package consists of a set of related assets and

service components that are compatible with each other.

3. Ensure that integrity of a release package and its constituent components is

maintained throughout the transition activities and recorded accurately in

the configuration management system.

4. „„Ensure that all release and deployment packages can be tracked, installed,

tested, verified, and/or uninstalled or backed out, if appropriate.

5. „„Ensure that change is managed during the release and deployment

activities.

6. „„Record and manage deviations, risks, issues related to the new or changed

service, and take necessary corrective action.

7. „„Ensure that there is knowledge transfer to enable the customers and users

to optimise their use of the service to support their business activities.

8. „„Ensure that skills and knowledge are transferred to operations and support

staff to enable them to effectively and efficiently deliver, support and

maintain the service, according to required warranties and service levels

*http://en.wikipedia.org/wiki/Deployment_Plan

9


Look at one requirement

2. Ensure that each release package consists of a set of

related assets and service components that are

compatible with each other.

– Every development team contributing to the release

must have completed their code

– Every development team must have used the same

version of every supporting library

– The development teams must have agreed on a

common set of supporting technologies

• Every item requires coordination among developers

– Meetings

– Documents

• I.E. Time 10


How to Speed Up Deployment

11

• Set up a process and an architecture so that

development teams do not need to coordinate

with each other

• Support “partial release” deployments.

• This is called: continuous deployment

• Code Complete Code in Production


Continuous Deployment Pipeline

12

Developer pushes a button and, as long as all of the

automated tests are passed, the code is placed into

production automatically through a tool chain.

• No coordination with other teams during the execution of

the tool chain

• No dependence on other teams activities

The ability to have a continuous deployment pipeline depends

on the architecture of the system being deployed.


~2002 Amazon instituted the following

design rules - 1

• All teams will henceforth expose their data and

functionality through service interfaces.

• Teams must communicate with each other

through these interfaces.

• There will be no other form of inter-process

communication allowed: no direct linking, no

direct reads of another team’s data store, no

shared-memory model, no back-doors

whatsoever. The only communication allowed is

via service interface calls over the network.

13


Amazon design rules - 2

• It doesn’t matter what technology they[services]

use.

• All service interfaces, without exception, must be

designed from the ground up to be

externalizable.

• Amazon is providing the specifications for what

has come to be called “Microservice

Architecture”.

• (Its really an architectural style).

14


In Addition

• Amazon has a “two pizza” rule.

• No team should be larger than can be fed with two

pizzas (~7 members).

• Each (micro) service is the responsibility

of one team

• This means that microservices are

small and intra team bandwidth

is high

• Large systems are made up of many microservices.

• There may be as many as 140 in a typical Amazon page.

15


Microservice architecture supports

continuous deployment

• Two topics:

– What is microservice architecture?

– What are the deployment issues and how do I deal

with them?

16


Micro service architecture

17

Service • Each user request is

satisfied by some sequence

of services.

• Most services are not

externally available.

• Each service communicates

with other services through

service interfaces.

• Service depth may be 70,

e.g. LinkedIn


Relation of teams and services

• Each service is the responsibility of a single

development team

• Individual developers can deploy new version

without coordination with other developers.

• It is possible that a single development team

is responsible for multiple services

18


Coordination model of microservice

architecture

• Elements of service interaction

– Services communicate asynchronously through

message passing

– Each service could (in principle) be deployed

anywhere on the net.

• Latency requirements will probably force particular

deployment location choices.

• Services must discover location of dependent services.

– State must be managed

19


Service discovery

20

• When an instance of a

service is launched, it

registers with a

registry/load balancer

• When a client wishes

to utilize a service, it

gets the location of an

instance from the

registry/load balancer.

• Eureka is an open

source registry/load

balancer

Instance of

a service

Client

Register

Invoke

Registry/

load balancer

Query registry


Subtleties of registry/load balancer

• When multiple instances of the same service

have registered, the load balancer can rotate

through them to equalize number of requests to

each instance.

• Each instance must renew its registration

periodically (~90 seconds) so that load balancer

does not schedule message to failed instance.

• Registry can keep other information as well as

address of instance. For example, version

number of service instance.

21


State management

• Services can be stateless or stateful

– Stateless services

• Allow arbitrary creation of new instances for performance and

availability

• Allow messages to be routed to any instance

• State must be provided to stateless services

– Stateful services

• Require clients to communicate with same instance

• Reduces overhead necessary to acquire state

22


Where to keep the state?

• Persistent state is kept in a database

– Modern database management systems (relational)

provide replication functionality

– Some NoSQL systems may be replicated. Others will

require manual replication.

• Transient small amounts of state can be kept

consistent across instances by using tools such

as Memcached or Zookeeper.

• Instances may cache state for performance

reasons. It may be necessary to purge the cache

before bringing down an instance.

23


Provisioning new instances

• When the desired workload of a service is greater than

can be provided by the existing number of instances of

that service, new instances can be instantiated (at

runtime).

• Four possibilities for initiating new instance of a service:

1. Client. Client determines whether service is adequately

provisioned for its needs based on service SLA and services

current workload.

2. Service. Service determines whether it is adequately

provisioned based on number of requests it expects from

clients.

3. Registry/load balancer determines appropriate number of

instances of a service based on SLA and client instance

requests.

4. External entity can initiate creation of new instances

24


Questions about Micro SOA

• /Q/ Isn’t it possible that different teams will implement the

same functionality, likely differently?

• /A/ Yes, but so what? Major duplications are avoided

through assignment of responsibilities to services. Minor

duplications are the price to be paid to avoid necessity

for synchronous coordination.

• /Q/ what about transactions?

• /A/ Micro SOA privileges flexibility above reliability and

performance. Transactions are recoverable through

logging of service interactions. This may introduce some

delays if failures occur.

25


Microservice architecture supports

continuous deployment

• Two topics:

– What is microservice architecture?

– What are the deployment issues and how do I deal

with them?

26


Deploying a new version of an application

27

Multiple instances

of a application are

executing • Red is service being

replaced with new version

• Blue are clients

• Green are dependent

services

VA VB VB VB

UAT / staging / performance

tests


Deployment goal and constraints

• Goal of a deployment is to move from current

state (N instances of version A of a app) to a

new state (N instances of version B of a app)

• Constraints:

– Any development team can deploy their app at any

time. I.e. New version of a app can be deployed either

before or after a new version of a client. (no

synchronization among development teams)

– It takes time to replace one instance of version A with

an instance of version B (order of minutes)

– Service to clients must be maintained while the new

version is being deployed.

28


Deployment strategies

• Two basic all of nothing strategies

– Red/Black – leave N instances with version A as they

are, allocate and provision N instances with version B

and then switch to version B and release instances

with version A.

– Rolling Upgrade – allocate one instance, provision it

with version B, release one version A instance.

Repeat N times.

• Partial strategies are canary testing and A/B

testing.

29


Trade offs – Red/Black and Rolling Upgrade

• Red/Black

– Only one version available

to the client at any

particular time.

– Requires 2N instances

(additional costs)

• Rolling Upgrade

– Multiple versions are

available for service at the

same time

– Requires N+1 instances.

• Rolling upgrade is widely

used. 30

Update Auto Scaling

Group

Sort Instances

Remove & Deregister

Old Instance from ELB

Confirm Upgrade Spec

Terminate Old

Instance

Wait for ASG to Start

New Instance

Register New Instance

with ELB

Rolling

Upgrade

in EC2


Types of failures during rolling upgrade

Rolling Upgrade Failure

Provisioning

See references at end

Logical failure

Inconsistencies to be discussed

Instance failure

Handled by Auto Scaling Group in EC2

31


What are the problems with Rolling

Upgrade?

• Any development team can deploy their app at

any time.

• Three concerns

– Maintaining consistency between different versions of

the same app when performing a rolling upgrade

– Maintaining consistency among different apps

– Maintaining consistency between an app and

persistent data

32


Maintaining consistency between different

versions of the same app

• Key idea – differentiate between installing a new

version and activating a new version

• Involves “feature toggles” (described

momentarily)

• Sequence

– Develop version B with new code under control of

feature toggle

– Install each instance of version B with the new code

toggled off.

– When all of the instances of version A have been

replaced with instances of version B, activate new

code through toggling the feature. 33


Issues

• What is a feature toggle?

• How do I manage features that extend across

multiple apps?

• How do I activate all relevant instances at once?

34


Feature toggle

• Place feature dependent new code inside of an

“if” statement where the code is executed if an

external variable is true. Removed code would

be the “else” portion.

• Used to allow developers to check in

uncompleted code. Uncompleted code is

toggled off.

• During deployment, until new code is activated,

it will not be executed.

• Removing feature toggles when a new feature

has been committed is important.

35


Multi app features

• Most features will involve multiple apps.

• Each app has some code under control of a

feature toggle.

• Activate feature when all instances of all apps

involved in a feature have been installed.

– Maintain a catalog with feature vs service version

number.

– A feature toggle manager determines when all old

instances of each version have been replaced. This

could be done using registry/load balancer.

– The feature manager activates the feature.

– Archaius is an open source feature toggle manager.

36


Activating feature

• The feature toggle manager changes the value of the

feature toggle. Two possible techniques to get new value

to instances.

– Push. Broadcasting the new value will instruct each

instance to use new code. If a lag of several seconds

between the first service to be toggled and the last

can be tolerated, there is no problem. Otherwise

synchronizing value across network must be done.

– Pull. Querying the manager by each instance to get

latest value may cause performance problems.

• A coordination mechanism such as Zookeeper will

overcome both problems.

37


Maintaining consistency across versions

(summary)

• Install all instances before activating any new

code

• Use feature toggles to activate new code

• Use feature toggle manager to determine when

to activate new code

• Use Zookeeper to coordinate activation with low

overhead

38


Maintaining consistency among different

services

• Use case:

– Wish to deploy new version of app A without

coordinating with development team for clients of app

A.

• I.e. new version of app A should be backward compatible in

terms of its interfaces.

• May also require forward compatibility in certain

circumstances, e.g. rollback

39


Maintaining consistency between an app

and persistent data

• Assume new version is correct. Rollback discussed in a

minute.

• Inconsistency in persistent data can come about

because data schema or semantics change.

• Effect can be minimized by the following practices (if

possible).

– Only extend schema – do not change semantics of

existing fields. This preserves backwards

compatibility.

– Treat schema modifications as features to be toggled.

This maintains consistency among various apps that

access data.

40


Summary of consistency discussion so far.

• Feature toggles are used to maintain

consistency within instances of an app

• Disallowing modification of schema will maintain

consistency between apps and persistent data.

41


Canary testing

• Canaries are a small number of instances of a new

version placed in production in order to perform live

testing in a production environment.

• Canaries are observed closely to determine whether the

new version introduces any logical or performance

problems. If not, roll out new version globally. If so, roll

back canaries.

• Named after canaries

in coal mines.

42


Implementation of canaries

• Designate a collection of instances as canaries. They do

not need to be aware of their designation.

• Designate a collection of customers as testing the

canaries. Can be, for example

– Organizationally based

– Geographically based

• Then

– Activate feature or version to be tested for canaries.

Can be done through feature activation

synchronization mechanism

– Route messages from canary customers to canaries.

Can be done through making registry/load balancer

canary aware.

43


A/B testing

• Suppose you wish to test user response to a

system variant. E.g. UI difference or marketing

effort. A is one variant and B is the other.

• You simultaneously make available both

variants to different audiences and compare the

responses.

• Implementation is the same as canary testing.

44


Rollback

• New versions of an app may be unacceptable

either for logical or performance reasons.

• Two options in this case

• Roll back (undo deployment)

• Roll forward (discontinue current deployment and

create a new release without the problem).

• Decision to rollback or roll forward is almost

never automated because there are multiple

factors to consider.

• Forward or backward recovery

• Consequences and severity of problem

• Importance of upgrade 45


Summary

• Speeding up deployment time will reduce time to

market

• Continuous deployment is a technique to speed

up deployment time

• Microservice architecture is designed for

minimizing coordination needs and allowing

independent deployment

• Multiple simultaneous versions managed with

feature toggles.

• Feature toggles support rollback, canary testing,

and A/B testing.

46


NICTA team

• Liming Zhu

• Ingo Weber

• Min Fu

• Sherry Xu

• Daniel Sun

• Ah Binh Tran

• Chao Li

47


More Information

Contact

[email protected]

Book is due out May,

2015

Research papers:

ssrg.nicta.com.au’projects

/cloud

48

mailto:[email protected]