Upgrading Microservices (Continuously...) 1 Presented by Rean Griffith
Upgrading Microservices(Continuously...)
1
Presented by Rean Griffith
Summary● Upgrades should be:
○ “Boring”○ Uneventful○ Predictable○ Frequent!○ Reversible (think compensating actions and redo vs. db transactions)
● Good news: many layers of patterns to use (bad news: many anti-patterns)○ Some patterns are practice, others are structural (make it easy to do “the right thing”)
● Working definition for upgrade includes○ Changing how/where your microservices are deployed (new base image, new kernel, new
physical/virtual machine, new configurations/ports etc.)○ Changing what is deployed (new features, new feature variants)
2
Agenda● Bio● Microservice upgrade considerations● What customers want● Upgrade challenges● Options customers have
○ Upgrade options for microservices running in Containers■ Pros and Cons of each approach
○ Upgrade options for microservices running in Unikernels■ Pros and Cons of each approach■ Can we learn from docker upgrade experience and apply it to unikernel microservices?
● Summary
3
Bio● Operating Systems + Distributed Systems + ML Person
○ Operating systems, resource management, cluster scheduling, machine learning
○ 5 years in VMware CTO Office (Network-aware DRS, Network resource management, autoscaling systems, data-mining VM telemetry + anomaly detection)
○ 2 years Post Doc in RAD Lab at UC Berkeley (Machine Learning + Systems, OpenFlow, Datacenter transport)
○ Ph.D. in Computer Science - Columbia University○ B.Sc. in Computer Science and Management - University of the West Indies (Barbados)
4
Microservices Upgrade Considerations● Important to consider upgrade logistics early
○ Successful upgrades a mix of design/architecture and process
○ Using technology ‘X’ won’t compensate for poor design, fuzzy boundaries, or poor state management. Won’t automatically get rolling upgrade, etc. without forethought!
○ Upgrading a single application feature might require changes to multiple microservices
● Upgrade differences from monolith○ Online (perhaps temporarily degraded) vs. offline expectations○ Higher frequency of upgrades anticipated○ More outbound dependencies (e.g., dns, storage, other/external services etc.)○ If using containers then factor in Docker registry dependencies, security and latency
5
Typical Upgrade Experience?● Kubernetes Operations (Kops) anecdote
6
What Customers Want● Request for “docker restart” with updated image● I want to apply upgrade to production in zero downtime● Faster post-upgrade bring-up of new services/instances (batch dns updates)● Smooth cluster creation process● Better interaction with networking
7
Continuous Upgrades
Do Upgrade/Update
Undo!(oops) Validate
8
Continuous Upgrades: (Some) Challenges
Undo!(oops)
9
Config,Component
or dependency
mismatch
Time outs, Failed
Upgrade - Unable to Rollback
Post-upgrade version is buggy or new CVE
(vuln) found
Post-upgrade
performance is worse
Agenda● Microservice upgrade considerations● What customers want● Upgrade challenges● Options customers have
○ Upgrade options for microservices running in Containers■ Pros and Cons of each approach
○ Upgrade options for microservices running in Unikernels■ Pros and Cons of each approach■ Can we learn from docker upgrade experience and apply it to unikernel microservices?
● Summary
10
Container Upgrade - Option 1: Manual● Demo● Pros
○ Simple
● Cons (pain points)○ Prone to manual mistakes
■ Could result in server failing to come up after upgrade
■ Could result in dropped client connections and/or data loss in the case of a stateful application
○ Not scalable
11
Container Upgrade - Option 2: Watchtower● Monitor running Docker containers
○ Pulls new images when changes detected and restarts container using new image○ Image restarted using “...the same options that were used when it was deployed initially”
● Demo● Pros
○ New image push triggers the workflow
○ Detect links between containers and start/stop them “...in a way that won't break any of the links”
● Cons (pain points)○ Assumes new start options = old/initial start options○ No validation of image and its config options post upgrade
12
Container Upgrade - Option 3: Git hook (resin.io)Linux Containers for IoT (Yocto Linux + Resin Container Engine)
Pros
● Automates dev push to image build + deploy
Cons (pain points)
● Root causing upgrade interruptions
● IoT device only● Device control via Yocto Linux + RCE● Git submodule incompatible
13
Continuous Upgrades: Missing Upgrade Workflows
Do Upgrade/Update
Undo!(oops) Validate
14
Manual upgrade,Watchtower,Resin.io
Resin.io
Ex: Patterns to Integrate to Capture Cont. Upgrade● Immutable Server
○ Deployed instance carved in “stone”, config changes => new deployed instance
● Blue/Green Deployments○ Separate infrastructure for different “versions”/deployments
● Canary Release○ Introduce new functionality incrementally (different from A/B test)
● Monitoring (upgrade validation)● Response Diffing (upgrade validation)
○ Validating old and new service versions via (automated) response comparisons (e.g., using Diffy)
15
Response Diffing with Diffy (Twitter)
● Primary, Secondary run “last known good code”● Candidate runs new code● Compare #Primary-Secondary differences with #Primary-Candidate differences● Noise example: candidate, primary and secondary all disagree 16
Revisit Watchtower Container Upgrade with Patterns
● Immutable Server○ Each git push builds new image
● Blue/Green Deployments○ New images deployed on new set of instances
● Canary Release○ Introduce new functionality incrementally into the newly active deployment
● Response Diffing (upgrade validation)○ Validating old and new service versions via (automated) response comparisons in newly
active deployment (e.g., using Diffy) 17
Active deployment + Canaries + Previous version
Inactive deployment w/Previous version
Revisit Resin.io Container Upgrade with Patterns
● Immutable Server○ Each git push builds new image
● Blue/Green Deployments○ May not be applicable unless new deployment = new set of drones (edge devices)
● Canary Release○ Likely more applicable: Introduce new functionality incrementally into the newly active
deployment. Should preserve previous container image so there’s a rollback story!
● Monitoring (upgrade validation)
18
Active deployment + Canaries + Previous version
Agenda● Microservice upgrade considerations● What customers want● Upgrade challenges● Options customers have
○ Upgrade options for microservices running in Containers■ Pros and Cons of each approach
○ Upgrade options for microservices running in Unikernels■ Pros and Cons of each approach■ Can we learn from docker upgrade experience and apply it to unikernel microservices?
● Summary
19
Upgrading Microservices in Unikernels● Unikernel (working definition)
○ Single purpose (single-process) virtual appliance (multi-threading available)
○ Statically linked image of your Application and a hypervisor (no general OS or extra library code)
○ No extraneous services, no (full-fledged) shell, no fork() facility to start a second process
● Small form-factor deployments○ Well-suited for storage-constrained edge deployments
● Some best practices baked-in○ A unikernel is a purpose-built targeted virtual appliance (immutable server)○ Config baked into image (role also set in stone)○ Blue/Green-friendly (each instance is a new VM)
20
Unikernel Upgrades with Patterns
● Immutable Server○ Must build a new statically-linked virtual appliance on each dev change
● Blue/Green Deployments○ New virtual appliance images launched as new VMs
● Canary Release○ Slightly modified VM images launched in newly active deployment
● Response Diffing (upgrade validation)○ Validating old and new service versions via (automated) response comparisons in newly
active deployment (e.g., using Diffy) 21
Active deployment + Canaries + Previous version
Inactive deployment w/Previous version
Unikernel Upgrade Story (Nodejs + OSv)● Nodejs 4.1.1 + App + OSv
○ Upgrade to nodejs 4.6.1 (rebuild + run => worked)○ Upgrade to nodejs 6.9.1 (rebuild + run => worked most of the time)
○○○○ Upgrade to nodejs 7.0.0 (rebuild + run => worked most of the time - same issue)○ Short term fix: rollback to 4.6.1 image. Long term fix: fix OSv pthread_mutex_trylock wrapper
22
Summary● We want upgrades to be routine (boring) and frequent● Working towards continuous upgrades requires a combination of
design/architecture and process● Many tools capture upgrade steps but not the higher-level desirable workflows
○ Combining patterns/lessons from deploying containers can help capture these workflows○ These patterns can be applied to container and unikernel deployments
23
Acknowledgements● Special thanks to:
○ You (the audience) for your time and attention○ Cisco (our meetup hosts)○ Jean-Paul Calderone○ Erika Ghose, DJ ○ Madhuri Yechuri
● Contact info:○ [email protected]
24