Page 1
Opportunities&ChallengesinAdoptingMicroserviceArchitectureforEnterpriseWorkloads
ShriramRajagopalan,PriyaNagpurkar,TamarEilam,
andHaniJamjoomEtaiLev-Ran,andVitaBortnikov
IBMWatson Research IBMResearch,Haifa
FrankBudinsky
IBM
Contact:[email protected]
Page 2
• Emergenceofmicroservices&DevOps
• Challenges&Opportunities
• AdoptingaSDNperspectiveofmicroservices
• Version/Content-awarerouting
• Systematicresiliencetesting
2
Page 3
• Emergenceofmicroservices&DevOps
• Challenges&Opportunities
• AdoptingaSDNperspectiveofmicroservices
• Version/Content-awarerouting
• Systematicresiliencetesting
3
Page 4
FromMonolithstoMicroservices
MonolithicServiceinstances
Microserviceinstances
• Eachserviceservesasinglepurpose (functionality)
• Manyloosely-coupledmicroservicescommunicateoverthenetwork
Well-definedAPI
4
• Asingleserviceservesmultiplepurposes
• Tight-coupling acrossservices
Page 5
FromWaterfalltoDevOps
5
Plan Develop Test Deploy
monthstoyears
Features,performance improvements,bugfixes,etc.,areperiodically deliveredasonebigupdate
hourstodays
Continuousdeliveryofincrementalupdates
P T DD P T DD P T DD P T DD P T DD
Plan Develop Test Deploy
Culture+Automation +Instrumentation
Emphasizesconstantexperimentation&feedback-drivendevelopment
Page 6
Microservices+DevOps
• Polyglotapplicationswithloosely-coupledmicroservices
• Small“twopizza”teamspermicroservice– Autonomy&accountability– Owntheroadmapforthe
feature/service– Independent launchschedules
• Develop,deploy, scale– “Youbuildit,yourunit”
• 10sto100sofdeploymentsadayacrosstheapplication– E.g.,Orbitz,GrubHub,HubSpot
• Multipleversionsco-existsimultaneously
6
Users
MicroserviceA
B
C
D D’
Application
F
⋮
RDBMS
MessageBus
NoSQL
CloudPlatformServices
…
3rdpartyInternet Services
SocialMedia
MobilePushNotification
Ruby
Node.js
Go Java
Page 7
“Traditional”EnterprisesaremovingorhavemovedtoMicroservices+DevOps
7
Page 8
• Emergenceofmicroservices&DevOps
• Challenges&Opportunities
• AdoptingaSDNperspectiveofmicroservices
• Version/Content-awarerouting
• Systematicresiliencetesting
8
Page 9
Opportunities
• Enterprisesare– Re-architecting legacyapplicationstomicroservicearchitecture– Developingin-houseplatforms tohostsensitiveappsonpremise
• E.g.Fidelity’sMako
– Stillexperimentingwithdifferentdesignalternatives– Heavilyleveragingopen-sourcetechnologies
• Opportunityfortheresearchcommunitytoengage– Influenceinfrastructure&applicationdesign– Integrateideasintoopen-sourceplatformsandsolutions
9
Page 10
Challenges
• 10sto100sofdeploymentsadayacrosstheapplication
• Multipleversionsco-existsimultaneously
• Complexityshiftedtothenetworkandorchestrationacrossservices
• Cascading failuresdespitethemicroservicesbeingdesignedforfailure
10
Users
MicroserviceA
B
C
D D’
Application
F
⋮
RDBMS
MessageBus
NoSQL
CloudPlatformServices
…
3rdpartyInternet Services
Facebook
MobilePushNotification
Ruby
Node.js
Go
Page 11
Ad-hocDesigns&Implementations
11
• TwoOptions:
• Adoptopen-sourceframeworksfromlargescaleinternetapplications(e.g.,NetflixOSS)• Theseframeworksarepointsolutions thatfittheneeds&environment of
thecompaniesthatoperatetheseapplications(e.g.,Javaonlysupport)
• Shoehorntheservice-orientedwebapplicationintoclusteringframeworkslikeKubernetes,Marathon,etc.,andwritead-hoctoolsontoptocontrolthemicroservices
Page 12
• Emergenceofmicroservices&DevOps
• Challenges&Opportunities
• AdoptingaSDNperspectiveofmicroservices
• Version/Content-awarerouting
• Systematicresiliencetesting
12
Page 13
MicroserviceApplicationRequirements
• Integration– Serviceregistration&discovery– Loadbalancingofrequestsacrossmicroservice instances
• Version&content-awarerouting– Hypothesisdriven-development (i.e.A/Btesting)– Canarydeployments (featurereleaseto%ofusers)– Red/Blackdeployments (gradual rollout toallusers)– Etc.
• Operationaltestinginproduction– E.g.,doesfailurerecoveryworkasexpected?
13
Page 14
IntroducingAmalgam8
• Observation:– Microservices interactonlyoverthenetwork predominantly usingHTTP(s)– Existingsolutions lacktheabilitytodynamicallycontroltheroutingof
requestsbetweentwomicroservices
• Insight:– Thinkofrequestsaspacketsandmicroservicesasswitches– ALayer-7SDNwillsimplify integrationandrouting
• Design:– Sidecar:Aprogrammable layer-7proxyprocess attachedtoeachmicroservice– Controller:TheequivalentofanSDNcontroller,exceptatLayer-7
14
Page 15
RequestsA'
B
B’DataPlanew/TenantApps
Controller,ServiceRegistry
API
Multi-tenantControlPlane
C
SimplifyingIntegration
15Kubernetes,Marathon,Swarm,VMs,BareMetal
A
Sidecar
Tenant1
Tenant2
Tenant3
…
Page 16
• Emergenceofmicroservices&DevOps
• Challenges&Opportunities
• AdoptingaSDNperspectiveofmicroservices
• Version/Content-awarerouting
• Systematicresiliencetesting
16
Page 17
RequestsA'
B
B’DataPlanew/TenantApps
Controller,Registry
API
upgradefromBtoB’
Multi-tenantControlPlane
VersionRouting
C
Send35%ofiphonetraffictoA’and65%toA
17Kubernetes,Marathon,Swarm,VMs,BareMetal
A
Sidecar
Analytics
Canarydeployments
Red/Blackdeploy…
ActiveDeploy
Tenant1
Tenant2
Tenant3
…
Auto-rollbackifB’farespoorlycomparedtoB,withagivenconfidencemeasureRef.toCanaryAdvisor,ISSTA2015
Page 18
• Emergenceofmicroservices&DevOps
• Challenges&Opportunities
• AdoptingaSDNperspectiveofmicroservices
• Version/Content-awarerouting
• Systematicresiliencetesting
18
Page 19
ResilienceTesting
• Microservicesdesignedbut“seldom”testedforfailures
• Randomizedfaultinjection(e.g.,NetflixChaosMonkey)isinsufficient– Manualefforttovalidatewhetherapplicationrecoveredproperlyornot
• Gremlin– systematicresiliencetesting– Scriptfailurescenariosandexpectations– Faultsinjected fromthenetwork– Runassertionson thelogstovalidateexpectations– Exposesfaultyrecoverybehavior, conflicting failurehandlingpoliciesacross
services,etc.
19
Page 20
RequestsA'
B
B’DataPlanew/TenantApps
Controller,Registry
API
Multi-tenantControlPlane
VersionRouting
C
Overload(C)Assert (A’respondsin10ms)
20
FaultInjection
Kubernetes,Marathon,Swarm,VMs,BareMetal
A
Sidecar
GremlinResilienceTesting
Tenant1
Tenant2
Tenant3
…
Ref.toGremlin,ICDCS2016
Failuresareemulatedbymanipulatingnetworkinteractionsbetweenservices
(e.g.,delays,HTTP500s,etc.)
Assertionsarevalidatedagainstrequestlogstoidentifyfaultyrecoverybehavior
Page 21
ThankYou
• https://amalgam8.io
• https://github.com/amalgam8/examples
21
Page 23
ResearchChallengesintheFaceofContinuousChange
• Managingstatefulservicesanddatastores
• Problemdeterminationgainsmanydimensions– Theproblemmaynotjustbeinyourcode– Manydimensions changesimultaneously suchasinfrastructure, runtime, etc.– Canwepinpoint theissuedowntotheGit commitbycorrelatingruntime logs
anddevelopmenthistory?
• Toomuchdata,toolittleinsights– Logsemittedbyalllayersofthesoftwarestack,byautomatedbuild tools,etc.– Yet,wearenowhereclosetopinpointing theproblemandfixingitwhen
thingsgowrong!
23
Page 24
OpportunitiestoFixIssuesBeforeTheyOccur
• Softwarebuild,testanddeploymentphasesarecompletelyautomated
• Providesauniqueopportunitytocatchsecurityvulnerabilities,buggyimplementations,etc.,evenbeforesoftwareisdeployed
• However,existingtoolsandtechniquesdonotscaletotheextremecodechurn(100sofdeployments)
24