Top Banner
h u d s o n b a y t e c h
63

GeeCon 2016: Scaling Microservices at Gilt

Apr 16, 2017

Download

Software

Adrian Trenaman
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GeeCon 2016: Scaling Microservices at Gilt

h u ds o nb a yt e c h

Page 2: GeeCon 2016: Scaling Microservices at Gilt
Page 3: GeeCon 2016: Scaling Microservices at Gilt

~300… the number of micro services running gilt.com.

But what about the Persian horde?

Page 4: GeeCon 2016: Scaling Microservices at Gilt

Gilt: luxury designer brands at discounted prices

Page 5: GeeCon 2016: Scaling Microservices at Gilt

we shoot the product in our studios

Page 6: GeeCon 2016: Scaling Microservices at Gilt

we receive, store, pick, pack and ship...

Page 7: GeeCon 2016: Scaling Microservices at Gilt

we sell every day at noon...

Page 8: GeeCon 2016: Scaling Microservices at Gilt

stampede...

Page 9: GeeCon 2016: Scaling Microservices at Gilt

this is what the stampede really looks like...

Page 10: GeeCon 2016: Scaling Microservices at Gilt

The Hype Cycle

Page 11: GeeCon 2016: Scaling Microservices at Gilt

2011: Boo: we have a monolith! Maybe these micro-services can help us move faster!

2012: This is AMAZING!

Gilt’s Microservice Hype Cycle

2013 :Look at all these services!

Page 12: GeeCon 2016: Scaling Microservices at Gilt

service growth over time: point of inflexion === scala.

Page 13: GeeCon 2016: Scaling Microservices at Gilt

2011: Boo: we have a monolith! Maybe these micro-services can help us move faster!

2012: This is AMAZING!

2013: Look at all these services!

2014: Holy cr&p, what have we done? Look at ALL these services

2015: Let’s get a handle on§ this

Gilt’s Microservice Hype Cycle

2016: Ah, the sweet taste of awesome sauce.

Page 14: GeeCon 2016: Scaling Microservices at Gilt

from rails to riches

Page 15: GeeCon 2016: Scaling Microservices at Gilt

rails to riches: 2007 - ruby-on-rails monolith

Page 16: GeeCon 2016: Scaling Microservices at Gilt

2011: java, loosely-typed, monolithic services

(5) Hidden linkages; buried business logic

(4) Monolithic Java App; huge bottleneck for innovation.

(2) Lots of duplicated code :(

(3) Teams focused on business lines

(1) Large loosely-typed JSON/HTTP services

Page 17: GeeCon 2016: Scaling Microservices at Gilt

enter: µ-services

“How can we arrange our teams around strategic initiatives? How can we make it fast

and easy to get to change to production?”

Page 18: GeeCon 2016: Scaling Microservices at Gilt

2015: micro-services

Page 19: GeeCon 2016: Scaling Microservices at Gilt

driving forces behind gilt’s emergent architecture● team autonomy● voluntary adoption (tools, techniques, processes)● kpi or goal-driven initiatives● failing fast and openly● open and honest, even when it’s difficult

Page 20: GeeCon 2016: Scaling Microservices at Gilt

anatomy of a gilt service

Page 21: GeeCon 2016: Scaling Microservices at Gilt

anatomy of a gilt service - typical choices

gilt-service-framework,

log4j, cloudwatch Cave,

, , javascript

or

Page 22: GeeCon 2016: Scaling Microservices at Gilt

service discovery: straight forward

zookeeper

Brocade Traffic Manager (aka Zeus, Stringray, SteelApp,...)

Page 23: GeeCon 2016: Scaling Microservices at Gilt

cloudiness

Page 24: GeeCon 2016: Scaling Microservices at Gilt

from bare-metal...

PHXIAD

Page 25: GeeCon 2016: Scaling Microservices at Gilt

… to vapour.

Page 26: GeeCon 2016: Scaling Microservices at Gilt

Lift-and-shift + elastic teams

Existing Data Centre

Dual 10Gb direct connect line, 2ms latency.

‘Legacy VPC’

MobileCommon Person-alisation Admin Data

(1) Deploy to VPC

(2) ‘Department’ accounts for elasticity & devops

Page 27: GeeCon 2016: Scaling Microservices at Gilt

single tenant: one EC2 instance per service instance

Page 28: GeeCon 2016: Scaling Microservices at Gilt

reproducible, immutable deployments: docker

Page 29: GeeCon 2016: Scaling Microservices at Gilt

service discovery: same pattern, different LB

zookeeper

Amazon ELB

Page 30: GeeCon 2016: Scaling Microservices at Gilt

# running instances per service: ‘rule of three’

Page 31: GeeCon 2016: Scaling Microservices at Gilt

AWS instance sizing

Page 32: GeeCon 2016: Scaling Microservices at Gilt

Lessen dependencies between teams: faster code-to-prod

Lots of initiatives in parallel

Your favourite <tech/language/framework> here

We (heart) μ-servicesGraceful degradation of service

Disposable Code: easy to innovate, easy to fail and move on.

Page 33: GeeCon 2016: Scaling Microservices at Gilt

We (heart) cloudDo devops in a meaningful way.Low barrier of entry for new tech (dynamoDB, Kinesis, ...)Isolation

Cost visibilitySecurity tools (IAM)Well documentedResilience is easyHybrid is easyPerformance is great

Page 34: GeeCon 2016: Scaling Microservices at Gilt

Lessons from the Slope:

1. µservice architecture is emergent

2. manage ownership & risk

3. make your clients thin

4. avoid snowflakes

5. test in production where possible

Page 35: GeeCon 2016: Scaling Microservices at Gilt

emergent architecture

Page 36: GeeCon 2016: Scaling Microservices at Gilt
Page 37: GeeCon 2016: Scaling Microservices at Gilt

It’s hard to think of architecture in one dimension.

n = 265, where n is the number of services.

Page 38: GeeCon 2016: Scaling Microservices at Gilt

… we used a “spread sheet”.‘The Gilt Genome Project’

Page 39: GeeCon 2016: Scaling Microservices at Gilt

It’s hard to think of architecture in one dimension.

We added ‘Functional Area’, ‘System’ and ‘Subsystem’ columns to Gilt Genome; provides a strong (although subjective) taxonomy.

It turns out we have an elegant, emergent architecture.

Some services / components are deceptively simple.

Others are simply deceptive, and require knowledge of their surrounding ‘constellation’

n = 265, where n is the number of services.

Page 40: GeeCon 2016: Scaling Microservices at Gilt

Deceptively Simple - many services are small; < 2048 loc

Page 41: GeeCon 2016: Scaling Microservices at Gilt

Deceptively Simple - many services are small, < 32 files.

Page 42: GeeCon 2016: Scaling Microservices at Gilt

Gilt Admin (Legacy Ruby on Rails Application)

City

Discounts FinancialReporting

Fraud Mgmt

Gift Cards Inventory Mgmt Order Mgmt

Sales Mgmt Product Catalog

Purchase Orders

Targetting

Billing

Other Admin Applications (Scala + Play Framework)*

City Creative (2) CS

Discounts Distribution i18n Inventory (2)

Order Processing

(2)Util

Service Constellations (Scala, Java)*

Auth (1) Billing (1) City (6) Creative (4) CS (2) Discounts (1) Distribution (9) i18n (3) inventory (6)

Order Processing

(8)Payments (3) Product

Catalog (5) Referrals (1) Util (2)

Core Database - ‘db3’

Job System (Java, Ruby)

Gilt Logical Architecture - Back Office Systems

* counts denote number of service / app components.

Simply deceptive: service context only make sense in constellation.

Page 43: GeeCon 2016: Scaling Microservices at Gilt

Emergent Architecture: Using the three-level taxonomy approach, we’ve been able to get a better understanding of an emergent architecture, at a department level, and where the complexity lies.

We’ve also concluded that the department is the right level of granularity for consensus on technical decisions (language, framework, …)

Gilt’s Architecture Board set’s the overall standards that teams must follow when interacting across departmental boundaries. HTTP. REST. DNS. AWS.

Page 44: GeeCon 2016: Scaling Microservices at Gilt

ownership

Page 45: GeeCon 2016: Scaling Microservices at Gilt

1. Software is owned by departments, tracked in ‘genome project’. Directors assign services to teams.

2. Teams are responsible for building & running their services; directors are accountable for their overall estate.

bottom-up ownership, RACI-style

Page 46: GeeCon 2016: Scaling Microservices at Gilt

Notes:

Zero Power, High Influence: The Architecture Board https://github.com/gilt/arch-board Gilt Standards and Recommendations: https://github.com/gilt/standards

Page 47: GeeCon 2016: Scaling Microservices at Gilt

The perfect size for a team

5 ± 2

Page 48: GeeCon 2016: Scaling Microservices at Gilt

The perfect size for a ‘department’ (team of teams)

20 ± 4

Page 49: GeeCon 2016: Scaling Microservices at Gilt

30%Amount of time a department should spend on operations / maintenance / red-hot.

We build the notion of SRE (Site Reliability Engineering) into the team.

Page 50: GeeCon 2016: Scaling Microservices at Gilt

‘ownership donut’ informs tech strategy

We classify ownership as: active, passive, at-risk.

‘done’ === 0% ‘at risk’

Page 51: GeeCon 2016: Scaling Microservices at Gilt

Getting a handle on ownership...

Jul 2015 Sep 2015

Oct 2015 Feb 2015

Page 52: GeeCon 2016: Scaling Microservices at Gilt

Emergent Architecture + Ownership Oriented Org: “You just pulled an inverse Conway manoeuvre”

Back-Office Personalisation Mobile Web & Core Services

Back-Office Personalisation Mobile Web & Core Services

Architectural Area

Department

Page 53: GeeCon 2016: Scaling Microservices at Gilt

thin clients

Page 54: GeeCon 2016: Scaling Microservices at Gilt

Consumer Dependencies

ConsumerRepo

Take as few code dependencies as possible. This stuff HURTS when n ~= 300.

Service Code

Common Code

Client Code

ServiceRepo

Service Dependencies

Client JAR

Dependency hell as client JAR dependencies conflicts with service dependencies.

X

Page 55: GeeCon 2016: Scaling Microservices at Gilt

This is way easier. http://apidoc.me

<<apidoc>>Service API

Service CodeServiceRepo Service

Dependencies

Consumer Dependencies

ConsumerRepo

apidoc: define RESTful service API agnostically and generate dependency free, thin clients.

Client Code

<< generate>>

Service Stub

<< generate>>

Page 56: GeeCon 2016: Scaling Microservices at Gilt

stop building snowflakes

Page 57: GeeCon 2016: Scaling Microservices at Gilt

7 different code deployment pipelines... Really?

Page 58: GeeCon 2016: Scaling Microservices at Gilt

6Andrey’s Rule of Six:

“We could solve this now, or, just wait six months, and Amazon will provide a solution”

Andrey Kartashov, Distinguished Engineer, Gilt.

Page 59: GeeCon 2016: Scaling Microservices at Gilt

Current thinking on deployment:

(1) Re-use as much AWS tooling as possible: Code Pipeline, Code Deploy, Cloud Formation.

(2) Very lightweight tool chain to support dark canaries, canary releases, phased roll-out and roll-back: NOVA

https://github.com/gilt/nova

Page 60: GeeCon 2016: Scaling Microservices at Gilt

testing in production

Page 61: GeeCon 2016: Scaling Microservices at Gilt

Testing and TiPMaintaining stage environments in a micro-service architecture is HARD.

Prefer to test in production where possible: use dark canaries, canaries, staged roll-out and roll-back.

Invest in monitoring and alerting over hard-to-maintain test pipelines.

Where teams need a stage environment, let them build a minimal environment, and manage it themselves.

Estimate: about 85% of Gilt’s teams use TiP techniques; 15% need a stage environment.

Page 62: GeeCon 2016: Scaling Microservices at Gilt

Lessons from the Slope:

1. µservice architecture is emergent

2. manage ownership & risk

3. make your clients thin

4. avoid snowflakes

5. test in production where possible

Page 63: GeeCon 2016: Scaling Microservices at Gilt

#thanks @adrian_trenaman @gilttech