h u d s o n b a y t e c h
2011: Boo: we have a monolith! Maybe these micro-services can help us move faster!
2012: This is AMAZING!
Gilt’s Microservice Hype Cycle
2013 :Look at all these services!
2011: Boo: we have a monolith! Maybe these micro-services can help us move faster!
2012: This is AMAZING!
2013: Look at all these services!
2014: Holy cr&p, what have we done? Look at ALL these services
2015: Let’s get a handle on§ this
Gilt’s Microservice Hype Cycle
2016: Ah, the sweet taste of awesome sauce.
2011: java, loosely-typed, monolithic services
(5) Hidden linkages; buried business logic
(4) Monolithic Java App; huge bottleneck for innovation.
(2) Lots of duplicated code :(
(3) Teams focused on business lines
(1) Large loosely-typed JSON/HTTP services
enter: µ-services
“How can we arrange our teams around strategic initiatives? How can we make it fast
and easy to get to change to production?”
driving forces behind gilt’s emergent architecture● team autonomy● voluntary adoption (tools, techniques, processes)● kpi or goal-driven initiatives● failing fast and openly● open and honest, even when it’s difficult
anatomy of a gilt service - typical choices
gilt-service-framework,
log4j, cloudwatch Cave,
, , javascript
or
service discovery: straight forward
zookeeper
Brocade Traffic Manager (aka Zeus, Stringray, SteelApp,...)
Lift-and-shift + elastic teams
Existing Data Centre
Dual 10Gb direct connect line, 2ms latency.
‘Legacy VPC’
MobileCommon Person-alisation Admin Data
(1) Deploy to VPC
(2) ‘Department’ accounts for elasticity & devops
Lessen dependencies between teams: faster code-to-prod
Lots of initiatives in parallel
Your favourite <tech/language/framework> here
We (heart) μ-servicesGraceful degradation of service
Disposable Code: easy to innovate, easy to fail and move on.
We (heart) cloudDo devops in a meaningful way.Low barrier of entry for new tech (dynamoDB, Kinesis, ...)Isolation
Cost visibilitySecurity tools (IAM)Well documentedResilience is easyHybrid is easyPerformance is great
Lessons from the Slope:
1. µservice architecture is emergent
2. manage ownership & risk
3. make your clients thin
4. avoid snowflakes
5. test in production where possible
It’s hard to think of architecture in one dimension.
We added ‘Functional Area’, ‘System’ and ‘Subsystem’ columns to Gilt Genome; provides a strong (although subjective) taxonomy.
It turns out we have an elegant, emergent architecture.
Some services / components are deceptively simple.
Others are simply deceptive, and require knowledge of their surrounding ‘constellation’
n = 265, where n is the number of services.
Gilt Admin (Legacy Ruby on Rails Application)
City
Discounts FinancialReporting
Fraud Mgmt
Gift Cards Inventory Mgmt Order Mgmt
Sales Mgmt Product Catalog
Purchase Orders
Targetting
Billing
Other Admin Applications (Scala + Play Framework)*
City Creative (2) CS
Discounts Distribution i18n Inventory (2)
Order Processing
(2)Util
Service Constellations (Scala, Java)*
Auth (1) Billing (1) City (6) Creative (4) CS (2) Discounts (1) Distribution (9) i18n (3) inventory (6)
Order Processing
(8)Payments (3) Product
Catalog (5) Referrals (1) Util (2)
Core Database - ‘db3’
Job System (Java, Ruby)
Gilt Logical Architecture - Back Office Systems
* counts denote number of service / app components.
Simply deceptive: service context only make sense in constellation.
Emergent Architecture: Using the three-level taxonomy approach, we’ve been able to get a better understanding of an emergent architecture, at a department level, and where the complexity lies.
We’ve also concluded that the department is the right level of granularity for consensus on technical decisions (language, framework, …)
Gilt’s Architecture Board set’s the overall standards that teams must follow when interacting across departmental boundaries. HTTP. REST. DNS. AWS.
1. Software is owned by departments, tracked in ‘genome project’. Directors assign services to teams.
2. Teams are responsible for building & running their services; directors are accountable for their overall estate.
bottom-up ownership, RACI-style
Notes:
Zero Power, High Influence: The Architecture Board https://github.com/gilt/arch-board Gilt Standards and Recommendations: https://github.com/gilt/standards
30%Amount of time a department should spend on operations / maintenance / red-hot.
We build the notion of SRE (Site Reliability Engineering) into the team.
‘ownership donut’ informs tech strategy
We classify ownership as: active, passive, at-risk.
‘done’ === 0% ‘at risk’
Emergent Architecture + Ownership Oriented Org: “You just pulled an inverse Conway manoeuvre”
Back-Office Personalisation Mobile Web & Core Services
Back-Office Personalisation Mobile Web & Core Services
Architectural Area
Department
Consumer Dependencies
ConsumerRepo
Take as few code dependencies as possible. This stuff HURTS when n ~= 300.
Service Code
Common Code
Client Code
ServiceRepo
Service Dependencies
Client JAR
Dependency hell as client JAR dependencies conflicts with service dependencies.
X
This is way easier. http://apidoc.me
<<apidoc>>Service API
Service CodeServiceRepo Service
Dependencies
Consumer Dependencies
ConsumerRepo
apidoc: define RESTful service API agnostically and generate dependency free, thin clients.
Client Code
<< generate>>
Service Stub
<< generate>>
6Andrey’s Rule of Six:
“We could solve this now, or, just wait six months, and Amazon will provide a solution”
Andrey Kartashov, Distinguished Engineer, Gilt.
Current thinking on deployment:
(1) Re-use as much AWS tooling as possible: Code Pipeline, Code Deploy, Cloud Formation.
(2) Very lightweight tool chain to support dark canaries, canary releases, phased roll-out and roll-back: NOVA
https://github.com/gilt/nova
Testing and TiPMaintaining stage environments in a micro-service architecture is HARD.
Prefer to test in production where possible: use dark canaries, canaries, staged roll-out and roll-back.
Invest in monitoring and alerting over hard-to-maintain test pipelines.
Where teams need a stage environment, let them build a minimal environment, and manage it themselves.
Estimate: about 85% of Gilt’s teams use TiP techniques; 15% need a stage environment.
Lessons from the Slope:
1. µservice architecture is emergent
2. manage ownership & risk
3. make your clients thin
4. avoid snowflakes
5. test in production where possible