Top Banner
Kubernetes and lastminute.com group: our course towards better scalability and processes [email protected] @micheleorsi Rome, 24-25 March 2017
38

Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Apr 11, 2017

Download

Technology

Michele Orsi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Kubernetes and lastminute.com group: our course towards better scalability and processes

[email protected]@micheleorsi

Rome, 24-25 March 2017

Page 2: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

An inspiring travel company

2

Page 3: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

A tech company to the core

Tech department: 300+ people

Applications: ~100

Database: 4 TB data

Servers: 1400 VMs, 300 physical machines

Locations: Chiasso, Milan, Madrid, London, Bengaluru

3

Page 4: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

https://www.pexels.com/photo/turtle-walking-on-sand-132936/

Business: "technology is slow"

Page 5: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Technology: "the monolith is the problem"

https://www.flickr.com/photos/southtopia/5702790189

Page 6: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/

"... let’s break into microservices"

Page 7: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

A lot of issues

● LONG provisioning time

● LACK OF alignment across environments

● LACK OF alignment across applications

● LACK OF awareness about ops (monitoring, alerting)

7

Page 8: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

An year-long endeavour

● build a new, modern infrastructure

● migrate the search (flight/hotel) product there

... without:

● impacting the business● throwing away our whole datacenter

8

Page 9: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Our plan

● same architecture across environments

● a common framework to align software

● centralized monitoring/logging, with alerts

● zero downtime deployment

● automation everywhere

9

Page 10: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

How? Teams and peopleNew teams

https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/

Page 11: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Our infrastructure and technologyOur infrastructure and technology

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Page 12: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Docker containers

registry.intra/application:v2-090025032017

BASE OS

JAVA JRE

START/STOP SCRIPTS

JAR APPLICATION

● build once, run everywhere

● externalised configuration

12

Page 13: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Kubernetes

● independent from OS/hosts

● isolated env, managed at scale

● self-healing

● externalised configuration

Omega paper: http://research.google.com/pubs/pub41684.html

13

Page 14: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

https://www.pexels.com/photo/red-toy-truck-24619/

"Your infrastructure on wheels"

Page 15: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Kubernetes: physical representation

NODE1

cluster

NODE2

NODE70

...

K8S

DOCKER

FLAN

NE

LD

ET

CD

Ubuntu

K8S

DOCKER

FLAN

NE

LD

ET

CD

Ubuntu

K8S

DOCKER

FLAN

NE

LD

ET

CD

Ubuntu

15

Page 16: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Kubernetes: logical representation

NAMESPACE1CPU 10

MEM 40GB

cluster

NAMESPACE2CPU 20

MEM 80GB

NAMESPACE3CPU 80

MEM 90GB NAMESPACE4CPU 100

MEM 10GB

16

Page 17: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

APP3-PRODUCTION

Kubernetes: our architecture

APP2-PRODUCTIONAPP1-PRODUCTION

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-PREVIEW

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-DEVELOPMENT

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-QA

nonproductionproduction

17

Page 18: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

APP1-PRODUCTION

Kubernetes: our architecture and choices

POD

collectd

production

applicationfluentdcarbon

18

Page 19: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

APP1-PRODUCTION

POD

Monitoring and alerting: grafana + graphite

cluster

graphiteapplication

Grafana 4

icons from http://www.flaticon.com

collectd

carbon

19

Page 20: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Kubernetes: our architecture and choices

APP1-PRODUCTION

deployment

replica-set

app1.lastminute.intra

secret configmap

POD3

POD2

POD1

production

20

Page 21: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Kubernetes: what’s left outside?

● datastores

○ DBs

○ logs

○ metrics

● distributed caches

● distributed locking

● pub-sub

21

Page 22: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

1st try (with test app), it seemed to work

https://www.flickr.com/photos/26516072@N00/2194001232

Page 23: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Self-healing

ref: https://technologyconversations.com/2016/01/26/self-healing-systems

application

I am fine, thanks

Hey, how are you?

Hey, how are you?

I have problems

23

Page 24: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Kubernetes contract

"When a container is dead I will restart it"

"When a container is ready I will forward traffic to it"

Page 25: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Kubernetes probes: liveness & readiness

Two questions:

● when can I consider my container alive?

● when can I consider my container ready to receive traffic?

spec: containers: livenessProbe: httpGet: path: /liveness

readinessProbe: httpGet: path: /readiness

deployment.yaml

Page 26: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

/liveness:

● when tomcat container is up● when ratio active/max threads < threshold

/readiness:

● all the startup jobs have run

.. ongoing never-ending research ..

Our choices: framework - k8s

26

Page 27: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

● zero downtime during rollout

● resilience improved

● legacy infrastructure to the rescue in case of problem

2nd try (with production traffic)

27

Page 28: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

... failure ... the big one!

https://www.flickr.com/photos/ghost_of_kuji/2763674926

Page 29: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Problems

● configuration

● infrastructure

● tools

● manual mistakes

● (external) scalability

29

Page 30: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

● temporary team focus on objective

● automation

● Go deeper in docker/kubernetes

Another improvement step

30

Page 31: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Pipeline: a huge step forward

microservice = factory.newDeployRequest().withArtifact("com.lastminute.application1",2)

lmn_deployCanaryStrategy(microservice,"qa") lmn_deployCanaryStrategy(microservice,"preview")lmn_deployCanaryStrategy(microservice,"production")

pipeline

31

Page 32: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Pipeline: a huge step forward

● git push○ continuous integration○ continuous delivery

pulljar

builddocker

(gate)

QAcanary

(gate)

QAstable

(gate)

PREVcanary

(gate)

PREVstable

(gate)

PRODcanary

(gate)

PRODstable

32

Page 33: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

"Go" deep .. whatever language it takes

https://www.pexels.com/photo/sea-man-person-ocean-2859/

Page 34: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

nginx ingress controller problem

NGINX

NGINX

NGINX

LB

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

10.0.0.5

10.0.0.6

NGINX

NGINX

NGINX

NGINX

NGINX

34

Page 35: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

There’s light .. There’s a light .. at the end

https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/

Page 36: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

● lead and migration time

● resilience

● root cause analysis

● speed of deployment

● instant and easy scaling

... benefits

36

Page 37: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

● 70 physical nodes, 1300 pods, 5200 containers● 20k req/sec in the new cluster● 35 micro-services migrated in 6 months● 10 minutes to create a new environment ● whole pipeline runs in 16 minutes

○ 4 minutes to release 100 instances of a new version● 2M metrics/minute flows

Give me the numbers!

37

Page 38: Kubernetes and lastminute.com: our course towards better scalability and processes (Codemotion Rome 2017)

Yes, we’re hiring!

THANKS

careers.lastminutegroup.com

38