Top Banner
Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017
27

Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Jan 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Optimizing Mesos Utilization at Opentable

JAY CHININFRASTRUCTURE ENGINEERING

MesosCon Europe 2017

Page 2: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

1.4 Billion Online Reservations

MesosCon Europe 2017

2.3 Million Diners per Month58 Million verified reviews

Page 3: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

http

s://fl

ic.k

r/p/9

F6Kh

k

Before 2013

Page 4: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Every 2 Months

Phot

o C

redi

t : N

ASA

Page 5: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...
Page 6: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Search

Opentable Codebase

Reviews Emails

Reservations Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

Etc Etc Etc

Around 2013

Page 7: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...
Page 8: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

VirtualizationSearch

Codebase

Reviews Emails

Reservations Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

Etc Etc Etc

Search

DATACENTRE

Reviews Emails

Reservations Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

VM

VM

VM VMVM

VM VM

VM VM VM

VMVM

Page 9: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Let’s Scale !

. Search

Codebase

Reviews Emails

Reservations Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

Etc Etc Etc

Search

DATACENTRE

Reviews Emails

Reservations

Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

VM

VM

VM VMVM

VM VM

VM VM VM

VMVM

VM

EmailsVM

Search

VM

Emails

Restaurant profiles

VM

Restaurant profiles

VM

Restaurant profiles

VM

Menu API

VM

Menu API

VM

Menu API

VM

Page 10: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Infrastructure Team / SRE

Page 11: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Write Puppet Code

Local Vagrant Build

Test and Version ControlCode

Provision VMsProvision More VMs in different Regions/Envs

Wait for Provisioned host puppet

run

Infrastructure Team pushes Puppet Code

Local Build

Provision

Metrics Write Puppet Code

Infrastructure Team pushes Puppet code

Build Grafana Dashboards

Code integration with Statsd/Graphite

Monitoring Runbooks and escalation policies

Write Puppet Code

Infrastructure Team pushes Puppet code

Identify Metrics or emit metrics

Page 12: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...
Page 13: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

DATACENTRE

Mesos Cluster

Search

ReviewsEmails Reservations

Photo Service

Availability Service

Menu API

White Label

Restaurant profiles External API

Person APIFeedback API

Hubspot Singularity

Around 2014 Explore Mesos

Page 14: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Local Docker Testing

Push to Docker RepoCode

Deploy service to other Mesos

Cluster

Deploy Service to

Mesos Cluster

Local Build

Provision

Metrics Write Puppet Code

Infrastructure Team pushes Puppet code

Build Grafana Dashboards

Code integration with Statsd/Graphite

Monitoring Runbooks and escalation policies

Write Puppet Code

Infrastructure Team pushes Puppet code

Identify Metrics or emit metrics

Page 15: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Mesos Task

Singularity API

Mesos API

Carbon Format

PublisherKafka

Carbon Format

ConsumerCarbon-c relay

Graphite Cluster

Grafana

https://github.com/opentable/mesos_statshttps://github.com/weaveworks/grafanalib

Metrics Pipeline

https://github.com/weaveworks/grafanalib

Page 16: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Auto-generated Grafana Dashboard

Help text explaining the graphs and what

they mean

Every Service runningin Mesos will have an

auto-generated dashboard

Shows cluster-wideUsage and Instance Usage

Page 17: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Right-sizing Resource Usage = $$$ Saved

SingularityTask

Mesos Cluster Mesos stats

and Metrics

Shows that memory isover-provisioned for this service

Page 18: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...
Page 19: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Local Docker Testing

Push to Docker RepoCode

Deploy service to other Mesos

Cluster

Deploy Service to

Mesos Cluster

Local Build

Provision

Metrics

Monitoring Runbooks and escalation policies

Write Puppet Code

Infrastructure Team pushes Puppet code

Identify Metrics or emit metrics

Only application

specific metrics

Create application

specific dashboards

Optional

Page 20: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Soushttps://github.com/opentable/sous

Sous Service

Global DeploymentManifest

Mesos Cluster QA

Container Repository

Mesos Cluster Prod

(London)

Mesos Cluster Prod

(US-West2)

CodeSous Build

Sous DeployManifestChange

Page 21: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...
Page 22: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Local Docker Testing Sous DeployCode

Updated Global

Deployment Manifest

Local Build

Provision

Metrics

Monitoring Runbooks and escalation policies

Identify Metrics and Thresholds

Updated Global

Deployment Manifest

Only application

specific metrics

Create application

specific dashboards

Optional

Page 23: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Logging

Restaurant_id == RID == ResID == Res_ID

Global RequestID

Page 24: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

https://github.com/opentable/request-timeline

Page 25: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Timeline Demo

Page 26: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Key TakeawaysMap out developer workflow and constantly look for opportunities to standardise, automate and enhance.Make metrics and monitoring part and parcel of the Mesos service.Engineers don’t always make the best choice when deciding resource usage - help them make an informed choice. Have a common deployment pipeline across the organisation that facilitates production readiness*Having a global data model for logging allows us to make more sense of logging data across the various Mesos tasks.

Page 27: Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017. 1.4 Billion Online Reservations ...

Thank You

[email protected]@jaychin

https://www.linkedin.com/in/jayschin