Top Banner
ruxit theme 2014.05.15 Behind the scenes @ ruxit Running a global monitoring infrastructure on AWS Alois Reitbauer, ruxit @aloisreitbauer
45
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Behind the scenes @ ruxitRunning a global monitoring infrastructure on AWS

Alois Reitbauer, ruxit@aloisreitbauer

Page 2: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Ruxit – what we doSaaS-based Monitoring and Management Solution

Page 3: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Page 4: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Page 5: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Page 6: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Page 7: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Page 8: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

A bit of historyHow we moved to a global AWS deployment in 80 days

Page 9: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Page 10: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15How we moved to the Cloud in 80 days

June 2014 – Beta Cloud Deployment

Ju ly 2014 – Open Beta Off ering to Publ ic

August 2014 – Ful l automation

September 2014 – Offi cial Product Launch

October 2014 - >1000 active companies

Page 11: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Page 12: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Our architectureLessons learned building a global cloud platform

Page 13: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Cluster

Page 14: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Cluster

Cassandra DB Cluster

Server ClusterPublic

Security Gateways

Availa

bility

Zone

Availa

bility

Zone

Availa

bility

Zone

Amazon EC2

HA Proxy

Elastic Load Balancer

Page 15: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Cluster

3rdP

3rdP

3rdP

3rdP

3rdPcloudcontrol.ruxit.com

account.ruxit.com

*.live.ruxit.com

*.live.ruxit.com

*.live.ruxit.com

Page 16: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Ruxit is build on AWSHow we solve challenges using AWS technology stack

Page 17: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Challenge: Growth

Being one of the fastest growing B2B SaaS companies

Page 18: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Challenge: Usability

Real Time provisioning of DNS names

Page 19: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Challenge: Reliability

Zero downtime without manual intervention

Page 20: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Challenge: Delivery

Manage deployment artifacts globally

Page 21: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

How we achieve zero downtimeYour application will break; your users should not recognize

Page 22: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15Key Guiding Principles

Over Provisioning

Quarantine Mode

Rolling Updates

Soft Stickyness

Page 23: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

We never run above two thirds of capacityOver provisioning is built into our architecture.

Page 24: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Cassandra DB Cluster

Server ClusterPublic

Security Gateways

Availa

bility

Zone

Availa

bility

Zone

Availa

bility

Zone

HA Proxy

Elastic Load Balancer

Quarantine and Diagnose in Production

Page 25: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

How we handle upgradesWe have to be able to upgrade without any downtimes

Page 26: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Rolling update

Cloud Control AWS S3

Page 27: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Soft StickinessCombining Data Locality with Transparent Failover

Page 28: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Dynamic Traffi c Routing

A

B

C

Page 29: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Constant Failover Mode

A

B

C

Page 30: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Routing with Wishlist

A

B

C

B

Page 31: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Routing with Failover

A

C

B

B

Page 32: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Our road from DevOps to NoOpsWe don’t have a dedicated Operations team and we don’t want one

Page 33: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15Key Guiding Principles

Autonomous Operations

Feedback and Transparency

Everything is production

Data-Driven Operations

Page 34: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Run books become backlogsIf you describe what to do, you can also code it into the platform

Page 35: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15Ruxit needs to be able to mange itself

Page 36: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Feedback and TransparencyEverybody has access to our production monitoring data.

Page 37: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15Full Transparency on Quality

Page 38: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

We treat all environments like productionEverybody has access to our production monitoring data.

Page 39: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Page 40: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Data-Driven OperationsThere is no decision without data.

Page 41: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Java

OS

Apache

IIS.NET

Understand the impact of deployments

Page 42: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

1.57

1.591.61 1.63

1.651.67

1.69

1.58

1.54

1.54

Information on Agent Deployment

Page 43: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Questions?

Page 44: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Member of

Page 45: Ruxit - How we launched a global monitoring platform on AWS in 80 days.

ruxit theme 2014.05.15

Alois [email protected]@ruxit.comblog.ruxit.com