Top Banner
Relaxing picture of Yoga
62

How to do monitoring that won't make your engineers quit

Jan 26, 2017

Download

Software

Gil Zellner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How to do monitoring that won't make your engineers quit

Relaxing picture of Yoga

Page 3: How to do monitoring that won't make your engineers quit
Page 4: How to do monitoring that won't make your engineers quit

hunt through logs for 2 hours

Page 5: How to do monitoring that won't make your engineers quit
Page 6: How to do monitoring that won't make your engineers quit
Page 7: How to do monitoring that won't make your engineers quit
Page 8: How to do monitoring that won't make your engineers quit

Monitoring that will make your engineers give up

Gil Zellner (CloudifyDev at Gigaspaces)

Twitter: @Heathenaspargus

Page 12: How to do monitoring that won't make your engineers quit

cost of hiring new employee is 1.5-3x their monthly salary

@Heathenaspargus

Page 14: How to do monitoring that won't make your engineers quit

Easy (days) Intermediate (months)

Hard (years)

- no changes to infrastructure

- just policy

- Small changes to apps

- logging- light

automation

- Design for better operability

- long term

@Heathenaspargus

Page 19: How to do monitoring that won't make your engineers quit

frustration - I am unable to complete my task

@Heathenaspargus

Page 20: How to do monitoring that won't make your engineers quit

Time spent inefficiently

@Heathenaspargus

Page 24: How to do monitoring that won't make your engineers quit

https://www.ergoflex.co.uk/blog/category/sleep-research/sleeponomics-could-sleep-deprivation-be-the-real-reason-politicians-make-bad-decisions

@Heathenaspargus

Page 25: How to do monitoring that won't make your engineers quit

Mandatory Half day-off after night production issue

@Heathenaspargus

Page 26: How to do monitoring that won't make your engineers quit

Allocate weekly time to resolve or automate issues that kept us up at night

@Heathenaspargus

Page 27: How to do monitoring that won't make your engineers quit

Wider rotation (more people do on-call)

@Heathenaspargus

Page 28: How to do monitoring that won't make your engineers quit

https://www.youtube.com/watch?v=IUoEiDT1nXY

Creating a DevOps Culture: Identifying a “Single Person of Failure”

@Heathenaspargus

Page 29: How to do monitoring that won't make your engineers quit

Knowledge Matrix

Deploy System Mobile Link Backend

Gil V V

Karen V V

Ari V V

@Heathenaspargus

Page 32: How to do monitoring that won't make your engineers quit

Easy (days) Intermediate (months)

Hard (years)

- no changes to infrastructure

- just policy

- Small changes to apps

- logging- light

automation

- Design for better operability

- long term

@Heathenaspargus

Page 33: How to do monitoring that won't make your engineers quit
Page 34: How to do monitoring that won't make your engineers quit
Page 36: How to do monitoring that won't make your engineers quit

solution: alert only things that meet the following criteria:

1) Alert on symptoms, not suspected "causes"2) Actionable3) Business breaking

@Heathenaspargus

Page 38: How to do monitoring that won't make your engineers quit

Solution: direct alerts to relevant parties

@Heathenaspargus

Page 39: How to do monitoring that won't make your engineers quit

Companies that are doing this as a service:

@Heathenaspargus

Page 44: How to do monitoring that won't make your engineers quit

Companies that are doing this as a service:

@Heathenaspargus

Page 45: How to do monitoring that won't make your engineers quit

Picking the right things to measure

Page 47: How to do monitoring that won't make your engineers quit

Netflix stream starts per second

@Heathenaspargus

Page 48: How to do monitoring that won't make your engineers quit

What are your KPIs ?stream starts per second

Taxi orders per minute

Api calls per second

@Heathenaspargus

Page 49: How to do monitoring that won't make your engineers quit

Companies that are doing this as a service:

@Heathenaspargus

Page 53: How to do monitoring that won't make your engineers quit

Auto-remediation basics1) Make remediation script2) Make diagnosis script3) Connect them

@Heathenaspargus

Page 56: How to do monitoring that won't make your engineers quit

Heal Workflows - Cloudify

@Heathenaspargus

Page 57: How to do monitoring that won't make your engineers quit

Easy (days) Intermediate (months)

Hard (years)

- no changes to infrastructure

- just policy

- Small changes to apps

- logging- light

automation

- Design for better operability

- long term

@Heathenaspargus

Page 58: How to do monitoring that won't make your engineers quit

Incentive for resilient architecture

0.99 uptime: 87.6 hours per year

0.999 uptime: 8.76 hours per year

0.9999 uptime: 52.6 minutes per year

0.99999 uptime: 5.3 minutes per year

@Heathenaspargus