Top Banner
StatsCraft StatsCraft Monitoring Conference Monitoring Conference website and agenda: twitter: (#statscraft) facebook: email: http://statscraft.org.il @statscraft https://www.facebook.com/statscraft.il [email protected]
23

StatsCraft 2015: The problem (Keynote) - Nir Cohen

Apr 14, 2017

Download

Technology

StatsCraft
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: StatsCraft 2015: The problem (Keynote) - Nir Cohen

StatsCraftStatsCraftMonitoring ConferenceMonitoring Conference

website and agenda: twitter: (#statscraft)facebook: email:

http://statscraft.org.il@statscraft

https://www.facebook.com/[email protected]

Page 2: StatsCraft 2015: The problem (Keynote) - Nir Cohen

AgendaAgenda1. Understand the problem.2. Understand what monitoring is.3. Example use-case(s)4. A different approach5. Learn methodologies and tools

Page 3: StatsCraft 2015: The problem (Keynote) - Nir Cohen

The ProblemThe ProblemNir Cohen @ Gigaspaces

@thinkopshttp://github.com/nir0s

Page 4: StatsCraft 2015: The problem (Keynote) - Nir Cohen

WeWemonitor because...monitor because...

We want to satify theWe want to satify thecustomer.customer.

(make money?)

Page 5: StatsCraft 2015: The problem (Keynote) - Nir Cohen

Automated Resource ProvisioningConfiguration ManagementAutomated Code DeploymentContinuous WhateverMonitoring

Still underrated...Still underrated...Automated Resource ProvisioningConfiguration ManagementAutomated Code DeploymentContinuous WhateverMonitoring

PROBLEM!PROBLEM!

Page 6: StatsCraft 2015: The problem (Keynote) - Nir Cohen

Blame the tools?Blame the tools?

Page 7: StatsCraft 2015: The problem (Keynote) - Nir Cohen

Problem originProblem origin

DISCLAIMERDISCLAIMER

Page 8: StatsCraft 2015: The problem (Keynote) - Nir Cohen

We're monitoringWe're monitoringthe wrong things.the wrong things.

_rootCauseAnalysis:

the alternative is harder.

Page 9: StatsCraft 2015: The problem (Keynote) - Nir Cohen

We're consideringWe're consideringlogs a second classlogs a second class

citizen.citizen.

_rootCauseAnalysis:

the alternative is harder.

Page 10: StatsCraft 2015: The problem (Keynote) - Nir Cohen

Our data is lacking.Our data is lacking.

_rootCauseAnalysis:

inertia. that's how it was, that's how it is.

Page 11: StatsCraft 2015: The problem (Keynote) - Nir Cohen

We separateWe separatemonitoring frommonitoring from

applicationapplication

_rootCauseAnalysis:

we're not used to this. (Ops problem)

Page 12: StatsCraft 2015: The problem (Keynote) - Nir Cohen

We monitorWe monitorreactively, notreactively, not

proactivelyproactively

_rootCauseAnalysis:

reaction requires less initial energy than anticipation.

Page 13: StatsCraft 2015: The problem (Keynote) - Nir Cohen

We put uptimeWe put uptimeabove system andabove system and

product qualityproduct quality

_rootCauseAnalysis:

it's much easier.

Page 14: StatsCraft 2015: The problem (Keynote) - Nir Cohen

We deal with hardWe deal with hardlimits.limits.

_rootCauseAnalysis:

arbitrary numbers are easier to set.

Page 15: StatsCraft 2015: The problem (Keynote) - Nir Cohen

Monitoring is non-Monitoring is non-functional butfunctional but

resource hungryresource hungry

_rootCauseAnalysis:

we just don't accept it.

Page 16: StatsCraft 2015: The problem (Keynote) - Nir Cohen

Good monitoringGood monitoringrequires the rightrequires the right

people, not just Ops!people, not just Ops!

_rootCauseAnalysis:

delegation is natural. other have more important things to do.

Page 17: StatsCraft 2015: The problem (Keynote) - Nir Cohen

Alert fatigue isAlert fatigue iscommon.common.

_rootCauseAnalysis:

solving issues is much easier than solving problems, and apparently, we are additted to non-actionable alerts.

Page 18: StatsCraft 2015: The problem (Keynote) - Nir Cohen

We're auto-scalingWe're auto-scalingprematurelyprematurely

_rootCauseAnalysis:

brute force is natural

Page 19: StatsCraft 2015: The problem (Keynote) - Nir Cohen

We're choosing theWe're choosing thewrong tools.wrong tools.

_rootCauseAnalysis:

it's easier to choose the tool than to choose what to monitor.

Page 20: StatsCraft 2015: The problem (Keynote) - Nir Cohen

Good monitoringGood monitoringis hardis hard

_rootCauseAnalysis:

systems become complex, so they're harder to monitor.

Page 21: StatsCraft 2015: The problem (Keynote) - Nir Cohen

So, after all, why do weSo, after all, why do wenot monitor properly?not monitor properly?

1. SimplificationSimplification2. DelegationDelegation3. RationalizationRationalization

_rootCauseAnalysis:

Page 22: StatsCraft 2015: The problem (Keynote) - Nir Cohen

No fear,No fear,

Let's see how we can makeLet's see how we can make

this all betterthis all better

is here!is here!

Page 23: StatsCraft 2015: The problem (Keynote) - Nir Cohen

“ If a service crashes and no one isaround to monitor it, does it raise an

alert?