FUTURESTACK13: Wake Up! A New Day Dawns for Alerts in New Relic from Bill Kayser, Distinguished Engineer at New Relic

Post on 27-Jan-2015

104 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

The alerts in New Relic have always been an essential feature for our customers, starting in the early days when we watched only web applications through the evolution of availability monitoring and server alerting. Coming next, though, is the biggest improvement to alerting yet. In this talk we'll cover the latest features with special insights and pro-tips, as well as an inside look at the new policy based features. Get ready to take your alerts to a whole new level.

Transcript

Wake Up!A New Day Dawns for Alerts in New Relic

Bill Kayser

Wednesday, November 6, 13

You are always monitoring all the things.

Wednesday, November 6, 13

WelcomeBill Kayser

@bravokingbkayser@newrelic.com

Covered in this talk:Overview of alerting in New RelicAlert Policies, other new featuresTips for getting the most out of alerts

Wednesday, November 6, 13

What do we expect from alerting systems?

Wednesday, November 6, 13

Moehre1992 CC BY-SA 3.0

Wednesday, November 6, 13

Randy Chiu CC BY 2.0

Wednesday, November 6, 13

What to Watch

Wednesday, November 6, 13

What to Watch

Wednesday, November 6, 13

What to Watch

Wednesday, November 6, 13

What to Watch

Wednesday, November 6, 13

What to Watch

Wednesday, November 6, 13

Define Conditions

Wednesday, November 6, 13

Define Conditions

Applications

Wednesday, November 6, 13

Define Conditions

Applications

KeyTransactions

Wednesday, November 6, 13

Define Conditions

Applications

KeyTransactions

Servers

Wednesday, November 6, 13

What Action to Take

Wednesday, November 6, 13

What Action to Take

Wednesday, November 6, 13

What Action to Take

Wednesday, November 6, 13

What Action to Take

37 Signals’ Campfire

Wednesday, November 6, 13

What Action to Take

37 Signals’ Campfire

Wednesday, November 6, 13

What Action to Take

37 Signals’ Campfire

Mobile Alerts

Wednesday, November 6, 13

What Action to Take

37 Signals’ Campfire

Mobile Alerts

Wednesday, November 6, 13

What Action to Take

37 Signals’ CampfireNotification Channels

Mobile Alerts

Wednesday, November 6, 13

Wednesday, November 6, 13

Thresholds and Notifications - Today

Wednesday, November 6, 13

Thresholds and Notifications - Today

Wednesday, November 6, 13

Thresholds and Notifications - Today

Wednesday, November 6, 13

Thresholds and Notifications - Today

Wednesday, November 6, 13

Thresholds and Notifications - Today

Wednesday, November 6, 13

Thresholds and Notifications - Today

Wednesday, November 6, 13

Thresholds and Notifications - Today

Wednesday, November 6, 13

New in New Relic:Alert Policies

Wednesday, November 6, 13

New Feature: Alert PoliciesAn Alert Policy specifies:

What to watchWhat action to take

Wednesday, November 6, 13

Alert Policies

Wednesday, November 6, 13

Alert PoliciesPolicies apply to:

ApplicationsKey TransactionsServers

Wednesday, November 6, 13

Alert Policies

Wednesday, November 6, 13

Alert Policies

Threshold settings and other configuration

Wednesday, November 6, 13

Alert Policies

Threshold settings and other configuration

Notification channels

Wednesday, November 6, 13

Alert Policies

Server policies are mapped to servers

Threshold settings and other configuration

Notification channels

ops-puppet-01

Wednesday, November 6, 13

Application Alert Policy

Wednesday, November 6, 13

Server Alert Policy

Wednesday, November 6, 13

Key Transaction Alert Policy

Wednesday, November 6, 13

Editing Alert Policies

Wednesday, November 6, 13

Editing Alert Policies

Wednesday, November 6, 13

Editing Alert Policies

Wednesday, November 6, 13

Editing Alert Policies

Wednesday, November 6, 13

Alert Policies

Named Policies

Wednesday, November 6, 13

Alert Policies

Default Policy

Wednesday, November 6, 13

Default Policies

Wednesday, November 6, 13

Default Policies

Applications with defaults

Wednesday, November 6, 13

Default Policies Default settings

Wednesday, November 6, 13

Mapping Policies

Wednesday, November 6, 13

Mapping Policies

Wednesday, November 6, 13

Defining Channels

Wednesday, November 6, 13

Defining Channels

Wednesday, November 6, 13

Defining Channels

Wednesday, November 6, 13

Defining Channels

Wednesday, November 6, 13

Channel SettingsDefine distinct endpoints for each channel type

Wednesday, November 6, 13

Channel SettingsDefine distinct endpoints for each channel type

Designate channels to be used strictly for downtime alerts

Wednesday, November 6, 13

Notification GroupsDefine a set of notification channels for use across multiple policies

Include e-mail addresses, PagerDuty services, HipChat rooms, etc

Wednesday, November 6, 13

Policies: Putting it All Together

Wednesday, November 6, 13

Policies: Putting it All Together

Wednesday, November 6, 13

Policies: Putting it All Together

Wednesday, November 6, 13

Policies: Putting it All Together

Wednesday, November 6, 13

Policies: Putting it All Together

Wednesday, November 6, 13

Examining Alerts

Wednesday, November 6, 13

Wednesday, November 6, 13

Wednesday, November 6, 13

What’s the Problem?Problems are events

Critical: tell me right away

Wednesday, November 6, 13

What’s the Problem?Problems are events

Critical: tell me right awayCaution: note it in supporting detail

Wednesday, November 6, 13

Wednesday, November 6, 13

Alert

Wednesday, November 6, 13

Wednesday, November 6, 13

Wednesday, November 6, 13

Alerts in the Event List

Wednesday, November 6, 13

Wednesday, November 6, 13

Wednesday, November 6, 13

Wednesday, November 6, 13

Wednesday, November 6, 13

Wednesday, November 6, 13

Wednesday, November 6, 13

Alert History

Wednesday, November 6, 13

Tips for Leveraging New Alerts

Wednesday, November 6, 13

‘Alarm fatigue’worries hospitals

San Francisco Chronicle, October 23, 2013

“...a patient’s life could be hanging in the balance.”

Wednesday, November 6, 13

Avoid Threshold SensitivityIncrease the lag time for triggering problems

Wednesday, November 6, 13

Avoid Threshold SensitivityIncrease the lag time for triggering problems

Wednesday, November 6, 13

Avoid Threshold Sensitivity

Increase the lag time for triggering downtime

Wednesday, November 6, 13

Improve Threshold SensitivityAdd substring requirement to downtime settings

Wednesday, November 6, 13

Improve Threshold Sensitivity

T = 800 ms

Apdex Score: 0.84 - 0.91

Mean Response Time

Wednesday, November 6, 13

Narrow the ScopeDefine Key Transactions

Watch error rate and apdexCustomize apdex T value

Wednesday, November 6, 13

Key Transaction Policies

Wednesday, November 6, 13

Key Transaction Policies

Wednesday, November 6, 13

Ignore Transaction NoiseJava: NewRelic.ignoreTransaction(), @NewRelicIgnoreTransaction

Ruby: newrelic_ignore [ :only => action..] [ :except => action...]

DotNet: NewRelic.Api.Agent.NewRelic.IgnoreTransaction();

PHP: newrelic_ignore_transaction (  )

Python: newrelic.agent.ignore_transaction(flag=True)

Wednesday, November 6, 13

Stop waking me up

Wednesday, November 6, 13

Stop waking me up(Wake up Jonathan instead)

Wednesday, November 6, 13

Stop waking me up(Wake up Jonathan instead)

Wednesday, November 6, 13

Alerts API

Wednesday, November 6, 13

Alerts API

Wednesday, November 6, 13

Alerts API

Wednesday, November 6, 13

Alerts API

Wednesday, November 6, 13

Alerts API

Wednesday, November 6, 13

Alerts V2: Summary of ImprovementsAlert PoliciesNotification ChannelsNotification GroupsServer DowntimeLag time settingNew API

Wednesday, November 6, 13

What do we expect from alerting? Get the right people involved at the appropriate time

Use well defined notification channelsGroup channels together for easy re-useTake advantage of PagerDuty

High signal, low noiseAlert on key transactionsEdit lag timesDistinguishing critical and caution problemsAdjusting apdex T

Wednesday, November 6, 13

What do we expect from alerting Flexibility

Define policies according to specific operational requirementsUse defaults, define overridesAPI for integration, management and auditing

SimplicityWatch only the things you care about most

Wednesday, November 6, 13

Migrating to use Alert PoliciesAccounts will be upgraded in November

One policy will be created for each application and key transactionOne policy will be created for each unique set of threshold settings across all servers

Existing alert behavior is preserved

Wednesday, November 6, 13

Wednesday, November 6, 13

Bill Kayserbkayser@newrelic.com@bravoking

Thanks!

Wednesday, November 6, 13

top related