Service Monitoring CPTE 433 John Beckett. The Control Cycle Process Actual Outcome Desired Outcome OK? Evaluation Process New Process Design Arrive at.

Service Monitoring

CPTE 433John Beckett

The Control Cycle

Process

ActualOutcomeDesired

Outcome OK?

EvaluationProcess

New ProcessDesign

Arrive at Destination

Location

Driving

OtherInformation

Low on Gas

Get Gas

Revise ETA

Getting Gas

The Control Cycle

Process

ActualOutcomeDesired

Outcome OK?

EvaluationProcess

New ProcessDesign

Deliver Apps Remotely

It Works Slowly

Use Network

OtherInformation

Price of add’l Bandwidth

Move to Clients

Revise Expectations

Change workflow accordingly

Use Client Also

Two Types of Monitoring

• Real-Time– “Where is our bandwidth going?”– “Who is doing ____?”

• Historical– Analysis of trends over time

• Perhaps a year• Perhaps a day• Perhaps an hour

– Tracking problems down

Need a reason before you monitor

The reason comes later

Reasons to Monitor

• Business requirement– Lost Revenue if errors are made– “E-commerce will probably need to

implement everything presented in this chapter.”

• Legal requirement– If you are required to prove you didn’t

do something, abundant records of the alternative are all you can get. • Test: “Reasonable and prudent”

Real-Time

• Alerts/Notifications• Drill-down• Root-Cause Analysis

• Note: uses bandwidth!

Monitoring versus Alerting

• The Best System…• Moves activity from monitoring to

alerting• Focuses work on productive

measures• Note emphasis on “automatic root-

cause analysis”• Does it work?• It may be too early to tell

Who Else Might Hear?

• “I’m Hot. I’m Wet.”

Built-in Escalation System

• Can your system “know” whom to notify?

• Do you give lower levels authority to select escalation path?

System, Heal Thyself?

• Network routing systems have done this for decades.– Symptom is often that the automatic routing

looks strange.– Good application for alert system

• The key is reporting and corrective action.• Problems tend to have “children.”

– Automatic re-routing causes strain elsewhere– Unknown dependencies– Power systems chronically snowball problems

Device Discovery

• Major key to successful monitoring• Speeds installation of the monitoring

system• Finds unknown devices• Catches unreported changes• Finds rogue devices!• And by the way…

– Improves service

End-To-End Tests

• The Mailping case• Develop metrics to watch for service

degradation• Excellent preparation for scaling-up

Service Monitoring CPTE 433 John Beckett. The Control Cycle Process Actual Outcome Desired Outcome OK? Evaluation Process New Process Design Arrive at.

Documents