WEB307 Health and Business Activity Monitoring Richard Turner Program Manager XML Enterprise Services Microsoft Corporation
Dec 28, 2015
WEB307
Health and Business Activity MonitoringRichard TurnerProgram ManagerXML Enterprise ServicesMicrosoft Corporation
Why Management?
App Code
App Code
App Code
AdminView
AdminView
AdminView
User: User: Apps are Apps are UNIQUEUNIQUE and and tailored tailored to my to my needs!needs!
UserView
UserView
UserView
Goal: present a Goal: present a commoncommon, , admin centricadmin centric view view of a of a heterogeneousheterogeneous environment environment
App logic contributes to both user view and admin view.App logic contributes to both user view and admin view.
Admin: All Admin: All apps are apps are SIMILARSIMILAR in in the way they the way they express express management management data.data.
Why Monitor Application Health?
Determines whether appropriate levels of availability and performance are being achieved
Downtime is expensive
Up to 80 percent of the total cost of solving a problem is spent identifying the cause of the problem
Critical to identifying the root cause of a problem
Increases availability and reliability of your applications
Many problems only manifest themselves in production environment
You’ll know of a service outage before your users do
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Defining Application HealthHealthy applications
Programmatically efficient and make good use of the application environment
Make the fewest demands on the infrastructure while still delivering the required performance
Make resources available as soon as they have finished using them
Use memory management techniques efficiently
Do not adversely affect other applications
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Defining Application HealthUnhealthy applications
Do not return a response
Return a correct response, but too slowly or with inconsistent performance
Return an incorrect response
Refuse to let go of resources
Make inefficient use of operating system features
Adversely affect the operation of other applications
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Definitions: Instrumentation
What is Instrumentation?INSTRUMENTATION: The application or use of instruments
INSTRUMENT: A device for recording, measuring, or controlling, especially such a device functioning as part of a control system
Definitions from The American Heritage® Dictionary of the English Language, Fourth Edition, Houghton Mifflin Company
Definitions: Control
What is Control?CONTROL: To adjust to a requirement; regulate
CONTROL: To exercise authoritative or dominating influence over; direct
Instrumentation: Two Models
PULL YOU: Respond when queried for state. We’ll call this INSPECTION.
PUSH ME:PUSH ME: Send data for administrative Send data for administrative attention. We’ll call this EVENTS.attention. We’ll call this EVENTS.
Dashboard Examples
CONTROL:CONTROL:Turn on the Turn on the
lights!lights!
INSPECTION:INSPECTION:How fast are How fast are we going?we going? EVENT:EVENT:
Oil Pressure Oil Pressure Warning!Warning!
Inspection: ScenariosMonitoring performance – “We are constantly on the watch for performance degradation this usually indicates to us that we have a problem elsewhere.” System health checks – “We perform daily checks on disk space, event logs, drive errors”Application health checks – “We have to verify that servers are up and responding to PINGs and application services are running.” Health drill-down – “Event issue detection and response (monitoring, evaluating and isolating app & system health).”Diagnostics – “Selectively scan app values to diagnose performance issues”
System events – “I spend a lot of time reading event log entries and investigating what part of the network is broke (DNS, etc.).”Troubleshooting – “whether problems are application OS or hardware related.” Change Notification – “Audit security changes in systems as they occur.” Event consolidation – “Read the event logs. With all the servers we have and the inability to consolidate event logs into one location it takes hours to do this every day to look for potential problems.”Hardware state notifications - on imminent hardware failures.
Events: Scenarios
Service Control: Scenarios
Scripted control – “I write batch scripts that restart services, reboot the machine, etc.” (availability/performance)
Command line control – “I write scripts and commands directly from the console (if service stops then restart)”
Automated control – “I write batch commands or use tools to start nightly backup/restore jobs.”
Optimistic Monitoring
Looks for HEALTH indicatorsBehavior normal
Resource consumption normal
Side effects normal
Diagnostic methods“Green ball” events
No-effect service methods
Pessimistic Monitoring
Look for SICKNESS indicatorsBehavior abnormal
Resource consumption excessive
Related component degredation
Diagnostic methods“Red ball” events
Statistical and trend analysis.
External Management
“Agent” based infrastructureManagement by observed behavior
Typical of MOM, OpenView
Pros: Minimal impact on running code, no extra work for developers
Cons: “Shallow” information, poor root cause analysis
Internal Management
“App” based infrastructureFrom the horse’s mouth
WMI, Perfmon, NT Events enable this approach
Pros: Potentially great information “from the source.” Rich root cause analysis
Cons: Depends on enlisting your (internal or contracted) development orgs in the effort
Defining Monitoring Levels
Coarse-grained monitoringEnsures the functioning of your application architectureFundamental information, low overhead
Fine-grained monitoringDetailed data on applications, components, and transactions within applicationsAllows operations to dial in the needed informationAddresses such issues as:
Whether one application is demanding more resources than the othersWhether the memory usage of one application is growing faster than the rest
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
PAG: Five Monitoring Areas in IT
Synthetic Transactions
RoutersLeased Lines
Data
.NET Application (FMStocks7 for example)
SwitchesOther network devices
Servers Hardware (power supplies, hard disks, etc)
Windows 2000Windows 2000 DNS and Active Directory
IIS and Enterprise Services (COM+).NET Framework
Application Center 2000SQL Server 2000
Other Servers
ServiceServiceMonitoringMonitoring
NetworkNetworkMonitoringMonitoring
ServerServerMonitoringMonitoring
PlatformPlatformMonitoringMonitoring
ApplicationApplicationMonitoringMonitoring
Source: http://msdn.microsoft.com/practicesSource: http://msdn.microsoft.com/practices
Power of Correlated AbstractionsEven as an administrator, you don’t always want all the data.
You want to manage services, not systems!
Disk FailureDisk Failure NIC FailureNIC Failure RAM FailureRAM Failure
Process FailureProcess FailureRouter FailureRouter FailureStore FailureStore Failure
Auth FailureAuth FailureTransaction FailureTransaction FailureDialog FailureDialog Failure
Invalid AccountInvalid AccountOrder Timed OutOrder Timed OutRequest Timed OutRequest Timed Out
Failed ConfirmationFailed ConfirmationLate DeliveryLate DeliveryOrder AbortedOrder Aborted
Inventory ShortfallInventory ShortfallPoor Customer Sat.Poor Customer Sat.
Sys
tem
s
B
usi
nes
sS
yste
ms
Bu
sin
ess Inventory ShortfallInventory Shortfall
Inspection Guidelines
Be generous, think in terms of “debugging by ops”
Consider levels of inspection per admin role
Expose resource consumption, dependencies, current activity
Wherever possible, correlate to business activity
Be consistent, be query friendly
Events Guidelines
Use configuration or subscription to determine whether to fire the events
Similarly, make depth of information adaptable
Remember events can initiate control actions
Watch the signal to noise ratio!
Events are a great admin automation mechanism
Control Guidelines
Imperative configAlternatively, expose your config as a control construct
State management (net start, stop…)
Automation, scripting, UI
Implementing Notifications
Define alerts:Alert severity levels
Service failure categories
Readiness states
Define whom to notify:Create a notification hierarchy
Use notification methods
Combine notification methods
Understand notification reliability
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Creating a Notification Hierarchy
Who needs to be notified for each level of alert severity?
Notification groups
Alternative notification procedures
Ensure that alerts of a particular severity go to the right people
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Defining and Generating Reports
Reporting can prompt proactive fixes and provide evidence of meeting SLA
Reporting information must be:Accurate
Timely
Relevant
In a suitable format
Automatically generated
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Creating the Reporting Environment
Understand why and for whom you are producing your reports
Reporting information falls into at least three categories:
Reporting for decision makers
Reporting on service levels
Reporting for technical analysis
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Reporting on Synthetic TransactionsTo create reports on service levels, use synthetic
transactions executed against the Web service
Employ monitoring applications, such as Web Monitor or Cluster Sentinel
Create reports that cover areas such as average response time and total uptime for the Web service
Tracking and reporting these statistics over time records overall availability of the Web service
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Using Synthetic Transactions to Report on Web Service Availability
Web MonitorWeb MonitorOrOr
Cluster SentinelCluster Sentinel
Corporate CRM SystemCorporate CRM System
Synthetic TransactionSynthetic TransactionFor RetrievingFor RetrievingStock PricesStock Prices
Synthetic TransactionSynthetic TransactionFor Account ValidationFor Account Validation
Stock PriceStock PriceWeb ServiceWeb Service
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Stock PriceStock PriceWeb ServiceWeb Service
AccountAccountBalanceBalance
CustomerCustomerLog onLog on
RetrieveRetrievePortfolio ListPortfolio List
Stock Prices WebStock Prices WebService RequestService Request
Account BalanceAccount BalanceReportReport
FMStocksFMStocksDatabaseDatabase
AccountAccountValidationValidation
Corporate CRM SystemCorporate CRM System
SyntheticSyntheticTransactionTransactionfor Accountfor Account
BalanceBalanceScenarioScenario
Using Synthetic Transactions to Report on Web Service Availability
Source: Source: http://msdn.microsoft.com/practiceshttp://msdn.microsoft.com/practices
Web Services
If you are deploying Web Services, you are deploying a technology which enables rich interoperability, distribution, security…And the ecosystem just keeps getting better
GXA StandardsReliable MessagingTransactionsSecurityPolicy
ToolsVisual Studio .NETBTS, BAM
PlatformsWindows Server 2003SQL Server
Management FOR and BY Web Services…
Special needs of Web ServicesHighly distributedDynamic deployment environmentsHeterogeneous platformsSecurityReliabilityCorrelation
Use Web Services principles to manage your web services!
Expose management information using web service constructsUse web services capabilities to enable a distributed management environment
Organizing Principles
Web Services are an organizing abstraction for manageability
All manageable constructs FOR Web Services should present AS Web Services
Distributable by default
Benefits from GXA Security, TX, RM…
All communication is via messaging
Manageable Web Services
In 3 easy steps:1. Expose management information
via a Web Method
2. Allow “subscription” to a set of messages which expose critical state changes
3. Expose control methods as Web Methods
Step 1: Inspection
Create a class to exposes mgmt data
Expose this class as a web method which returns the class on request
Fancier: Accept an xpath query to filter sub-contents
Fancier: Expose a method to dynamically add to this monitoring space
class AppHealth{ int PendingRequests; . . .}
[WebMethod]AppHealth GetHealth(){ return this.appHealth;}
Step 2: Events
Use CLR events to signal state changes
Notify: Define a delegate which creates a message from event parameters
Subscribe: Define a web method that accepts URIs to which the messages will be sent
delegate void ReportEvent(string s);event ReportEvent e;
[WebMethod]AppHealth Subscribe(Uri uri){ Recipients.Add(uri);}
Step 3: Control
Decide what your admin tasks are
Expose them as Web Methods
Build console around them using proxies or…
Tie into MOM or AppCenter
[WebMethod]void ResetHealth(){ appHealth.Clear();}
Monitoring Step by Step
1. Determine what matters Correctness, performance, availability
2. Determine what information you need Request results, latency metrics, % failed responses
3. Predict causal relationships The what-if drill
4. Take what you can from the environment Perf counters, NT events…
5. Put it all in a common tongue SOAP/XML is usually the right answer here
6. Use advanced tools to automate Orchestration, BAM, SQL, …
Now What?
Now your management messages are SOAP messages, the world of Web Services opens up to you
Orchestrate your management scenarios with BTS
Or monitor them with BAM!
Cross domain management is a reality
Don’t preclude the use of current tools: AppCenter can invoke your management methods with Healthmon
Next StepsNext StepsExisting systems – consider WS shims to expose data to messaging infrastructure
New systems – Build in manageability
Enable root cause analysis
Ask The ExpertsGet Your Questions Answered
Talk one-on-one with a community of your peers
Community Experts: Microsoft product teams, consultants and Tech*Ed speakers
Resources: whiteboards, internet, etc.
Location: in the middle of the Exhibit Hall
Hours: at least 12-3:30p every day
I will be available in the ATE area after this sessionI will be available in the ATE area after this session
Community Resources
Community Resourceshttp://www.microsoft.com/communities/default.mspx
Most Valuable Professional (MVP)http://www.mvp.support.microsoft.com/
NewsgroupsConverse online with Microsoft Newsgroups, including Worldwidehttp://www.microsoft.com/communities/newsgroups/default.mspx
User GroupsMeet and learn with your peershttp://www.microsoft.com/communities/usergroups/default.mspx
evaluationsevaluations
© 2003 Microsoft Corporation. All rights reserved.© 2003 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.