Autonomic Computing © 2005 IBM Corporation The Role of Predictive Methods in Autonomic Computing April 27, 2005 Ric Telford Director of Architecture and Development, Autonomic Computing
Autonomic Computing
© 2005 IBM Corporation
The Role of Predictive Methodsin Autonomic ComputingApril 27, 2005
Ric TelfordDirector of Architecture and Development, Autonomic Computing
2
Autonomic Computing
© 2005 IBM Corporation
Agenda
Autonomic Computing overviewAC Problem Determination TechnologiesCustomer ResultsThe Self-Healing VisionSummary
3
Autonomic Computing
© 2005 IBM Corporation
Today’s Complex Infrastructure
Management of complex,
heterogeneous environments
is too difficult
IT asset utilisation is too low
Operational speed too slow;IT flexibility too limited
Privacy, security and
business continuity
Inability to manage the
infrastructure seamlessly
Swamped by the proliferation
of technology and platforms
to support
WWW
4
Autonomic Computing
© 2005 IBM Corporation
“IBM’s autonomic computing initiative will become its most important cross-product initiative (as the foundation of On Demand Business).”
— Thomas Bittman, Gartner
Increased return on IT investment
Improved flexibility, resiliency and quality of service
Accelerated time to value
Providing customer value
Focus on business value, not infrastructure
Adapt to unpredictable conditions Continuously tune themselves Prevent and recover from failures Provide a safe environment
Autonomic Computing delivers intelligent open systems that:
Sense and respond to ever-changing environments
5
Autonomic Computing
© 2005 IBM Corporation
IBM Autonomic Computing Structure
Open Standards
Autonomic Computing Architecture
Products delivering autonomic features
Autonomic Computing Common Components
Problem Determination
Provisioning Admin ConsoleWorkload Mgt
Autonomic Computing Control Loop
Autonomic Computing Architecture Blueprint
Log/Trace Analyzer Generic Log Adapter Solution installation &
dependency checking Common Console Autonomic Management
Engine
50 products with 415+ features Partner solutions
Common log format Solution installation schema
Installation Management Engine
Autonomic Computing
© 2005 IBM Corporation
Autonomic Computing:Problem DeterminationTechnologies
7
Autonomic Computing
© 2005 IBM Corporation
The Pain Point….
Backup Servers
FireWall
HTTPServers
FireWall
FireWall
DataServers
ApplicationServers
FireWall
Network Routers/Switches
Policy Servers
Managing Servers
LoadBalancers
LDAP Registries
You
LoadBalancers
Edge ServersSecurity Servers
LoadBalancers
8
Autonomic Computing
© 2005 IBM Corporation
Today’s Approach… Internal Swat Team – The Manual Process
Requires:
Key resources across the IT staff to get the breadth of skills to understand the end-to-end problem
Deep understanding of log file formats
Deep understanding of system components
Result:Multiple man-hours/days/weeks of effort
Political issues – passing the blame
Insufficient / inadequate data can cause this approach to fail
Customers are repeating this step today for every major IT outage
9
Autonomic Computing
© 2005 IBM Corporation
Disparate pieces and parts
Tools focused on individual products
No common interfaces among tools
No synergies in building tools OR in creating log entries
Log format todayProblem determination: Log format tomorrow
Generic log adapter
Common format for log files
Common set of tools
Common interfaces among tools
common base event
Ad
ap
ters
Ad
ap
ters
Common Base Eventan OASIS standard
Common Base Eventan OASIS standard
Database
Networks
ApplicationServer
Servers
Storage devices
Applications
10
Autonomic Computing
© 2005 IBM Corporation
Common Base Event Format
MsgDataElement
msgId : Str ing
msgIdType : Str ing
msgCatalog Id : Stri ng
msgCatalogTokens : Str ing[]
msgCatalog : Str ing
msgLocal e : Str ing
msgCatalogType : Stri ng
ComponentIdentification
location : String
locationType : Str ing
application : Str ing
execut ionEnvironment : Str i ng
component : St ri ng
subCompo nent : Str ing
componentIdType : Str ing
ins tanceId : Str ing
process Id : Str ing
th read Id : String
Context Dat aElement
contextId : Stringty pe : Stringname : StringcontextValue : String
ExtendedDataElement
name : Str ing
type : Str ing
values : Str ing[]
hexValue : byte[]
0..n
1
+childr en
0..n
1
CommonBaseEvent
extensionName : Str ing
localInstanceId : Str ing
globalInstanceId : Str ing
creationTime : Str ing
sever ity : shor t
pr ior ity : shor t
situationType : Str ing
msg : Str ing
repeatCount : shor t
elapsedTime : long
sequenceNumber : long
version : Str ing = commonbaseevent1_0
10..1 1
+msgDataElement
0..1
11
+sourceComponentId
11
0..11
+repor terComponentId
0..11
1
0..n
1
+contextDataElements
0..n
0..n
1
+extendedDataElements 0..n
1
AssociationEngine
id : Str ing
name : Str ing
type : Str ing
AssociatedEvent
0..n
1
+resolvedEvents
0..n
1
0..n
1
+associatedEvents 0..n
1
1
0..1
+asso ciati onEngine
1
0..1
11
Autonomic Computing
© 2005 IBM Corporation
Supported Log Formats (Feb 2005) AIX errpt log AIX syslog Apache HTTP Server access log Apache HTTP Server error log CICS Transaction Server for z/OS System message
log Common Base Event XML log ESS (Shark) Problem log IBM Communications Server log IBM DB2 Express diagnostic log IBM DB2 Universal Database Cli Trace log IBM DB2 Universal Database JDBC trace log IBM DB2 Universal Database SVC Dump on z/OS IBM DB2 Universal Database Trace log IBM DB2 Universal Database diagnostic log IBM HTTP Server access log IBM HTTP Server error log IBM WebSphere Application Server activity log IBM WebSphere Application Server for z/OS error log IBM WebSphere Application Server plugin log IBM WebSphere Application Server trace log IBM WebSphere Commerce Server ecmsg log IBM WebSphere Commerce Server ecmsg, stdout,
stderr log IBM WebSphere InterChange Server log IBM WebSphere MQ FDC log IBM WebSphere MQ error log IBM WebSphere MQ for z/OS Joblog IBM WebSphere Portal Server appserver_err log IBM WebSphere Portal Server appserverout log IBM WebSphere Portal Server run-time information log
IBM WebSphere Portal Server systemerr log IBM WebSphere Portal Server systemout log IBM Websphere Edge Server log Javacore log Logging Utilities XML log Microsoft Windows Application log Microsoft Windows Security log Microsoft Windows System log Oracle JDBC trace log Oracle alert log Oracle listener log Oracle server log Rational TestManager log RedHat syslog SAN File System log SAN Volume Controller error log SAP system log Squadrons-S Problem log SunOS syslog SunOS vold log TXSeries CICS Console/CSMT log z/OS Component trace z/OS GTF trace z/OS Joblog z/OS Logrec z/OS System log(SYSLOG) z/OS System trace z/OS master trace
12
Autonomic Computing
© 2005 IBM Corporation
Log Correlation – Generating the End-to-End View
Transition from trying to understand log formats to identifying ways to analyze the overall data and the end-to-end view
Move the Mindset from Monitoring to Analysis
With Correlation IDs in place, or Correlation methods identified:Implement a Correlation Engine in the Log Analyzer
Generate a sequence diagram showing the log interactions and sequence of events
Help the IT staff hone in on where the problem occurred:Identify quickly where to concentrate efforts
13
Autonomic Computing
© 2005 IBM Corporation
End Results…
Multiple IT-Skilled
Resources
Multiple Man-Hours / Days /
Weeks of analysis
Unstructured Swat Team
Approach with success unknown
Repeatable Process with a reusable set of
tools
Root Cause identification in hours / minutes
Single PD-Skilled
Resource
From To
14
Autonomic Computing
© 2005 IBM Corporation
Self-Healing - Customer ResultsFrom several hours/days to less than one hour
85% Improvement
70% Improvement
50% Improvement
10 to 30% Savings in IT Support Costs
50% Improvement – IBM’s SAP Deployment
60% Improvement
60% Improvement
20 to 30% Improvement
10 to 20% improvement in operational staff productivity – IBM Software Delivery and Fulfillment
From 3 people 2 hours to 1 person 15 min
40% Improvement
75% Improvement
New in 2005
15
Autonomic Computing
© 2005 IBM Corporation
Self-Healing Roadmap
Event Event RepresentationRepresentation
AdaptersAdapters
IBM DeployersIBM Deployers
Knowledge Knowledge RepresentationRepresentation
Event Correlation Event Correlation and Analysisand Analysis
Partner DeployersPartner Deployers
Action Action RepresentationRepresentation
Knowledge Knowledge AccumulationAccumulation
Customer PullCustomer Pull Capture
RemediationBusiness PolicyBusiness Policy
Continuous Continuous AvailabilityAvailability
Knowledge SharingKnowledge Sharing
Self Healing
Analysis
Standard data model for common situation and event reporting
Tooling for easy adoption of standard
Commitments from IBM brands and IBM Partners to support the data model
Standardize data model for symptom analysis
Transport & correlate events from all components in IT infrastructure
Predictive Analysis Constructs
ARM Correlation
Standardize data model for change requests, change plans
Standardize grammar to describe change requests and constraints
Allow analysis and planning when uncertainty is present
Allow human to determine recovery action
High-profile customer deployments and references
Business policies guide self-healing system
Preemptive diagnostics automatically recognize and resolve problems
Call home facilities are integrated as part of self-healing solutions
Symptom data made available to customers, ISVs, partners
2004
2004-2005
2007
2006
16
Autonomic Computing
© 2005 IBM Corporation
Self-Healing Vision
WinSS
AIXDB2MQ
zOSDB2MQ
CallHome
MA EP
Increased Embedded Self-Management Function
IT Professionals
ToolingSymptom
Policy
Config
MA EP MA EP MA EP
CBEs
Human-based MAsand associatedtooling for correlation,analysis, viewing
Adapter
Analyze Plan
ExecuteMonitor
Knowledge
SymptomSymptom
Change Type
Change Type
CBECBE ActionAction
Change Plan
Change Plan
Sensor Effector
17
Autonomic Computing
© 2005 IBM Corporation
Summary
IBM’s Autonomic Computing initiative has helped deliver the right “hygiene” to enable the industry for better Problem Determination
Predictive technologies can capitalize on this hygiene to help automate the “Problem Determination” process
We need continued research and cooperation across IBM and the industry at large to make the vision of Self-Healing systems a reality!