Top Banner
A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca, Manager, Problem Management 2/17/11 A A A N C N U I N F O R M A T I O N T E C H N O L O G Y IT OPERATIONS
19

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

Dec 16, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 1

Problem Management

Jim Heronime, Manager, ITSM Program

Tanya Friehauf-Dungca, Manager, Problem Management

2/17/11

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y

I T O P E R A T I O N S

Page 2: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 2

Agenda

PM Overview

History

Vision & Mission

Operational Level Agreement (OLA)

Action Items

Trending (Proactive Problem Management)

Facilitated Meetings (MIR & ToE)

KPIs and Metrics

Future Initiatives

Questions? Problem Management Team Members

Page 3: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 3

Problem Management Overview

Main goal of Problem Management:

– Detection of the underlying causes of an incident and the subsequent resolution and prevention of the incidents.

Problem Management ensures:

– The identification and classification of problems, root cause analysis, and resolution of problems

Problem Management process also includes:

– The formulation of recommendations for improvement, maintenance of problem records, and review of the status of corrective actions

Page 4: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 4

History of PM at AAA

Began our formal Problem Management practice in 2008.

– Track major incidents

– ID Root cause for major incidents

– Rudimentary MS-Access dB to store info

Began formal implementation of ITSM in June 2009

– Average root cause found was 55.4%

– Mean time to close problems = 6 days

Implemented current iteration of Problem Management October 2009. By January 2010.

– Average root cause found was 83%

– Mean time to close problems = 3 days

We continue to mature our process

Page 5: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 5

Vision and Mission

VISION:

– To permanently eliminate problems in our production environment and prevent new problems from occurring

MISSION:

– To aggressively identify root cause of problems and drive permanent solutions to stabilize our IT infrastructure

We do this by:

– PROCESSES: Ensuring PM processes and procedures are followed by IT support teams

– ACTION ITEMS: Managing assigned action items and their timeframes with support teams to drive permanent solutions

– ROOT CAUSE: Driving root cause identification within OLA timeframes

Page 6: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 6

OLAs for PM

Be aggressive: 3 Business days to identify root cause- Report enables us to track daily progress

Page 7: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 7

Action Items

Objective:

– Action items are identified and assigned to drive permanent solutions Types of Action Items:

– Root cause identification for every problem created from an incident

– Areas of improvement

• Documentation

• Process improvement & training

• Vendor management

• Hardware replacement How are Action Items identified?

– Incident management activities

– Problem management activities – Root Cause Analysis

– Meetings: Daily IT Operations Meeting, Major Incident Review (MIR), or Team of Experts (ToE)

How are they tracked?

– Maximo – integrated system with Change, Incident, and Asset

Page 8: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 8

Trend Analysis (Proactive Problem Management)

Objective:

– Analyze related incidents for common root causes

Collaboration with Operations Bridge:

– Weekly work sessions to identify potential areas of concern

– The Problem Management team reviews related incidents to look for common symptoms, causes, or conditions

Commonalities identified by trend analysis?

– A Global Problem record is created and assigned to the Service Owner with appropriately assigned action items

Service Owner analysis:

– The Service Owner prioritizes their efforts

– Determine to identify root cause

– Prioritize and approve with business for funding, scheduling

Page 9: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 10

Major Incident Review (MIR)

What is it?

– Evaluation of the incident process after a major incident

What’s it’s purpose?

– Validate details of the incident record

– Review incident handling – identify opportunities

– Identify lessons learned - share across the enterprise

– Identify action items

When is one required?

– Mandated for all Severity 1 incidents

– Lower severities by request or as needed

Why does Problem Management facilitate a Major Incident Review?

– Unbiased view of events – no call involvement

Page 10: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 11

MIR Agenda

Page 11: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 12

MIR Template

Page 12: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 13

Team of Experts (ToE)

What is it?

– A special team of technical subject matter experts (SMEs) assembled to analyze and resolve critical problems at an accelerated pace to minimize or eliminate exposure.

How long has this process been in place?

– This is one of our newest additions – since December 2010

Why are ToEs initiated?

– Teams not collaboratively engaging each other

– Need to identify root cause immediately – back to back incidents

– Leadership’s request for information and status of critical or chronic problems

Page 13: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 14

ToE (cont.)

ToE Activities

– Root cause analysis

– Brainstorm solutions and permanent fixes

– Assign action items and due dates

Where’s the template?

– Currently under construction

Page 14: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 15

KPIs and Metrics

KPIs

– Root cause identified within OLA

– MIRs conducted for Sev1 Incidents

Operational Metrics

– Total Problems by Severity

– Problems by Causing Party

– Outages by Domain (Applications, Network, Security, Servers, Telecom or Other)

Page 15: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 16

KPIs

*Baseline determined by internal historical data = 82% *Industry standards non-existent

Page 16: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 17

KPI Details

*2010 Average for RC Identified within OLA = 85.7%

Page 17: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 18

Examples of Metrics

*Change Freeze

AT&T

AAA NCNU

Page 18: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 19

Future Initiatives

Workarounds and defects – Known Error Database

Action item validation – quality check on completed actions

ToE template development

Page 19: A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : IT OPERATIONS 1 Problem Management Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca,

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y : I T O P E R A T I O N S 20

Questions?

PROBLEM MANAGEMENT TEAM MEMBERS

– Mark Hernandez - IT Service Transition Analyst V

– Gessica Briggs-Sullivan – IT Service Transition Analyst III

– Andrew Egan - Intern