Top Banner
Purpose The purpose of this document is to document the procedures required to execute, manage and govern the DoE ITD Problem Management Process. Audience & applicability The intended audience are those stakeholders throughout DoE that have a role to play in this process either directly or indirectly as dependent stakeholders. As well as those stakeholders who are interested in the process for execution and governance reasons. This procedure document inherits the definitions contained in the SMO Glossary for Managing Services. Context This document is created to provide a framework that underpins the execution and governance of this process. It is part of the definitive reference material for this process. Document Approval List Version Approver name Role How approved Date V2.0 IT LEADERSHIP GROUP Approver Leadership Forum 16/06/2016 V3.0 IT Executive Team Governance authority Endorsed at Ops Meeting 22/06/2016 V4.0 IT Executive Team Governance authority Pending 22/06/2016 Problem Management Procedure document – V4.0
39

Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

Jul 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

Purpose The purpose of this document is to document the procedures required to execute, manage and govern the DoE

ITD Problem Management Process.

Audience & applicability The intended audience are those stakeholders throughout DoE that have a role to play in this process either directly or indirectly as dependent stakeholders. As well as those stakeholders who are interested in the process for

execution and governance reasons.

This procedure document inherits the definitions contained in the SMO Glossary for Managing Services.

Context This document is created to provide a framework that underpins the execution and governance of this process. It is part of the definitive reference material for this process.

Document Approval List Version Approver name Role How approved Date

V2.0 IT LEADERSHIP GROUP Approver Leadership Forum 16/06/2016

V3.0 IT Executive Team Governance authority Endorsed at Ops Meeting 22/06/2016

V4.0 IT Executive Team Governance authority Pending 22/06/2016

Problem Management

Procedure document – V4.0

Page 2: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

Document Change Control Version Date Authors Description of changes V1.1 10/06/2016 Jehad Batta Inclusion of Problem Closure procedure for all problems

monitored by PRB and update references from ‘SMO’ to ‘ITD SMO’.

V2.1 28/04/2016 Jehad Batta Update to governance report section V2.2 20/06/2017 Jehad Batta Update to Problem Owner responsibilities ensuring

attendance to weekly problem operations meeting, inclusion of guiding principal where a service owner does not exist and a detailed problem management escalation.

V3 22/05/2018 Jehad Batta Annual review including an update to the process where a problem closure is only required for problems being governed by PRB.

V4.0 25/06/2019 John Crowe Added revised Impact, Urgency and Priority matrix from work with SDMs on Major Incident Management Updates some flow directions in flowcharts Minor changes for Grammar and readability

Document Information Details Name Email

Process Owner John Crowe [email protected]

Location Technology Intranet – Service Management Guides

Page 3: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 1 | P a g e

Table of content

1. Overview and purpose ................................................................................................... 2

1.1. Procedure overview ................................................................................................ 2

1.2. Procedures overview .............................................................................................. 2

1.3. Key process interfaces ............................................................................................ 3

2. Key principles ................................................................................................................ 6

2.1. Problem prioritisation .............................................................................................. 6

2.2. Problem investigation lifecycle ................................................................................ 7

2.3. Identification and classification ................................................................................ 8

2.4. Review .................................................................................................................... 9

2.5. Investigation and Diagnosis .................................................................................. 10

2.6. Resolution and Recovery ...................................................................................... 11

2.7. Closed .................................................................................................................. 12

2.8. Cancelled .............................................................................................................. 12

3. Problem management process roles ............................................................................ 13

3.1. Problem management process owner ................................................................... 13

3.2. Problem coordinator .............................................................................................. 13

3.3. Problem owner ...................................................................................................... 14

3.4. Specialist .............................................................................................................. 14

3.5. Service desk ......................................................................................................... 15

3.6. Vendor/service partners ........................................................................................ 15

3.7. Service Delivery Manager ..................................................................................... 16

3.8. Problem Review Board ......................................................................................... 16

3.9. RACI ..................................................................................................................... 16

4. Problem management workflow ................................................................................... 18

4.1. Problem closure review (meets problem review board criteria) ............................. 19

4.2. Problem management activity descriptions ........................................................... 20

4.3. Problem closure review activity description ........................................................... 26

5. Process critical success factors and KPIs .................................................................... 30

5.1. Reporting .............................................................................................................. 30

6. Escalation Management............................................................................................... 32

7. Problem review board .................................................................................................. 33

7.1. Problem Review Board Roles ............................................................................... 33

Appendix A: Root cause analysis techniques ...................................................................... 35

RCA techniques .............................................................................................................. 35

Page 4: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 2 | P a g e

Appendix B: problem closure report .................................................................................... 36

Appendix C: Kepner and Fourie Root Cause Analysis Templates ....................................... 37

1. Overview and purpose 1.1. Procedure overview This procedure document articulates by what means the DoE Problem Management Process

will operate. It describes the where, who and when activities will be managed through the

process lifecycle. Detailed work instructions describing how to execute this process are

available within the ITD SMO portal and Remedy knowledge base.

1.2. Procedures overview

The Problem Management process is a significant component of the ITIL Service

Management framework, within the Service Operation phase.

The Problem Management process supports the ITD adoption and standardisation on the

industry best practice ITIL process framework. This framework gives the necessary standard

foundations to manage the delivery of services from a Service Provider to its customers.

The process consists of the following 5 main procedures, each of which is dependent on other

external processes (through defined interfaces) to progress requests through to closure.

Problem Management Process Overview

Inte

rfac

esPr

oble

m L

ifecy

cle

Identification and Classification Problem Review Resolution and

Recovery Problem Closure

Reporting, Metrics, Feedback

Continual Service

Improvement

EndInvestigation and Diagnosis

Incident Management

Configuration Management

Change Management

Start

Release Management

Continual Service

ImprovementRisk Register

1.2.1. Identification and Classification The Problem Management Process may be triggered from various sources. When a

significant or recurring incident or an anticipated problem exists, the Problem Coordinator

responsible for a particular service assess the situation and determines whether a Problem

Page 5: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 3 | P a g e

exists or performs further investigation. This will involve classifying the service/product

name affected and priority assessment.

1.2.2. Review This procedure is used to control and reduce the duplication of problem investigation and

to determine the resources required to investigate.

1.2.3. Investigation and Diagnosis The investigation and diagnosis procedure involves the use of the Kepner and Fourie Root

cause analysis methodology (CauseWise). The aim is to standardise the method each

team approaches problem investigation, as a result this should increase the speed and

accuracy of root cause identification.

1.2.4. Resolution and Recovery This procedure is used to identify a preferred solution where the solution is applied using

the ITD Change Management process.

1.2.5. Closure When any change has been completed (and successfully reviewed), and the resolution has

been applied, the Problem Record should be formally closed – as should any related

Incident Records that are still open.

1.3. Key process interfaces The main interfaces between Problem Management and other ITD Service Management

processes are:

Page 6: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 4 | P a g e

Problem Identification

Investigation & Diagnosis

Known Error Management Resolution Problem Closure

Post Incident Review identifies

actions for Problem Mgmt

Event Mgmt identifies a potential

Problem

Work-around input to Knowledge Management

Project/Proposal Mgmt assists in

solution assessment

Knowledge article updated or removed

Advise Service Desk of problem eradication

Legend:

Process Step

Process Interface Input

Process Interface Output

Availability, Capacity, ITSC

Mgmt Monitoring

Service Level and Supplier issues

identified requiring Problem Mgmt

Release Mgmt identifies Known Errors

that will be introduced via a release

CI updated to reflect Change

Trends identified from Incident and Service Request

Mgmt

Other Process relationship

Continual Service

Improvement

Risk Register Capture

Change required to

resolve problem

Configuration Mgmt provides

details of CI impacted

1.3.1. Incident Management Inputs Information needed from the Incident Management process by Problem Management

includes:

• Incident Data

• Critical Incidents

• Trend and Statistical data

• Problems Identified

Outputs

Information needed by Incident Management process from Problem Management includes:

• Resolutions information

• Workarounds information

• Known Error Database articles

• Reports on the status of Problems and Known Errors in progress

1.3.2. Change Management Inputs

Information needed from the Change Management process to Problem Management

includes:

Page 7: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 5 | P a g e

• Results from Request for Change (RFC) submitted for approval

• RFC Rejections

Outputs

Information needed by Change Management process from Problem Management includes:

• RFCs

• Known Error Database data to assist with change decisions

1.3.3. Configuration Management Inputs

Information needed from the Configuration Management process by Problem Management

includes:

• CI data

• Relationships between CIs

Outputs

Information needed by Configuration Management process to Problem Management

includes:

• Trend and statistical data on CIs (CIs that are Known Errors)

• Known Error Database articles that relate to Configuration Management

Page 8: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 6 | P a g e

2. Key principles 2.1. Problem prioritisation A problem’s priority is defined via the same prioritisation methods used by ITD Incident

Management; i.e. based on two key factors – Impact and Urgency. The priority is assigned to

determine the order in which the support organisation will respond to a reported Problem

(highest priority calls are responded to first).

Note: A problem affecting the SALM School service, and is impacting a business functions of

either; Student Wellbeing, Financial Reporting, Student attendance will require a high priority

at a minimum.

Impact is defined as the effect of the Problem on the productivity of the service client’s function.

The clients of a service estimate business impact in relation to non-delivery of the affected

business process.

Impact and urgency are combined to assign a priority to a Problem. The following table defines

which combination equals which priority level.

Page 9: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 7 | P a g e

2.2. Problem investigation lifecycle The following table and lifecycle flowchart describes the life cycle of a Problem record in the

Remedy Problem Management system.

Workflow stage Status Description

Identification and

Classification Draft

A potential Problem has been triggered from one of the many triggers and a Problem

investigation is raised.

Review Under Review

The Problem Investigation raised and awaiting approval to further investigation.

Review Assigned The ticket has been approved and assigned to a Specialist for further investigation.

Investigation and

Diagnosis

Under

Investigation

The Specialist is actively performing further analysis, such as root cause analysis

and solution identification.

Resolution and

Recovery Completed

The primary and/or technical cause has been identified and the preferred resolution

(or none) has been defined and accepted by the Problem Coordinator.

Closed Closed The preferred problem resolution has been defined, implemented and

communicated.

Closed Cancelled The Problem is no longer being investigated and will not be reopened.

Various Pending

At any workflow stage other that Cancelled or Closed, a Problem Ticket may be

moved to a Status of Pending. The Pending Status indicates that the Problem is not

actively being investigated at the time and effectively on hold. A ticket with a status

of “pending’ must be moved to its previous status before the ticked can be progresses.

2.2.1. Known error record lifecycle The following table describes the life cycle of a Known Error record:

Status Description

Assigned The responsibility for the Known Error has been assigned to a specific group or change

coordinator.

Scheduled for Correction The implementation of the structural solution has been scheduled.

Assigned to Vendor The Known Error has been assigned to a vendor that has been asked to implement the

structural solution.

No Action Planned There are currently no concrete plans for the implementation of the structural solution.

Corrected The structural solution has been implemented successfully.

Closed It has been confirmed that the implemented structural solution has fixed the Problem.

Cancelled The implementation of the structural solution has been cancelled because of the reason

specified in the Status Reason field.

Page 10: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 8 | P a g e

2.3. Identification and classification

The Problem Management Triggers are outlined below. The Problem Management Process

may be triggered from various sources. Post the restoration of a Critical/Major Incident

management, via a Post Incident Report (where the criteria has been met). When a significant

or recurring incident or an anticipated problem exists, the Problem Coordinator responsible for

a particular service assess the situation and determines whether a Problem exists or performs

further investigation.

A potential problem is identified by analysing:

• High Impact Incident: isolated incidents which adversely affect one or more critical

business services, typically managed as a Major Incident

• Re-Occurring Incidents: incidents that recur or have affected multiple people over time,

identified through analysis of previous Incidents

• Non-Routine incidents: potential problems identified pre-emptively before being

reported as incidents, such as through service testing prior to releasing a new or

changed service

• Other: Any identified or potential problems that do not meet other categories or require

further assessment before categorisation.

The Problem Coordinator creates the Problem Record with as much detail as is currently

known, including selecting the affected product, entering a summary of the problem statement.

Any actions the Problem Coordinator takes to improve the understanding of the problem

should be entered as a work detail entry to ensure all relevant information is retained with the

ticket. The Problem Coordinator must also create links to other ticket types, such as Incidents

or Change record.

The Problem Coordinator must select a target date for resolution of the Problem ticket. This

signifies the anticipated date by which the ticket will be considered complete, with the problem

being sufficient addressed, with or without the problem or root cause being removed.

The Problem Coordinator then assigns the Problem ticket to the Specialist (Assignee) Group

for further investigation. Unless previously agreed, the Problem Coordinator should assign the

ticket to a group rather than an individual.

Page 11: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 9 | P a g e

Triggers Description Examples

Reactive Responding to single/multiple

events that have occurred, where

root cause is unknown. The

provision of a permanent solution and the mitigation of a repeat

failure.

• Post the restoration of a critical incident or major

incident which has been classified as either ‘High’ or

Critical’, if root cause is unknown a Problem Management record must be created to document the

Incident investigation steps and resolution outcomes

or, in cases whereby a work around has been applied, to identify and document the ‘true’ root cause and

Permanent Corrective Actions.

• A suspicion or detection of service disruptions which

has resulted in one or more incidents and where the

cause of the incidents are unknown;

• A notification from a supplier, contractors or event

monitoring tools that a disruption to services exists

which should be a candidate for Problem Management.

Proactive • Responding to multiple low priority events that have

occurred or, responding to trend

indications from any source of data that highlights a possible

problem. The provision of

permanent solutions and mitigation against repeat

failures.

• Or preventing incidents from

occurring or known

deficiencies/errors in the

environment.

• Analysis of an incident by a technical support group or subject matter expert which reveals that an underlying

problem exists, or is likely to exist;

• Extending the fix by applying Permanent Corrective

Actions to other process, technologies or people to

prevent service disruptions of a similar or same nature;

• Trending of historical incident records to identify one or

more underlying causes that if mitigated, can prevent

future disruptions to a given service or Configuration

Item;

• Customer experience (CX) activities taken to improve

the quality of services and/or functions of applications which are deemed significant enough to be candidates

for Problem Management.

2.4. Review When assigned to a Specialist Group, the Problem ticket moves to the Review workflow stage

with a status of “Assigned”. The nominated person/people within that team must review the

Problem Ticket and assign a Specialist as the Assignee to that Problem. The ticket then moves

to a status of “Under Investigation”.

Any actions the Specialist takes to improve the understanding of the problem should be

entered as a work detail entry to ensure all relevant information is retained with the ticket. The

Problem Coordinator should also create links to other ticket types, such as Incidents or

Change record.

Page 12: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 10 | P a g e

During the investigation, a number of common activities may be performed. These include:

• Relate Incidents to the Problem as the problem is better understood and Incidents are

assessed and analysed for commonalities and relationships

• Assign the problem ticket to an initial Specialist or reassign to a more appropriate

Specialist.

• Generate Tasks to assign work to individuals to assist in the investigation

• Relate CI to the Problem as affected services are identified and confirmed.

Alternatively, under certain circumstances the person/people nominated to review the queue

may move the ticket directly to the Resolution stage of the Workflow. In doing to, that person

must still specify the Specialist and the reason that the Problem ticket has been closed.

Additional information should be recorded as a as a work detail entry to ensure all relevant

information is retained with the ticket.

2.5. Investigation and Diagnosis The Specialist then works independently (for isolated issues) or other Specialists to investigate

the Root Cause, define a Known Error and Workaround, and develop a Solution. If the Solution

requires a system change, the Specialist then initiates the Change process.

For complex Problem investigations, the Specialist may establish a project or group to formally

investigate the problem with other Specialists from multiple support groups. The assigned

Specialist is responsible for managing the investigation project as the “lead” Specialist (still

nominated as the Assignee for the purpose of the Problem ticket). During the investigation, a

more appropriate “lead” Specialist from the same or different Specialist support group may be

identified and agreed and the ticket updated to reflect this.

If a vendor’s support is required for investigation or resolution, the Specialist contacts the

Vendor to log a call and records the Vendor’s reference as the Ticket Number.

The Problem Coordinator is responsible for:

• Investigation of the problem (root cause analysis)

• Engagement of ICT support teams

• End to end management and resolution of the Problem Investigation

• Creating a knowledge article with the work around (when using KCS)

ICT support groups are responsible for:

• Undertake detailed investigation of the problem

• Register the cause as a known error

Page 13: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 11 | P a g e

• Raise changes to resolve problems

• Update Problem Investigations with details of findings

2.5.1. Known Error Control When a Known Error is identified, the specialist is required to raise a Known Error record

and clearly state the defect, symptoms and workarounds (you can have more than one

Known Error related to a Problem).

During the development of the Known Error Solution, the specialist is required to document

and prepare the solution and have it reviewed and endorsed by the Problem Coordinator.

If a proposed solution requires a modification to a configuration item, the ITD Change

Management process must be adhered too.

2.6. Resolution and Recovery When the preferred solution has been identified and agreed the Specialist then moves the

Problem ticket through the workflow to Resolution and Recovery.

The Problem is resolved as one of the following:

• A defined and approved Solution that prevents the Incident from re-occurring.

• An Enhancement Request, when a modification to a system is required to resolve the root

cause. The Specialist creates a Service Request record to capture the required

enhancement details.

• Unresolvable in a situation where a solution cannot be implemented due to technical or

financial implications.

Note: these are the default ‘reasons’ in the tool and may be reassessed.

Any actions Specialist(s) and/or Vendor takes to improve the understanding of the problem

should be entered as a work detail entry to ensure all relevant information is retained with the

ticket. The Problem Coordinator may also create links to other ticket types, such as Incidents

or Change record.

The Specialist ensures that a workaround and solution are recorded when they exist. These

details are used by Support Analysts to provide ongoing support to customers prior to final

resolutions being implemented.

If the root cause or a practical resolution cannot be identified, the Specialist may move the

Problem Investigation to a Status of “Pending”. The Specialist then monitors to Problem

Investigation and may periodically review the Problem Investigation for a root cause or

resolution that may become available over time.

Page 14: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 12 | P a g e

The Specialist then notifies the Problem Coordinator that the Problem Ticket of the updated

status.

For all Critical and High priority problems, Problem Review Board will have oversight on

proposed solutions before proceeding.

2.7. Closed When the Problem record has been completed and verified, the workflow moves to Closed.

The Problem Coordinator reviews the Problem Record and verifies the status and any

associated records (Known Error, Change, etc).

The Problem Coordinator may reject the proposed solution, or the Change Request rejected

through the CAB. When the Problem Investigation is not accepted, the Problem Coordinator

may move the workflow back to Assigned and assign it to the same or an alternative Specialist

group to continue investigation.

For all problems being governed by Problem Review Board it is mandatory that the problem

coordinator completes a Problem Closure Report.

2.8. Cancelled A Problem Investigation may also be cancelled at any stage, completing the workflow without

implementing a solution.

The reasons for cancelling a Problem Investigation are:

• Duplicate Investigation, when another Problem Investigation is underway (or Pending) for

the same Incident(s).

Page 15: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 13 | P a g e

3. Problem management process roles The following section provides an overview of the roles involved in the Problem Management

process. They are categorised into the:

• day-to-day operational management of the process

• end-to-end governance oversight across the problem lifecycle

3.1. Problem management process owner The Process Owner is accountable for:

• End to end process ownership ensuring it is initially designed and implemented to match

the Departments needs and evolves as the business requirements change

• Ensuring the problem management process health is maintained and matured by

reporting, monitoring, analysing and governing the process against metrics.

• Ensuring the process matures after it is implemented and fit for purpose via the Continual

Service Improvement Process

• Liaise with Problem Owner and Coordinators to provide staff with Problem Management

Process training and/or familiarisation

• Review and track outstanding and unassigned Problem Investigation tickets to ensure

raised problems are being managed within agreed targets

• Collecting and assessing process improvement ideas from staff and users

• Ensuring all associated Problem Management documentation (Standards, Procedure,

work instructions, Remedy toolset) is maintained and circulated to all appropriate staff

• The Process Owner is responsible for liaising with other Process Owners to maintain and

improve process relationships.

• Chair the Community of Practice, including the Problem operation review meetings.

3.2. Problem coordinator The Problem is accountable for:

• Review of incidents to identify problems and raise problem investigations

• Attendance to Problem Governance group meetings when required i.e. attendance to

Problem Review Board and/or Problem Operation review.

• Determine appropriate Specialists or vendors that are required in the problem investigation

and diagnosis

• Search existing Problem Investigations and Known Errors to ensure the new problem has

not been previously recorded

Page 16: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 14 | P a g e

• Conduct regular problem analysis meetings and involve all relevant stakeholders

• Validate and approve proposed permanent Solutions and/or Workarounds

• Ensure updates are made throughout the problem management lifecycle and adhering to

the update frequency if a problem is being tracked on the IT Executive dashboard

• Monitor and manage their Problem Queue within the Service Management tool

• Manage the workarounds and/or solutions by utilising the ITD Change Management

process

• Notify stakeholders of planned solutions and set expectations for changes

• Identify any potential improvements to the problem management process and notify the

process owner

• Champion the utilisation of the problem management process

• Review and approval of the Problem Resolution

• Follow the closure procedure prior to closing a problem record

3.3. Problem owner The Problem is accountable for:

• Problem Owner is to ensure a Problem is managed from end-to-end of the Problem

lifecycle according to the defined process.

• Attendance to Problem Governance group meetings when required i.e. attendance to

Problem Review Board and/or Problem Operation review.

• Ensure updates are made throughout the problem management lifecycle and adhering to

the update frequency if a problem is being tracked on the IT Executive dashboard

• Approval of Problem Closure Report for all Critical and High priority problems.

• Notify stakeholders of planned solutions and set expectations for changes

• Ensure all Critical priority problems are periodical updated.

3.4. Specialist The Specialist is accountable for:

• Investigate and diagnosis the root cause of all problems escalated to the team utilising

their subject matter expertise

• Ensure the problem has not previously been recorded as a known error

• Participation of problem diagnosis meetings

• Liaise with vendors if required to perform investigation or analysis work

• Work with other specialists and/or vendors to assess the problem and identify the known

error and/or solution

Page 17: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 15 | P a g e

• Create a known error record documenting the root cause

• Develop and document a workaround, where a workaround exists

• Notify the Problem Coordinator of any existing or potential issues impacting investigation

or the eventual resolution of a known error

• Develop a solution implementation plan

• Adhere to the Change Management process when implementing workarounds or solution

• Identify any potential changes to the problem management process and notify the process

owner.

3.5. Service desk The Service Desk is accountable for:

• Identify potential problems and notify Problem Coordinator

• Assess the potential problem to determine whether it is a problem

• Participate in the problem analysis meeting when required

• Identify any potential change to the problem management process and notify the process

owner.

• Ensure the suitable use of the Incident Management Parent and Child incidents.

3.6. Vendor/service partners The Vendor/Service Partner is accountable for:

• Notify relevant DET contact when a problem or potential problem is identified

• Assess problems that have been assigned to them, including details of the problem record

• Assist the assigned Specialist to analyse a problem or identify the root cause.

• Develop a work around, informing the Problem Coordinator of work performed by the

vendor organisation

• Inform the Problem Coordinator of any work performed by their organisation in terms of

the known error

• Identify a solution through corrective action or accepting the known error

• Develop a solution implementation plan if warranted

• Implement the authorised solution, verify the corrective action has worked, provide a back-

out procedure if the corrective action fails and investigate and resolve the failure of the

corrective action

• Identify any potential change to the problem management process and notify the process

owner.

Page 18: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 16 | P a g e

3.7. Service Delivery Manager The Service Manager is accountable for:

• Problem Coordinator for Problem investigation that are a result of a Critical (Major Incident)

Incident

• Work with the Problem Coordinator, Specialists, Vendor and Service Desk to ensure

customer expectations are managed

• Participate in the problem analysis meeting to identify a proposed solution

• Notify the Problem Coordinator of any issues impacting/potentially impacting resolution of

a Known Error

• Determine reporting requirements and identify any changes to the reports

• Identify any potential change to the problem management process and notify the process owner.

3.8. Problem Review Board The Problem Review Board is accountable for:

• Keeping informed of Critical or High severity problems and the risk to the Department:

• Re-evaluating, amending and agreeing to the prioritisation (Critical and High), as well as

resource allocation of Problem and /or Known Error records and their related tasks:

• Approval of solutions which could potentially impact commercial relationships and/or DoE

reputation, as well as resource utilisation and service efficiency:

• Approval to close and/or Resolve (when required) Problem and Known Error records:

• Assessing and mitigating residual risk of closed problems.

3.9. RACI

RACI Title Details

R – Responsible Role responsible for completing the task

A – Accountable Only one role can be accountable for each activity

C – Consult The role who are consulted and whose opinions are sought

I – Inform The role who are kept up-to-date on progress

Page 19: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 17 | P a g e

ACTIVITY

Problem management

process manager

Problem owner \

coordinator Specialist

Service desk

Service manager

Vendor/ service partners

Problem review board

Problem Management Process -

Planning, design, compliance, process improvement, training,

communications)

A/R

Identification and Classification A R C

Problem Review A/R R

Investigation and Diagnosis A/R R I R

Known Error Control A R R

Resolution and Recovery A/R R I R

Problem Closure A/R C

Reporting, Metrics, Feedback A/R C C C C C

Continual Service Improvement A/R C C C C C

Page 20: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 18 | P a g e

4. Problem management workflow SMO : Problem Management Level 3

Use

rPr

oble

m S

peci

alist

Pr

oble

m C

oord

inat

orPR

B

PM.102 Problem Identification and

Classification

PM.103 Approved for

Analysis?

PM.113 Create Known Error Record

Incident, Release, Change, SR, SACM

& Supplier.

PM.109 Is a Workaround

available?

PM.110 Develop and Test workaround

PM.105 Validate Categorisation and

Priority

PM.108 Investigation and Diagnosis (Root

Cause Analysis)

PM.114 Investigate and Propose Solution

PM.115 Permanent Fix

Available?

PM.111 Validate and Approve

Workaround

PM.118 Review and Approve Proposed

Solution

PM.117 Develop Proposed Solution NO

Change Management

PM.122 Problem

Resolved?

PM.123 Review Problem Resolution

YES PM.124 Problem/Known Error Closure

YES

PM.101 Critical Incident/Post

Incident Report

PM.104 Provide Feedback and Cancel

Problem

NO

End

YES

Yes

PM.112 Is a Known Error Identified?

No

NO

PM.107 Conduct Risk Assessment

and Final Prioritisation

PM.106 Meets PRB Criteria

YES

NO

YES

PM.106 Meets PRB Criteria

YES

PM.119 Review Proposed Solution

NO

PM.120 Solution

Approved?

PM.121 Communicate

feedback and next steps

No YES

NO

PM.106 Meets PRB Criteria

PM.125 Review Problem Closure

Report

YES

NO End

PM.126 Problem Closure

Approved?

PM.127 Communicate

feedback and next steps

NO

PM.128 Update Problem Closure

based on PRB Feedback

YES End

PM.116a Escalation Required?

PM.116b Escalation conducted YES

NO

Page 21: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 19 | P a g e

4.1. Problem closure review (meets problem review board criteria)

SMO : Problem Closure Review (Meets PRB Criteria)Pr

oble

m O

wne

rSe

rvic

e O

wne

rPr

oces

s Ow

ner

Prob

lem

Re

view

Bo

ard

IT E

xecu

tive

Team

NO

PCR.102 Create/Update/Modify Problem Closure

Report

End

PCR.101 Meets PRB Criteria

YES

PCR.103 Service Owner Review

PCR.104 Service Owner Endorsed

PCR.105 Process Owner Review

YES

PCR.106 Ready for PRB circulation NO

NO

PCR.107 PRB Review

YES

PCR.108a PRB Approval

PCR.110 ITD Leadership Review

PCR.111 ITD Leadership Endorsed

YES

PCR.113 Prepare feedback and send to

Service Owner

PCR.112 Problem Closed

YES

PCR.106b Prepare/Send Feedback to

Service Owner

PCR.114a Engage Problem Owner

address feedback

PCR.109 ITD Leadership Criteria

NO

NO

Problem Closure

Procedure

NO

NO

PCR.114b Engage Problem Owner, PRB Member and address

feedback

PCR.108b Any feedback for

Service Owner?YES

Prepare and Send feedback to Service

Owner

YES

Page 22: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 20 | P a g e

4.2. Problem management activity descriptions This section of the document lists the specific activities as referenced in the Problem

Management workflow diagram and provides some level of detail as to how these activities

are performed.

PM. 101 Critical incident/post incident report

Purpose To ensure all declared Critical incidents are managed to resolution via the Problem Management process.

Trigger Completion of a Post Incident Report

Inputs A completed Post Incident Report

Description Raising a problem investigation to identify root cause and solution.

Outputs Proceed to activity PM.102

PM. 102 Problem identification and classification

Purpose Identification and classification of the problem being raised and to help provide information if the

problem investigation should proceed.

Trigger A raised problem record

Inputs Refer to the varied problem triggers

Description The identification and classification will be used to provide an accurate problem statement,

product affected and the prioritisation of the problem using the impact and urgency.

Outputs Creation of a Problem Record

PM. 103 Approved for analysis?

Purpose To determine if the problem investigation is to proceed.

Trigger Raised problem record in an Under Review status.

Inputs A raised problem record

Description

During the analysis, it’s important to review the purpose of the problem investigation by

establishing if there isn’t a current problem raised or if the raised problem record does not merit an investigation (enhancement request).

Outputs If approved, go to PM.105

If rejected, go to PM.104

PM. 104 Provided feedback and cancel problem

Purpose Reduce the amount of duplicated or non-required problem investigation.

Trigger A rejected problem investigation.

Inputs A raised problem record for review

Description Provide feedback by placing information within the problem record on why the decision was

made, then cancel the problem investigation by selecting the appropriate status reason.

Outputs Cancel Problem record.

Page 23: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 21 | P a g e

PM. 105 Validate categorisation and priority

Purpose Maintain accuracy and quality of Problem investigations raised.

Trigger An approved problem investigation.

Inputs Problem record

Description Once the Problem Co-ordinator has approved the problem record for analysis. The Problem Co-ordinator will review and Validate Categorisation & Prioritisation of the problem record.

Outputs If priority of Problem is either High or Critical, this will be addressed to PRB.

If priority of Problem is Low or Medium, proceed to PM.108

PM. 106 Meets problem review board (PRB) criteria

Purpose Problem Review Board will review all problems with a priority of either Critical.

Trigger Problem record where it has been reviewed by the Problem Coordinator.

Inputs Problem record

Description Problem review board will have visibility of all Critical and High priority problems within the Department of Education environment.

Outputs PM.107

PM. 107 Conduct risk assessment and final prioritisation

Purpose Risk Assessment regarding the problem investigation, final prioritisation and any disputes of the

Problem owner.

Trigger Problem investigation has meet the PRB criteria

Inputs Problem record

Description Risk Assessment regarding the problem investigation, final prioritisation and any disputes of the

Problem owner.

Outputs PM.108

PM. 108 Investigation and diagnosis (root cause analysis)

Purpose Utilise the Kepner and Fourie Root Cause Analysis methodology to identify technical cause.

Trigger Approved problem investigation

Inputs Problem record, Kepner and Fourie (CauseWise) templates

Description Kepner and Fourie root cause analysis methodology is the standard for all ITD staff and should

be used to identify root cause of problems.

Outputs PM.109

PM. 109 Is a workaround available?

Purpose Identify a suitable workaround that can be utilised by affected users/customers.

Trigger Once root cause analysis has been conducted.

Inputs Findings from root cause analysis.

Description Identify a suitable workaround that can be used by affected users and customers.

Outputs PM.110

Page 24: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 22 | P a g e

PM. 110 Develop and test workaround

Purpose Develop and testing the proposed workaround.

Trigger Once workaround has been identified

Inputs Findings from root cause analysis.

Description Problem specialist is to develop and test the workaround before making it public viewing for users.

Outputs Preparing the workaround to be peer reviewed and accepted by the Problem Coordinator.

PM.111

PM. 111 Validate and approve workaround

Purpose Peer review the workaround before promoting its use via problem record and/or Knowledge

Article.

Trigger Once root cause analysis has been conducted.

Inputs Findings from root cause analysis.

Description Problem Coordinator is to peer review the proposed workaround by testing and validating the

solution.

Outputs PM.113

PM. 112 Is a known error identified?

Purpose Ensuring if a workaround does not exists, the root cause of the problem has been identified by

investigating and proposing a solution.

Trigger No workaround has been identified

Inputs Findings from root cause analysis.

Description Before proceeding to investigate a solution, it best to identify a known error/cause.

Outputs PM.113

PM. 113 Create know error record

Purpose Creating a known error record illustrating the symptoms a user will experience, the cause and

workaround.

Trigger Cause of the problem is known

Inputs Cause of the problem is known

Description

Create a known error record by inputting all relevant information, allowing storage of previous

knowledge of incidents and problems and how they were overcome. Allowing quicker diagnosis and resolution if they occur.

Outputs PM.113

Page 25: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 23 | P a g e

PM. 114 Investigate and propose solution

Purpose Investigation will be performed to identify if a permanent fix is available.

Trigger Root cause is known

Inputs Known Error record created

Description After investigation if a solution is available to resolve the known error.

Outputs PM.115

PM. 115 Permanent fix available?

Purpose Identify and assess if a permanent fix is available.

Trigger Assessment if a permanent fix is available

Inputs Known Error record

Description

Based on the technology and shelf life of the product, a problem/known error may remain open

as the cost associated to resolving may not exceed the return on investments or the product is

being decommissioned.

Outputs PM.116 or PM.117

PM. 116a Escalation required?

Purpose Confirming if the problem solver team have the required knowledge to identify a permanent solution.

Trigger No permanent fix available

Inputs Known Error record

Description If no permanent fix is available, it’s the responsibility of the problem coordinator to escalate the matter with the Problem Owner to arrange suitable resources to identify a permanent solution.

Outputs If not escalation required proceed to PM.114, if escalation required proceed to PM.116b

PM. 116b Escalation conducted

Purpose To mobilise additional subject matter experts from teams within ITD or external suppliers, to

provide expertise.

Trigger There are no resources available to provide a permanent solution.

Inputs Confirmation there are no available resources to provide a permanent solution

Description Escalation of problem to the relevant specialist group (includes vendor) in order to further

investigate the Problem

Outputs PM.114

PM. 117 Develop proposed solution

Purpose Problem specialist is to develop and propose the solution for review and approval by the problem

coordinator

Trigger Permanent fix is available

Inputs Proposed solution

Description Ensuring the solution is proposed and reviewed by the problem coordinator

Outputs PM.118

Page 26: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 24 | P a g e

PM. 118 Review and approve proposed solution

Purpose Problem Coordinator is to review and validate the proposed solution and verify the solution will remediate and resolve the problem.

Trigger Proposed solution

Inputs Proposed solution is to be documented using the proposed solution template

Description Ensuring the solution is peer reviewed and approved for completion or if it meets the PRB criteria, it will need to be prepared for PRB review.

Outputs PM.106

PM. 119 Review proposed solution

Purpose Problem Review Board is to review and approve the proposed solution, for all critical priority

problems.

Trigger Problem coordinator approved solution for PRB review

Inputs Problem coordinator approved solution for PRB review

Description Ensuring the solution is reviewed and approved by the PRB

Outputs PM.120

PM. 120 Solution approved?

Purpose Approval of solutions which could potentially impact commercial relationships and/or DoE

reputation, as well as resource utilisation and service efficiency:

Trigger Problem coordinator approved solution for PRB review

Inputs Problem coordinator approved solution for PRB review

Description Ensuring the solution is reviewed and approved by PRB.

Outputs Rejected PM.121 or Accepted Problem Coordinator or specialist is to initiate the solution using

the ITD Change Management process.

PM. 121 Communicate feedback and next steps

Purpose PRB may have rejected the solution based on commercial, risk or reputation to DoE. Feedback

it to be provided to the Problem coordinator to amend the solution based on feedback.

Trigger Rejected solution

Inputs Solution that has been rejected

Description Any feedback received by the PRB will be provided to the Problem Coordinator to adjust the

solution based on PRB recommendation.

Outputs Feedback provided to Problem Coordinator PM.118

Page 27: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 25 | P a g e

PM. 122 Problem resolved?

Purpose Review the accuracy and effectiveness of the resolution by confirming the problem is now resolved.

Trigger Completion of the change record/s

Inputs Completion of the ITD Change Management process.

Description Once the solution has been implemented, the Problem coordinator will review the success of

the resolution.

Outputs If the problem still exists PM.117 or if problem resolved PM.123

PM. 123 Review problem resolution

Purpose The review and validation of the resolution of a problem and ensuring a Problem with either Critical, it will be assessed by PRB.

Trigger Confirmation by the Problem Coordinator problem is resolved

Inputs Resolved Problem

Description Preparing the review of the problem resolution in readiness of PRB review, only if it meets the PRB criteria of either Critical or High priority.

Outputs PM.124

PM. 124 Problem/knows error closure

Purpose Closure of all relating Problems, Known Errors and relating incidents if applicable, if problem is Critical will require PRB approval. For all problems governed by Problem Review Board a problem closure report is to be completed.

Trigger Approval the problem is now resolved by validating with affected users/customers.

Inputs Confirmation by affected users’/customers problem is now resolved.

Description Closure of problem and known error records.

Outputs PM.106

PM. 125 Review problem closure

Purpose PRB are to review the resolution of either Critical priority problems.

Trigger Problem closure approved by problem Coordinator and Service Owner

Inputs Confirmation by affected users/customers problem is now resolved.

Description Closure of problem and known error records.

Outputs PM.126

PM. 126 Problem closure approved?

Purpose PRB are to provided confirmation if any re-work is required within the Problem Closure Report before they can accept problem closure.

Trigger Review Problem Closure Report by all PRB members

Inputs Problem Closure Report reviewed by all PRB members.

Description Validating if Problem Closure can be provided or is there additional work required within the

report.

Outputs Accepted feedback is provided to problem coordinator for closure if rejected proceed to PM.127

Page 28: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 26 | P a g e

PM. 127 Communicate feedback and next steps

Purpose Feedback is to be communicated to the problem coordinator or owner, where updates are to be made.

Trigger Rejected Problem Closure Report

Inputs Reject problem closure report

Description Feedback has been provided to correct the problem closure report.

Outputs PM.128

PM. 128 Update problem closure based on PRB feedback

Purpose Problem coordinator is to review the feedback and make changes. Where the Problem Closure Report will require an approval by all parties before by addressed to PRB

Trigger Updated PRB comments received

Inputs PRB comments

Description Problem Coordinator to update Problem Closure Report with PRB comments and ensure Problem closure report is approved by Problem Owner and Service Owner before being

reviewed again by PRB.

Outputs PM.125

4.3. Problem closure review activity description This section of the document lists the specific activities as referenced in the Problem Closure

Review (meets PRB) workflow diagram and provides some level of detail as to how these

activities are performed.

PCR. 101 Meets problem review board criteria

Purpose All problems requested by the Problem Review Board to be monitored, will meet the criteria.

Trigger Problem is ready for closure

Inputs Problem Resolution or if Problem is unable to be remediated and will require a project initiated or in the rare case, a risk is to be raised as a solution to the problem is not possible.

Description Problem Owner will be required to assess the readiness of a problem and its closure.

Outputs PCR.102

PCR. 102 Create/Update/Modify Problem Closure Report

Purpose The creation, update or modification of a problem closure report.

Trigger

Problem is being monitored by Problem Review Board or where the service owner has

requested an update to the problem report, or if the Problem process owner has also requested an update or if the Problem Review Board has requested an update to the problem report.

Inputs PCR.104, PCR.114a, PCR.114b

Description Problem Owner will be required to assess the readiness of a problem and its closure.

Outputs PCR.103

Page 29: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 27 | P a g e

PCR. 103 Service Owner Review

Purpose Service owner to review problem closure report ensuring all aspects within the report have been endorsed for circulation to PRB and IT Executive team.

Trigger

Problem is being monitored by Problem Review Board or where the service owner has

requested an update to the problem report, or if the Problem process owner has also requested an update or if the Problem Review Board has requested an update to the problem report.

Inputs PCR.102

Description Service Owner to review problem.

Outputs PCR.102, PCR.104

PCR. 104 Service Owner Endorsed

Purpose Service Owner to endorse or not approve problem closure.

Trigger Once service owner reviewed problem closure report.

Inputs PCR.103

Description Service Owner will either endorsed circulation of the problem closure report or will not approve, requiring the problem closure report to be reviewed and updated by problem owner.

Outputs PCR.102, PCR.104

PCR. 105 Process Owner Review

Purpose Process Owner to review and ensure its standards is ready for Problem Review Board

circulation.

Trigger Once Service Owner has endorsed Problem closure.

Inputs PCR.104

Description Process owner will review the problem closure report ensuring all information supplied is factual

and accurately supplied.

Outputs PCR.106

PCR. 106 Ready for PRB circulation

Purpose To ensure the problem closure report is ready for PRB circulation or it will require further

clarification from the Problem Owner.

Trigger Once Service Owner has endorsed Problem closure.

Inputs PCR.104

Description

After reviewing the problem closure report, if there are any corrections required this will require

the review of the problem owner or if it’s ready for circulation, this can now be circulated to PRB members.

Outputs PCR.107, PCR.113

Page 30: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 28 | P a g e

PCR. 106b Prepare/Send Feedback to Service Owner

Purpose Process owner to consolidate all feedback received from PRB as a result of the non-approval, then have it reviewed by the Service owner.

Trigger PRB have not approved problem closure.

Inputs PCR.108a

Description As a result of PRB not approving problem closure, all feedback is to be captured and provided

to the service owner for review and consultation with problem owner.

Outputs PCR.114b

PCR. 108a PRB Approval

Purpose Problem Review Board members will review and endorsed the closure of all problems.

Trigger Process Owner will circulate the problem closure report.

Inputs PCR.107

Description Problem review board members will be required to review the contents of the problem closure report and provide their endorsement of closure. If it has not been approved this will be captured

as feedback.

Outputs PCR.106b, PCR.108b

PCR. 108b Any feedback for Service Owner

Purpose Even though the problem closure report may have been endorsed, if there are any feedback

required, it will be sent back to the service owner/problem owner for review.

Trigger PRB have endorsed problem closure.

Inputs PCR.108a

Description After PRB have endorsed endorsement o problem closure report Problem Review Board

members will provide

Outputs Prepare and Send Feedback to Service Owner, PCR.109

PCR. 109 IT Executive Criteria

Purpose

Ensure Problem Closure Reports that meet the criteria to circulate to IT Executive has been meet.

• Budget allocation to permanently resolve known error/problem

• ITD resources outside of BAU capacity

• Approval of residual risks

• Or if a PRB member has recommended the escalation to the IT Executive team.

Trigger Problem Closure Report has been endorsed by IT Executive team and has meet the criteria.

Inputs PCR108b

Description Problems that require review and possibly an approval on either a budget allocation, additional resources and/or approval of residual risk.

Outputs PCR.110

Page 31: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 29 | P a g e

PCR. 110 IT Executive Review

Purpose

IT Executive team will review problem closures and will be required to review approve items that cover the following aspects:

• Budget allocation to permanently resolve known error/problem

• ITD resources outside of BAU capacity

• Approval of residual risks

Trigger Problem closure report has meet the criteria where the IT Executive team are required to review and endorse aspects ranging from budget, additional resources and/or residual risks.

Inputs PCR.110

Description IT Executive team are to review and address problem closures that have meet the criteria.

Outputs PCR.111

PCR. 111 IT Executive team Endorsed

Purpose Approve or not approve aspects of the problem closure.

Trigger Review of Problem closure.

Inputs PCR.110

Description

IT Executive team will either endorse or not approve the closure of the problem based on the

criteria; additional budget allocation, additional ITD resources outside of BAU capacity and/or

approval of residual risk.

Outputs PCR.106b, PCR.112

PCR. 112 Problem closure

Purpose Ensure all aspects of the problem have been closed. Process owner has performed the activities required to close off the problem within the Remedy Problem record.

Trigger Problem Review Board and IT Executive team have endorsed problem closure report.

Inputs PCR.109, PCR.111

Description Process owner will commence to close the procedure by ensuring the Problem Closure Reports have been closed and captured appropriately within the Remedy Problem Management.

Outputs Procedure End.

Page 32: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 30 | P a g e

5. Process critical success factors and KPIs CSF’s KPI’s

Minimise the impact of problems • % of incidents correlated to Knowledge Articles

• The # of Problems Closed this month with a permanent resolution (by

category)

Avoiding repeated incidents • % of incidents correlated to Knowledge Article

• The # of Problems Closed this month with a permanent resolution (by

category)

Improved service quality • % of incidents correlated to Known Errors

• Ratio of problems to incidents

5.1. Reporting A number of reports will be used to identify problems and known errors before the incident

occur, thus minimising the impact on the service. Incident and problem analysis reports

provide information for proactive measures to improve service quality.

Critical report is what is known as the Problem Review Board dashboard. The report is crucial

for the effective delivery of the weekly Problem Review Board meeting, where members refer

to the report to complete their responsibilities as a board.

5.1.1. Governance reports

Item Requirement

Open Problems No Work Around Showing a count of the number of problem tickets with a status of Assigned, under investigation or Pending which do not have a Work Around Identified.

Open Problems No Root Cause Number of open problem tickets with a status of Assigned, under investigation or Pending which do not have a Root Cause identified.

Open No Work Around Or Root Cause Number of open problem tickets with a status of Assigned, under investigation or Pending which do not have a Work Around and Root Cause identified.

Mean time to work around Average time it takes for a Work Around to be added to a Problem. Time taken from Submit Date until Workaround Determined On is populated in Working Days averaged over all Problems submitted

Mean time to Root Cause Average time it takes for a Root Cause to be added to a Problem. Time taken from Submit Date until Root Cause Added is populated in Working Days averaged over all Problems submitted.

Mean time to Investigation

Average time it takes for a Problem to reach the Status of Under Investigation. Sum of the time taken from Submit Date to the Status being changed to Under Investigation (In Working Days) for all Problems divided by the number of problems submitted. Time taken from Submit Date until the Status is changed to Under investigation, in Working Days averaged over all Problems submitted

Page 33: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 31 | P a g e

Item Requirement

Mean time to Respond

Average time it takes for a Problem to reach the Status of Under Review, starting from the first Incident Submitted. Time taken from Submit Date of the earliest Incident attached until the Status is changed to Under investigation, in Working Days averaged over all Problems submitted.

Mean time to Close Problem

Average time it takes for a Problem to reach the Status of Closed, starting from the point it was Submitted. Sum of the time taken from Submit Date of the Problem to the Status being changed to Closed (In Working Days) for all Problems divided by the number of problems submitted.

Mean time Risk of Disruption

Average time it takes for a Problem to progress from the Identification of the Root cause until a permanent fix is implemented and the Status of Completed is reached. Sum of the time taken from the population of the Root Cause Identified field to the Status being changed to Completed (In Working Days) for all Problems divided by the number of problems submitted

# Problems Not Updated By Problem

Coordinator Group (5Days)

Where a problem is being governed by PRB and is not included within the Watch list items, all problems must be updated every 5 business days.

Page 34: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 32 | P a g e

6. Escalation Management An escalation management process will be invoked where a problem is being governed/monitored by the Problem Review Board. And where the following criteria’s have been met;

• No update within the Remedy Problem Record for two reporting cycles, • Root Cause is known but solution implementation has missed 2 estimated

implementation dates, • Problem Review board recommend to have it escalated, • Problem closure report has not been completed within 2 weeks without a satisfactory

reason and/or • Problem has been updated but does not demonstrate progress.

Page 35: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 33 | P a g e

7. Problem review board The Problem Review Board (PRB) describes the purpose, values and how the PRB will

operate within the Department ICT environment. The PRB quorum will consist of DoE ITD

Executive Directors who will meet on a regular basis or during an emergency situation which

requires their approval.

Problem Review Board responsibilities:

• Keeping informed of Critical or High priority problems and the risk to the Department:

• Re-evaluating, amending and agreeing to the prioritisation (Critical and High), as well as

resource allocation of Problem and /or Known Error records and their related tasks:

• Approval of solutions which could potentially impact commercial relationships and/or DEC

reputation, as well as resource utilisation and service efficiency:

• Approval to close and/or Resolve (when required) Problem and Known Error records:

• Assessing and mitigating residual risk of closed problems.

7.1. Problem Review Board Roles

Problem Review Board Roles

The PRB will consist of ITD Directors who will have the authority and sponsorship to execute

the PRB charter and may also require the presents of a PRB Guest. The following roles will

be required to participate within the PRB meetings.

PRB Chairperson

• The PRB chairperson chairs;

• Presenting problems that require risk assessment and final prioritisation, resource

allocation, approving/rejecting proposed solution and acceptance of problem closures.

PRB Member

• Provide input into the risk assessment and final prioritisation on critical and high problems;

• Approving the suitable resource allocation to perform further investigation of problems;

• Approving/rejecting proposed solutions based on commercial or resourcing and the review

of critical and high problems and problem closures.

• Review all new Critical and/or High priority problems raised and approve additions/

changes to CIO Dashboard

Page 36: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 34 | P a g e

• Acceptance of risks where mitigation activities have taken place

Problem Process Owner

• Produce a report of critical and high priority problems that require risk assessment and

final prioritisation, resource allocation, approving/rejecting proposed solution and

acceptance of problem closures;

• Minutes and coordination of action items as a result of the PRB meetings:

PRB Guests (Problem Owner, Service Delivery Manager, Operations Manager, Service Desk Manager)

• The PRB guest is a role which may require further explanation, clarification of a problem

and its risks, workaround and/solutions or a detailed status update is required in person

• Invitation of guest will be extended by core PRB members on a case by case basis

Page 37: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 35 | P a g e

Appendix A: Root cause analysis techniques RCA techniques

Technique Benefits

Kepner and Fourie CauseWise

CauseWise can be applied to any technical deviation or incident experienced with Hardware and other computer equipment. The framework involves defining

the Problem statement, Problem details by outlining what is, is not and why not

then validating the possible causes, followed by an action plan.

Kepner and Fourie PriorityWise

A highly flexible and practical prioritisation approach enabling to determine management priorities of core issues effectively. A systematic and integrative

application of rational and intuitive priority settings strategies.

Kepner and Fourie ThinkingWise

A problem solving approach that is to determine the causes and reasons for a complex/vague problem/s.

Kepner-Tregoe Problem

Analysis

K-T Problem Analysis is a systematic method to analyse a problem and

understand the root cause of the issue instead of making assumptions and

jumping to conclusions. The methodology is a well renown choice within the IT and technical field and has been included in the IT Infrastructure Library (ITIL)

but can be applied to a wide range of problems.

The 5 steps to Kepner-Tregoe Problem Analysis

• Define the Problem

• Describe the Problem

• Establish possible causes

• Test the most probable cause

• Verify the true root cause

Page 38: Procedure Problem management...may be triggered from various sources. Post the restoration of a Critical/Major Incident management, via a Post Incident Report (where the criteria has

NSW DoE | IT CESD | Procedure – Problem Management 36 | P a g e

Appendix B: problem closure report Problem Closure Report template is located within the Problem Management SharePoint page

– filename Problem Closure Report template