15 minute read

2 Problem management process

2.1 Overview and process diagram

The process of problem management is shown in Figure 1 and summarised below.

Problems may be identified from many sources, usually either as a result of ongoing or recent incidents (reactive problem management) or from a retrospective analysis of historical incidents (proactive problem management).

Once detected, problems need to be logged, categorised and prioritised in a similar way to incidents. A problem will then be investigated, potentially using a variety of available approaches and tools, including chronological analysis, pain value analysis, brainstorming and 5-Whys. The purpose of the investigation is to identify the root cause of the problem which, if fixed, will eliminate incidents arising from it, so improving service to users.

If a workaround is already known or becomes available as a result of the diagnosis of the problem, then this may be communicated to the incident management process for use until the problem is fixed.

Once the problem has been successfully diagnosed and the root cause found, a known error record is raised in the known error database. If a change is needed to resolve the root cause of the problem, then a change request will be raised via the change management process.

The problem will then be resolved, and the resolution tested to ensure that it was successful. If fixed, the problem will then be closed and, if the severity of the problem justifies it, a major problem review will be carried out to identify lessons learned and any additional improvement actions.

Figure 1: Problem management process

2.2 Process triggers

The problem management process is initiated as a result of one or more of the following triggers:

• As a reaction to one or more incidents with similar symptoms occurring for which the cause is not currently known. This may be recognised by: o The service desk o Second line o Third line o Suppliers o Customers o Users o Other source or stakeholder • From information provided by the service transition stage regarding problems that have not been resolved prior to live running e.g. bugs in software or issues with configuration items • As a result of a proactive analysis of previous incidents or message logs carried out with the intention of identifying common factors and trends worth investigating

2.3 Process inputs

The process of problem management requires a number of inputs in order to be able to function effectively. These may not always be available but will ideally be:

• Details of incidents related to the problem, including o Number of incidents o Dates and times of incidents o Categorisations o Impacts o Symptoms o Actions carried out so far with results • Configuration Management System (CMS) records for relevant CIs • Technical and business input to investigation and diagnosis sessions such as brainstorming • Details of completion of requested changes from the change management process • Feedback from incident management, users and other parties regarding whether the problem resolution has been successful • Information from internal development teams and external suppliers regarding software and hardware problems that are known about but not yet fixed in the version in use within the organization

2.4 Process activities

The individual process activities at each step are detailed as follows.

2.4.1 Problem detection

In order to make the problem management function as effective as it can be it is important that problems are identified as early as possible.

Problems may be identified from any source, including:

• IT team members • Suppliers • Monitoring tools • Customers • Users • Analysis of incident records

All of the above will be encouraged to provide feedback to the problem management team about potential problems, including user perception of specific areas of service which may indicate an underlying problem. The problem management team will provide advice and guidance about whether an issue represents a valid problem.

Problem management will make regular contact with the business in order to get team members to put forward potential problems for investigation. Often the user is aware of things that are not right but perseveres with them because that is the way it has always been. The problem manager will make efforts to get the business perspective on what they see as the main IT-related problems.

Initially the problem manager will contact the Business Relationship Managers (BRMs) to create a first pass list and ask them to pass on the problem identification concept to those areas of the business they are in regular contact with. The problem manager will then liaise with the BRMs on an ongoing basis to log, assess, prioritise and investigate those problems that are brought forward.

In addition, on a quarterly basis, an analysis of logged incidents will be performed by the problem manager in order to identify areas in which possible problems exist.

2.4.2 Problem logging

It is important that, once identified, problems are recorded so that effort can be allocated to resolving them.

Upon a problem being recognised, a problem record will be created by the problem management team within the service desk system and populated with the references of the related incidents and the details of the symptoms of the problem, including:

• The business impact of the problem (quantified where possible) • Users and user groups affected • Any relevant information about the timing of the problem • Possible causes identified so far

2.4.3 Problem categorisation

Three levels of categorisation will be used for problems. These will the same as used for incidents so that a degree of cross-referencing can be performed. Category hierarchies will be available within the service desk system and will be reviewed on a regular basis as part of process improvement activities. Changes to the categories will be managed carefully so that the implications to SLA reporting are understood and catered for accordingly.

The process manager will review on a regular basis the use of categories in logging problems to ensure that they are used consistently by all parties.

2.4.4 Problem prioritisation

The priority of a problem will determine the order in which it is addressed by problem managers and subsequent teams involved in its investigation. This will be based on a combination of two factors:

• Impact: A measure of the effect of a problem on business processes • Urgency: A measure of how quickly the business needs the problem to be fixed

The priority should consider the benefits that will be achieved if we manage to resolve it (obviously not all problems will be resolvable). These benefits may take a number of forms but the main questions to be asked will be:

• How much will business disruption be reduced? (e.g. no. of man-hours p.a.) • What effect will this have on our customers? • How many incidents will we prevent p.a.? • How much time will be saved in the IT team? • What direct costs will we avoid? • What effect will solving this problem have on staff morale?

These questions will allow a benefit profile to be created for the problem which will indicate how much effort it makes sense to put in to get it solved.

Both impact and urgency will be assessed on a scale of high, medium and low. The priority of a problem will then be calculated based on the rating of its urgency and impact as follows:

IMPACT/URGENCY HIGH

High

Medium

Low

Table 1: Determination of priority 1

2

3

MEDIUM

2

3

4

LOW

3

4

5

The priority of a problem will be calculated automatically by the service desk system based on the above rules.

The definitions of each priority level are as follows:

PRIORITY TITLE DESCRIPTION

1 Critical Significant delay or disruption to the business until the problem is fixed

2 High Significant delay or disruption to parts of the business until the problem is fixed

3 Medium Localised delay or disruption affecting one or more users

4 Low Localised inconvenience affecting single user

5 Planning Very minor inconvenience or non-urgent problem

Table 2: Priority definitions

There may be circumstances where a problem affecting a single user has a significant business impact, particularly if the user is a member of the senior management team or a high-value financial transaction is involved. The priority should therefore be set in consultation with the user.

2.4.5 Problem investigation and diagnosis

Once a problem has been logged, all activities performed with respect to that problem should be recorded as actions in the problem record e.g. adding notes, referring to supplier.

Where appropriate, one or more of the following techniques will be used by the problem management team to define the problem and its possible causes in more detail:

• Chronological Analysis • Pain Value Analysis • Kepner and Tregoe • Brainstorming • Ishikawa Diagrams • Pareto Analysis

If the problem management team cannot resolve the problem, they may opt to escalate it further e.g. to a third line team or an external supplier. In this case the problem remains with the problem management team and it is the problem team member’s responsibility to ensure that the problem is updated on a regular basis based on feedback from the support team or external supplier.

2.4.6 Workaround

Any workarounds found which reduce or eliminate the symptoms of the problem temporarily should be recorded in the problem record and made available to the service desk.

Any instances that make use of a workaround should still be recorded as incidents and linked to the outstanding problem record. This gives a continuing indication of the frequency of the problem.

2.4.7 Raise known error record if required

Once investigations have been completed and cause of the problem is diagnosed (or before this point if useful), the status of the problem will be moved to that of “known error”. This indicates that the cause is known but the problem is not yet “fixed”.

A knowledgebase (Known Error Database) will be maintained within the service desk system into which known errors will be placed.

2.4.8 Change Request

Where a change to the live environment is required in order to fix the problem, a change request must be raised in accordance with change management procedures. The reference numbers of such changes must be recorded in the problem record and the problem reference listed in each change record.

2.4.9 Problem resolution and closure

Once the problem has been diagnosed and resolved, it may be closed. In some circumstances it may be decided to close the problem without it being resolved e.g. if the cost of resolving it is prohibitive or the service involved is about to be replaced or retired. In this case the reasons should be documented in the problem record.

Related incidents that remain open and are resolved as part of the resolution of the problem should also be closed.

2.4.10 Major problem review

In the case of major problems which have had a significant impact upon service to users, a problem review will be carried out by the problem manager to identify lessons learned. The report produced will be made available to interested parties and any recommendations input to the service improvement plan.

2.5 Process outputs

The outputs of the problem management process will be the following:

• Closed problems • Complete and accurate problem records • Feedback from customers and users regarding levels of satisfaction • Communication and feedback to other service management processes such as availability management, capacity management and change management • Reports to management regarding problem volumes, impacts, resolution success rates and process effectiveness

2.6 Problem management tools

There are a number of key software tools that underpin an effective problem management process. These are subject to change as requirements and technology are updated and so specific systems are not described here. However, the main types of tools that play a significant part in the process within [Organization Name] are as follows.

2.6.1 Service desk system

The service desk system provides the workflow engine and database to implement the core activities within problem management. These include:

• Problem logging • Routing and assignment of problems to teams and individuals • Recording of actions against problems • Updating of problem status from open through to closed • Assessment of impact and urgency and auto-calculation of priority • Email communication with users from within problem records • Problem categorisation to multiple levels • Reporting • Knowledgebase of past incidents with search capability • Known error database

The service desk system is integrated with the systems that support various other processes, including incident, change and configuration management.

2.6.2 Problem analysis and investigation tools

There are various techniques that may be used during the different stages of the investigation of a problem. Some of these, such as Pareto Analysis and Ishikawa Diagrams, may be supported by tools implemented using spreadsheets and mapping software.

2.6.3 Email

The email system is key to communication between the problem management team and other involved groups such as users and suppliers.

2.6.4 Configuration management system

The CMS provides real-time information about the hardware and software within the IT environment and allows problem management to view any changes that have been implemented on key components that are under consideration with regard to a problem. It allows the installed software and its versions to be viewed without the need to access the user’s computer remotely as well as helping problem management understand the relationships between service components.

2.7 Communication and training

There are various forms of communication that must take place for the problem management process to be effective. These are described below.

2.7.1 Communication with users

It is likely that many of the incidents that give rise to the identification of a problem are reported by users. If such incidents are not able to be closed via the use of a workaround then it will be appropriate to keep these users informed about the progress of the investigation of the problem. In the event that such incidents can be closed but reoccur on a regular basis then users will still want to be kept informed about when the underlying problem will be fixed, and the frequent incidents can be expected to cease.

Emails that are exchanged with the user should be incorporated into the request record so that a full audit trail of all communication is kept and is available to whoever is working on the problem.

It may be appropriate to invite selected users to sessions organized to investigate problems via the various techniques available such as brainstorming. Users who have first-hand knowledge of the symptoms and circumstances of a problem can provide valuable insight into its causes and may speed up its resolution.

2.7.2 Communication with customers

Even where there is no formal SLA associated with the resolution of problems, customers should be kept informed about the progress of high priority problems affecting their business area, including what is being done to resolve them and the resources dedicated to their investigation.

2.7.3 Communication with IT teams

Problem management needs the support of technical specialists to identify and resolve sometimes complex problems for the benefit of the business and often the IT team itself. The problem manager will foster close relationships with key teams within the IT organization so that the benefits of effective problem management are understood and demonstrated.

IT specialists will be involved in investigative sessions and are likely to be key contributors to the use of techniques such as chronological analysis and fault isolation.

2.7.4 Communication with suppliers

Often the input of suppliers will be critical to diagnose, test and resolve difficult problems. Their knowledge of the products and services they supply will usually exceed that available in-house and sometimes access to the developers of products may be needed to determine a resolution.

The internal supplier manager for the third party involved should be kept informed of the ongoing communication between problem management and supplier staff and may be useful in securing additional resource to speed up investigations.

2.7.5 Process performance

It is important that the performance of the problem management process is monitored and reported upon on a regular basis in order to assess whether the process is operating as expected. The content of performance reports is set out in section 6 of this document, but it is vital that the reports are not only produced but are also communicated to the appropriate audience.

This will include the customers of the IT service and the management of IT concerning resource utilisation and allocation. Depending on the health of the process it may be appropriate to hold regular meetings with customers and IT management to discuss the performance and agree any actions to improve it.

2.7.6 Communication related to changes

The problem management process manager must have visibility of the change management schedule and ideally will be briefed on any changes with the potential to affect ongoing problems. This may be a regular meeting or carried out on an ad-hoc basis according to the frequency of occurrence of such changes.

Problem management will also communicate with change management as part of the logging of changes to resolve problems and the review of these after the event.

2.7.7 Training for problem management

In addition to a well-defined process and appropriate software tools it is essential that the people aspects of problem management are adequately addressed. The process requires that training be provided to all participants in order that it runs as smoothly as possible.

The main areas in which training will be required for problem management are as follows.

• The problem management process itself, including the activities, roles and responsibilities involved • Problem management software tools such as the service desk system and configuration management system • Specific problem investigation techniques such as Kepner-Tregoe, 5-Whys and

Affinity Mapping

• Soft skills such as customer service, dealing with difficult conversations and avoiding technical jargon • The basics of the technology and how it is implemented within [Organization Name] • The business, its structure, locations, priorities and people

In addition, training should be provided to the user population regarding how to identify and report a problem, including:

• The difference between an incident, a service request, a problem and a change proposal and how they are handled • How to report a problem via the various means available • What may be expected of them as part of problem investigation

This training may be provided via short workshops and supplemented by on demand resources such as videos and user guides.

This article is from: