:

 

ITIL Incident Management

Speed Up Ticket Resolution with Smart Automation

IT incident management is one of the help desk's fundamental processes. In this guide, you will learn about the basics of incident management, its components, the roles and responsibilities involved, and how incident management works with other components of the service desk. Reduce outages, improve agent productivity, meet SLAs and manage the complete life cycle of IT tickets. Automate ticket workflows to let your IT technicians focus on other important tasks.

 

 

- Understanding ITIL incident management concepts, workflows, and best practices

 

What is an IT incident?

An IT incident is any disruption to an organization's IT services that affects anything from a single user or the entire business . In short, an incident is anything that interrupts business continuity.

What is IT incident management?

Incident management is the process of managing IT service disruptions and restoring services within agreed service level agreements (SLAs).

The scope of incident management starts with an end user reporting an issue and ends with a service desk team member resolving that issue. 

      ITIL-Incident-Management-Lifecycle-Diagram

 

The Stages in Incident Management 

With proper incident management in place, collecting information about incidents is streamlined and less chaotic without having emails fly back and forth for the purpose. Service desk teams can publish forms in t user  self-service portal to ensure that all relevant information is collected right at the time of ticket creation. 

The next stage in incident management is incident categorization and prioritization. This not only helps sort incoming tickets but also ensures that the tickets are routed to the technicians, most qualified to work on the issue.  Incident categorization also helps the service desk system apply the most appropriate SLAs to incidents and communicate those priorities to end users. Once an incident is categorized and prioritized, technicians can diagnose the incident and provide the end user with a resolution.

Incident management process when enabled with the relevant automation allows service desk teams to keep an eye on SLA compliance, and sends notifications to technicians when they are approaching an SLA violation; technicians also have the option to escalate SLA violations by configuring automated escalations , as applicable to the incident. After diagnosing the issue, the technician offers the end user a resolution, which the end user can validate. This multi-step process ensures that any IT issue affecting business continuity is resolved as  soon as possible.

 

How to classify IT incidents

Incidents in an IT environment can be categorized in several different ways. Some factors that influence incident categorization include the urgency of the incident and the severity of its impact on users or the business in general. Classifying and categorizing IT incidents helps identify and route incidents to the right technician, saving time and effort. For example, incidents can be classified as major or minor incidents based on their impact on the business and their urgency. Typically major incidents are the ones that affect business-critical services, thus affecting the entire organization, and need immediate resolutions. Minor incidents usually impact a single user or a department, and might have a documented resolution in place already.

 

What happens when you don't have IT incident management in place?

Incident management covers every aspect of an incident across its life cycle. It speeds up the resolution process and makes ticket management transparent. Without incident management, handling tickets can be a hassle. Some of the key problems that can arise include:

  • Lack of transparency on ticket status and expected time-lines for end users.

  • No proper record of past incidents.

  • Inability to document solutions for repeat or familiar issues.

  • Higher risk of business outages, particularly with major incidents.

  • Stretched resolution times

  • Lack of reporting abilities.

  • Decreased customer satisfaction

Who uses IT incident management?

Incident management practices are widely used by the IT service desk teams. Service desks are usually the single point of contact for end users to report issues to IT management teams.

 

The IT incident management lifecycle

The incident management process can be summarized as follows:

     Incident management life cycle

                          Incident management life cycle                        

 

    • Step 1 : Incident logging.
    • Step 2 : Incident categorization.
    • Step 3 : Incident prioritization.
    • Step 4 : Incident assignment. 
    • Step 5 : Task creation and management.
    • Step 6 : SLA management and escalation.
    • Step 7 : Incident resolution.
    • Step 8 : Incident closure.

    These processes may be simple or complex based on the type of incident; they also may include several workflows and tasks in addition to the basic process described above.

    • Incident logging

      An incident can be logged through phone calls, emails, SMS, web forms published on the self-service portal or via live chat messages. 

    • Incident categorization

      Incidents can be categorized and sub-categorized based on the area of IT or business that the incident causes a disruption in like network, hardware etc.

    • Incident prioritization

      The priority of an incident can be determined as a function of its impact and urgency using  a priority matrix. The impact of an incident denotes the degree of damage the issue will cause to the user or business. The urgency of an incident indicates the time within which the incident should be resolved. Based on the priority, incidents can be categorized as:

      • Critical 
      • High
      • Medium
      • Low
    • Incident routing and assignment

      Once the incident is categorized and prioritized, it gets automatically routed to a technician with the relevant expertise. 

    • Creating and managing tasks

      Based on the complexity of the incident, it can broken down into sub-activities or tasks. Tasks are typically created when an incident resolution requires the contribution of multiple technicians from various departments.

    • SLA management and escalation

      While the incident is being processed, the technician needs to ensure the SLA isn't breached. An SLA is the acceptable time within which an incident needs response (response SLA) or resolution (resolution SLA). SLAs can be assigned to incidents based on their parameters like category, requester, impact, urgency etc. In cases where an SLA is about to be breached or has already been breached, the incident can be escalated functionally or hierarchically to ensure that it is resolved at the earliest.

    • Incident resolution

      An incident is considered resolved when the technician has come up with a temporary workaround or a permanent solution for the issue.

    • Incident closure

      An incident can be closed once the issue is resolved and the user acknowledges the resolution and is satisfied with it.

Post-incident review

After an incident has been closed, it's good practice to document all the takeaways from that incident. This helps better prepare teams for future incidents and creates a more efficient incident management process. The post-incident review process can be broken down into various aspects, as shown below, and is particularly useful for major incidents.

 

Internal evaluation

  • Incident identification

    • Who detected the incident and how?

    • How soon was the incident detected after it occurred?

    • Could the incident have been identified earlier?

    • Could any tools or technologies have aided in the prompt or preemptive detection of the incident?

  • Information flow and communication:

    • How quickly were the stakeholders informed about the incident?

    • What channel was used for relaying notifications?

    • Were all the relevant stakeholders promptly updated with the latest information?

    • How easy was it to communicate with the end user(s) to gather information and keep them informed on the status of the ticket?

  • Structure

    • How was the incident response team initially structured?

    • Was this structure adhered to throughout the incident management life cycle? If not, why? What changes had to be made to the structure?

    • Can the incident handling team be organized in a better way? If so, how?

  • Resource utilization

    • What resources were employed to handle the incident?

    • Were those resources used to their optimal capacity?

    • How quickly were resources mobilized to handle the incident?

    • Could resource utilization be improved in the future?

  • Process

    • How closely was the defined incident management process followed?

    • Were there any deviations in the incident management workflow and process?

    • Were the incident SLAs honored? If not, which SLAs were breached? Why?

    • Was there adequate monitoring of the process being followed for handling the incident?

    • Could the process be improved to make it more efficient? If yes, how?

  • Reporting

    • Were reports generated to analyze how the incident was handled?

    • What parameters were included in the reports?

    • Which parts of the incident life cycle were analyzed?

    • Is there any room for improvement? If so, how can it be achieved?

External evaluation - End User surveys

Apart from the above factors, some end-user facing factors should also be evaluated. For this purpose, a post-closure survey is conducted to collect feedback from the end users affected by the incident. This survey should be used to gain insight in some key areas, like:

  • How easy or difficult was it for the end user to report an incident?

  • Was the first response from the IT team swift and prompt?

  • Was the incident resolved in a timely manner?

  •  How satisfied is the end user with the resolution?



      ITIL-SLA

     

The roles and responsibilities involved in IT incident management

Although each organization can have their own custom roles and responsibilities, below are some of the most common IT incident management roles.

  • End user / user / requester

    This is the stakeholder who usually experiences a disruption in service and raises an incident ticket to initiate the process of incident management.

  • Tier 1 service desk

    This is the first point of contact for the requester when they want to raise a request or incident ticket. The Tier 1 service desk usually consists of technicians who have a working knowledge of the most common issues that might occur in an IT environment, including password resets and Wi-Fi problems.

  • Tier 2 service desk

    This service desk is made up of technicians with advanced knowledge of incident management. They usually receive more complex requests from end users; they also receive requests in the form of escalations from Tier 1.

  • Tier 3 (and above) service desk

    This level is usually comprised of specialist technicians who have advanced knowledge of particular domains in the IT infrastructure. For example, technicians for hardware maintenance and server support specialize in very specific fields.

  • Incident manager

    This stakeholder plays a key role in the process of incident management by monitoring how effective the process is, recommending improvements, and ensuring the process is followed, among other responsibilities.

  • Process owner

    This stakeholder owns the process followed for managing incidents. They also analyze, modify, and improve the process to ensure it best serves the interest of the organization.

Each role has unique responsibilities, as shown below:

  • End user / user / requester:

    • Contact the service desk to raise a new incident request.

    • Follow up on an existing request.

    • Clearly communicate all the required information to technicians.

    • Acknowledge the restoration of service and completion of the ticket.

    • Respond to follow-up surveys after ticket resolution completing the feedback loop.

  • Tier 1 help desk:

    • Log all incoming incident requests with appropriate parameters like category, urgency, and priority.

    • Assign tickets to technicians.

    • Analyze and resolve an incident to restore service.

    • Escalate unresolved incidents to the Tier 2 service desk.

    • Gather all required information from the requester and send them regular updates on the status of their request.

    • Act as a point of contact for requester, and, if needed, coordinate between the Tier 2 support desk and requester.

    • Verify the resolution with the end user and collect feedback.

  • Tier 2&3 service desk:

    • Carry out incident diagnosis.

    • Document the steps followed to resolve the incident and submit knowledge base articles.

    • Identify when an incident is a problem and convert the incident ticket to a problem ticket.

    • If the incident is resolved, confirm the resolution with the end user.

    • If the incident is unresolved, escalate it to the Tier 3 service desk.

    • If unresolved, escalate the incident to the IT problem management team for identifying the underlying issue or external vendors as applicable.

    • Provide subject matter expertise.

  • Incident manager:

    • Serve as the point of contact for all major incidents.

    • Plan and facilitate all the activities involved in the incident management process.

    • Ensure that the correct process is followed for all tickets and correct any deviations.

    • Coordinate and communicate with the process owner.

    • Ensure that SLAs are complied with.

    • Identify the incidents that need to be reviewed and carry out the review.

  • Process owner:

    • Take accountability for the overall process of incident management.

    • Define key performance indicators (KPIs) and align them with critical success factors (CSFs).

    • Review KPIs and ensure that they meet business goals and CSFs.

    • Design, document, review, and improve processes.

    • Establish continuous service improvement (CSI) wherein the procedures, policies, roles, technology, and other aspects of the incident management process are reviewed and improved upon.

    • Stay informed about industry best practices and incorporate them in to the incident management process.

                                                                 service desk

The key performance indicators for IT incident management

Metrics that drive important decisions are termed key performance indicators (KPIs). Below are a few KPIs for effective IT incident management. 

Average initial response time

The average time taken to respond to each incident. 

       Average resolution time           

     The average time taken to resolve an incident.

First call resolution rate

Percentage of incidents resolved in the first call.

      SLA compliance rate

The percentage of incidents resolved within an SLA.

Reopen rates  

The percentage of resolved incidents that were reopened. 

Number of repeat incidents

The number of identical incidents logged within a specific time frame.

Percentage of major incidents          

The number of major incidents compared to the total number of incidents. 

Incident backlog              

The number of incidents that are pending in the queue without a resolution.

End user satisfaction rates

The number of end users or customers who were satisfied with the IT services delivered to them. 

Cost per ticket

The average expense pertaining to each ticket. 

 

Benefits of ITIL incident management

With a proper ITIL incident management process in place, you can: 

  • Record all reported IT incidents in a central repository.

  • Automatically categorize and classify IT incidents based on parameters like priority, urgency, impact, and department.

  • Associate the appropriate SLAs with IT incident tickets.

  • Assign tickets to technicians or support groups for investigation.

  • Identify resolutions and workarounds to incidents.

  • Document resolutions in a knowledge base for future reference.

  • Create live dashboards and reports from help desk data for insights and analysis for effective handling of incidents.

Best practices for successful ITIL incident management

  1. Offer multiple modes for ticket creation including through an email, phone call, or a self-service portal.

  2. Publish business-facing, custom IT incident forms for effective information gathering.

  3. Automatically categorize and prioritize IT incidents based on ticket criteria.

  4. Associate SLAs with IT incidents based on ticket parameters like priority.

  5. If all technicians, are of the same skill levels, auto-assign tickets to technicians based on algorithms like load balancing and round robin.

  6. Associate IT asset data, IT problems, and IT changes with IT incident tickets.

  7. Ensure that incidents are closed only after providing a proper resolution by confirming with end user and applying the appropriate  closure codes.

  8. Configure a custom end-user communication process for every step in an IT incident life cycle

  9. Create, and maintain a knowledge base with appropriate solutions

  10. Provide role-based access to end users and technicians based on the complexity of the solutions.

  11. Handle major incidents by creating unique workflows.
     

Feature checklist for IT incident management software

When choosing a ticketing system or IT help desk software, there are a few features that can make or break your IT incident management. Here are some features to consider when choosing incident management software:

  • A central repository to log and track issues.

  • Automatic generation of incidents from email, chat, SMS, and more.

  • Automatic ticket routing, categorization, incident closure, and more.

  • Automatic incident prioritization based on impact and urgency. 

  • Email and SMS communication from within the application.

  • Both customizable and predefined forms and templates.

  • A priority matrix that helps define the priority of tickets based on their impact and urgency.

  • Custom scripts for integrating with external applications.

  • The option to create multiple tasks for each incident.

  • Configurable rules to automatically drive tasks and route incidents.

  • Well-established response and resolution SLA management.

  • The option to pause the SLA timer for a specific period of time.

  • Ability to link incidents to other modules including problems and changes.

  • The option to associate incidents with related problems or convert an incident to a problem or a change.

  • A self-service portal where users can log their tickets.

  • Live chat included within the help desk.

  • A calendar showing technician availability.

  • A complete history of incidents and workstations.

  • Customizable roles and incident templates.

  • Task management for IT incidents.

  • Ability to create multiple sites.

  • A customizable knowledge base that allows end users to search for possible resolutions.

  • Notifications for users and technicians.

  • Automated user satisfaction surveys that collect feedback from end users.

  • Support for integration with other IT management tools and applications.


      Incident Management Benefits        
     

- Incident management and the other service desk components

 

IT incident management and IT problem management

Incident management is a collection of policies, processes, workflows, and documentation that helps IT teams manage an incident from start to finish. The process of incident management involves identifying an incident, logging it with all the relevant information, diagnosing the issue, and restoring the service in a timely manner. The process of incident management is akin to firefighting, where the main goal is to minimize damage to the business.

On the other hand, IT problem management is the process of identifying the root cause leading to one or more incidents and then initiating actions to rectify the issue. Problem management aims to minimize the impact of the problem on the business by taking a more organized approach in the form of root cause analysis, which is used to pinpoint the underlying issue. This issue is then fixed to prevent similar incidents in the future. Ultimately, identifying underlying problems helps with incident management and proactively ensures that normal operations continue.

 

Incident management and change management

ITIL change management is the process of modifying the IT infrastructure of an organization in a standardized and systematic manner. It is a well-planned process comprised of various stages and statuses that IT changes can go through.

Typically, IT changes are initiated after the IT problem management processes to fix the identified IT problem, to replace a faulty asset that leads to repeat incidents, or as a part of the resolution to a major incident. The objective of IT incident management is to minimize IT disruptions and restore services immediately. In some cases, change implementations can lead to incidents, most of which are minor incidents caused by temporary service disruptions or service unavailability. The impact of such incidents can be minimized by proactively informing end users about the change implementation as well as anticipated incidents or service unavailability. In case of a major incident caused by a change, change management teams can immediately roll back the change to restore normalcy.

 

Incident management and asset management

Integrating IT asset management and IT incident management processes makes incident diagnosis and resolution much easier for Tier 2 and Tier 3 technicians. For example, when a user reports an issue about limited Internet connectivity, the issue could be either with the laptop or with the router the user is connected to. Having all the information about the user's laptop—including the router they're connected to along with its details and relationships—helps the technician pinpoint the cause of the incident and provide the right resolution. From an asset-management perspective, linking IT incidents with assets helps IT service desks identify and retire faulty assets that cause repeat incidents in the organization.

 

ITIL glossary for incident management

  • Incident

    An unplanned interruption to an IT service or reduction in the quality of an IT service. Failure of a configuration item, even if it has not yet affected a service, is also an incident (e.g. failure of one disk from a mirror set).

  • Incident identification

    The process of discovering an incident.

  • Incident logging

    Creating and maintaining a record of an incident in the form of a ticket.

  • Incident categorization

    Recording an incident with due diligence so that it's placed under the appropriate category.

  • Incident closure

    Closing an open incident ticket once the incident has been resolved.

  • Incident escalation rules

    A set of rules defining the hierarchy for escalating incidents, including triggers that lead to escalations. Triggers are usually based on incident severity and resolution time. 

  • Incident management

    Managing the life cycle of all incidents to restore normal service operation as quickly as possible and minimize business impact.

  • Incident management report

    A series of reports produced by the incident manager for various target groups (e.g. teams responsible for IT management, service level management, other service management processes, or incident management itself).

  • Incident manager

    The person responsible for the effective implementation of the incident management process and carrying out reporting. Also represents the first stage of escalation if an incident is not able to be resolved within the agreed service level.

  • Incident model

    Contains the predefined steps that should be taken to deal with a particular type of incident.

  • Incident monitoring

    Tracking the processing status of outstanding incidents so that counter measures may be introduced as soon as possible if service levels are likely to be breached.

  • Incident prioritization

    Assigning priorities to incidents and defining what constitutes a major Incident. 

  • Incident record

    A collection of data with all details of an incident, documenting the history of the incident from registration to closure.

  • Incident report

    A report that includes information about incidents, how they were handled, and other data that can help measure the performance of the incident management process.

  • Incident resolution

    The workaround or correction that fixes the incident and restores service to its best quality.

  • Incident status

    How far along an incident is in the incident management process. Common statuses include:

    • New : An incident that has been logged but not yet worked on. 

    • Assigned : An incident that has been received in the IT help desk and assigned to a technician.

    • In progress: An incident that has been assigned to a technician and is in the process of receiving a resolution. 

    • On hold or pending: An incident that has been temporarily put on hold.  

    • Resolved : An incident that was worked on by a technician and has received a resolution. 

    • Closed : An incident that was closed once the resolution was acknowledged by the end user.

ITIL KPIs Incident Management

Key Performance Indicator (KPI)

Definition

  • Number of repeated Incidents
  • Number of repeated Incidents, with known resolution methods

  • Incidents resolved Remotely
  • Number of Incidents resolved remotely by the Service Desk

  • (i.e.without carrying out work at user's location)

  • Number of Escalations
  • Number of escalations for Incidents not resolved in the agreed resolution time

  • Number of Incidents
  • Number of incidents registered by the Service Desk

  • grouped into categories

  • Average Initial Response Time
  • Average time taken between the time a user reports an Incident and the time that the Service Desk responds to that Incident

  • Incident Resolution Time
  • Average time for resolving an incident

  • grouped into categories

  • First Time Resolution Rate
  • Percentage of Incidents resolved at the Service Desk during the first call

  • grouped into categories

  • Resolution within SLA
  • Rate of incidents resolved during solution times agreed in SLA

  • grouped into categories

  • Incident Resolution Effort
  • Average work effort for resolving Incidents

  • grouped into categories

 

Features and Capabilities of Manage Engine Service Desk Plus System

  • No. 22, Zeytoon Building, Javaheryan St., Sattari Expy, Tehran, Iran
  • +9821 - 449 78 699
  • +9821 - 446 28 335
  • +98 - 930 584 2566
  • Info @ TaksaSystem.com

Send Message

  Mail is not sent.   Your email has been sent.
Captcha
بالا