Closed Loop Remediation

By integrating external systems with Automic Automation (Automic Automation), you as a system administrator or Workflow designer can automate remediation processes. Automated remediation (or problem resolution) is valuable for optimizing routine support tasks and incident resolutions. In these automated processes, the Automic Automation Workflow plays the central role of executing the actual remediation tasks.

In a closed loop remediation process, the Workflow has an extra step to send the final state of the remediation task to the originating system, thus closing the remediation loop.

This page includes the following:

Benefits of Automated Remediation

As part of the their daily standard operating procedures (SOP), IT support and IT administration teams have to handle a wide range of rather small and simple routine remediation tasks. These tasks span a wide range of activities. Typical examples are disk space cleanup, password resets, and operations for virtual machines (VM) like rebooting, resetting, or moving VMs, but there are many more.

Individually, the tasks take only a few minutes, but as a whole they add up to considerable overhead to the support staff.

The benefits that automated remediation of any kind brings to an organization include the following:

Overview of Automated Remediation

Automated closed loop remediation with Automic Automation involves integrating an Automic Automation Workflow that handles the actual remediation tasks, with your monitoring solution. The basic process looks like this:

Process flow showing the main parts of closed loop remediation

You always have the five main parts:

  1. The originating system, where the problem or need occurs
  2. A monitoring system
  3. A notification mechanism
  4. The remediation actions, which are executed with an Automic Automation Workflow
  5. A mechanism that reflects the final state, thus closing the remediation loop

When remediation is automated, getting feedback about the final state of the remediation actions makes the automation process truly meaningful to your organization.

Variation: Integrating Incident Management

Automated remediation can be key to a successful ITSM (IT service management) strategy. The benefits that it brings to an organization include the following:

When you integrate an incident management tool, you get the greatest benefit from automated remediation. Incident management tools include ServiceNow, JIRA, or Request Manager (RM). As the Workflow works through the remediation process, the Workflow can enrich the incident data with the latest status and remediation details all the way to closing the ticket.

For full automation, you can design the Workflow to create an incident ticket, which it assigns to itself. As the Workflow goes through its tasks, it continues to enrich the incident. Otherwise, if the incident is created outside of Automic Automation, your process includes steps to pass the details of an incident ticket onto the Workflow for enrichment.

If the Workflow is unable to solve the problem, the Workflow can automatically reassign the ticket to a person or group who can take over the incident for further analysis and resolution.

Integrating your incident management tool gives you more transparency in your automated remediation processes.

 

Other Variations

The way that you implement closed loop remediation for your specific needs depends on the systems that you work with and how their capabilities can integrate with Automic Automation. Critical junctures in the process include the following:

Notification Options

A notification has two parts:

System Authorization Options

To resolve problems on external systems, Automic Automation needs access to the necessary servers, applications, and files. Typical authorization approaches are:

Returning the Final State

In closed loop remediation processes, the remediating Workflow returns the final state of the remediation task. How this is done depends on how the process is structured.

Tracking Statistics for Analysis

Tracking statistics about your executed remediation processes is essential for continuous improvement of your remediation process. You need this data to see where you can lower MTTR for routine tasks and to more accurately predict failures for developing proactive maintenance strategies.

You can track these statistics with an incident management tool, CA Automic Analytics, another analytics engine, or a combination of tools. Whichever tool you use, it is important to collect precise, meaningful, and comparable data.

For example, you can collect data to calculate the mean time to repair and/or the mean time to recovery. Tracking the execution runtimes of the remediation Workflows is the ideal basis for calculating the mean time to repair (the time from the start of repair until return to normal operations, including testing). For the mean time to recovery, which starts when the failure is discovered, you need to know the time when the incident occurred. If monitoring happens outside of Automic Automation, your process has to pass the time of the incident occurrence onto Automic Automation for Analytics or to your the statistics gathering system.

Note: The distinction between mean time to repair or to recovery is significant for collecting comparable data for analysis. For service level agreements (SLAs) and maintenance contracts, it can be crucial.

See also: