Closed Loop Remediation
By integrating external systems with Automic Automation (Automic Automation), you as a system administrator or Workflow designer can automate remediation processes. Automated remediation (or problem resolution) is valuable for optimizing routine support tasks and incident resolutions. In these automated processes, the Automic Automation Workflow plays the central role of executing the actual remediation tasks.
In a closed loop remediation process, the Workflow has an extra step to send the final state of the remediation task to the originating system, thus closing the remediation loop.
This page includes the following:
Benefits of Automated Remediation
As part of the their daily standard operating procedures (SOP), IT support and IT administration teams have to handle a wide range of rather small and simple routine remediation tasks. These tasks span a wide range of activities. Typical examples are disk space cleanup, password resets, and operations for virtual machines (VM) like rebooting, resetting, or moving VMs, but there are many more.
Individually, the tasks take only a few minutes, but as a whole they add up to considerable overhead to the support staff.
The benefits that automated remediation of any kind brings to an organization include the following:
- Freeing up your support staff from routine tasks
- Lower mean time to repair (MTTR)
- More controlled, consistent, and sustainable problem resolution
- Transparency
- Scalability
Overview of Automated Remediation
Automated closed loop remediation with Automic Automation involves integrating an Automic Automation Workflow that handles the actual remediation tasks, with your monitoring solution. The basic process looks like this:
You always have the five main parts:
- The originating system, where the problem or need occurs
- A monitoring system
- A notification mechanism
- The remediation actions, which are executed with an Automic Automation Workflow
- A mechanism that reflects the final state, thus closing the remediation loop
When remediation is automated, getting feedback about the final state of the remediation actions makes the automation process truly meaningful to your organization.
Variation: Integrating Incident Management
Automated remediation can be key to a successful ITSM (IT service management) strategy. The benefits that it brings to an organization include the following:
- Freeing up your support staff from routine tasks
- Lower mean time to repair (MTTR)
- More controlled, consistent, and sustainable problem resolution
- Scalability
- Transparency about progress
- A framework for continuous improvement in your incident resolution, which is especially possible when data from closed loop remediation processes is collected and analyzed
When you integrate an incident management tool, you get the greatest benefit from automated remediation. Incident management tools include ServiceNow, JIRA, or Request Manager (RM). As the Workflow works through the remediation process, the Workflow can enrich the incident data with the latest status and remediation details all the way to closing the ticket.
For full automation, you can design the Workflow to create an incident ticket, which it assigns to itself. As the Workflow goes through its tasks, it continues to enrich the incident. Otherwise, if the incident is created outside of Automic Automation, your process includes steps to pass the details of an incident ticket onto the Workflow for enrichment.
If the Workflow is unable to solve the problem, the Workflow can automatically reassign the ticket to a person or group who can take over the incident for further analysis and resolution.
Integrating your incident management tool gives you more transparency in your automated remediation processes.
- In the short run, you can track the progress of individual tickets.
- In the long run, collecting comparable incident data can support analysis for process improvement opportunities for lowering incident MTTR.
Other Variations
The way that you implement closed loop remediation for your specific needs depends on the systems that you work with and how their capabilities can integrate with Automic Automation. Critical junctures in the process include the following:
- How Automic Automation is notified to start the remediation process
- How Automic Automation accesses the system that needs the remediation
- How Automic Automation returns the originating system about the final state of the remediation activities
Notification Options
A notification has two parts:
- The alarm that triggers the notification
The alarm, which contains the incident details, can be either an extended function of your monitoring system or a separate but integrated application.
- The trigger that starts the remediation Workflow
You can handle this second part with any number of mechanisms, as long as the incident details can also be passed onto Automic Automation. The most common approaches involve one or more of the following:
- REST APIs
The AE REST APIs provide endpoints to post and request data between Automic Automation and an external component
- Webhooks, if the sending system is set up for webhooks
As a web application, Automic Automation can receive custom HTTP callbacks posted by webhooks from external sources.
- The Event Engine to act on events
Event processing with the Event Engine enables real-time filtering of huge volume of events from various external sources to trigger the right actions in Automic Automation.
- REST APIs
System Authorization Options
To resolve problems on external systems, Automic Automation needs access to the necessary servers, applications, and files. Typical authorization approaches are:
- Adding the key certificate to the server of the remediation Agent in the Automation Engine
- Passing the authentication information in the REST call or webhook as part of the incident details
Returning the Final State
In closed loop remediation processes, the remediating Workflow returns the final state of the remediation task. How this is done depends on how the process is structured.
- With an incident management tool, the Workflow enriches the incident with the final status details, even closing the incident ticket if the remediation was completed.
- Otherwise, the Workflow can send a REST call back to the originating system with the final status and other details.
Tracking Statistics for Analysis
Tracking statistics about your executed remediation processes is essential for continuous improvement of your remediation process. You need this data to see where you can lower MTTR for routine tasks and to more accurately predict failures for developing proactive maintenance strategies.
You can track these statistics with an incident management tool, CA Automic Analytics, another analytics engine, or a combination of tools. Whichever tool you use, it is important to collect precise, meaningful, and comparable data.
For example, you can collect data to calculate the mean time to repair and/or the mean time to recovery. Tracking the execution runtimes of the remediation Workflows is the ideal basis for calculating the mean time to repair (the time from the start of repair until return to normal operations, including testing). For the mean time to recovery, which starts when the failure is discovered, you need to know the time when the incident occurred. If monitoring happens outside of Automic Automation, your process has to pass the time of the incident occurrence onto Automic Automation for Analytics or to your the statistics gathering system.
Note: The distinction between mean time to repair or to recovery is significant for collecting comparable data for analysis. For service level agreements (SLAs) and maintenance contracts, it can be crucial.
See also: