Troubleshooting, Root-Cause Analysis and Remediation
The Process Monitoring perspective provides the essential toolset for identifying, investigating, and resolving abnormal system behaviors. By combining detailed execution data and the Automation Assistant, you can quickly perform root-cause analysis and initiate targeted remediation or escalation. This topic outlines the core activities required to maintain operational stability, from initial failure recognition in the task list to AI-assisted diagnostics that accelerate the path to resolution.
Monitoring Processes and Recognizing Abnormal Behaviors
As soon as an object is executed, it is displayed as a task in the Tasks list, where you can check its properties (activation, start and end times, runID) and statuses. It provides key execution and status data.
-
Manual Monitoring
Use the Filter pane to isolate Aborted or ENDED_NOT_OK tasks.
-
AI-enhanced method
Use the Automation Assistant (powered by the MCP server) to find failures using natural language, for example, typing Show me all failed Jobs from the last hour.. This automatically updates the filter to display only the relevant tasks.
Investigating the Abnormal Behaviors
Once you have identified the tasks that require attention, the investigation phase allows you to transform raw execution data into actions. For this purpose, a number of tools are available, see Tools for Investigation. The Automation Assistant helps you not only identify that a failure occurred, but also understand exactly why it happened.
This hybrid approach combines the precision of structured logs alongside the speed of AI-driven summarization. The Automation Assistant drastically reduces the time required to navigate complex Workflows and large Job reports by pinpointing anomalies and explaining technical errors in plain language.
-
Automated Report Analysis
The Automation Assistant reads reports and log files across different Agents and platforms. When you ask it to analyze a failure, it scans these technical resources for specific error patterns, return codes, and system messages.
-
Cross-Run Comparison
The Automation Assistant can instantly compare the attributes and logs of a failed execution against a previous successful run to identify environmental or configuration changes.
-
Plain Language Summaries
The primary effect is the translation of complex, nested error strings into actionable summaries.
-
Example: Instead of seeing Return Code 9009, the Automation Assistant explains something along theses lines: The job failed because the executable 'xyz.sh' was not found in the target directory.
Tools for Investigation
The following tools are available for investigating failed tasks and abnormal behaviors:
-
Execution lists
Use these lists to compare different runs of a task (for example, compare a failed execution against a successful one) to discover discrepancies that may explain the failure.
Use these lists in combination with the information in the reports and the task details to get a clear picture of what has happened. Execution lists let you drill down into all aspects of a run; in case of compound tasks, such as Workflows or Schedules, you can access their child task executions. Likewise, from a child task execution you can access its parent execution.
For more information about execution data in general in this documentation, see Execution Data. For more information about object and task details, see Showing Object and Task Details.
Tip: Right-click an execution and select Analyze Execution to get assistance from Automic Automation's Gen AI capabilities. For more information, see:
-
Reports
Reports provide the technical trail for every execution. There are different types of reports, depending on the type of task. For more information about reports in general, see:
-
Monitors
Many task types have their own monitor view. For some task types, all the information you need is already contained in the list of Tasks, on the reports and in the Executions lists. Other task types require more extensive information for monitoring and they provide a dedicated monitor. For more information, see:
Escalating and Remediating
Once the root cause is identified, the final step is to restore service and prevent future recurrences. From the Tasks list, the Executions lists, and various monitors, you have access to all the functions designed to resolve issues based on your specific rights and permissions.
The Automation Assistant streamlines the transition from diagnosis to action by providing guided remediation. Instead of manually navigating through menus to find the correct recovery command, you can use the assistant to initiate fixes directly through natural language.
Manual Remediaton
Access the context menu of a failed task to perform actions such as Restart, Cancel, or Modify. These actions are context-sensitive and depend on the current status of the task. For more information, see Available Functions Depending on the Task Status.
AI-Assisted Remediation
The Automation Assistant can suggest the most appropriate next step based on its analysis of the failure. For example, if a Job failed due to a temporary network timeout, the assistant may suggest a simple restart. If a script error is detected, it can provide a direct link to open the object for editing.
Direct Execution
You can command the Automation Assistant to perform bulk operations or complex restarts, such as: Restart the parent workflow for this failed job," or "Cancel all blocked tasks in Client 100.
Whether you are performing a manual deep dive or using AI-led analysis, these tools provide the transparency needed to ensure system reliability and informed decision-making.
See also: