Troubleshooting, Root-Cause Analysis and Remediation

The Process Monitoring perspective provides the essential toolset for identifying, investigating, and resolving abnormal system behaviors. By combining detailed execution data and the Automation Assistant, you can quickly perform root-cause analysis and initiate targeted remediation or escalation. This topic outlines the core activities required to maintain operational stability, from initial failure recognition in the task list to AI-assisted diagnostics that accelerate the path to resolution.

Monitoring Processes and Recognizing Abnormal Behaviors

As soon as an object is executed, it is displayed as a task in the Tasks list, where you can check its properties (activation, start and end times, runID) and statuses. It provides key execution and status data.

  • Manual Monitoring

    Use the Filter pane to isolate Aborted or ENDED_NOT_OK tasks.

  • AI-enhanced method

    Use the Automation Assistant (powered by the MCP server) to find failures using natural language, for example, typing Show me all failed Jobs from the last hour.. This automatically updates the filter to display only the relevant tasks.

Investigating the Abnormal Behaviors

Once you have identified the tasks that require attention, the investigation phase allows you to transform raw execution data into actions. For this purpose, a number of tools are available, see Tools for Investigation. The Automation Assistant helps you not only identify that a failure occurred, but also understand exactly why it happened.

This hybrid approach combines the precision of structured logs alongside the speed of AI-driven summarization. The Automation Assistant drastically reduces the time required to navigate complex Workflows and large Job reports by pinpointing anomalies and explaining technical errors in plain language.

  • Automated Report Analysis

    The Automation Assistant reads reports and log files across different Agents and platforms. When you ask it to analyze a failure, it scans these technical resources for specific error patterns, return codes, and system messages.

  • Cross-Run Comparison

    The Automation Assistant can instantly compare the attributes and logs of a failed execution against a previous successful run to identify environmental or configuration changes.

  • Plain Language Summaries

    The primary effect is the translation of complex, nested error strings into actionable summaries.

  • Example: Instead of seeing Return Code 9009, the Automation Assistant explains something along theses lines: The job failed because the executable 'xyz.sh' was not found in the target directory.

Tools for Investigation

The following tools are available for investigating failed tasks and abnormal behaviors:

Escalating and Remediating

Once the root cause is identified, the final step is to restore service and prevent future recurrences. From the Tasks list, the Executions lists, and various monitors, you have access to all the functions designed to resolve issues based on your specific rights and permissions.

The Automation Assistant streamlines the transition from diagnosis to action by providing guided remediation. Instead of manually navigating through menus to find the correct recovery command, you can use the assistant to initiate fixes directly through natural language.

Manual Remediaton

Access the context menu of a failed task to perform actions such as Restart, Cancel, or Modify. These actions are context-sensitive and depend on the current status of the task. For more information, see Available Functions Depending on the Task Status.

AI-Assisted Remediation

The Automation Assistant can suggest the most appropriate next step based on its analysis of the failure. For example, if a Job failed due to a temporary network timeout, the assistant may suggest a simple restart. If a script error is detected, it can provide a direct link to open the object for editing.

Direct Execution

You can command the Automation Assistant to perform bulk operations or complex restarts, such as: Restart the parent workflow for this failed job," or "Cancel all blocked tasks in Client 100.

Whether you are performing a manual deep dive or using AI-led analysis, these tools provide the transparency needed to ensure system reliability and informed decision-making.

See also: