Monitoring Databricks Jobs

When you execute an object in Automic Automation, the corresponding task is visible in the list of Tasks in the Process Monitoring perspective. Process Monitoring gives you the full range of possibilities to monitor, analyze, identify, and remediate problems. It displays comprehensive data on active and inactive tasks, it provides tools to filter and group tasks. In this perspective, you can modify active tasks, open their reports, and execution lists and troubleshoot.

If you want to learn more about how to monitor your jobs, please refer to the Automic Automation product documentation at Monitoring Tasks.

For more information on how to work with tasks on the Process Monitoring perspective, refer to the Automic Automation documentation at Working with Tasks.

This page includes the following:

Statuses

The Process Monitoring perspective contains two status-related columns:

  • Status

    This column shows the status of the Automic Automation Run Job or the Start or Stop Cluster Job.

  • Remote Status

    This column shows the status of the Run Job or the Start or Stop Cluster Job on Databricks.

Possible Statuses

The tasks can have the following statuses:

Status Column

  • While executing, the status in Automic Automation is Active.

  • On completion and depending on the success, ENDED_OK or ENDED_NOT_OK.

Remote Status

  • While executing, the status in Databricks is Running.

  • On completion and depending on the success, Success or Failed.

This information is written into the Report (REP) log file, see Reports.

Monitoring Databricks Job Details

  1. Go to the list of tasks in the Process Monitoring perspective.

  2. Find the Databricks task.

  3. Select the task and select the Details button.

    The Details pane opens on the right hand side and shows a summary of the execution of the selected task. The General and Job sections of the Details pane display information about the object configuration and its execution in Automic Automation.

    The Object Variables section displays the information that the Databricks system reports back to Automic Automation for the Run Jobs:

Canceling and Restarting Databricks Jobs

You can execute Run Jobs and cancel the corresponding tasks in the Process Monitoring perspective.

Canceling means that the Automic Automation task is canceled as well as the job on the Databricks workspace. If for some reason canceling the execution fails - for example, if you do not have the rights to cancel the execution - the execution remains active and that information is visible on the job report.

You can also restart a task that was canceled or that failed, thus allowing you to restart the job in your Databricks workspace. You can restart it either from scratch or starting from a failed activity. That means when you restart a job from Automic Automation, the activities that were executed successfully can be skipped so that the pipeline execution restarts from failed activities.

Example

You trigger a Job in Automic Automation which, in turn, triggers the pipeline in the Databricks environment. The job execution returns the &PIPELINERUNID#. If, for some reason, the pipeline fails on Databricks, the Job also fails in Automic Automation.

Restarting the Job in Automic Automation triggers a pipeline run in Databricks. However, you can also restart the pipeline in Databricks from a failed activity.

To do so, you need to pass the &RESTART_PIPELINERUNID# variable, which needs to include the value of the &PIPELINERUNID# variable returned by Databricks in the Object Variables section.

You can pass that value using the Post Process in the object definition, for example:

:publish &PIPELINERUNID#, RESTART_PIPELINERUNID#, "TASK"

For more information, refer to the Automic Automation documentation at Conditions, Preconditions, Postconditions.

In this case, if the object definition includes a definition for the &RESTART_PIPELINERUNID# variable, the task is restarted from the failed activity. This can be defined on the same or a subsequent job.

If there is no &RESTART_PIPELINERUNID# variable definition, a complete new run starts and the Object Variables section returns the &PIPELINERUNID# variable.

Also, if you try to restart a job after its job definition has been changed, then it will run the pipeline with the values currently available in the job definition.

Notes:
  • When executing a pipeline job from Automic, regardless of whether it is part of a new execution or a restart, the variable &RESTART_PIPELINERUNID# is always checked. If this variable exists and contains a non-empty value, the pipeline will attempt to restart from the failed activity. Therefore, this variable must be set correctly and scoped to the specific job execution to ensure it does not interfere with other pipeline executions.

  • Restarting a task is not the same as starting one. For example, when restarting, the task does not re-run the preconditions as they have been fulfilled already. For more information, see Restarting Tasks in the Automic Automation documentation.

For more information on how to work with tasks on the Process Monitoring perspective, refer to the Automic Automation documentation at Working with Tasks.

Databricks Agent Connection

If the Agent stops working or loses connection, the Job execution stops but resumes as soon as the Agent is connected again.

Reports

When jobs are executed, Automic Automation generates output files and reports that you can open from the user interface. Combined with the Execution Data, these reports provide comprehensive information about the status and the history of your executions. This information tracks all processes and allows you to control and monitor tasks, ensuring full auditing capability.

Reports are kept in the system until they are explicitly removed by a reorganization run. During a reorganization run, the Execution Data can be kept. Keeping the execution data is an advantage, because reports can consume a large amount of hard drive space. Removing the reports from your database does not mean loss of important historical data.

The following reports provide a comprehensive insight into the executions of your jobs:

  • Report (REP)

    This report provides information about the execution of the commands in the Job JCL. It contains all relevant information about the job execution and its results, no matter if it succeeds or fails.

  • Agent log (PLOG)

    This report logs all the functions that the Agent has executed for the specific Job. It contains the summary of the job configuration.

  • Directory

    This report provides the list of job output files that the job has created for this particular execution and that are available for downloading.

To Access the Reports

Do one of the following:

  • From the Process Assembly perspective

    • After executing a Job, a notification with a link to the report is displayed. Click it to open the Last Report page.

    • Right-click a job and select Monitoring > Last Report.

    In either case, the most recent report that is created for the object opens up.

  • From the Process Monitoring perspective

    • Right-click a task and select Open Report.

      The report of that particular execution opens up. Its runID is displayed in the title bar to help you identify it.

    • In the list of Executions for a particular task, right-click and select Open Report.

See also: