Google Cloud Dataproc Jobs: Submit Job to Cluster
Automic Automation Submit Job to Cluster jobs allow you to submit Spark jobs to a cluster on your Dataproc environment from Automic Automation.
This page includes the following:
Defining Google Cloud Dataproc Submit Job Properties
On the Submit Job page, you define the parameters relevant to submit the job to the cluster on Dataproc:
-
Connection
Select the Dataproc Connection object containing the relevant information to connect to the application.
To search for a Connection object, start typing its name to limit the list of the objects that match your input.
-
Project ID
Enter the ID of the GCP project containing the relevant transfer configuration.
-
Location
Enter the location (region) in which your data is stored and processed. For example: us-west1
-
Cluster Name
Allows you to type in or select the name of the cluster that you want to submit.
Alternatively, you can click the browse button to the right of the field to open a picker dialog and select the relevant type of operation.
Tip:Submit the job to a running cluster as otherwise, your job will fail.
-
Operation Type
Enter the type of the operation you want to run:
-
Submit
This submits the job to the cluster directly which is especially recommended for short-running jobs.
-
SubmitAsOperation
This approach assigns an operation ID to the job, which is recommended for managing long-running jobs on the cluster as it makes it easier to track the job if required.
-
-
Parameters
(Optional) Allows you to pass arguments created at runtime to the cluster in JSON format.
Select one of the options available:
-
JSON
Use the JSON field to enter the JSON payload definition.
Important!There are many options available to define the JSON payload. For more information and examples of the JSON definition, see Defining the JSON.
-
JSON File Path
Use the JSON File Path field to define the path to the JSON file containing the attributes that you want to pass to the application. Make sure that the file is available on the Agent machine (host).
Note:The Pre-Process page allows you to define the settings of the Jobs using script statements. These statements are processed before the Dataproc Job is executed, see Google Cloud Dataproc Jobs: Setting Job Properties Through Scripts.
-
Defining the JSON
This section gives you examples of how you could define the JSON field when defining a Submit Job. You have different options available.
Simple JSON Definition
The first option to define the JSON field is a simple payload definition. To do so, make sure you define the parameters required to define the cluster, such as name, operation type, location, and so on.
Using Variables
You can also use variables in the payload definition.
Example
In the JSON field, enter the following:
&SUBMITPARA#
If the variable is not defined yet, you must define it now. You do it on the Variables page of the Submit Job definition:
(Click to expand)
When you execute the Job, the variables will be replaced with the value you have just defined. This is visible in the Agent log (PLOG), see Monitoring Dataproc Jobs.
Google Cloud Dataproc Submit Job in a Workflow
You can also use the JSON field if you want to include a Submit Job in a Workflow and you want to use Automation Engine variables in it.
Example
In the Workflow, a Script object (SCRI) with the variable definition relevant for the cluster parameters and the Location ID precedes your Submit Job
(Click to expand)
In the Submit Job definition, you include those variables:
(Click to expand)
When the Job is executed, the variables will be replaced with the value you have just defined. This is visible in the Agent (PLOG) report, see Monitoring Dataproc Jobs.
See also: