Google Cloud Dataproc Jobs: Submit Job to Cluster

{"URL":["/*.*/awa/pa_view_pa_view_SUBMIT_JOB_dataproc"],"heroDescriptionIdentifier":"ice_hero_Dataproc_Submit","customCards":[{"id":"ice_specific_Dataproc_Submit","title":"Submit Job to Cluster Job Parameters","type":"customize","url":"https://docs.automic.com/documentation/webhelp/english/ALL/components/IG_GCP_DATAPROC/*.*/Agent%20Guide/Content/GCP_Dataproc/GCP_Dataproc_SubmitJob.htm","languages":["en-us"]},{"id":"ice_RA_Integration_Report","title":"Defining RA / Integration Reports","type":"customize","url":"https://docs.automic.com/documentation/webhelp/english/ALL/components/IG_GCP_DATAPROC/*.*/Agent%20Guide/Content/GCP_Dataproc/GCP_Dataproc_Jobs_RA_Properties.htm","languages":["en-us"]},{"id":"ice_script_Dataproc_Job","title":"Setting Job Parameters through Scripts","type":"customize","url":"https://docs.automic.com/documentation/webhelp/english/ALL/components/IG_GCP_DATAPROC/*.*/Agent%20Guide/Content/GCP_Dataproc/GCP_Dataproc_Script.htm","languages":["en-us"]},{"id":"ice_related_information_Dataproc_Submit","title":"Related Information","type":"customize","url":"https://docs.automic.com/documentation/webhelp/english/ALL/components/IG_GCP_DATAPROC/*.*/Agent%20Guide/Content/GCP_Dataproc/GCP_Dataproc_SubmitJob.htm","languages":["en-us"]}]}

Automic Automation Submit Job to Cluster jobs allow you to submit Spark jobs to a cluster on your Dataproc environment from Automic Automation.

This page includes the following:

Defining Google Cloud Dataproc Submit Job Properties

On the Submit Job page, you define the parameters relevant to submit the job to the cluster on Dataproc:

  • Connection

    Select the Dataproc Connection object containing the relevant information to connect to the application.

    To search for a Connection object, start typing its name to limit the list of the objects that match your input.

  • Project ID

    Enter the ID of the GCP project containing the relevant transfer configuration.

  • Location

    Enter the location (region) in which your data is stored and processed. For example: us-west1

  • Cluster Name

    Allows you to type in or select the name of the cluster that you want to submit.

    Alternatively, you can click the browse button to the right of the field to open a picker dialog and select the relevant type of operation.

    Tip:

    Submit the job to a running cluster as otherwise, your job will fail.

  • Operation Type

    Enter the type of the operation you want to run:

    • Submit

      This submits the job to the cluster directly which is especially recommended for short-running jobs.

    • SubmitAsOperation

      This approach assigns an operation ID to the job, which is recommended for managing long-running jobs on the cluster as it makes it easier to track the job if required.

  • Parameters

    (Optional) Allows you to pass arguments created at runtime to the cluster in JSON format.

    Select one of the options available:

    • JSON

      Use the JSON field to enter the JSON payload definition.

      Important!

      There are many options available to define the JSON payload. For more information and examples of the JSON definition, see Defining the JSON.

    • JSON File Path

      Use the JSON File Path field to define the path to the JSON file containing the attributes that you want to pass to the application. Make sure that the file is available on the Agent machine (host).

    Note:

    The Pre-Process page allows you to define the settings of the Jobs using script statements. These statements are processed before the Dataproc Job is executed, see Google Cloud Dataproc Jobs: Setting Job Properties Through Scripts.

Defining the JSON

This section gives you examples of how you could define the JSON field when defining a Submit Job. You have different options available.

Simple JSON Definition

The first option to define the JSON field is a simple payload definition. To do so, make sure you define the parameters required to define the cluster, such as name, operation type, location, and so on.

Using Variables

You can also use variables in the payload definition.

Example

In the JSON field, enter the following:

&SUBMITPARA#

If the variable is not defined yet, you must define it now. You do it on the Variables page of the Submit Job definition:

(Click to expand)

 

When you execute the Job, the variables will be replaced with the value you have just defined. This is visible in the Agent log (PLOG), see Monitoring Dataproc Jobs.

Google Cloud Dataproc Submit Job in a Workflow

You can also use the JSON field if you want to include a Submit Job in a Workflow and you want to use Automation Engine variables in it.

Example

In the Workflow, a Script object (SCRI) with the variable definition relevant for the cluster parameters and the Location ID precedes your Submit Job

(Click to expand)

Screenshot of the script that precedes a workflow.

In the Submit Job definition, you include those variables:

(Click to expand)

Screenshot of the definition of the variables in a workflow.

When the Job is executed, the variables will be replaced with the value you have just defined. This is visible in the Agent (PLOG) report, see Monitoring Dataproc Jobs.

See also: