Adding/Editing/Deleting Airflow Schedulers
This topic explains how to add, configure, view, edit and delete a scheduler for the following cloud providers:
-
Apache Airflow
-
Google Cloud Composer
-
Amazon MWAA
The Airflow scheduler contains the parameters (endpoint, login data and so on) that AAI requires to authenticate on and connect to the cloud environment and start collecting definitional and run data.
You can add Airflow schedulers before installing the Airflow Connector. However, you will not be able to use them until the Connector is up and running. For more information, see Connectors for Universal Schedulers and Setting Up the Airflow Connector.
To configure an Airflow scheduler object you will need the cloud provider authentication data (credentials, endpoints, tokens and so forth) that enable the login to the target cloud solution. You can get this data from the team in your organization that is responsible for maintaining the target cloud solution.
Adding an Airflow Scheduler
Configuring an Airflow scheduler involves defining the following:
-
Scheduler-specific parameters: Name, type, connector and time zone.
-
The endpoint that identifies the cloud provider where Airflow is installed and accessible.
-
Authentication data to connect to the selected cloud provider.
-
Intervals, periods and times that AAI will apply to retrieve data.
To Add an Airflow Scheduler
-
Go to System and click the Add Scheduler button.
-
Enter a descriptive Name that the people who will work with the data from this scheduler can easily recognize. The name must be unique in your system.
-
In Scheduler Type, select Airflow. The dialog to configure the scheduler is displayed.
-
Specify the following parameters:
-
Connector
If there is an Airflow connector already registered, it is listed in the Connector dropdown list.
-
Default Time Zone
Time zone in which the Airflow instance is located. For example, if an AAI installation in New York monitors an Airflow system in Vienna, the default time zone is Vienna's.
-
Provider
Cloud provider that runs the DAGs that you want to monitor. The Airflow scheduler can collect data from the following providers:
-
Apache Airflow
-
Google Cloud Composer
-
Amazon MWAA
-
-
Endpoint
URL that identifies the cloud environment (Apache Airflow, Google Cloud Composer ) that you want AAI to collect the definitional and run data from. For example:
http://<hostname>:<port>/api/v1
-
Authentication Type
Depending on the type of cloud provider that you want to extract the data from, you have different authentication types:
For Apache Airflow
Specify the method with which you authenticate to Apache Airflow.
-
Basic
Enter the Username and Password of the user with privileges to access the Apache Airflow environment.
-
OAuth2
Specify the following:
-
OAuth2 Endpoint
URL that clients call to request the token.
-
Client ID
Identifier as defined in the OAuth2 system.
-
Client Secret
Encrypted secret value as defined in the OAuth2 system
-
Scope
Range of applications that the token will grant access to.
-
For the Google Cloud Composer
Specify the method with which you authenticate to the Google Cloud Composer environment.
-
Service Account Key
Path to the JSON file that contains the authentication information. Click the Browse button to search for it and upload it.
When creating a service account in the Google Cloud Composer, a public key can be created which provides the JSON file. Contact the team in your organization that is responsible for maintaining the Google Cloud Composer environment as they will be able to provide you with the data that you need.
For Amazon MWAA
Specify the method with which you authenticate to the AWS environment.
-
Secret Access Key
Private unique identifier of user accounts that is used to sign requests to AWS services. It is displayed to the user only during its creation. It consists of the following:
-
Region
Region in the AWS account where Airflow resides. For example:
us-east-1
-
Authentication Endpoint
URL that identifies the environment where the Airflow instance resides. It contains the environment name and the AWS region. The Authentication Endpoint is necessary to authenticate to the environment and to create the cli token .
For example:
https://env.airflow.us-east-1.amazonaws.com/clitoken/MyAirflowEnvironment
-
Access Key
-
Secret Access Key
-
-
AWS Credentials File Path
-
Region
Region in the AWS account where Airflow resides. For example:
us-east-1
-
Authentication Endpoint
URL that identifies the environment where the Airflow instance resides. It contains the environment name and the AWS region. The Authentication Endpoint is necessary to authenticate to the environment and to create the cli token .
For example:
https://env.airflow.us-east-1.amazonaws.com/clitoken/MyAirflowEnvironment
-
Profile Name
AWS profiles store multiple AWS credentials. Enter the name of the profile that contains the credentials for your AWS system.
-
Credentials File Path
Location of the AWS credentials file. For example:
Windows: C:\Users\user\Documents\AWS\credentials
UNIX: /home/user/aws/credentials
-
-
EC2 Profile Instance
Allows you to connect to an EC2 VM within an AWS cloud application. To use this option, you must have an EC2 instance profile. For instructions on how to set it up, please refer to the official AWS documentation.
-
Region
Region in the AWS account where Airflow resides. For example:
us-east-1
-
Authentication Endpoint
URL that identifies the environment where the Airflow instance resides. It contains the environment name and the AWS region. The Authentication Endpoint is necessary to authenticate to the environment and to create the cli token .
For example:
https://env.airflow.us-east-1.amazonaws.com/clitoken/MyAirflowEnvironment
-
Profile Instance Name
Name of the profile available on the VM.
-
-
External Provider
-
Region
Region in the AWS account where Airflow resides. For example:
us-east-1
-
Authentication Endpoint
URL that identifies the environment where the Airflow instance resides. It contains the environment name and the AWS region. The Authentication Endpoint is necessary to authenticate to the environment and to create the cli token .
For example:
https://env.airflow.us-east-1.amazonaws.com/clitoken/MyAirflowEnvironment
-
Tenant ID
Identifies of the AWS tenant.
-
Authentication URL
URL that identifies the network address of the external authentication provider used to secure the application.
-
SAML Username
Username used for SAML authentication
-
SAML Password
Password for the user used for SAML authentication.
-
Principal ARN
Amazon Resource Name (ARN) of the account's principal.
-
Role ARN
Amazon Resource Name (ARN) of the role to be assumed by the user.
-
No matter which Authentication Type that you have selected, if you are using a proxy for the connection, specify the following:
-
Proxy Host Name
Host name or IP address of the proxy server to which you want to connect.
-
Port
Port used by the proxy server.
-
Proxy Username
User name used to authenticate the proxy server.
-
Proxy Password
AES encrypted password that is associated with the proxy user name.
-
-
Job Definition Refresh Times
Times at which AAI will refresh the job definitions. You can select multiple times.
Note:When setting this parameter, take into account that refreshing the job definitions can be a time consuming task. The job definitions are read from the underlying workload automation engine and compared to the existing ones in AAI. Choose the refreshing times accordingly.
Typically, in a production environment, job definitions are relatively static so once or twice a day is sufficient.
-
Debug Log Airflow REST API Payloads
Activate this option to write the REST requests and responses from the Airflow environment to the log file in JSON format.
-
Projected Start Time Periods
How far forward the connector should look for planned start times.
-
Event Polling Interval
Specific event polling interval for the scheduler.
-
-
Click Save Changes. The new scheduler is added to the list of already connected systems. Initially, its status is In Progress until the scheduler is completely connected to AAI.
Viewing the Scheduler
Click an existing scheduler in the list. The dialog that is displayed provides the scheduler configuration data as you entered when adding it. For details about each field, see the Adding a Scheduler section in the documentation.
Updating Definitional Data
When you define a new scheduler, you specify the times at which the job definitions should be refreshed in the Job Definition Refresh Times field. You can also reload the job definitions manually at any time. Do the following:
-
In the Schedulers list click the name of the scheduler.
-
On the resulting dialog, click the Edit button to make the fields editable.
-
In the Monitoring Details tab click the Update Now button that is under Job Definitions Updated.
The last time that this data has been updated is displayed between the label and the button. This gives accurate information about the age of the data that AAI is surfacing.
Editing the Scheduler
-
In the Schedulers list click the name of the scheduler.
-
On the resulting dialog, click the Edit button to make the fields editable.
-
This dialog provides the scheduler configuration data in two tabs, Monitoring Details and Connection Configuration, as you entered them when you added it. Makes your changes. For details about each field, see Adding a Scheduler .
Deleting a Scheduler
-
In the Schedulers list select the scheduler that you want to delete.
-
Click Delete on the dialog that is displayed.
See also: