Universal Language Support (Unicode)

Automic Automation supports Unicode with UTF-8 , a variable multi-byte (1-4 bytes per character) encoding.

The Unicode implementation is mandatory and impacts the whole Automic Automation system. It allows the system to perform character conversion where necessary, whether to transfer files, execute Jobs, load data and so forth. There are only a few components that do not support Unicode:

  • Service Manager

  • Call APIs

  • Archive Browser Utility

  • AE DB Client Copy utility, which can only be used when both Clients are Unicode-compliant

Important!

ZDU is not supported for a first upgrade to an Automic Automation version that supports Unicode / UTF-8.

There are significant DB schema and AE message changes that cannot be supported in compatibility or rollback mode. Furthermore, the AE database UTF-8 migration process requires a downtime so that all data is migrated correctly. This step is relevant to avoid an encoding misalignment resulting in corrupted data.

This page outlines the functions implemented for Unicode support in detail:

AE and Analytics Database, AE Database Schema

Automic Automation requires you to migrate the data in your existing (source) DB to a new (destination) database configured for UTF-8 before you can start the regular upgrade process to a version that supports Unicode. There are different options you can use to migrate your database to UTF-8. For more information and detailed instructions on how to migrate your database, see Migrating the AE DB to UTF-8.

Analytics supports only PostgreSQL databases. If your Analytics database already supports UTF-8 as recommended in previous versions, there is no need to change its configuration. For more information, see Installing Analytics Manually and Upgrading the Analytics Backend/IA Agent.

The supported AE databases must be set up and configured for UTF-8. This is an overview of the relevant configuration parameters per database:

  • DB2

    Connection string parameter: UTF-8

    Database configuration setting: CODESET=UTF-8.

    Minimum database version: IBM UTF-8 as of version 7.2; we recommend version 10.5 to 11.5

    For details, see Preparing the AE Database - DB2

  • MSSQL

    Connection string parameter: AutoTranslate=NO

    Database configuration setting: Collation Latin1_General_100_CI_AS_SC_UTF8.

    Minimum database version: MSSQL Server 2019

    ODBC Driver as of version 17; we recommend minimum version 18

    For details, see Preparing the AE Database - MS SQL

    Note: Make sure that the source database version is MSSQL Server 2017 or higher before migrating it. The Migration Action Pack does not support versions lower than 2017. The destination database has to be on version 2019 or higher to support UTF-8.

  • ORACLE

    Connection string parameter: CODESET=AL32UTF8

    Database configuration setting: NLS_CHARACTERSET=AL32UTF8

    For details, see Preparing the AE Database - Oracle

  • PostgreSQL

    Connection string parameter: client_encoding=UTF-8

    Database configuration setting: ENCODING=UTF8

    For details, see Preparing the AE Database - PostgreSQL

Specific fields of the AE DB schema were extended to store multi-byte UTF-8 data and to fulfill customer requests for longer values. For example, the length of file names and paths has 1024 bytes and that of Archive Keys 1 and 2 has 512. When you load initial data for the first time into an existing AE database, its schema is also updated and, since the size of many text fields is now larger, the loading action might take longer than usual.

For more information, see the DB Schema documentation under Core Components > Automation Engine > DB Schema at https://docs.automic.com/documentation.

AE Server Processes

When the AE database and the connection strings are configured correctly to use UTF-8 encoding, the server processes read data from or write data to an AE DB check at startup. If they are not configured for UTF-8 encoding, the server processes abort and write a corresponding error message into the log file. Doing so prevents data from being corrupted or interpreted incorrectly. For more information, see Server Processes.

AE Configuration

The XML_ENCODING key of the XML Parameters of the UC_SYSTEM_SETTINGS has been redefined to represent the encoding used in previous versions. Whenever am encoding conversion is required, the encoding defined in the XML_ENCODING key is used as the source (legacy) encoding, see XML Parameters.

The system reads INI files as UTF-8 encoded; therefore, there is no need to adapt the INI files. However, if the content of an INI file cannot be read as UTF-8, it is interpreted as in previous versions. Supporting Unicode allows you, for example, to use paths using Unicode characters for the temp folder definition. For more information, see Configuration (INI) Files.

Note: A code page set in JOBS, JOBF, or script functions overrule the INI file settings.

Technical Interfaces

The AE REST API and the Java API both support UTF-8. For more information, see AE REST API - General Info and AE.ApplicationInterface.

The CallAPI does not support UTF-8. The AE handles it as a non-UTF-8 component. For example, a password's clear text value is always handled as in legacy encoding.

AE Utilities

Some utilities support UTF-8 while others do not. Here is an overview of the utilities and how they handle UTF-8 support, restrictions, and limitations in this context.

Important! The installation path and command line parameters of the utilities only support basic ASCII (basic (0-127) + extended (128-255) ASCII).

  • AE.DB Archive (ucybdbar, ucybdbarg)

    This utility exports and writes files in UTF-8 instead of ASCII. Additionally, encoding information is written to the export files to define the encoding of the data written into the files. With the help of this encoding information, you can determine if a file's content is in legacy or UTF-8 encoding for correct handling.

  • AE.DB Archive Browser (ucybarbr)

    The Archive Browser shows information about archived data records. It does not support UTF-8, and cannot display multi-byte characters nor search for them.

  • AE.DB Client Copy (ucybdbcc)

    This utility supports UTF-8 encoded data only if both Clients use the same encoding. If the Clients use different encoding, the utility aborts with an error message to prevent data corruption.

  • AE.DB Change (ucybchng)

    This utility modifies data that has been exported from the database by using the Transport Case. AE.DB Change supports UTF-8, but you have to adapt existing replacement text files (ASCII by UTF-8 content) if required.

  • AE.DB Load - Import Transport Case (ucybdbld)

    This utility supports the conversion from legacy encoding to UTF-8. When you use the AE.DB Load utility to load a Transport Case unloaded from a previous version that uses an encoding other than UTF-8, you can pass the encoding that is used to convert the data to UTF-8 through the command line.

  • AE.DB Unload - Export Transport Case (ucybdbun)

    This utility allows you to unload data from the AE database. AE.DB Unload writes data with UTF-8 encoding into the transport case file.

  • AE.DB Reorg (ucybdbre)

    No UTF-8-specific changes were made.

  • AE.DB Revision Report (ucybdbrr)

    No UTF-8-specific changes were made.

For more information, see Utilities and Utilities Reference.

Agents

Important! Automic Automation version 24 kicks-off an OS agent refresh initiative to have all OS agents use a common Java-based framework. The first v24 release includes new agents for x64 Linux and Windows. Other OS agents will be released with future v24 updates and announced in the What's New section of the documentation.

When communicating with these new Agents, the Automation Engine sends messages using UTF-8. However, the Automation Engine is downwards compatible with Agents of previous versions. In this case, the AE sends messages using the original encoding (not UTF-8) to guarantee compatibility.

When a UTF-8-encoded text that contains characters that cannot be represented in any single-byte encoding is sent to an Agent of a previous version, these characters are replaced by question marks.

Consider the following issues when working with the new Windows and x64 Linux Agents:

  • You can use UTF-8 when defining file paths and names.

  • UTF-8 is the default code page definition for job executions for these agents.

  • If you set a code page in the Agent's INI file, make sure that you use the same pattern for standard code pages and CODE objects. However, while standard code pages must be defined using angle brackets, for example, <UTF-8>, CODE objects must not include them.

For more information, see Configuration (INI) Files.

Objects

You can observe the following behavior related to Unicode/UTF-8:

Code Table Objects and Standard Character Conversion Sets

Automic Automation provides Code Table objects and standard character conversion sets to perform a character conversion when the server where the Agent runs uses a different character encoding than the Automation Engine server. They are also used with File Transfer objects if the source and destination Agents use different character sets, and with Jobs.

For more information, see:

Job Objects and Output Scan

You can perform an Output-Scan for Jobs (JOBS) by applying a FILTER object, which defines the text keywords to be used for the scan. If the code page of the AE differs from the one that the agent uses, the target Code Table defined on the Attributes page of the Job object definition is used to convert the filter text keywords. The keywords converted to the target code page are then used for the Output Scan on the Agent side.

For more information, see Output Pages and Filter (FILTER).

Script Editors

Script objects, Pre Process/Process/Post Process pages and the Script Editor are some of the AWI components that consist of or contain a text editor. You can enter UTF-8 characters in text editors now. If you do so, you can observe the following behavior, which is usual with state-of-the-art text editors in the context of UTF-8:

If you place the cursor within a UTF-8 character (neither at its beginning nor at its end) and then enter another character, the original character is split in two parts that are not valid UTF-8 characters anymore.

Input Fields

AWI consists of many text input fields where you can now enter UTF-8 characters. The layout and size of these fields has not been modified. Some characters that are now supported require more space. This means that you can enter more UTF-8 characters in a text input field than it can display. In that case, a string might not be displayed at its full length. However, AWI provides resizing mechanisms to help you in these situations.

XML Import/Export

You can import XML objects from previous versions using an encoding other than UTF-8. The data is read in the original encoding, translated and stored in the AE database using UTF-8 encoding. For more information, see Importing/Exporting Objects.

Automation Engine scripting language

All function in the Automation Engine scripting language support UTF-8 characters.

  • HEX_2_STRING

    This script function allows you to convert strings in hexadecimal notation to UTF-8.

    Example:

    4175746F6D6963 converted to UTF-8 would result in the following string: Automic.

    This script function is especially useful for non-printable characters such as CR (carriage return), LF (line feed), TAB (tabulator), etc.), or any combination of printable and non-printable characters.

    For more information, see HEX_2_STRING.

  • UC_CRLF and STR_LENGTH, STR_LNG.

    The UC_CRLF() script function inserts a line break to denote the start of a new line. If you use it in the context of STR_LEN(), it is considered as one character (previous versions considered it as two characters).

    For detailed examples, see UC_CRLF and STR_LENGTH, STR_LNG.

  • GET_ATT_SUBSTR and STR_CUT, MID, SUBSTR

    These script functions are used to process strings. In the context of Unicode/UTF-8, this is how they behave:

    • Start/Begin parameters refer to the character position and not the byte position. For example, a value of 7 specifies the 7th character, and not necessarily the 7th byte.
    • Length parameters refer to the number of characters and not the number of bytes. For example, a value of 10 specifies characters, but it can contain ten or more bytes.

    For more information, see GET_ATT_SUBSTR and STR_CUT, MID, SUBSTR.

  • WRITE_PROCESS and PREP_PROCESS_FILE

    These script functions support the standard character conversion sets (see Standard Character Encoding Sets).

    For more information, see WRITE_PROCESS and PREP_PROCESS_FILE.

  • Script Functions for XML Elements

    All XML script functions support UTF-8. They can process XML with UTF-8 encoded tag names, attribute names, and all their values.

    For more information, see Script Functions for XML Elements.

  • Variable Names

    Variable names can use a paragraph character (§) in their names where the '§' represents part of the extended ASCII character set (such as "ö" or "â") and takes up two bytes in UTF-8. Using this character as part of your variable name reduces the available bytes by two instead of one.

    For more information, see Variable Names.