Analytics - Sizing Requirements

Analytics Backend and Analytics Datastore

Important! Database systems and database storage have to be fail safe and redundant. This section does not deal with this subject.

Modules

Large Configuration

High End Configuration

No.

CPU

Memory

Disk

No.

CPU

Memory

Disk

Analytics & Datastore 1 x 32 Cores 256 GB 2 TB 1 x 32 Cores 256+ GB 4 TB

Number of

Concurrent users

< 200

> 200

Agents

< 1 000

> 1 000

Object definitions

< 100 000

> 100 000

Total Executions per day

< 1 500 000

> 1 500 000

 

WP 2 x 16 4 x 16
DWP* 2 x 45 4 x 45
JWP* 2 x 10 4 x 10
CP 2 x 2 4 x 4

Sizing and Storage Recommendations

Note: For medium-sized and bigger installations, setting up a regular back-up and truncate process for the Analytics Datastore is recommended. To provide a stable chart performance, back-up and truncate and keep only the last 3-12 months in the Datastore.

For further information, see: Analytics Datastore Delete Action

Setup Recommendations

  • The UI plug-in is always added to one or more hosts where AWI is installed
  • The Datastore and Backend should be both installed on a dedicated host
  • The Backend must be accessible using HTTP or HTTPs from the AWI host. The Backend must be able to connect to the Datastore and to all required databases (AE, ARA).
What disk space is required? One GB for every hundred thousand executions in the Automation Engine.
Must I back up the Datastore?

The Analytics Datastore was created to store large amounts of data.
To save space, remove data older than 1 year from the Analytics Datastore. You can use backup actions in the ANALYTICS ACTION PACK.

General Database Rules

The following information is valid for all database vendors. The log files must be placed on the fastest available disks (example: SSDs).

ORACLE: REDO LOG FILE DESTINATION
SQL SERVER: TRANSACTION LOG and TEMPDB files
LOG and DATA files must always be on separate DISKS /LUNS

Maximizing Efficiency with the Analytics Datastore

We recommend that you install PostgreSQL 11 with large and high end configurations. This version lets you benefit from the parallel query feature.

To enable parallel queries, two parameters must be set before PostgreSQL is started:

  • max_worker_processes = 8
    The default value is set to eight. The value should be set according to the number of cores the database Administrator allocates for the PostgreSQL database.
  • max_parallel_workers_per_gather = 7
    The value should be set to the value of the max_worker_processes minus one.

You can configure the previously mentioned parameters in the Customized Options section of the postgresql.conf file, that is located:

Windows: C:\Program Files\PostgreSQL\10\data\postgresql.conf

Linux: /etc/postgresql/10/main/postgresql.conf

Example:

A host with 32-cores running PostgreSQL, reserve 4-cores for the Backend:

  • max_worker_processes = 28 #
  • max_parallel_workers_per_gather = 27 #

Analytics Rule Engine

Important! Message queue systems and database storage have always to be fail safe and redundant. This section does not deal with this subject.

Sizing and Storage Recommendations

  • IA Agent Nodes

    • See existing recommendations for Analytics Backend, previously mentioned
    • On a single box: 16 Cores for a small sized configuration and 32 cores for a medium-sized configuration
    • + 8-16 GB RAM to existing memory recommendations
  • Streaming Platform Nodes

    • 1x4 Cores
    • 16-GB RAM
    • Disk: Expected event size * expected events per second * how many seconds kept in the Streaming Platform (retention period) * replication factor / # of brokers
      That is, 80 Bytes * 30000 events per second * 86400 seconds (= 1 day) of retention * 1 (no replication) / 1 (one broker) ~ 210 GB. A single 80-Bytes raw event results in around 3 KB of disk usage in the Streaming Platform
    • Disk buffer cache lives in memory. Sufficient RAM is required on each broker. The RAM depends on how often the Streaming Platform flushes, the more flushes, the less throughput.
    • A single broker can host only a single replica per partition, hence # brokers > # replicas

  • Rule Engine Nodes

    • 1x8 Cores
    • 32-GB RAM
    • Disk: 32 GB
    • Memory is critical. The Rule Engine would otherwise start spilling to disk, and decreases throughput. 

    Other Factors

    • To increase throughput by a factor of 5-10 (depending on the batch size). Run Rule Engine, the Automation Engine processes, and the Streaming Platform on separate machines.
    • Maximum throughput 1000 concurrent users on a single box, after that backpressure occurs.
    • Throughput scales with batch size
    • 22.9-GB Streaming Platform logs.dir size for ~ 67-m events ~ 3 KB per event

    Note: A single event ingestion using a single box installation is limited to, approximately 2500 events per second. The ingestion rate can be improved by distributing services, selecting a higher batch size, or using more than one IA Agent.

Analytics Streaming Platform

Important! Streaming Platform systems and database storage have always to be fail safe and redundant. This section does not deal with this question.

Modules

Big Configuration

High End Configuration

No.

CPU

Memory

Disk

No.

CPU

Memory

Disk

Streaming Platform 1 x 32 Cores 256 GB 2 TB 1 x 32 Cores 256+ GB 4 TB

Number of

Concurrent users

< 200

> 200

Agents

< 1 000

> 1 000

Object definitions

< 100 000

> 100 000

Total Executions per day

< 1 500 000

> 1 500 000