About Apache Kafka

To be able to configure, work with and capitalize on the Apache Kafka Agent integration you need an understanding of how the Apache Kafka platform works and of the terms used in it. This page provides a brief description of the most important Apache Kafka concepts. It is NOT a comprehensive documentation of Apache Kafka. It simply introduces it and explains the concepts that you will need to use it in the Automic Automation context. For detailed information about Apache Kafka, please refer to its product documentation at https://kafka.apache.org/documentation/.

This page includes the following:

Apache Kafka In a Nutshell

Apache Kafka is a middle-ware that collects event streams in real-time from multiple external sources (databases, sensors, cloud services, software applications and more). It stores, categorizes, groups, prioritizes and processes these event streams. It then routes them to external applications, either in real-time or retrospectively. Thus, Apache Kafka acts as a sort of event-triggered scheduler. Clients at both ends of the Apache Kafka platform receive the events: The Producers (also called publishers) on the sending end and the Consumers (also called subscribers) on the receiving end.

Apache Kafka Basic Terms and Concepts

This is a list of the most basic Apache Kafka terms and concepts. You need to understand many of them to be able to configure the Automic Automation/Apache Kafka Agent integration jobs properly.

  • Message / Event

    A message (also referred to as event or event stream) is a unit of data in Apache Kafka. A message consists of the following parts:

    • The header, which provides metadata on the message.

    • A key, which is optional and provides data about the message itself.

    • A value, which is the body of the message.

    • The timestamp

  • Producers/Consumers

    Applications outside Apache Kafka that use it to send or read event streams.

    • Producers generate and queue the event streams in topics, that means, they publish events. A producer chooses which events to assign to which partition within the topic.

      A Producer could be an ETL pipeline such as the Data Factory, a Download Job from S3, and so forth.

    • Consumers are subscribed to a certain Topic in Apache Kafka, that is, they read the event streams from Topics. As soon as there is an update on the Topic, they trigger an event on an external application. Consumers only have to specify the Topic and the Broker that they read data from and Apache Kafka automatically takes care of pulling the data from the right Partition. Consumers read data in order within each Partition. Consumers are organized in Consumer Groups.

    • Consumer Groups Any number of Consumers grouped together. Consumer groups increase the consumption rate. Consumers in the same group can subscribe to zero, one or more Partitions (see definition of partition below) within a Topic.

      If multiple Consumers in a group are subscribed to the same Topic, Apache Kafka automatically assigns those consumers to different Partitions within the Topic. This means that none of those consumers will receive the same messages, avoiding the same message being consumed multiple times.

  • Kafka Brokers

    Components that handle the data transaction between Producers and Consumers. Topics reside within brokers.

  • Kafka Cluster

    Group of Kafka Brokers.

  • Topic

    User-defined categorizations where event streams are published and stored. Producers write events to Topics, Consumers read events from Topics. A topic can have zero, one or more producers that write events to it and zero, one or more consumers that read events from it. Events in a Topic can be read as often as needed, they are not deleted after consumption.

  • Partition

    A Topic is split into various Partitions. Event streams are written sequentially to specific Partitions.

  • Partition Offset

    When written to a partition, each message gets a unique ID called the Partition Offset. This helps keep track of the messages that have already been consumed. If a Consumer goes down, this ID indicates from where the consumer should start consuming again.

See also: