Kafka

Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Available solutions




This template is for Zabbix version: 5.4
Also available for: 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/kafka_jmx?at=release/5.4

Apache Kafka by JMX

Overview

For Zabbix version: 5.4 and higher
Official JMX Template for Apache Kafka.

This template was tested on:

  • Apache Kafka, version 2.6.0
  • Zabbix, version 5.0, 5.2

Setup

See Zabbix template operation for basic instructions.

Metrics are collected by JMX.

  1. Enable and configure JMX access to Apache Kafka. See documentation for instructions.
  2. Set the user name and password in host macros {$KAFKA.USER} and {$KAFKA.PASSWORD}.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN}

The minimum Network processor average idle percent for trigger expression.

30
{$KAFKA.PASSWORD}

-

zabbix
{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN}

The minimum Request handler average idle percent for trigger expression.

30
{$KAFKA.TOPIC.MATCHES}

Filter of discoverable topics

.*
{$KAFKA.TOPIC.NOT_MATCHES}

Filter to exclude discovered topics

__consumer_offsets
{$KAFKA.USER}

-

zabbix

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Topic Metrics (write)

-

JMX jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*"]

Filter:

AND

- A: {#JMXTOPIC} MATCHES_REGEX {$KAFKA.TOPIC.MATCHES}

- B: {#JMXTOPIC} NOT_MATCHES_REGEX {$KAFKA.TOPIC.NOT_MATCHES}

Topic Metrics (read)

-

JMX jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=*"]

Filter:

AND

- A: {#JMXTOPIC} MATCHES_REGEX {$KAFKA.TOPIC.MATCHES}

- B: {#JMXTOPIC} NOT_MATCHES_REGEX {$KAFKA.TOPIC.NOT_MATCHES}

Topic Metrics (errors)

-

JMX jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic=*"]

Filter:

AND

- A: {#JMXTOPIC} MATCHES_REGEX {$KAFKA.TOPIC.MATCHES}

- B: {#JMXTOPIC} NOT_MATCHES_REGEX {$KAFKA.TOPIC.NOT_MATCHES}

Items collected

Group Name Description Type Key and additional info
Kafka Kafka: Leader election per second

Number of leader elections per second.

JMX jmx["kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs","Count"]
Kafka Kafka: Unclean leader election per second

Number of “unclean” elections per second.

JMX jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: Controller state on broker

One indicates that the broker is the controller for the cluster.

JMX jmx["kafka.controller:type=KafkaController,name=ActiveControllerCount","Value"]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Kafka Kafka: Ineligible pending replica deletes

The number of ineligible pending replica deletes.

JMX jmx["kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount","Value"]
Kafka Kafka: Pending replica deletes

The number of pending replica deletes.

JMX jmx["kafka.controller:type=KafkaController,name=ReplicasToDeleteCount","Value"]
Kafka Kafka: Ineligible pending topic deletes

The number of ineligible pending topic deletes.

JMX jmx["kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount","Value"]
Kafka Kafka: Pending topic deletes

The number of pending topic deletes.

JMX jmx["kafka.controller:type=KafkaController,name=TopicsToDeleteCount","Value"]
Kafka Kafka: Offline log directory count

The number of offline log directories (for example, after a hardware failure).

JMX jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]
Kafka Kafka: Offline partitions count

Number of partitions that don't have an active leader.

JMX jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]
Kafka Kafka: Bytes out per second

The rate at which data is fetched and read from the broker by consumers.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: Bytes in per second

The rate at which data sent from producers is consumed by the broker.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: Messages in per second

The rate at which individual messages are consumed by the broker.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: Bytes rejected per second

The rate at which bytes rejected per second by the broker.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: Client fetch request failed per second

Number of client fetch request failures per second.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: Produce requests failed per second

Number of failed produce requests per second.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: Request handler average idle percent

Indicates the percentage of time that the request handler (IO) threads are not in use.

JMX jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"]

Preprocessing:

- MULTIPLIER: 100

Kafka Kafka: Fetch-Consumer response send time, mean

Average time taken, in milliseconds, to send the response.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","Mean"]
Kafka Kafka: Fetch-Consumer response send time, p95

The time taken, in milliseconds, to send the response for 95th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","95thPercentile"]
Kafka Kafka: Fetch-Consumer response send time, p99

The time taken, in milliseconds, to send the response for 99th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","99thPercentile"]
Kafka Kafka: Fetch-Follower response send time, mean

Average time taken, in milliseconds, to send the response.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","Mean"]
Kafka Kafka: Fetch-Follower response send time, p95

The time taken, in milliseconds, to send the response for 95th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","95thPercentile"]
Kafka Kafka: Fetch-Follower response send time, p99

The time taken, in milliseconds, to send the response for 99th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","99thPercentile"]
Kafka Kafka: Produce response send time, mean

Average time taken, in milliseconds, to send the response.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","Mean"]
Kafka Kafka: Produce response send time, p95

The time taken, in milliseconds, to send the response for 95th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","95thPercentile"]
Kafka Kafka: Produce response send time, p99

The time taken, in milliseconds, to send the response for 99th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","99thPercentile"]
Kafka Kafka: Fetch-Consumer request total time, mean

Average time in ms to serve the Fetch-Consumer request.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","Mean"]
Kafka Kafka: Fetch-Consumer request total time, p95

Time in ms to serve the Fetch-Consumer request for 95th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","95thPercentile"]
Kafka Kafka: Fetch-Consumer request total time, p99

Time in ms to serve the specified Fetch-Consumer for 99th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","99thPercentile"]
Kafka Kafka: Fetch-Follower request total time, mean

Average time in ms to serve the Fetch-Follower request.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","Mean"]
Kafka Kafka: Fetch-Follower request total time, p95

Time in ms to serve the Fetch-Follower request for 95th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","95thPercentile"]
Kafka Kafka: Fetch-Follower request total time, p99

Time in ms to serve the Fetch-Follower request for 99th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","99thPercentile"]
Kafka Kafka: Produce request total time, mean

Average time in ms to serve the Produce request.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","Mean"]
Kafka Kafka: Produce request total time, p95

Time in ms to serve the Produce requests for 95th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","95thPercentile"]
Kafka Kafka: Produce request total time, p99

Time in ms to serve the Produce requests for 99th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","99thPercentile"]
Kafka Kafka: Fetch-Consumer request total time, mean

Average time for a request to update metadata.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","Mean"]
Kafka Kafka: UpdateMetadata request total time, p95

Time for update metadata requests for 95th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","95thPercentile"]
Kafka Kafka: UpdateMetadata request total time, p99

Time for update metadata requests for 99th percentile.

JMX jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","99thPercentile"]
Kafka Kafka: Temporary memory size in bytes (Fetch), max

The maximum of temporary memory used for converting message formats and decompressing messages.

JMX jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Max"]
Kafka Kafka: Temporary memory size in bytes (Fetch), avg

The amount of temporary memory used for converting message formats and decompressing messages.

JMX jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"]
Kafka Kafka: Temporary memory size in bytes (Fetch), min

The minimum of temporary memory used for converting message formats and decompressing messages.

JMX jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"]
Kafka Kafka: Temporary memory size in bytes (Produce), max

The maximum of temporary memory used for converting message formats and decompressing messages.

JMX jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Max"]
Kafka Kafka: Temporary memory size in bytes (Produce), avg

The amount of temporary memory used for converting message formats and decompressing messages.

JMX jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Mean"]
Kafka Kafka: Temporary memory size in bytes (Produce), min

The minimum of temporary memory used for converting message formats and decompressing messages.

JMX jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Min"]
Kafka Kafka: Network processor average idle percent

The average percentage of time that the network processors are idle.

JMX jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"]

Preprocessing:

- MULTIPLIER: 100

Kafka Kafka: Requests in producer purgatory

Number of requests waiting in producer purgatory.

JMX jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch","Value"]
Kafka Kafka: Requests in fetch purgatory

Number of requests waiting in fetch purgatory.

JMX jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce","Value"]
Kafka Kafka: Replication maximum lag

The maximum lag between the time that messages are received by the leader replica and by the follower replicas.

JMX jmx["kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica","Value"]
Kafka Kafka: Under minimum ISR partition count

The number of partitions under the minimum In-Sync Replica (ISR) count.

JMX jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"]
Kafka Kafka: Under replicated partitions

The number of partitions that have not been fully replicated in the follower replicas (the number of non-reassigning replicas - the number of ISR > 0).

JMX jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"]
Kafka Kafka: ISR expands per second

The rate at which the number of ISRs in the broker increases.

JMX jmx["kafka.server:type=ReplicaManager,name=IsrExpandsPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: ISR shrink per second

Rate of replicas leaving the ISR pool.

JMX jmx["kafka.server:type=ReplicaManager,name=IsrShrinksPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: Leader count

The number of replicas for which this broker is the leader.

JMX jmx["kafka.server:type=ReplicaManager,name=LeaderCount","Value"]
Kafka Kafka: Partition count

The number of partitions in the broker.

JMX jmx["kafka.server:type=ReplicaManager,name=PartitionCount","Value"]
Kafka Kafka: Number of reassigning partitions

The number of reassigning leader partitions on a broker.

JMX jmx["kafka.server:type=ReplicaManager,name=ReassigningPartitions","Value"]
Kafka Kafka: Request queue size

The size of the delay queue.

JMX jmx["kafka.server:type=Request","queue-size"]
Kafka Kafka: Version

Current version of brocker.

JMX jmx["kafka.server:type=app-info","version"]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Kafka Kafka: Uptime

Service uptime in seconds.

JMX jmx["kafka.server:type=app-info","start-time-ms"]

Preprocessing:

- JAVASCRIPT: return (Math.floor((Date.now()-Number(value))/1000))

Kafka Kafka: ZooKeeper client request latency

Latency in milliseconds for ZooKeeper requests from broker.

JMX jmx["kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs","Count"]
Kafka Kafka: ZooKeeper connection status

Connection status of broker's ZooKeeper session.

JMX jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Kafka Kafka: ZooKeeper disconnect rate

ZooKeeper client disconnect per second.

JMX jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: ZooKeeper session expiration rate

ZooKeeper client session expiration per second.

JMX jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: ZooKeeper readonly rate

ZooKeeper client readonly per second.

JMX jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka: ZooKeeper sync rate

ZooKeeper client sync per second.

JMX jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka {#JMXTOPIC}: Messages in per second

The rate at which individual messages are consumed by topic.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka {#JMXTOPIC}: Bytes in per second

The rate at which data sent from producers is consumed by topic.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka {#JMXTOPIC}: Bytes out per second

The rate at which data is fetched and read from the broker by consumers (by topic).

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Kafka Kafka {#JMXTOPIC}: Bytes rejected per second

Rejected bytes rate by topic.

JMX jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={#JMXTOPIC}","Count"]

Preprocessing:

- CHANGE_PER_SECOND

Triggers

Name Description Expression Severity Dependencies and additional info
Kafka: Unclean leader election detected

Unclean leader elections occur when there is no qualified partition leader among Kafka brokers. If Kafka is configured to allow an unclean leader election, a leader is chosen from the out-of-sync replicas, and any messages that were not synced prior to the loss of the former leader are lost forever. Essentially, unclean leader elections sacrifice consistency for availability.

{TEMPLATE_NAME:jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"].last()}>0 AVERAGE
Kafka: There are offline log directories

The offline log directory count metric indicate the number of log directories which are offline (due to an hardware failure for example) so that the broker cannot store incoming messages anymore.

{TEMPLATE_NAME:jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"].last()} > 0 WARNING
Kafka: One or more partitions have no leader

Any partition without an active leader will be completely inaccessible, and both consumers and producers of that partition will be blocked until a leader becomes available.

{TEMPLATE_NAME:jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"].last()} > 0 WARNING
Kafka: Request handler average idle percent is too low (under {$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN} for 15m)

The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is.

{TEMPLATE_NAME:jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"].max(15m)}<{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN} AVERAGE
Kafka: Network processor average idle percent is too low (under {$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN} for 15m)

The network processor idle ratio metric indicates the percentage of time the network processor are not in use. The lower this number, the more loaded the broker is.

{TEMPLATE_NAME:jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"].max(15m)}<{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN} AVERAGE
Kafka: Failed to fetch info data (or no data for 15m)

Zabbix has not received data for items for the last 15 minutes

{TEMPLATE_NAME:jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"].nodata(15m)}=1 WARNING
Kafka: There are partitions under the min ISR

The Under min ISR partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind.

{TEMPLATE_NAME:jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"].last()}>0 AVERAGE
Kafka: There are under replicated partitions

The Under replicated partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind.

{TEMPLATE_NAME:jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"].last()}>0 AVERAGE
Kafka: Version has changed (new version: {ITEM.VALUE})

Kafka version has changed. Ack to close.

{TEMPLATE_NAME:jmx["kafka.server:type=app-info","version"].diff()}=1 and {TEMPLATE_NAME:jmx["kafka.server:type=app-info","version"].strlen()}>0 INFO

Manual close: YES

Kafka: has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:jmx["kafka.server:type=app-info","start-time-ms"].last()}<10m INFO

Manual close: YES

Kafka: Broker is not connected to ZooKeeper

-

{TEMPLATE_NAME:jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"].regexp("CONNECTED")}=0 AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

Didn't find what you are looking for?