Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/kafka_jmx?at=release/6.2
Apache Kafka by JMX
Overview
For Zabbix version: 6.2 and higher
Official JMX Template for Apache Kafka.
This template was tested on:
- Apache Kafka, version 2.6.0
Setup
See Zabbix template operation for basic instructions.
Metrics are collected by JMX.
- Enable and configure JMX access to Apache Kafka. See documentation for instructions.
- Set the user name and password in host macros {$KAFKA.USER} and {$KAFKA.PASSWORD}.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN} | The minimum Network processor average idle percent for trigger expression. |
30 |
{$KAFKA.PASSWORD} | - |
zabbix |
{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN} | The minimum Request handler average idle percent for trigger expression. |
30 |
{$KAFKA.TOPIC.MATCHES} | Filter of discoverable topics |
.* |
{$KAFKA.TOPIC.NOT_MATCHES} | Filter to exclude discovered topics |
__consumer_offsets |
{$KAFKA.USER} | - |
zabbix |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
Topic Metrics (errors) | - |
JMX | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic=*"] Filter: AND- {#JMXTOPIC} MATCHES_REGEX - {#JMXTOPIC} NOT_MATCHES_REGEX |
Topic Metrics (read) | - |
JMX | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=*"] Filter: AND- {#JMXTOPIC} MATCHES_REGEX - {#JMXTOPIC} NOT_MATCHES_REGEX |
Topic Metrics (write) | - |
JMX | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*"] Filter: AND- {#JMXTOPIC} MATCHES_REGEX - {#JMXTOPIC} NOT_MATCHES_REGEX |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Kafka | Kafka: Leader election per second | Number of leader elections per second. |
JMX | jmx["kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs","Count"] |
Kafka | Kafka: Unclean leader election per second | Number of “unclean” elections per second. |
JMX | jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: Controller state on broker | One indicates that the broker is the controller for the cluster. |
JMX | jmx["kafka.controller:type=KafkaController,name=ActiveControllerCount","Value"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
Kafka | Kafka: Ineligible pending replica deletes | The number of ineligible pending replica deletes. |
JMX | jmx["kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount","Value"] |
Kafka | Kafka: Pending replica deletes | The number of pending replica deletes. |
JMX | jmx["kafka.controller:type=KafkaController,name=ReplicasToDeleteCount","Value"] |
Kafka | Kafka: Ineligible pending topic deletes | The number of ineligible pending topic deletes. |
JMX | jmx["kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount","Value"] |
Kafka | Kafka: Pending topic deletes | The number of pending topic deletes. |
JMX | jmx["kafka.controller:type=KafkaController,name=TopicsToDeleteCount","Value"] |
Kafka | Kafka: Offline log directory count | The number of offline log directories (for example, after a hardware failure). |
JMX | jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"] |
Kafka | Kafka: Offline partitions count | Number of partitions that don't have an active leader. |
JMX | jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"] |
Kafka | Kafka: Bytes out per second | The rate at which data is fetched and read from the broker by consumers. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: Bytes in per second | The rate at which data sent from producers is consumed by the broker. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: Messages in per second | The rate at which individual messages are consumed by the broker. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: Bytes rejected per second | The rate at which bytes rejected per second by the broker. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: Client fetch request failed per second | Number of client fetch request failures per second. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: Produce requests failed per second | Number of failed produce requests per second. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: Request handler average idle percent | Indicates the percentage of time that the request handler (IO) threads are not in use. |
JMX | jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"] Preprocessing: - MULTIPLIER: |
Kafka | Kafka: Fetch-Consumer response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","Mean"] |
Kafka | Kafka: Fetch-Consumer response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","95thPercentile"] |
Kafka | Kafka: Fetch-Consumer response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","99thPercentile"] |
Kafka | Kafka: Fetch-Follower response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","Mean"] |
Kafka | Kafka: Fetch-Follower response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","95thPercentile"] |
Kafka | Kafka: Fetch-Follower response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","99thPercentile"] |
Kafka | Kafka: Produce response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","Mean"] |
Kafka | Kafka: Produce response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","95thPercentile"] |
Kafka | Kafka: Produce response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","99thPercentile"] |
Kafka | Kafka: Fetch-Consumer request total time, mean | Average time in ms to serve the Fetch-Consumer request. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","Mean"] |
Kafka | Kafka: Fetch-Consumer request total time, p95 | Time in ms to serve the Fetch-Consumer request for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","95thPercentile"] |
Kafka | Kafka: Fetch-Consumer request total time, p99 | Time in ms to serve the specified Fetch-Consumer for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","99thPercentile"] |
Kafka | Kafka: Fetch-Follower request total time, mean | Average time in ms to serve the Fetch-Follower request. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","Mean"] |
Kafka | Kafka: Fetch-Follower request total time, p95 | Time in ms to serve the Fetch-Follower request for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","95thPercentile"] |
Kafka | Kafka: Fetch-Follower request total time, p99 | Time in ms to serve the Fetch-Follower request for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","99thPercentile"] |
Kafka | Kafka: Produce request total time, mean | Average time in ms to serve the Produce request. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","Mean"] |
Kafka | Kafka: Produce request total time, p95 | Time in ms to serve the Produce requests for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","95thPercentile"] |
Kafka | Kafka: Produce request total time, p99 | Time in ms to serve the Produce requests for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","99thPercentile"] |
Kafka | Kafka: Fetch-Consumer request total time, mean | Average time for a request to update metadata. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","Mean"] |
Kafka | Kafka: UpdateMetadata request total time, p95 | Time for update metadata requests for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","95thPercentile"] |
Kafka | Kafka: UpdateMetadata request total time, p99 | Time for update metadata requests for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","99thPercentile"] |
Kafka | Kafka: Temporary memory size in bytes (Fetch), max | The maximum of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Max"] |
Kafka | Kafka: Temporary memory size in bytes (Fetch), min | The minimum of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"] |
Kafka | Kafka: Temporary memory size in bytes (Produce), max | The maximum of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Max"] |
Kafka | Kafka: Temporary memory size in bytes (Produce), avg | The amount of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Mean"] |
Kafka | Kafka: Temporary memory size in bytes (Produce), min | The minimum of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Min"] |
Kafka | Kafka: Network processor average idle percent | The average percentage of time that the network processors are idle. |
JMX | jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"] Preprocessing: - MULTIPLIER: |
Kafka | Kafka: Requests in producer purgatory | Number of requests waiting in producer purgatory. |
JMX | jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch","Value"] |
Kafka | Kafka: Requests in fetch purgatory | Number of requests waiting in fetch purgatory. |
JMX | jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce","Value"] |
Kafka | Kafka: Replication maximum lag | The maximum lag between the time that messages are received by the leader replica and by the follower replicas. |
JMX | jmx["kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica","Value"] |
Kafka | Kafka: Under minimum ISR partition count | The number of partitions under the minimum In-Sync Replica (ISR) count. |
JMX | jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"] |
Kafka | Kafka: Under replicated partitions | The number of partitions that have not been fully replicated in the follower replicas (the number of non-reassigning replicas - the number of ISR > 0). |
JMX | jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"] |
Kafka | Kafka: ISR expands per second | The rate at which the number of ISRs in the broker increases. |
JMX | jmx["kafka.server:type=ReplicaManager,name=IsrExpandsPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: ISR shrink per second | Rate of replicas leaving the ISR pool. |
JMX | jmx["kafka.server:type=ReplicaManager,name=IsrShrinksPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: Leader count | The number of replicas for which this broker is the leader. |
JMX | jmx["kafka.server:type=ReplicaManager,name=LeaderCount","Value"] |
Kafka | Kafka: Partition count | The number of partitions in the broker. |
JMX | jmx["kafka.server:type=ReplicaManager,name=PartitionCount","Value"] |
Kafka | Kafka: Number of reassigning partitions | The number of reassigning leader partitions on a broker. |
JMX | jmx["kafka.server:type=ReplicaManager,name=ReassigningPartitions","Value"] |
Kafka | Kafka: Request queue size | The size of the delay queue. |
JMX | jmx["kafka.server:type=Request","queue-size"] |
Kafka | Kafka: Version | Current version of broker. |
JMX | jmx["kafka.server:type=app-info","version"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
Kafka | Kafka: Uptime | Service uptime in seconds. |
JMX | jmx["kafka.server:type=app-info","start-time-ms"] Preprocessing: - JAVASCRIPT: |
Kafka | Kafka: ZooKeeper client request latency | Latency in milliseconds for ZooKeeper requests from broker. |
JMX | jmx["kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs","Count"] |
Kafka | Kafka: ZooKeeper connection status | Connection status of broker's ZooKeeper session. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
Kafka | Kafka: ZooKeeper disconnect rate | ZooKeeper client disconnect per second. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: ZooKeeper session expiration rate | ZooKeeper client session expiration per second. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: ZooKeeper readonly rate | ZooKeeper client readonly per second. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka: ZooKeeper sync rate | ZooKeeper client sync per second. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka {#JMXTOPIC}: Messages in per second | The rate at which individual messages are consumed by topic. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={#JMXTOPIC}","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka {#JMXTOPIC}: Bytes in per second | The rate at which data sent from producers is consumed by topic. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={#JMXTOPIC}","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka {#JMXTOPIC}: Bytes out per second | The rate at which data is fetched and read from the broker by consumers (by topic). |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={#JMXTOPIC}","Count"] Preprocessing: - CHANGE_PER_SECOND |
Kafka | Kafka {#JMXTOPIC}: Bytes rejected per second | Rejected bytes rate by topic. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={#JMXTOPIC}","Count"] Preprocessing: - CHANGE_PER_SECOND |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kafka: Unclean leader election detected | Unclean leader elections occur when there is no qualified partition leader among Kafka brokers. If Kafka is configured to allow an unclean leader election, a leader is chosen from the out-of-sync replicas, and any messages that were not synced prior to the loss of the former leader are lost forever. Essentially, unclean leader elections sacrifice consistency for availability. |
last(/Apache Kafka by JMX/jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"])>0 |
AVERAGE | |
Kafka: There are offline log directories | The offline log directory count metric indicate the number of log directories which are offline (due to a hardware failure for example) so that the broker cannot store incoming messages anymore. |
last(/Apache Kafka by JMX/jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]) > 0 |
WARNING | |
Kafka: One or more partitions have no leader | Any partition without an active leader will be completely inaccessible, and both consumers and producers of that partition will be blocked until a leader becomes available. |
last(/Apache Kafka by JMX/jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]) > 0 |
WARNING | |
Kafka: Request handler average idle percent is too low | The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is. |
max(/Apache Kafka by JMX/jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"],15m)<{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN} |
AVERAGE | |
Kafka: Network processor average idle percent is too low | The network processor idle ratio metric indicates the percentage of time the network processor are not in use. The lower this number, the more loaded the broker is. |
max(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)<{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN} |
AVERAGE | |
Kafka: Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)=1 |
WARNING | |
Kafka: There are partitions under the min ISR | The Under min ISR partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"])>0 |
AVERAGE | |
Kafka: There are under replicated partitions | The Under replicated partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"])>0 |
AVERAGE | |
Kafka: Version has changed | Kafka version has changed. Ack to close. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#1)<>last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#2) and length(last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"]))>0 |
INFO | Manual close: YES |
Kafka: has been restarted | Uptime is less than 10 minutes. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","start-time-ms"])<10m |
INFO | Manual close: YES |
Kafka: Broker is not connected to ZooKeeper | - |
find(/Apache Kafka by JMX/jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"],,"regexp","CONNECTED")=0 |
AVERAGE |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.