Apache Ignite

Apache Ignite

Apache Ignite is a distributed database for high-performance computing with in-memory speed. Ignite was open-sourced by GridGain Systems in late 2014 and accepted in the Apache Incubator program that same year.

Available solutions




Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/ignite_jmx


Ignite by JMX

Overview

For Zabbix version: 5.4 and higher
Official JMX Template for Apache Ignite computing platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and Apache Ignite Contributor.

This template was tested on:

  • Zabbix, version 5.4
  • Ignite, version 2.9.0

Setup

See Zabbix template operation for basic instructions.

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

  1. Enable and configure JMX access to Apache Ignite. See documentation for instructions. Current jmx tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using officcial guide.
  2. Set the user name and password in host macros {$IGNITE.USER} and {$IGNITE.PASSWORD}.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH}

The maximum percent of checkpoint buffer utilization for high trigger expression.

80
{$IGNITE.CHECKPOINT.PUSED.MAX.WARN}

The maximum percent of checkpoint buffer utilization for warning trigger expression.

66
{$IGNITE.DATA.REGION.PUSED.MAX.HIGH}

The maximum percent of data region utilization for high trigger expression.

90
{$IGNITE.DATA.REGION.PUSED.MAX.WARN}

The maximum percent of data region utilization for warning trigger expression.

80
{$IGNITE.JOBS.QUEUE.MAX.WARN}

The maximum number of queued jobs for trigger expression.

10
{$IGNITE.LLD.FILTER.CACHE.MATCHES}

Filter of discoverable cache groups.

.*
{$IGNITE.LLD.FILTER.CACHE.NOT_MATCHES}

Filter to exclude discovered cache groups.

CHANGE_IF_NEEDED
{$IGNITE.LLD.FILTER.DATA.REGION.MATCHES}

Filter of discoverable data regions.

.*
{$IGNITE.LLD.FILTER.DATA.REGION.NOT_MATCHES}

Filter to exclude discovered data regions.

^(sysMemPlc|TxLog)$
{$IGNITE.LLD.FILTER.THREAD.POOL.MATCHES}

Filter of discoverable thread pools.

.*
{$IGNITE.LLD.FILTER.THREAD.POOL.NOT_MATCHES}

Filter to exclude discovered thread pools.

^(GridCallbackExecutor|GridRebalanceStripedExecutor|GridDataStreamExecutor|StripedExecutor)$
{$IGNITE.PASSWORD}

-

<secret>
{$IGNITE.PME.DURATION.MAX.HIGH}

The maximum PME duration in ms for high trigger expression.

60000
{$IGNITE.PME.DURATION.MAX.WARN}

The maximum PME duration in ms for warning trigger expression.

10000
{$IGNITE.THREAD.QUEUE.MAX.WARN}

Threshold for thread pool queue size. Can be used with thread pool name as context.

1000
{$IGNITE.THREADS.COUNT.MAX.WARN}

The maximum number of running threads for trigger expression.

1000
{$IGNITE.USER}

-

zabbix

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Ingite kernal metrics

-

JMX jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

Cluster metrics

-

JMX jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

Local node metrics

-

JMX jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

TCP discovery SPI

-

JMX jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

TCP Ccmmunication SPI metrics

-

JMX jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

Transaction metrics

-

JMX jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

Cache metrics

-

JMX jmx.discovery[beans,"org.apache:name=\"org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Filter:

AND

- A: {#JMXGROUP} MATCHES_REGEX {$IGNITE.LLD.FILTER.CACHE.MATCHES}

- B: {#JMXGROUP} NOT_MATCHES_REGEX {$IGNITE.LLD.FILTER.CACHE.NOT_MATCHES}

Data region metrics

-

JMX jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Filter:

AND

- A: {#JMXNAME} MATCHES_REGEX {$IGNITE.LLD.FILTER.DATA.REGION.MATCHES}

- B: {#JMXNAME} NOT_MATCHES_REGEX {$IGNITE.LLD.FILTER.DATA.REGION.NOT_MATCHES}

Cache groups

-

JMX jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Filter:

AND

- A: {#JMXNAME} MATCHES_REGEX {$IGNITE.LLD.FILTER.CACHE.MATCHES}

- B: {#JMXNAME} NOT_MATCHES_REGEX {$IGNITE.LLD.FILTER.CACHE.NOT_MATCHES}

Thread pool metrics

-

JMX jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"]

Preprocessing:

- JAVASCRIPT: Text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Filter:

AND

- A: {#JMXNAME} MATCHES_REGEX {$IGNITE.LLD.FILTER.THREAD.POOL.MATCHES}

- B: {#JMXNAME} NOT_MATCHES_REGEX {$IGNITE.LLD.FILTER.THREAD.POOL.NOT_MATCHES}

Items collected

Group Name Description Type Key and additional info
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Uptime

Uptime of Ignite instance.

JMX jmx["{#JMXOBJ}",UpTime]

Preprocessing:

- MULTIPLIER: 0.001

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Version

Version of Ignite instance.

JMX jmx["{#JMXOBJ}",FullVersion]

Preprocessing:

- REGEX: (.*)-\d+ \1

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline

Total baseline nodes that are registered in the baseline topology.

JMX jmx["{#JMXOBJ}",TotalBaselineNodes]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline

The number of nodes that are currently active in the baseline topology.

JMX jmx["{#JMXOBJ}",ActiveBaselineNodes]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Client

The number of client nodes in the cluster.

JMX jmx["{#JMXOBJ}",TotalClientNodes]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, total

Total number of nodes.

JMX jmx["{#JMXOBJ}",TotalNodes]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Server

The number of server nodes in the cluster.

JMX jmx["{#JMXOBJ}",TotalServerNodes]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current

Number of cancelled jobs that are still running.

JMX jmx["{#JMXOBJ}",CurrentCancelledJobs]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current

Number of jobs rejected after more recent collision resolution operation.

JMX jmx["{#JMXOBJ}",CurrentRejectedJobs]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current

Number of queued jobs currently waiting to be executed.

JMX jmx["{#JMXOBJ}",CurrentWaitingJobs]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs active, current

Number of currently active jobs concurrently executing on the node.

JMX jmx["{#JMXOBJ}",CurrentActiveJobs]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate

Total number of jobs handled by the node per second.

JMX jmx["{#JMXOBJ}",TotalExecutedJobs]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate

Total number of jobs cancelled by the node per second.

JMX jmx["{#JMXOBJ}",TotalCancelledJobs]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate

Total number of jobs this node rejects during collision resolution operations since node startup per second.

JMX jmx["{#JMXOBJ}",TotalRejectedJobs]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration, current

Current PME duration in milliseconds.

JMX jmx["{#JMXOBJ}",CurrentPmeDuration]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Threads count, current

Current number of live threads.

JMX jmx["{#JMXOBJ}",CurrentThreadCount]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Heap memory used

Current heap size that is used for object allocation.

JMX jmx["{#JMXOBJ}",HeapMemoryUsed]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator

Current coordinator UUID.

JMX jmx["{#JMXOBJ}",Coordinator]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes left

Nodes left count.

JMX jmx["{#JMXOBJ}",NodesLeft]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes joined

Nodes join count.

JMX jmx["{#JMXOBJ}",NodesJoined]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes failed

Nodes failed count.

JMX jmx["{#JMXOBJ}",NodesFailed]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue

Message worker queue current size.

JMX jmx["{#JMXOBJ}",MessageWorkerQueueSize]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate

Number of times node tries to (re)establish connection to another node per second.

JMX jmx["{#JMXOBJ}",ReconnectCount]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages

The number of messages received per second.

JMX jmx["{#JMXOBJ}",TotalProcessedMessages]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate

The number of messages processed per second.

JMX jmx["{#JMXOBJ}",TotalReceivedMessages]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue

Outbound messages queue size.

JMX jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate

The number of messages received per second.

JMX jmx["{#JMXOBJ}",ReceivedMessagesCount]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate

The number of messages sent per second.

JMX jmx["{#JMXOBJ}",SentMessagesCount]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate

Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.

JMX jmx["{#JMXOBJ}",ReconnectCount]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Locked keys

The number of keys locked on the node.

JMX jmx["{#JMXOBJ}",LockedKeysNumber]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current

The number of active transactions for which this node is the initiator.

JMX jmx["{#JMXOBJ}",OwnerTransactionsNumber]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current

The number of active transactions holding at least one key lock.

JMX jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate

The number of transactions which were rollback per second.

JMX jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
Ignite Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate

The number of transactions which were committed per second.

JMX jmx["{#JMXOBJ}",TransactionsCommittedNumber]
Ignite Cache group [{#JMXGROUP}]: Cache gets, rate

The number of gets to the cache per second.

JMX jmx["{#JMXOBJ}",CacheGets]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Cache group [{#JMXGROUP}]: Cache puts, rate

The number of puts to the cache per second.

JMX jmx["{#JMXOBJ}",CachePuts]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Cache group [{#JMXGROUP}]: Cache removals, rate

The number of removals from the cache per second.

JMX jmx["{#JMXOBJ}",CacheRemovals]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Cache group [{#JMXGROUP}]: Cache hits, pct

Percentage of successful hits.

JMX jmx["{#JMXOBJ}",CacheHitPercentage]
Ignite Cache group [{#JMXGROUP}]: Cache misses, pct

Percentage of accesses that failed to find anything.

JMX jmx["{#JMXOBJ}",CacheMissPercentage]
Ignite Cache group [{#JMXGROUP}]: Cache transaction commits, rate

The number of transaction commits per second.

JMX jmx["{#JMXOBJ}",CacheTxCommits]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate

The number of transaction rollback per second.

JMX jmx["{#JMXOBJ}",CacheTxRollbacks]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Cache group [{#JMXGROUP}]: Cache size

The number of non-null values in the cache as a long value.

JMX jmx["{#JMXOBJ}",CacheSize]
Ignite Cache group [{#JMXGROUP}]: Cache heap entries

The number of entries in heap memory.

JMX jmx["{#JMXOBJ}",HeapEntriesCount]

Preprocessing:

- CHANGE_PER_SECOND

Ignite Data region {#JMXNAME}: Allocation, rate

Allocation rate (pages per second) averaged across rateTimeInternal.

JMX jmx["{#JMXOBJ}",AllocationRate]
Ignite Data region {#JMXNAME}: Allocated, bytes

Total size of memory allocated in bytes.

JMX jmx["{#JMXOBJ}",TotalAllocatedSize]
Ignite Data region {#JMXNAME}: Dirty pages

Number of pages in memory not yet synchronized with persistent storage.

JMX jmx["{#JMXOBJ}",DirtyPages]
Ignite Data region {#JMXNAME}: Eviction, rate

Eviction rate (pages per second).

JMX jmx["{#JMXOBJ}",EvictionRate]
Ignite Data region {#JMXNAME}: Size, max

Maximum memory region size defined by its data region.

JMX jmx["{#JMXOBJ}",MaxSize]
Ignite Data region {#JMXNAME}: Offheap size

Offheap size in bytes.

JMX jmx["{#JMXOBJ}",OffHeapSize]
Ignite Data region {#JMXNAME}: Offheap used size

Total used offheap size in bytes.

JMX jmx["{#JMXOBJ}",OffheapUsedSize]
Ignite Data region {#JMXNAME}: Pages fill factor

The percentage of the used space.

JMX jmx["{#JMXOBJ}",PagesFillFactor]
Ignite Data region {#JMXNAME}: Pages replace, rate

Rate at which pages in memory are replaced with pages from persistent storage (pages per second).

JMX jmx["{#JMXOBJ}",PagesReplaceRate]
Ignite Data region {#JMXNAME}: Used checkpoint buffer size

Used checkpoint buffer size in bytes.

JMX jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Ignite Data region {#JMXNAME}: Checkpoint buffer size

Total size in bytes for checkpoint buffer.

JMX jmx["{#JMXOBJ}",CheckpointBufferSize]
Ignite Cache group [{#JMXNAME}]: Backups

Count of backups configured for cache group.

JMX jmx["{#JMXOBJ}",Backups]
Ignite Cache group [{#JMXNAME}]: Partitions

Count of partitions for cache group.

JMX jmx["{#JMXOBJ}",Partitions]
Ignite Cache group [{#JMXNAME}]: Caches

List of caches.

JMX jmx["{#JMXOBJ}",Caches]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Ignite Cache group [{#JMXNAME}]: Local node partitions, moving

Count of partitions with state MOVING for this cache group located on this node.

JMX jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Ignite Cache group [{#JMXNAME}]: Local node partitions, renting

Count of partitions with state RENTING for this cache group located on this node.

JMX jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Ignite Cache group [{#JMXNAME}]: Local node entries, renting

Count of entries remains to evict in RENTING partitions located on this node for this cache group.

JMX jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Ignite Cache group [{#JMXNAME}]: Local node partitions, owning

Count of partitions with state OWNING for this cache group located on this node.

JMX jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Ignite Cache group [{#JMXNAME}]: Partition copies, min

Minimum number of partition copies for all partitions of this cache group.

JMX jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Ignite Cache group [{#JMXNAME}]: Partition copies, max

Maximum number of partition copies for all partitions of this cache group.

JMX jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]
Ignite Thread pool [{#JMXNAME}]: Queue size

Current size of the execution queue.

JMX jmx["{#JMXOBJ}",QueueSize]
Ignite Thread pool [{#JMXNAME}]: Pool size

Current number of threads in the pool.

JMX jmx["{#JMXOBJ}",PoolSize]
Ignite Thread pool [{#JMXNAME}]: Pool size, max

The maximum allowed number of threads.

JMX jmx["{#JMXOBJ}",MaximumPoolSize]
Ignite Thread pool [{#JMXNAME}]: Pool size, core

The core number of threads.

JMX jmx["{#JMXOBJ}",CorePoolSize]

Triggers

Name Description Expression Severity Dependencies and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:jmx["{#JMXOBJ}",UpTime].last()}<10m INFO

Manual close: YES

Ignite [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data (or no data for 10m)

Zabbix has not received data for items for the last 10 minutes.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",UpTime].nodata(10m)}=1 WARNING

Manual close: YES

Ignite [{#JMXIGNITEINSTANCENAME}]: Version has changed (new version: {ITEM.VALUE})

Ignite [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",FullVersion].diff()}=1 and {TEMPLATE_NAME:jmx["{#JMXOBJ}",FullVersion].strlen()}>0 INFO

Manual close: YES

Ignite [{#JMXIGNITEINSTANCENAME}]: Server node left the topology

One or more server node left the topology. Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",TotalServerNodes].change()}<0 WARNING

Manual close: YES

Ignite [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology

One or more server node added to the topology. Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",TotalServerNodes].change()}>0 INFO

Manual close: YES

Ignite [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology

One or more server node left the topology. Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",TotalServerNodes].last()}>{Ignite by JMX:jmx["{#JMXOBJ}",TotalBaselineNodes].last()} INFO

Manual close: YES

Ignite [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high (over {$IGNITE.JOBS.QUEUE.MAX.WARN} for 15 min)

Number of queued jobs is over {$IGNITE.JOBS.QUEUE.MAX.WARN}.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",CurrentWaitingJobs].min(15m)} > {$IGNITE.JOBS.QUEUE.MAX.WARN} WARNING
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$IGNITE.PME.DURATION.MAX.WARN} for 5 min)

PME duration is over {$IGNITE.PME.DURATION.MAX.WARN}ms.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",CurrentPmeDuration].min(5m)} > {$IGNITE.PME.DURATION.MAX.WARN} WARNING

Depends on:

- Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$IGNITE.PME.DURATION.MAX.HIGH} for 5 min)

Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$IGNITE.PME.DURATION.MAX.HIGH} for 5 min)

PME duration is over {$IGNITE.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",CurrentPmeDuration].min(5m)} > {$IGNITE.PME.DURATION.MAX.HIGH} HIGH
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high (over {$IGNITE.THREADS.COUNT.MAX.WARN} for 15 min)

Number of running threads is over {$IGNITE.THREADS.COUNT.MAX.WARN}.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",CurrentThreadCount].min(15m)} > {$IGNITE.THREADS.COUNT.MAX.WARN} WARNING

Depends on:

- Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$IGNITE.PME.DURATION.MAX.HIGH} for 5 min)

Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed

Ignite [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",Coordinator].diff()}=1 and {TEMPLATE_NAME:jmx["{#JMXOBJ}",Coordinator].strlen()}>0 WARNING

Manual close: YES

Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m

-

{TEMPLATE_NAME:jmx["{#JMXOBJ}",CacheTxRollbacks].min(5m)}>0 and {Ignite by JMX:jmx["{#JMXOBJ}",CacheTxCommits].max(5m)}=0 AVERAGE
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m

-

{TEMPLATE_NAME:jmx["{#JMXOBJ}",CacheTxRollbacks].min(5m)} > {Ignite by JMX:jmx["{#JMXOBJ}",CacheTxCommits].max(5m)} WARNING

Depends on:

- Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m

Cache group [{#JMXGROUP}]: All entries are in heap

All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",CacheSize].last()}={Ignite by JMX:jmx["{#JMXOBJ}",HeapEntriesCount].last()} INFO

Manual close: YES

Data region {#JMXNAME}: Node started to evict pages

You store more data then region can accommodate. Data started to move to disk it can make requests work slower. Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",EvictionRate].min(5m)}>0 INFO

Manual close: YES

Data region {#JMXNAME}: Data region utilisation is too high (over {$IGNITE.DATA.REGION.PUSED.MAX.WARN} in 5m)

Data region utilization is high. Increase data region size or delete any data.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",OffheapUsedSize].min(5m)}/{Ignite by JMX:jmx["{#JMXOBJ}",OffHeapSize].last()}*100>{$IGNITE.DATA.REGION.PUSED.MAX.WARN} WARNING

Depends on:

- Data region {#JMXNAME}: Data region utilisation is too high (over {$IGNITE.DATA.REGION.PUSED.MAX.HIGH} in 5m)

Data region {#JMXNAME}: Data region utilisation is too high (over {$IGNITE.DATA.REGION.PUSED.MAX.HIGH} in 5m)

Data region utilization is high. Increase data region size or delete any data.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",OffheapUsedSize].min(5m)}/{Ignite by JMX:jmx["{#JMXOBJ}",OffHeapSize].last()}*100>{$IGNITE.DATA.REGION.PUSED.MAX.HIGH} HIGH
Data region {#JMXNAME}: Pages replace rate more than 0

There is more data than DataRegionMaxSize. Сluster started to replace pages in memory. Page replacement can slow down operations.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",PagesReplaceRate].min(5m)}>0 WARNING
Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$IGNITE.CHECKPOINT.PUSED.MAX.WARN} in 5m)

Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",UsedCheckpointBufferSize].min(5m)}/{Ignite by JMX:jmx["{#JMXOBJ}",CheckpointBufferSize].last()}*100>{$IGNITE.CHECKPOINT.PUSED.MAX.WARN} WARNING

Depends on:

- Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$IGNITE.CHECKPOINT.PUSED.MAX.HIGH} in 5m)

Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$IGNITE.CHECKPOINT.PUSED.MAX.HIGH} in 5m)

Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",UsedCheckpointBufferSize].min(5m)}/{Ignite by JMX:jmx["{#JMXOBJ}",CheckpointBufferSize].last()}*100>{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH} HIGH
Cache group [{#JMXNAME}]: One or more backups are unavaliable

-

{TEMPLATE_NAME:jmx["{#JMXOBJ}",Backups].min(5m)}>={Ignite by JMX:jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies].max(5m)} WARNING
Cache group [{#JMXNAME}]: List of caches has changed

List of caches has changed. Significant changes have occurred in the cluster. Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",Caches].diff()}=1 and {TEMPLATE_NAME:jmx["{#JMXOBJ}",Caches].strlen()}>0 INFO

Manual close: YES

Cache group [{#JMXNAME}]: Rebalance in progress

Ack to close.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount].max(30m)}>0 INFO

Manual close: YES

Cache group [{#JMXNAME}]: There is no copy for partitions

-

{TEMPLATE_NAME:jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies].max(30m)}=0 WARNING
Thread pool [{#JMXNAME}]: Too many messages in queue (over {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} for 5 min)

Number of messages in queue more than {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.

{TEMPLATE_NAME:jmx["{#JMXOBJ}",QueueSize].min(5m)} > {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

Articles and documentation

+ Propose new article
Add your solution