Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/6.2
GridGain by JMX
Overview
For Zabbix version: 6.2 and higher
Official JMX Template for GridGain In-Memory Computing Platform.
This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.
This template was tested on:
- GridGain, version 8.8.5
Setup
See Zabbix template operation for basic instructions.
This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.
- Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option
-DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false
to will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide. - Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} | The maximum percent of checkpoint buffer utilization for high trigger expression. |
80 |
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} | The maximum percent of checkpoint buffer utilization for warning trigger expression. |
66 |
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} | The maximum percent of data region utilization for high trigger expression. |
90 |
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} | The maximum percent of data region utilization for warning trigger expression. |
80 |
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN} | The maximum number of queued jobs for trigger expression. |
10 |
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES} | Filter of discoverable cache groups. |
.* |
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES} | Filter to exclude discovered cache groups. |
CHANGE_IF_NEEDED |
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES} | Filter of discoverable data regions. |
.* |
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES} | Filter to exclude discovered data regions. |
^(sysMemPlc|TxLog)$ |
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES} | Filter of discoverable thread pools. |
.* |
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES} | Filter to exclude discovered thread pools. |
^(GridCallbackExecutor|GridRebalanceStripedExecutor|GridDataStreamExecutor|StripedExecutor)$ |
{$GRIDGAIN.PASSWORD} | - |
<secret> |
{$GRIDGAIN.PME.DURATION.MAX.HIGH} | The maximum PME duration in ms for high trigger expression. |
60000 |
{$GRIDGAIN.PME.DURATION.MAX.WARN} | The maximum PME duration in ms for warning trigger expression. |
10000 |
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN} | Threshold for thread pool queue size. Can be used with thread pool name as context. |
1000 |
{$GRIDGAIN.THREADS.COUNT.MAX.WARN} | The maximum number of running threads for trigger expression. |
1000 |
{$GRIDGAIN.USER} | - |
zabbix |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache groups | - |
JMX | jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"] Preprocessing: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: Filter: AND- {#JMXNAME} MATCHES_REGEX - {#JMXNAME} NOT_MATCHES_REGEX |
Cache metrics | - |
JMX | jmx.discovery[beans,"org.apache:name=\"org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"] Preprocessing: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: Filter: AND- {#JMXGROUP} MATCHES_REGEX - {#JMXGROUP} NOT_MATCHES_REGEX |
Cluster metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
Data region metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"] Preprocessing: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: Filter: AND- {#JMXNAME} MATCHES_REGEX - {#JMXNAME} NOT_MATCHES_REGEX |
GridGain kernal metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"] Preprocessing: - JAVASCRIPT: |
Local node metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
TCP Communication SPI metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"] Preprocessing: - JAVASCRIPT: |
TCP discovery SPI | - |
JMX | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"] Preprocessing: - JAVASCRIPT: |
Thread pool metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"] Preprocessing: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: Filter: AND- {#JMXNAME} MATCHES_REGEX - {#JMXNAME} NOT_MATCHES_REGEX |
Transaction metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime | Uptime of GridGain instance. |
JMX | jmx["{#JMXOBJ}",UpTime] Preprocessing: - MULTIPLIER: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Version | Version of GridGain instance. |
JMX | jmx["{#JMXOBJ}",FullVersion] Preprocessing: - REGEX: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID | Unique identifier for this node within grid. |
JMX | jmx["{#JMXOBJ}",LocalNodeId] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline | Total baseline nodes that are registered in the baseline topology. |
JMX | jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline | The number of nodes that are currently active in the baseline topology. |
JMX | jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client | The number of client nodes in the cluster. |
JMX | jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total | Total number of nodes. |
JMX | jmx["{#JMXOBJ}",TotalNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server | The number of server nodes in the cluster. |
JMX | jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current | Number of cancelled jobs that are still running. |
JMX | jmx["{#JMXOBJ}",CurrentCancelledJobs] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current | Number of jobs rejected after more recent collision resolution operation. |
JMX | jmx["{#JMXOBJ}",CurrentRejectedJobs] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current | Number of queued jobs currently waiting to be executed. |
JMX | jmx["{#JMXOBJ}",CurrentWaitingJobs] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current | Number of currently active jobs concurrently executing on the node. |
JMX | jmx["{#JMXOBJ}",CurrentActiveJobs] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate | Total number of jobs handled by the node per second. |
JMX | jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate | Total number of jobs cancelled by the node per second. |
JMX | jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate | Total number of jobs this node rejects during collision resolution operations since node startup per second. |
JMX | jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current | Current PME duration in milliseconds. |
JMX | jmx["{#JMXOBJ}",CurrentPmeDuration] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current | Current number of live threads. |
JMX | jmx["{#JMXOBJ}",CurrentThreadCount] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used | Current heap size that is used for object allocation. |
JMX | jmx["{#JMXOBJ}",HeapMemoryUsed] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator | Current coordinator UUID. |
JMX | jmx["{#JMXOBJ}",Coordinator] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left | Nodes left count. |
JMX | jmx["{#JMXOBJ}",NodesLeft] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined | Nodes join count. |
JMX | jmx["{#JMXOBJ}",NodesJoined] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed | Nodes failed count. |
JMX | jmx["{#JMXOBJ}",NodesFailed] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue | Message worker queue current size. |
JMX | jmx["{#JMXOBJ}",MessageWorkerQueueSize] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate | Number of times node tries to (re)establish connection to another node per second. |
JMX | jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages | The number of messages received per second. |
JMX | jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate | The number of messages processed per second. |
JMX | jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue | Outbound messages queue size. |
JMX | jmx["{#JMXOBJ}",OutboundMessagesQueueSize] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate | The number of messages received per second. |
JMX | jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate | The number of messages sent per second. |
JMX | jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate | Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second. |
JMX | jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGE_PER_SECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys | The number of keys locked on the node. |
JMX | jmx["{#JMXOBJ}",LockedKeysNumber] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current | The number of active transactions for which this node is the initiator. |
JMX | jmx["{#JMXOBJ}",OwnerTransactionsNumber] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current | The number of active transactions holding at least one key lock. |
JMX | jmx["{#JMXOBJ}",TransactionsHoldingLockNumber] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate | The number of transactions which were rollback per second. |
JMX | jmx["{#JMXOBJ}",TransactionsRolledBackNumber] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate | The number of transactions which were committed per second. |
JMX | jmx["{#JMXOBJ}",TransactionsCommittedNumber] |
GridGain | Cache group [{#JMXGROUP}]: Cache gets, rate | The number of gets to the cache per second. |
JMX | jmx["{#JMXOBJ}",CacheGets] Preprocessing: - CHANGE_PER_SECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache puts, rate | The number of puts to the cache per second. |
JMX | jmx["{#JMXOBJ}",CachePuts] Preprocessing: - CHANGE_PER_SECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache removals, rate | The number of removals from the cache per second. |
JMX | jmx["{#JMXOBJ}",CacheRemovals] Preprocessing: - CHANGE_PER_SECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache hits, pct | Percentage of successful hits. |
JMX | jmx["{#JMXOBJ}",CacheHitPercentage] |
GridGain | Cache group [{#JMXGROUP}]: Cache misses, pct | Percentage of accesses that failed to find anything. |
JMX | jmx["{#JMXOBJ}",CacheMissPercentage] |
GridGain | Cache group [{#JMXGROUP}]: Cache transaction commits, rate | The number of transaction commits per second. |
JMX | jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing: - CHANGE_PER_SECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate | The number of transaction rollback per second. |
JMX | jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing: - CHANGE_PER_SECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache size | The number of non-null values in the cache as a long value. |
JMX | jmx["{#JMXOBJ}",CacheSize] |
GridGain | Cache group [{#JMXGROUP}]: Cache heap entries | The number of entries in heap memory. |
JMX | jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing: - CHANGE_PER_SECOND |
GridGain | Data region {#JMXNAME}: Allocation, rate | Allocation rate (pages per second) averaged across rateTimeInternal. |
JMX | jmx["{#JMXOBJ}",AllocationRate] |
GridGain | Data region {#JMXNAME}: Allocated, bytes | Total size of memory allocated in bytes. |
JMX | jmx["{#JMXOBJ}",TotalAllocatedSize] |
GridGain | Data region {#JMXNAME}: Dirty pages | Number of pages in memory not yet synchronized with persistent storage. |
JMX | jmx["{#JMXOBJ}",DirtyPages] |
GridGain | Data region {#JMXNAME}: Eviction, rate | Eviction rate (pages per second). |
JMX | jmx["{#JMXOBJ}",EvictionRate] |
GridGain | Data region {#JMXNAME}: Size, max | Maximum memory region size defined by its data region. |
JMX | jmx["{#JMXOBJ}",MaxSize] |
GridGain | Data region {#JMXNAME}: Offheap size | Offheap size in bytes. |
JMX | jmx["{#JMXOBJ}",OffHeapSize] |
GridGain | Data region {#JMXNAME}: Offheap used size | Total used offheap size in bytes. |
JMX | jmx["{#JMXOBJ}",OffheapUsedSize] |
GridGain | Data region {#JMXNAME}: Pages fill factor | The percentage of the used space. |
JMX | jmx["{#JMXOBJ}",PagesFillFactor] |
GridGain | Data region {#JMXNAME}: Pages replace, rate | Rate at which pages in memory are replaced with pages from persistent storage (pages per second). |
JMX | jmx["{#JMXOBJ}",PagesReplaceRate] |
GridGain | Data region {#JMXNAME}: Used checkpoint buffer size | Used checkpoint buffer size in bytes. |
JMX | jmx["{#JMXOBJ}",UsedCheckpointBufferSize] |
GridGain | Data region {#JMXNAME}: Checkpoint buffer size | Total size in bytes for checkpoint buffer. |
JMX | jmx["{#JMXOBJ}",CheckpointBufferSize] |
GridGain | Cache group [{#JMXNAME}]: Backups | Count of backups configured for cache group. |
JMX | jmx["{#JMXOBJ}",Backups] |
GridGain | Cache group [{#JMXNAME}]: Partitions | Count of partitions for cache group. |
JMX | jmx["{#JMXOBJ}",Partitions] |
GridGain | Cache group [{#JMXNAME}]: Caches | List of caches. |
JMX | jmx["{#JMXOBJ}",Caches] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
GridGain | Cache group [{#JMXNAME}]: Local node partitions, moving | Count of partitions with state MOVING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount] |
GridGain | Cache group [{#JMXNAME}]: Local node partitions, renting | Count of partitions with state RENTING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount] |
GridGain | Cache group [{#JMXNAME}]: Local node entries, renting | Count of entries remains to evict in RENTING partitions located on this node for this cache group. |
JMX | jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount] |
GridGain | Cache group [{#JMXNAME}]: Local node partitions, owning | Count of partitions with state OWNING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount] |
GridGain | Cache group [{#JMXNAME}]: Partition copies, min | Minimum number of partition copies for all partitions of this cache group. |
JMX | jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies] |
GridGain | Cache group [{#JMXNAME}]: Partition copies, max | Maximum number of partition copies for all partitions of this cache group. |
JMX | jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies] |
GridGain | Thread pool [{#JMXNAME}]: Queue size | Current size of the execution queue. |
JMX | jmx["{#JMXOBJ}",QueueSize] |
GridGain | Thread pool [{#JMXNAME}]: Pool size | Current number of threads in the pool. |
JMX | jmx["{#JMXOBJ}",PoolSize] |
GridGain | Thread pool [{#JMXNAME}]: Pool size, max | The maximum allowed number of threads. |
JMX | jmx["{#JMXOBJ}",MaximumPoolSize] |
GridGain | Thread pool [{#JMXNAME}]: Pool size, core | The core number of threads. |
JMX | jmx["{#JMXOBJ}",CorePoolSize] |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m |
INFO | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1 |
WARNING | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed | GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0 |
INFO | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology | One or more server node left the topology. Ack to close. |
change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0 |
WARNING | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology | One or more server node added to the topology. Ack to close. |
change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0 |
INFO | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology | One or more server node left the topology. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes]) |
INFO | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high | Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN} |
WARNING | |
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN} |
WARNING | Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH} |
HIGH | |
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high | Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN} |
WARNING | Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed | GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0 |
WARNING | Manual close: YES |
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m | - |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0 |
AVERAGE | |
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m | - |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m) |
WARNING | Depends on: - Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m |
Cache group [{#JMXGROUP}]: All entries are in heap | All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount]) |
INFO | Manual close: YES |
Data region {#JMXNAME}: Node started to evict pages | You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Ack to close. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0 |
INFO | Manual close: YES |
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} |
WARNING | Depends on: - Data region {#JMXNAME}: Data region utilization is too high |
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} |
HIGH | |
Data region {#JMXNAME}: Pages replace rate more than 0 | There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0 |
WARNING | |
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} |
WARNING | Depends on: - Data region {#JMXNAME}: Checkpoint buffer utilization is too high |
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} |
HIGH | |
Cache group [{#JMXNAME}]: One or more backups are unavailable | - |
min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m) |
WARNING | |
Cache group [{#JMXNAME}]: List of caches has changed | List of caches has changed. Significant changes have occurred in the cluster. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0 |
INFO | Manual close: YES |
Cache group [{#JMXNAME}]: Rebalance in progress | Ack to close. |
max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0 |
INFO | Manual close: YES |
Cache group [{#JMXNAME}]: There is no copy for partitions | - |
max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0 |
WARNING | |
Thread pool [{#JMXNAME}]: Too many messages in queue | Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} |
AVERAGE |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.