Elasticsearch

Elasticsearch

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java.

Available solutions




Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http


Template App Elasticsearch Cluster by HTTP

Overview

For Zabbix version: 5.0
The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.

This template was tested on:

  • Zabbix, version 5.0
  • Elasticsearch, version 6.5..7.6

Setup

You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}

Maximum of fetch latency in milliseconds for trigger expression.

100
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}

Maximum of flush latency in milliseconds for trigger expression.

100
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}

The maximum percent in the use of JVM heap for critically trigger expression.

95
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}

The maximum percent in the use of JVM heap for warning trigger expression.

85
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}

Maximum of indexing latency in milliseconds for trigger expression.

100
{$ELASTICSEARCH.PASSWORD}

The password of the Elasticsearch.

``
{$ELASTICSEARCH.PORT}

The port of the Elasticsearch host.

9200
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}

Maximum of query latency in milliseconds for trigger expression.

100
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}

The ES cluster maximum response time in seconds for trigger expression.

10s
{$ELASTICSEARCH.SCHEME}

The scheme of the Elasticsearch (http/https).

http
{$ELASTICSEARCH.USERNAME}

The username of the Elasticsearch.

``

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Cluster nodes discovery

Discovery ES cluster nodes.

HTTP_AGENT es.nodes.discovery

Preprocessing:

- JSONPATH: $.nodes.[*]

- DISCARD_UNCHANGED_HEARTBEAT: 1d

Items collected

Group Name Description Type Key and additional info
ES_cluster ES: Service status

Checks if the service is running and accepting TCP connections.

SIMPLE net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 10m

ES_cluster ES: Service response time

Checks performance of the TCP service.

SIMPLE net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"]
ES_cluster ES: Cluster health status

Health status of the cluster, based on the state of its primary and replica shards. Statuses are:

green

All shards are assigned.

yellow

All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.

red

One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.

DEPENDENT es.cluster.status

Preprocessing:

- JSONPATH: $.status

- JAVASCRIPT: var state = ['green', 'yellow', 'red']; return state.indexOf(value.trim()) === -1 ? 255 : state.indexOf(value.trim());

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Number of nodes

The number of nodes within the cluster.

DEPENDENT es.cluster.number_of_nodes

Preprocessing:

- JSONPATH: $.number_of_nodes

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Number of data nodes

The number of nodes that are dedicated to data nodes.

DEPENDENT es.cluster.number_of_data_nodes

Preprocessing:

- JSONPATH: $.number_of_data_nodes

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Number of relocating shards

The number of shards that are under relocation.

DEPENDENT es.cluster.relocating_shards

Preprocessing:

- JSONPATH: $.relocating_shards

ES_cluster ES: Number of initializing shards

The number of shards that are under initialization.

DEPENDENT es.cluster.initializing_shards

Preprocessing:

- JSONPATH: $.initializing_shards

ES_cluster ES: Number of unassigned shards

The number of shards that are not allocated.

DEPENDENT es.cluster.unassigned_shards

Preprocessing:

- JSONPATH: $.unassigned_shards

ES_cluster ES: Delayed unassigned shards

The number of shards whose allocation has been delayed by the timeout settings.

DEPENDENT es.cluster.delayed_unassigned_shards

Preprocessing:

- JSONPATH: $.delayed_unassigned_shards

ES_cluster ES: Number of pending tasks

The number of cluster-level changes that have not yet been executed.

DEPENDENT es.cluster.number_of_pending_tasks

Preprocessing:

- JSONPATH: $.number_of_pending_tasks

ES_cluster ES: Task max waiting in queue

The time expressed in seconds since the earliest initiated task is waiting for being performed.

DEPENDENT es.cluster.task_max_waiting_in_queue

Preprocessing:

- JSONPATH: $.task_max_waiting_in_queue_millis

- MULTIPLIER: 0.001

ES_cluster ES: Inactive shards percentage

The ratio of inactive shards in the cluster expressed as a percentage.

DEPENDENT es.cluster.inactive_shards_percent_as_number

Preprocessing:

- JSONPATH: $.active_shards_percent_as_number

- JAVASCRIPT: return (100 - value)

ES_cluster ES: Cluster uptime

Uptime duration in seconds since JVM has last started.

DEPENDENT es.nodes.jvm.max_uptime[{#ES.NODE}]

Preprocessing:

- JSONPATH: $.nodes.jvm.max_uptime_in_millis

- MULTIPLIER: 0.001

ES_cluster ES: Number of non-deleted documents

The total number of non-deleted documents across all primary shards assigned to the selected nodes.

This number is based on the documents in Lucene segments and may include the documents from nested fields.

DEPENDENT es.indices.docs.count

Preprocessing:

- JSONPATH: $.indices.docs.count

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Indices with shards assigned to nodes

The total number of indices with shards assigned to the selected nodes.

DEPENDENT es.indices.count

Preprocessing:

- JSONPATH: $.indices.count

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Total size of all file stores

The total size in bytes of all file stores across all selected nodes.

DEPENDENT es.nodes.fs.total_in_bytes

Preprocessing:

- JSONPATH: $.nodes.fs.total_in_bytes

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Total available size to JVM in all file stores

The total number of bytes available to JVM in the file stores across all selected nodes.

Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes.

This is the actual amount of free disk space the selected Elasticsearch nodes can use.

DEPENDENT es.nodes.fs.available_in_bytes

Preprocessing:

- JSONPATH: $.nodes.fs.available_in_bytes

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Nodes with the data role

The number of selected nodes with the data role.

DEPENDENT es.nodes.count.data

Preprocessing:

- JSONPATH: $.nodes.count.data

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Nodes with the ingest role

The number of selected nodes with the ingest role.

DEPENDENT es.nodes.count.ingest

Preprocessing:

- JSONPATH: $.nodes.count.ingest

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES: Nodes with the master role

The number of selected nodes with the master role.

DEPENDENT es.nodes.count.master

Preprocessing:

- JSONPATH: $.nodes.count.master

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES {#ES.NODE}: Total size

Total size (in bytes) of all file stores.

DEPENDENT es.node.fs.total.total_in_bytes[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].fs.total.total_in_bytes.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1d

ES_cluster ES {#ES.NODE}: Total available size

The total number of bytes available to this Java virtual machine on all file stores.

Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes.

This is the actual amount of free disk space the Elasticsearch node can utilize.

DEPENDENT es.node.fs.total.available_in_bytes[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].fs.total.available_in_bytes.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES {#ES.NODE}: Node uptime

JVM uptime in seconds.

DEPENDENT es.node.jvm.uptime[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].jvm.uptime_in_millis.first()

- MULTIPLIER: 0.001

ES_cluster ES {#ES.NODE}: Maximum JVM memory available for use

The maximum amount of memory, in bytes, available for use by the heap.

DEPENDENT es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_max_in_bytes.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1d

ES_cluster ES {#ES.NODE}: Amount of JVM heap currently in use

The memory, in bytes, currently in use by the heap.

DEPENDENT es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_in_bytes.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES {#ES.NODE}: Percent of JVM heap currently in use

The percentage of memory currently in use by the heap.

DEPENDENT es.node.jvm.mem.heap_used_percent[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_percent.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES {#ES.NODE}: Amount of JVM heap committed

The amount of memory, in bytes, available for use by the heap.

DEPENDENT es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_committed_in_bytes.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES {#ES.NODE}: Number of open HTTP connections

The number of currently open HTTP connections for the node.

DEPENDENT es.node.http.current_open[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].http.current_open.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES {#ES.NODE}: Rate of HTTP connections opened

The number of HTTP connections opened for the node per second.

DEPENDENT es.node.http.opened.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].http.total_opened.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Time spent throttling operations

Time in seconds spent throttling operations for the last measuring span.

DEPENDENT es.node.indices.indexing.throttle_time[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.indexing.throttle_time_in_millis.first()

- MULTIPLIER: 0.001

- SIMPLE_CHANGE

ES_cluster ES {#ES.NODE}: Time spent throttling recovery operations

Time in seconds spent throttling recovery operations for the last measuring span.

DEPENDENT es.node.indices.recovery.throttle_time[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.recovery.throttle_time_in_millis.first()

- MULTIPLIER: 0.001

- SIMPLE_CHANGE

ES_cluster ES {#ES.NODE}: Time spent throttling merge operations

Time in seconds spent throttling merge operations for the last measuring span.

DEPENDENT es.node.indices.merges.total_throttled_time[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.merges.total_throttled_time_in_millis.first()

- MULTIPLIER: 0.001

- SIMPLE_CHANGE

ES_cluster ES {#ES.NODE}: Rate of queries

The number of query operations per second.

DEPENDENT es.node.indices.search.query.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Time spent performing query

Time in seconds spent performing query operations for the last measuring span.

DEPENDENT es.node.indices.search.query_time[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()

- MULTIPLIER: 0.001

- SIMPLE_CHANGE

ES_cluster ES {#ES.NODE}: Query latency

The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.

CALCULATED es.node.indices.search.query_latency[{#ES.NODE}]

Expression:

last(es.node.indices.search.query_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.search.query_total[{#ES.NODE}]) + (last(es.node.indices.search.query_total[{#ES.NODE}]) = 0) )
ES_cluster ES {#ES.NODE}: Current query operations

The number of query operations currently running.

DEPENDENT es.node.indices.search.query_current[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.query_current.first()

ES_cluster ES {#ES.NODE}: Rate of fetch

The number of fetch operations per second.

DEPENDENT es.node.indices.search.fetch.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Time spent performing fetch

Time in seconds spent performing fetch operations for the last measuring span.

DEPENDENT es.node.indices.search.fetch_time[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()

- MULTIPLIER: 0.001

- SIMPLE_CHANGE

ES_cluster ES {#ES.NODE}: Fetch latency

The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.

CALCULATED es.node.indices.search.fetch_latency[{#ES.NODE}]

Expression:

last(es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.search.fetch_total[{#ES.NODE}]) + (last(es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) )
ES_cluster ES {#ES.NODE}: Current fetch operations

The number of fetch operations currently running.

DEPENDENT es.node.indices.search.fetch_current[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.fetch_current.first()

ES_cluster ES {#ES.NODE}: Write thread pool executor tasks completed

The number of tasks completed by the write thread pool executor.

DEPENDENT es.node.thread_pool.write.completed.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.write.completed.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Write thread pool active threads

The number of active threads in the write thread pool.

DEPENDENT es.node.thread_pool.write.active[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.write.active.first()

ES_cluster ES {#ES.NODE}: Write thread pool tasks in queue

The number of tasks in queue for the write thread pool.

DEPENDENT es.node.thread_pool.write.queue[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.write.queue.first()

ES_cluster ES {#ES.NODE}: Write thread pool executor tasks rejected

The number of tasks rejected by the write thread pool executor.

DEPENDENT es.node.thread_pool.write.rejected.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.write.rejected.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Search thread pool executor tasks completed

The number of tasks completed by the search thread pool executor.

DEPENDENT es.node.thread_pool.search.completed.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.search.completed.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Search thread pool active threads

The number of active threads in the search thread pool.

DEPENDENT es.node.thread_pool.search.active[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.search.active.first()

ES_cluster ES {#ES.NODE}: Search thread pool tasks in queue

The number of tasks in queue for the search thread pool.

DEPENDENT es.node.thread_pool.search.queue[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.search.queue.first()

ES_cluster ES {#ES.NODE}: Search thread pool executor tasks rejected

The number of tasks rejected by the search thread pool executor.

DEPENDENT es.node.thread_pool.search.rejected.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.search.rejected.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Refresh thread pool executor tasks completed

The number of tasks completed by the refresh thread pool executor.

DEPENDENT es.node.thread_pool.refresh.completed.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.completed.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Refresh thread pool active threads

The number of active threads in the refresh thread pool.

DEPENDENT es.node.thread_pool.refresh.active[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.active.first()

ES_cluster ES {#ES.NODE}: Refresh thread pool tasks in queue

The number of tasks in queue for the refresh thread pool.

DEPENDENT es.node.thread_pool.refresh.queue[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.queue.first()

ES_cluster ES {#ES.NODE}: Refresh thread pool executor tasks rejected

The number of tasks rejected by the refresh thread pool executor.

DEPENDENT es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.rejected.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Indexing latency

The average indexing latency calculated from the available index_total and index_time_in_millis metrics.

CALCULATED es.node.indices.indexing.index_latency[{#ES.NODE}]

Expression:

last(es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.indexing.index_total[{#ES.NODE}]) + (last(es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) )
ES_cluster ES {#ES.NODE}: Current indexing operations

The number of indexing operations currently running.

DEPENDENT es.node.indices.indexing.index_current[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.indexing.index_current.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

ES_cluster ES {#ES.NODE}: Flush latency

The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.

CALCULATED es.node.indices.flush.latency[{#ES.NODE}]

Expression:

last(es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.flush.total[{#ES.NODE}]) + (last(es.node.indices.flush.total[{#ES.NODE}]) = 0) )
ES_cluster ES {#ES.NODE}: Rate of index refreshes

The number of refresh operations per second.

DEPENDENT es.node.indices.refresh.rate[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.refresh.total.first()

- CHANGE_PER_SECOND

ES_cluster ES {#ES.NODE}: Time spent performing refresh

Time in seconds spent performing refresh operations for the last measuring span.

DEPENDENT es.node.indices.refresh.time[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.refresh.total_time_in_millis.first()

- MULTIPLIER: 0.001

- SIMPLE_CHANGE

Zabbix_raw_items ES: Get cluster health

Returns the health status of a cluster.

HTTP_AGENT es.cluster.get_health
Zabbix_raw_items ES: Get cluster stats

Returns cluster statistics.

HTTP_AGENT es.cluster.get_stats
Zabbix_raw_items ES: Get nodes stats

Returns cluster nodes statistics.

HTTP_AGENT es.nodes.get_stats
Zabbix_raw_items ES {#ES.NODE}: Total number of query

The total number of query operations.

DEPENDENT es.node.indices.search.query_total[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Zabbix_raw_items ES {#ES.NODE}: Total time spent performing query

Time in milliseconds spent performing query operations.

DEPENDENT es.node.indices.search.query_time_in_millis[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Zabbix_raw_items ES {#ES.NODE}: Total number of fetch

The total number of fetch operations.

DEPENDENT es.node.indices.search.fetch_total[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Zabbix_raw_items ES {#ES.NODE}: Total time spent performing fetch

Time in milliseconds spent performing fetch operations.

DEPENDENT es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Zabbix_raw_items ES {#ES.NODE}: Total number of indexing

The total number of indexing operations.

DEPENDENT es.node.indices.indexing.index_total[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.indexing.index_total.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Zabbix_raw_items ES {#ES.NODE}: Total time spent performing indexing

Total time in milliseconds spent performing indexing operations.

DEPENDENT es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.indexing.index_time_in_millis.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Zabbix_raw_items ES {#ES.NODE}: Total number of index flushes to disk

The total number of flush operations.

DEPENDENT es.node.indices.flush.total[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.flush.total.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Zabbix_raw_items ES {#ES.NODE}: Total time spent on flushing indices to disk

Total time in milliseconds spent performing flush operations.

DEPENDENT es.node.indices.flush.total_time_in_millis[{#ES.NODE}]

Preprocessing:

- JSONPATH: $..[?(@.name=='{#ES.NODE}')].indices.flush.total_time_in_millis.first()

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Triggers

Name Description Expression Severity Dependencies and additional info
ES: Service is down

The service is unavailable or does not accept TCP connections.

{TEMPLATE_NAME:net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"].last()}=0 AVERAGE

Manual close: YES

ES: Service response time is too high (over {$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} for 5m)

The performance of the TCP service is very low.

{TEMPLATE_NAME:net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"].min(5m)}>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} WARNING

Manual close: YES

Depends on:

- ES: Service is down

ES: Health is YELLOW

All primary shards are assigned, but one or more replica shards are unassigned.

If a node in the cluster fails, some data could be unavailable until that node is repaired.

{TEMPLATE_NAME:es.cluster.status.last()}=1 AVERAGE
ES: Health is RED

One or more primary shards are unassigned, so some data is unavailable.

This can occur briefly during cluster startup as primary shards are assigned.

{TEMPLATE_NAME:es.cluster.status.last()}=2 HIGH
ES: Health is UNKNOWN

The health status of the cluster is unknown or cannot be obtained.

{TEMPLATE_NAME:es.cluster.status.last()}=255 HIGH
ES: The number of nodes within the cluster has decreased {TEMPLATE_NAME:es.cluster.number_of_nodes.change()}<0 INFO

Manual close: YES

ES: The number of nodes within the cluster has increased {TEMPLATE_NAME:es.cluster.number_of_nodes.change()}>0 INFO

Manual close: YES

ES: Cluster has the initializing shards

The cluster has the initializing shards longer than 10 minutes.

{TEMPLATE_NAME:es.cluster.initializing_shards.min(10m)}>0 AVERAGE
ES: Cluster has the unassigned shards

The cluster has the unassigned shards longer than 10 minutes.

{TEMPLATE_NAME:es.cluster.unassigned_shards.min(10m)}>0 AVERAGE
ES: Cluster has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:es.nodes.jvm.max_uptime[{#ES.NODE}].last()}<10m INFO

Manual close: YES

ES: Cluster does not have enough space for resharding

There is not enough disk space for index resharding.

({Template App Elasticsearch Cluster by HTTP:es.nodes.fs.total_in_bytes.last()}-{TEMPLATE_NAME:es.nodes.fs.available_in_bytes.last()})/({Template App Elasticsearch Cluster by HTTP:es.cluster.number_of_data_nodes.last()}-1)>{TEMPLATE_NAME:es.nodes.fs.available_in_bytes.last()} HIGH
ES: Cluster has only two master nodes

The cluster has only two nodes with a master role and will be unavailable if one of them breaks.

{TEMPLATE_NAME:es.nodes.count.master.last()}=2 DISASTER
ES {#ES.NODE}: Node {#ES.NODE} has been restarted (uptime < 10m)

Uptime is less than 10 minutes

{TEMPLATE_NAME:es.node.jvm.uptime[{#ES.NODE}].last()}<10m INFO

Manual close: YES

ES {#ES.NODE}: Percent of JVM heap in use is high (over {$ELASTICSEARCH.HEAP_USED.MAX.WARN}% for 1h)

This indicates that the rate of garbage collection isn’t keeping up with the rate of garbage creation.

To address this problem, you can either increase your heap size (as long as it remains below the recommended

guidelines stated above), or scale out the cluster by adding more nodes.

{TEMPLATE_NAME:es.node.jvm.mem.heap_used_percent[{#ES.NODE}].min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} WARNING

Depends on:

- ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h)

ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h)

This indicates that the rate of garbage collection isn’t keeping up with the rate of garbage creation.

To address this problem, you can either increase your heap size (as long as it remains below the recommended

guidelines stated above), or scale out the cluster by adding more nodes.

{TEMPLATE_NAME:es.node.jvm.mem.heap_used_percent[{#ES.NODE}].min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} HIGH
ES {#ES.NODE}: Query latency is too high (over {$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ms for 5m)

If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.

{TEMPLATE_NAME:es.node.indices.search.query_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} WARNING
ES {#ES.NODE}: Fetch latency is too high (over {$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ms for 5m)

The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing,

this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.),

or requesting too many results.

{TEMPLATE_NAME:es.node.indices.search.fetch_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} WARNING
ES {#ES.NODE}: Write thread pool executor has the rejected tasks (for 5m)

The number of tasks rejected by the write thread pool executor is over 0 for 5m.

{TEMPLATE_NAME:es.node.thread_pool.write.rejected.rate[{#ES.NODE}].min(5m)}>0 WARNING
ES {#ES.NODE}: Search thread pool executor has the rejected tasks (for 5m)

The number of tasks rejected by the search thread pool executor is over 0 for 5m.

{TEMPLATE_NAME:es.node.thread_pool.search.rejected.rate[{#ES.NODE}].min(5m)}>0 WARNING
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks (for 5m)

The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.

{TEMPLATE_NAME:es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}].min(5m)}>0 WARNING
ES {#ES.NODE}: Indexing latency is too high (over {$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ms for 5m)

If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch’s documentation

recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).

{TEMPLATE_NAME:es.node.indices.indexing.index_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} WARNING
ES {#ES.NODE}: Flush latency is too high (over {$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ms for 5m)

If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate

and eventually prevent you from being able to add new information to your index.

{TEMPLATE_NAME:es.node.indices.flush.latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

References

https://www.elastic.co/guide/en/elasticsearch/reference/index.html

Articles and documentation

+ Propose new article
Add your solution