Ad Widget

**nsmirnov** · 16-09-2022, 13:11

I believe I've found an error in this template.
There are four almost identical calculated item prototypes (called 'ES {#ES.NODE}: Flush latency', 'ES {#ES.NODE}: Indexing latency', 'ES {#ES.NODE}: Fetch latency', 'ES {#ES.NODE}: Query latency')
Their values are something like:

Code:

change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) )

So, it takes change of total flush time, and divide by change of total flushes count (adding 1 if it iz zero). Which is probably not what expected :-)

And here is what they should be:

Code:

last(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( last(//es.node.indices.flush.total[{#ES.NODE}]) + count(//es.node.indices.flush.total[{#ES.NODE}],#1,0) )

(don't sure if I got new syntax right, I have zabbix 5.0 here)
So, with new expression it should be as expected: total time divided by total amount (adding 1 if amount is 0).

**steevhise** · 28-02-2023, 02:03

Hi recently we discovered that ES alerts were not resulting in messages sent to Slack. All our other zabbix alerts are sending to slack fine. the ES alerts/triggers/problems are showing up on the zabbix dashboard like normal. but they dont trigger slack messages like other alerts. Is there something known that might cause this? thanks.

**prasanthi.narra** · 18-12-2023, 22:57

Hello, I'm using zabbix 6.4 and trying to use the ElasticSearch cluster by HTTP template. Have ES deployed on a Kubernetes cluster and have setup a DNS, which does not require port specified, to access the cluster since the container host might change each time we deploy. The items defined in this template are all failing or returning incorrect values. Can you confirm if Browse Zabbix / Zabbix - ZABBIX GIT template works on zabbix 6.4 to monitor ES 8.6.0 cluster.

**bmaster** · 08-11-2024, 10:10

Is anyone using this template to monitor an Elastic Cloud deployment?

**richlv** · 19-03-2025, 21:16

Originally posted by Marco Gastreghini

Hello, I don't undestand the algorithm in trigger "ES: Cluster does not have enough space for resharding".

Formula is : "(last(/<nodename>/es.nodes.fs.total_in_bytes)-last(/<nodename>/es.nodes.fs.available_in_bytes))/(last(/<nodename>/es.cluster.number_of_data_nodes)-1)>last(/<nodename>/es.nodes.fs.available_in_bytes)".

Why it uses this formula?
What kind of monitoring want to implement?

Thanks in advance.
MG

That trigger seems to calculate the total used space, divide that by node_number-1 and check that it's bigger than the total available space.
I'd guess that it is trying to verify that all the existing indexes could be reindexed simultaneously, but that does not seem that likely to happen for most people.

**richlv** · 19-03-2025, 21:18

Originally posted by prasanthi.narra

Hello, I'm using zabbix 6.4 and trying to use the ElasticSearch cluster by HTTP template. Have ES deployed on a Kubernetes cluster and have setup a DNS, which does not require port specified, to access the cluster since the container host might change each time we deploy. The items defined in this template are all failing or returning incorrect values. Can you confirm if Browse Zabbix / Zabbix - ZABBIX GIT template works on zabbix 6.4 to monitor ES 8.6.0 cluster.

It mostly works with Zabbix 7.0 and Elasticsearch 8.17, so it seems likely that it worked with your versions back then.

**richlv** · 19-03-2025, 21:19

Originally posted by bmaster

Is anyone using this template to monitor an Elastic Cloud deployment?

Can confirm that it can collect data from an Elastic cloud deployment.

**richlv** · 19-03-2025, 22:17

Originally posted by nsmirnov

I believe I've found an error in this template.
There are four almost identical calculated item prototypes (called 'ES {#ES.NODE}: Flush latency', 'ES {#ES.NODE}: Indexing latency', 'ES {#ES.NODE}: Fetch latency', 'ES {#ES.NODE}: Query latency')
Their values are something like:

Code:

change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) )

So, it takes change of total flush time, and divide by change of total flushes count (adding 1 if it iz zero). Which is probably not what expected :-)

And here is what they should be:

Code:

last(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( last(//es.node.indices.flush.total[{#ES.NODE}]) + count(//es.node.indices.flush.total[{#ES.NODE}],#1,0) )

(don't sure if I got new syntax right, I have zabbix 5.0 here)
So, with new expression it should be as expected: total time divided by total amount (adding 1 if amount is 0).

After staring at it for a while, I believe the original trigger makes sense.
The source items are not per-something (check or a time period like a second), they are totals. They make hill-graphs, as I call them.
That is, they record the total time spent and the total number of operations like flushes (total since the node startup). On their own, this makes for nearly useless data and graphs.
The calculated item takes the difference between the last two checks, and thus obtains a value that precisely represents the time period between those two checks - it gets the time and amount in that period instead of values at the points in time, matching the checks.
Those being total items also explains why the calculated items never return negative values

Your suggestion would calculate the average latency since the node startup, not the latency in the period between the last two checks.
This would be exactly the same value as the template item the very first time, but then it would average out more and more, eventually becoming a nearly static value. Think iotop.

So the formulas in the template seem to make sense - but the trigger description is in the best traditions of code comments - "repeat what the code does technically without explaining the functionality and the reasoning behind it"

**Ihor Ru** · 05-01-2026, 17:27

Hello,

how do you monitor disk usage on ES nodes and is there any trigger if node disk usage is approaching cluster's watermark disk low value?

**guntis_liepins** · 06-01-2026, 18:23

Originally posted by Ihor Ru

Hello,

how do you monitor disk usage on ES nodes and is there any trigger if node disk usage is approaching cluster's watermark disk low value?

While there are trigger in zabbix ES template:
Elasticsearch Cluster by HTTP: Elasticsearch: Cluster does not have enough space (single node)
(last(/elastic/es.nodes.fs.available_in_bytes) < {$ELASTICSEARCH.SINGLE.NODE.JVM.SPACE.MIN}) and last(/elastic/es.cluster.number_of_data_nodes) = 1

I just monitor nodes with zabbix agent and set disk space limits a bit below low watermark.

Ad Widget

Discussion thread for official Zabbix Template for ElasticSearch

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment