Ad Widget

Collapse

Discussion thread for official Zabbix Template for ElasticSearch

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nsmirnov
    Junior Member
    • Sep 2022
    • 1

    #16
    I believe I've found an error in this template.
    There are four almost identical calculated item prototypes (called 'ES {#ES.NODE}: Flush latency', 'ES {#ES.NODE}: Indexing latency', 'ES {#ES.NODE}: Fetch latency', 'ES {#ES.NODE}: Query latency')
    Their values are something like:
    Code:
    change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) )
    So, it takes change of total flush time, and divide by change of total flushes count (adding 1 if it iz zero). Which is probably not what expected :-)

    And here is what they should be:
    Code:
    last(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( last(//es.node.indices.flush.total[{#ES.NODE}]) + count(//es.node.indices.flush.total[{#ES.NODE}],#1,0) )
    (don't sure if I got new syntax right, I have zabbix 5.0 here)
    So, with new expression it should be as expected: total time divided by total amount (adding 1 if amount is 0).

    Comment

    • steevhise
      Junior Member
      • Sep 2019
      • 2

      #17
      Hi recently we discovered that ES alerts were not resulting in messages sent to Slack. All our other zabbix alerts are sending to slack fine. the ES alerts/triggers/problems are showing up on the zabbix dashboard like normal. but they dont trigger slack messages like other alerts. Is there something known that might cause this? thanks.

      Comment

      • prasanthi.narra
        Junior Member
        • Dec 2023
        • 2

        #18
        Hello, I'm using zabbix 6.4 and trying to use the ElasticSearch cluster by HTTP template. Have ES deployed on a Kubernetes cluster and have setup a DNS, which does not require port specified, to access the cluster since the container host might change each time we deploy. The items defined in this template are all failing or returning incorrect values. Can you confirm if Browse Zabbix / Zabbix - ZABBIX GIT​ template works on zabbix 6.4 to monitor ES 8.6.0 cluster.

        Comment

        • bmaster
          Junior Member
          • Nov 2024
          • 1

          #19
          Is anyone using this template to monitor an Elastic Cloud deployment?

          Comment

          • richlv
            Senior Member
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Oct 2005
            • 3112

            #20
            Originally posted by Marco Gastreghini
            Hello, I don't undestand the algorithm in trigger "ES: Cluster does not have enough space for resharding".

            Formula is : "(last(/<nodename>/es.nodes.fs.total_in_bytes)-last(/<nodename>/es.nodes.fs.available_in_bytes))/(last(/<nodename>/es.cluster.number_of_data_nodes)-1)>last(/<nodename>/es.nodes.fs.available_in_bytes)".

            Why it uses this formula?
            What kind of monitoring want to implement?

            Thanks in advance.
            MG
            That trigger seems to calculate the total used space, divide that by node_number-1 and check that it's bigger than the total available space.
            I'd guess that it is trying to verify that all the existing indexes could be reindexed simultaneously, but that does not seem that likely to happen for most people.
            Zabbix 3.0 Network Monitoring book

            Comment

            • richlv
              Senior Member
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Oct 2005
              • 3112

              #21
              Originally posted by prasanthi.narra
              Hello, I'm using zabbix 6.4 and trying to use the ElasticSearch cluster by HTTP template. Have ES deployed on a Kubernetes cluster and have setup a DNS, which does not require port specified, to access the cluster since the container host might change each time we deploy. The items defined in this template are all failing or returning incorrect values. Can you confirm if Browse Zabbix / Zabbix - ZABBIX GIT template works on zabbix 6.4 to monitor ES 8.6.0 cluster.
              It mostly works with Zabbix 7.0 and Elasticsearch 8.17, so it seems likely that it worked with your versions back then.
              Zabbix 3.0 Network Monitoring book

              Comment

              • richlv
                Senior Member
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Oct 2005
                • 3112

                #22
                Originally posted by bmaster
                Is anyone using this template to monitor an Elastic Cloud deployment?
                Can confirm that it can collect data from an Elastic cloud deployment.
                Zabbix 3.0 Network Monitoring book

                Comment

                • richlv
                  Senior Member
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Oct 2005
                  • 3112

                  #23
                  Originally posted by nsmirnov
                  I believe I've found an error in this template.
                  There are four almost identical calculated item prototypes (called 'ES {#ES.NODE}: Flush latency', 'ES {#ES.NODE}: Indexing latency', 'ES {#ES.NODE}: Fetch latency', 'ES {#ES.NODE}: Query latency')
                  Their values are something like:
                  Code:
                  change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) )
                  So, it takes change of total flush time, and divide by change of total flushes count (adding 1 if it iz zero). Which is probably not what expected :-)

                  And here is what they should be:
                  Code:
                  last(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( last(//es.node.indices.flush.total[{#ES.NODE}]) + count(//es.node.indices.flush.total[{#ES.NODE}],#1,0) )
                  (don't sure if I got new syntax right, I have zabbix 5.0 here)
                  So, with new expression it should be as expected: total time divided by total amount (adding 1 if amount is 0).
                  After staring at it for a while, I believe the original trigger makes sense.
                  The source items are not per-something (check or a time period like a second), they are totals. They make hill-graphs, as I call them.
                  That is, they record the total time spent and the total number of operations like flushes (total since the node startup). On their own, this makes for nearly useless data and graphs.
                  The calculated item takes the difference between the last two checks, and thus obtains a value that precisely represents the time period between those two checks - it gets the time and amount in that period instead of values at the points in time, matching the checks.
                  Those being total items also explains why the calculated items never return negative values

                  Your suggestion would calculate the average latency since the node startup, not the latency in the period between the last two checks.
                  This would be exactly the same value as the template item the very first time, but then it would average out more and more, eventually becoming a nearly static value. Think iotop.

                  So the formulas in the template seem to make sense - but the trigger description is in the best traditions of code comments - "repeat what the code does technically without explaining the functionality and the reasoning behind it"
                  Zabbix 3.0 Network Monitoring book

                  Comment

                  • Ihor Ru
                    Junior Member
                    • Jan 2026
                    • 1

                    #24
                    Hello,

                    how do you monitor disk usage on ES nodes and is there any trigger if node disk usage is approaching cluster's watermark disk low value?

                    Comment

                    • guntis_liepins
                      Junior Member
                      • Oct 2025
                      • 14

                      #25
                      Originally posted by Ihor Ru
                      Hello,

                      how do you monitor disk usage on ES nodes and is there any trigger if node disk usage is approaching cluster's watermark disk low value?
                      While there are trigger in zabbix ES template:
                      Elasticsearch Cluster by HTTP: Elasticsearch: Cluster does not have enough space (single node)
                      (last(/elastic/es.nodes.fs.available_in_bytes) < {$ELASTICSEARCH.SINGLE.NODE.JVM.SPACE.MIN}) and last(/elastic/es.cluster.number_of_data_nodes) = 1​

                      I just monitor nodes with zabbix agent and set disk space limits a bit below low watermark.

                      Comment

                      Working...