Ad Widget

Collapse

Discussion thread for official Zabbix Template for ElasticSearch

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • AlexL
    Zabbix Certified Specialist
    Zabbix Certified Specialist
    • Aug 2019
    • 55

    #1

    Discussion thread for official Zabbix Template for ElasticSearch


    This thread is designed to provide grounds for discussion of the official Zabbix Template for ElasticSearch.
    The template and details of the template is available in GIT repository. https://git.zabbix.com/projects/ZBX/...ticsearch_http

    Zabbix is always looking for ways to improve our services and to make our users happier.
    We pride ourselves on doing our best each and every day, but we know that there is always something more to learn.
    We would like to hear back from you to know what have you liked and what would you improve in the template.
    Last edited by AlexL; 19-10-2020, 15:34.
  • yurtesen
    Senior Member
    • Aug 2008
    • 130

    #2
    I am wondering why this template does not have zabbix agent version for collecting the data. In most cases zabbix agent would be installed on the server and elasticsearch can be accessed securely using a local connection. Doesn't that make sense?
    I made a zabbix agent version at... https://share.zabbix.com/databases/o...y-zabbix-agent
    Last edited by yurtesen; 17-05-2020, 16:30.

    Comment

    • yurtesen
      Senior Member
      • Aug 2008
      • 130

      #3
      Can you tell why some `es.node.indices.*` items are listed in `Zabbix raw items` application? For example `es.node.indices.search.fetch_total[{#ES.NODE}]` Shouldn't they be under `ES {#ES.NODE}` application prototype?

      Comment

      • CB-Zabbix
        Junior Member
        • May 2020
        • 1

        #4
        I've got an Error from the "Cluster nodes discovery":
        Incorrect value "[]" for "headers" field.
        Incorrect value "[]" for "query_fields" field.

        If i use the {$ELASTICSEARCH.SCHEME}://{HOST.CONN}:{$ELASTICSEARCH.PORT}/_nodes/_all/nodes with the Firefox, i can see all the JSON Data. The vars will be the correct values in the browser.

        We have zabbix 5.0 on Ubuntu 18.04.
        The Makros are correct. (Port is standard and scheme is http)

        Comment

        • grommir
          Senior Member
          • Mar 2013
          • 134

          #5
          It's possible to convert this template for zabbix version 4.x?

          Comment

          • pucky_wins
            Junior Member
            • Mar 2020
            • 7

            #6
            Hi

            Is it possible to get the cluster name into the notifications? I have a few clusters and I'd like to know which one is failing.

            Comment

            • che666
              Junior Member
              • May 2020
              • 5

              #7
              First of all thank you very much.
              1. I would love if you could include the agent based checks (as we have the zabbix server in a different physical location than the actual systems we monitor). The link posted above work quite fine. https://share.zabbix.com/databases/o...y-zabbix-agent.
              2. What would be great is to include the cluster name in the actual error reports:
                1. ES: Cluster does not have enough space for resharding
              3. There is an issue with some alerts. I have clusters with up to 8 nodes and each of the nodes has the monitoring enabled. Some of the error messages are appearing more than once and also get notifications out from every single node. (e.g. on telegram using the upstream telegram)
                1. ES: Cluster does not have enough space for resharding <- this is reported by every single node
                2. ES nodename: Flush latency is too high (over 100ms for 5m) <- even for node specific problems each single node reports that one single other node has a problem, so in the case that node1 has the problem node[1-8] are all reporting it individually.
              4. A more general suggestion, how enabling people to submit pull requests for the templates? I would love to contribute some minor improvements and i am sure i am not the only one. The result would be that people can actually get the out of the box monitoring capability improved.

              Thank you for all your work,
              Rudi

              Comment

              • yurtesen
                Senior Member
                • Aug 2008
                • 130

                #8
                grommir you can try to remove the parts which may be incompatible with 4.x. I think discard with threshold may be one issue.Unfortunately I do not have access to old Zabbix installations.

                pucky_wins that is a tricky thing to do. There are limitations to what you can use in name and description fields of a trigger. But it may be possible in a more logical way in Zabbix 5.2 with "Event name" field which accepts expression macros
                https://www.zabbix.com/documentation...iggers/trigger
                I added collection of `cluster_name` to be collected as an item....

                che666
                2- same problem as pucky_wins not easily accomplishable.
                3-
                1- Interesting dilemma. The problem is clear, but how you think it should work? at least I am not sure if it is accomplishable according to manual:
                https://www.zabbix.com/documentation...s/dependencies
                2- if "1" can be fixed, same fix can be applied to this problem also.
                3- https://www.zabbix.com/forum/zabbix-...ting-to-zabbix

                The best would be if Zabbix would accept contributions more directly. But it seems it is not so straightforward. If you want I put my changes to github and you can feel free to suggest changes to it and make pull requests.

                I have put my template to github now. Initially I did not expect so much interest. I think it is easier to track issues using the issue system at github.So please report them there if you want:
                Template App Elasticsearch by Zabbix agent. Contribute to yurtesen/zabbix_elasticsearch development by creating an account on GitHub.
                Last edited by yurtesen; 22-11-2020, 23:58.

                Comment

                • mirekchk
                  Junior Member
                  • Oct 2020
                  • 5

                  #9
                  Hi,

                  is it possible to use authetication for queries with Template App Elasticsearch Cluster by Zabbix agent active ?



                  Comment

                  • mirekchk
                    Junior Member
                    • Oct 2020
                    • 5

                    #10
                    Originally posted by mirekchk
                    Hi,

                    is it possible to use authetication for queries with Template App Elasticsearch Cluster by Zabbix agent active ?


                    I have been read old documentation and forum posts.

                    Solution is:
                    Code:
                    web.page.get[{$ELASTICSEARCH.SCHEME}://{$ELASTICSEARCH.USERNAME}:{$ELASTICSEARCH.PASSWORD}@{$ELASTICSEARCH.HOST}:{$ELASTICSEARCH.PORT}/_cluster/health?timeout=5s]

                    Comment

                    • kerem.yarali
                      Junior Member
                      • Sep 2019
                      • 23

                      #11
                      Hi,

                      I filled {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template (Template App Elasticsearch Cluster by HTTP) level and assigned to the host. However I'm getting timeout error which you can find details below.

                      Cannot perform request: Connection timed out after 15001 milliseconds

                      Could you please help me what should I concantrate on?

                      Note: Zabbix version is 5.0

                      Thanks
                      Last edited by kerem.yarali; 29-03-2021, 10:09.

                      Comment

                      • michaelm14
                        Junior Member
                        • Apr 2021
                        • 1

                        #12
                        I'm also getting the timeout in the zabbix server logs, however, if I manually run the curl from that server with
                        Code:
                        'curl -k [URL="https://username:[email protected]:9200/_cluster/health?pretty"]hxxps://username:password@targetaddr.../health?pretty[/URL]'
                        I get results just fine. Since I am using self signed certificates, I must use the -k in the curl command. Is this possibly the cause of the timeout and is there a way to add the -k option in a macro? I verified I do not have the ssl verify peer or host checked in the http_agent settings. Thanks for your help!

                        Comment


                        • michaelm14
                          michaelm14 commented
                          Editing a comment
                          My issue was connectivity after all. 443 was not open directly since I was using a load balance. All good!
                      • partizanes
                        Junior Member
                        • Oct 2021
                        • 2

                        #13
                        Hello , I think I have problems with this template.

                        Zabbix server configuration:
                        8 CPU
                        8GB memory
                        SSD

                        Zabbix 5.2.7
                        zabbix_server.conf
                        LogFile=/var/log/zabbix/zabbix_server.log
                        LogFileSize=0
                        DebugLevel=3
                        PidFile=/var/run/zabbix/zabbix_server.pid
                        SocketDir=/var/run/zabbix
                        DBName=******
                        DBUser=****
                        DBPassword=****
                        StartPreprocessors=8
                        SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
                        ListenIP=0.0.0.0
                        CacheSize=256M
                        Timeout=15
                        AlertScriptsPath=/usr/lib/zabbix/alertscripts
                        ExternalScripts=/usr/lib/zabbix/externalscripts
                        LogSlowQueries=3000
                        StatsAllowedIP=127.0.0.1


                        Before connecting this template, I had 18 connected agents. The parameters "Utilization of preprocessing worker internal processes" and "Utilization of preprocessing manager internal processes" were kept at low values. After I connect a host with this template, the values can rise to 100% in waves. A couple of days ago, I deleted this template and re-imported it from the repository. Also I reinstalled the zabix agent on the host (zabbix-agent.x86_64 5.4.6-1.el7).After that, the maximum load decreased, but it still has a waveform.Perhaps I do not understand something about the work of Zabbix, but could not comment on this behavior.Since raising these parameters to 100% leads to the fact that data from other hosts are no longer processed.


                        Here two hosts connected with this template:

                        Click image for larger version

Name:	preprocessing.png
Views:	3783
Size:	64.5 KB
ID:	433855
                        .

                        After all reinstall zabbix-agent and template i connect only one host for testing the problem:


                        Click image for larger version

Name:	prepro_2.png
Views:	3742
Size:	45.6 KB
ID:	433857


                        Any help?
                        Attached Files

                        Comment

                        • partizanes
                          Junior Member
                          • Oct 2021
                          • 2

                          #14
                          So now i have 100% internal worker process and cpu utilization:

                          Click image for larger version

Name:	graph.png
Views:	3775
Size:	85.5 KB
ID:	433916

                          Comment


                          • trideter
                            trideter commented
                            Editing a comment
                            Hello template, ES: Get nodes stats gets too many values (about 20MB),
                            try replacing {$ELASTICSEARCH.SCHEME}://{HOST.IP}:{$ELASTICSEARCH.PORT}/_nodes/stats
                            with {$ELASTICSEARCH.SCHEME}://{HOST.IP}:{$ ELASTICSEARCH.PORT}/_nodes/stats/jvm,indices,fs (here are the metrics you need), in this case the system will not be loaded so much

                          • psychomoise
                            psychomoise commented
                            Editing a comment
                            There are too many dependant items on the es.nodes.get_stats master item which can be quite big as indicated by @trideter.
                            for more information on the issue you can check this : https://support.zabbix.com/browse/ZBX-21032 and https://support.zabbix.com/browse/ZBX-20590 as this is corresponding in my case to the issue, the preprocessing manager queuing items and one preprocessing worker saturated.

                            to solve the issue:
                            - create two new prototype items in the discovery of the nodes which will be the exact same as es.nodes.get_stats with the following differences:
                            name for the first is "ES {#ES.NODE}: Get node stats (fs,indices,jvm,thread_pool)" and second is "ES {#ES.NODE}: Get node stats (http)"
                            key for the first is "{$ELASTICSEARCH.SCHEME}://{HOST.CONN}:{$ELASTICSEARCH.PORT}/_nodes/{#ES.NODE}/stats/fs,indices,jvm,thread_pool" and second is "{$ELASTICSEARCH.SCHEME}://{HOST.CONN}:{$ELASTICSEARCH.PORT}/_nodes/{#ES.NODE}/stats/http"
                            - modify all the dependant items in the prototype items in the nodes discovery so the master item is now "ES {#ES.NODE}: Get node stats (fs,indices,jvm,thread_pool)"
                            - modify the prototype items "ES {#ES.NODE}: Number of open HTTP connections" and "ES {#ES.NODE}: Rate of HTTP connections opened" so the master item is now "ES {#ES.NODE}: Get node stats (http)"
                            - disable/delete the item "ES: Get nodes stats" which has no purpose now

                            Those changes should lower the load on the preprocesing processes (manager and workers), in my end with 8 cluster monitored with the template, I went down to almost no CPU consumption on the preprocessing processes instead of preprocessing manager saturated (and creating queued items) and 1 preprocessing worker saturated.

                            It could be good if the template is updated that way to solve a lots of performance issues in the future, especially with version 6.0 of Zabbix on which there is a big change in the preprocessing processes.
                            The preprocessing manager is now sending all the dependant of the same master item to the same preprocessing worker. That means if you have hundreds of dependant item linked to the same master item, your preprocessing worker can be completly saturated.
                            The idea to solve this kind of issue :
                            - reduce the number of dependant item for the same master item
                            - reduce the volume of data in the master item so the preprocessing is not processing a big data containing a lots of information not needed.
                        • Marco Gastreghini
                          Junior Member
                          • Mar 2021
                          • 4

                          #15
                          Hello, I don't undestand the algorithm in trigger "ES: Cluster does not have enough space for resharding".

                          Formula is : "(last(/<nodename>/es.nodes.fs.total_in_bytes)-last(/<nodename>/es.nodes.fs.available_in_bytes))/(last(/<nodename>/es.cluster.number_of_data_nodes)-1)>last(/<nodename>/es.nodes.fs.available_in_bytes)".

                          Why it uses this formula?
                          What kind of monitoring want to implement?

                          Thanks in advance.
                          MG
                          Last edited by Marco Gastreghini; 16-11-2021, 18:05.

                          Comment

                          Working...