Ad Widget

Collapse

Elasticsearch backend overloaded after server restart

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • grommir
    Senior Member
    • Mar 2013
    • 134

    #1

    Elasticsearch backend overloaded after server restart

    Good day.

    I use Elasticsearch as a history storage (3 nodes in cluster). After restarting of the zabbix server, I see heavy load on the elastic cluster with many read queries (350-400 Mb/sec on each node).

    Queries is like this:
    Code:
    {
      "completed": false,
      "task": {
        "node": "Bf795ulkQzC-upCT5d4wRg",
        "id": 71693,
        "type": "transport",
        "action": "indices:data/read/search",
        "description": "indices[uint*], types[values], search_type[QUERY_THEN_FETCH], source[{\"size\":2,\"query\":{\"bool\":{\"must\":[{\"match\":{\"itemid\":{\"query\":71907,\"operator\":\"OR\",\"prefix_length\":0,\"max_expansions\":50,\"fuzzy_transpositions\":true,\"lenient\":false,\"zero_terms_query\":\"NONE\",\"boost\":1.0}}}],\"filter\":[{\"range\":{\"clock\":{\"from\":null,\"to\":1582269002,\"include_lower\":true,\"include_upper\":true,\"boost\":1.0}}}],\"disable_coord\":false,\"adjust_pure_negative\":true,\"boost\":1.0}},\"sort\":[{\"clock\":{\"order\":\"desc\"}}]}]",
        "start_time_in_millis": 1582272101942,
        "running_time_in_nanos": 56658436343,
        "cancellable": true
      }
    }
  • back2base
    Junior Member
    • Aug 2020
    • 3

    #2
    Hi!
    I experience the same problem with zabbix server 5.0.*. Did you figure out source of a problem?
    Right now it seems to me like some sort of an infinite loop bug in zabbix.
    Sometimes it happens couple of times a week, sometimes it happens once in several weeks.
    Zabbix starts bombarding the elastic beckend with the queries, causing high (>60%) io wait on cpu, and it never stops until zabbix host is rebooted.
    In my case zabbix and elastic are on the same host so checks stop working, notifications are not sent and stuff like that.
    Example of queries:
    "indices[uint*], types[], search_type[QUERY_THEN_FETCH], scroll[10s], source[{"size":3,"query":{"bool":{"must":[{"match":{"itemid":{"query":110585,"operato r":"OR","prefix_length":0,"max_expansions": 50,"fuzzy_transpositions":true,"lenient":false ,"zero_terms_query":"NONE","auto_generate_syn onyms_phrase_query":true,"boost":1.0}}}],"filter":[{"range":{"clock":{"from":null,"to":160114 4585,"include_lower":true,"include_upper":true ,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"s ort":[{"clock":{"order":"desc"}}]}]"
    "indices[uint*], types[], search_type[QUERY_THEN_FETCH], scroll[10s], source[{"size":2,"query":{"bool":{"must":[{"match":{"itemid":{"query":50773,"operator ":"OR","prefix_length":0,"max_expansions":5 0,"fuzzy_transpositions":true,"lenient":false, "zero_terms_query":"NONE","auto_generate_syno nyms_phrase_query":true,"boost":1.0}}}],"filter":[{"range":{"clock":{"from":null,"to":160114 4496,"include_lower":true,"include_upper":true ,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"s ort":[{"clock":{"order":"desc"}}]}]"
    "indices[uint*], types[], search_type[QUERY_THEN_FETCH], scroll[10s], source[{"size":2,"query":{"bool":{"must":[{"match":{"itemid":{"query":111524,"operato r":"OR","prefix_length":0,"max_expansions": 50,"fuzzy_transpositions":true,"lenient":false ,"zero_terms_query":"NONE","auto_generate_syn onyms_phrase_query":true,"boost":1.0}}}],"filter":[{"range":{"clock":{"from":null,"to":160114 4561,"include_lower":true,"include_upper":true ,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"s ort":[{"clock":{"order":"desc"}}]}]"
    "indices[uint*], types[], search_type[QUERY_THEN_FETCH], scroll[10s], source[{"size":3,"query":{"bool":{"must":[{"match":{"itemid":{"query":97098,"operator ":"OR","prefix_length":0,"max_expansions":5 0,"fuzzy_transpositions":true,"lenient":false, "zero_terms_query":"NONE","auto_generate_syno nyms_phrase_query":true,"boost":1.0}}}],"filter":[{"range":{"clock":{"from":null,"to":160114 4598,"include_lower":true,"include_upper":true ,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"s ort":[{"clock":{"order":"desc"}}]}]"
    "indices[uint*], types[], search_type[QUERY_THEN_FETCH], scroll[10s], source[{"size":2,"query":{"bool":{"must":[{"match":{"itemid":{"query":32705,"operator ":"OR","prefix_length":0,"max_expansions":5 0,"fuzzy_transpositions":true,"lenient":false, "zero_terms_query":"NONE","auto_generate_syno nyms_phrase_query":true,"boost":1.0}}}],"filter":[{"range":{"clock":{"from":null,"to":160114 4590,"include_lower":true,"include_upper":true ,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"s ort":[{"clock":{"order":"desc"}}]}]"
    "indices[uint*], types[], search_type[QUERY_THEN_FETCH], scroll[10s], source[{"size":3,"query":{"bool":{"must":[{"match":{"itemid":{"query":32360,"operator ":"OR","prefix_length":0,"max_expansions":5 0,"fuzzy_transpositions":true,"lenient":false, "zero_terms_query":"NONE","auto_generate_syno nyms_phrase_query":true,"boost":1.0}}}],"filter":[{"range":{"clock":{"from":null,"to":160114 4530,"include_lower":true,"include_upper":true ,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"s ort":[{"clock":{"order":"desc"}}]}]"
    Last edited by back2base; 26-09-2020, 23:16.

    Comment

    • grommir
      Senior Member
      • Mar 2013
      • 134

      #3
      No. Tried to ask Zabbix support, but they prefer to play dead, just because "Elasticsearch backend is experimental".
      Created a workaround script that closes all indexes except today's ones and restart the zabbix server. After starting, it waits for a few minutes and gradually opens indexes.

      Comment

      • vso
        Zabbix developer
        • Aug 2016
        • 190

        #4
        This is a long shot but maybe there is problem with performance due to this range:
        Code:
        [{\"range\":{\"clock\":{\"from\":null,\"to\":158226 9002,
        Is it possible to adjust "from" to be just hours before 1582269002 instead of null and try if it helps ?

        Comment


        • grommir
          grommir commented
          Editing a comment
          No, just because these requests are generated by the zabbix server.
      • vso
        Zabbix developer
        • Aug 2016
        • 190

        #5
        Could you please be so kind and provide more information about item with itemid 71907 what kind of triggers are there created for it and how big is range in that trigger ? Also what kind of data is received on that item, is there data with 0 timestamp ?

        Comment

        • grommir
          Senior Member
          • Mar 2013
          • 134

          #6
          In this case, it's the status of the Linux service "rhel-loadmodules". But in fact, it can be any item.

          Click image for larger version

Name:	2020-09-28_14h22_24.png
Views:	482
Size:	44.2 KB
ID:	409856

          Comment

          • grommir
            Senior Member
            • Mar 2013
            • 134

            #7
            In general, it looks like the zabbix server is trying to get ALL items at ALL times.

            Comment


            • back2base
              back2base commented
              Editing a comment
              I'd even say that it queries all items in infinite loop changing date range to the latest time every query. I checked some of the item ids - they are unrelated to each other. Some of them come from cico hosts, some from linux/windows etc.
          • vso
            Zabbix developer
            • Aug 2016
            • 190

            #8
            I am sorry but trigger information for the item and if there is history with 0 timestamp for the item is required in order to rule out misconfiguration.

            Comment

            • grommir
              Senior Member
              • Mar 2013
              • 134

              #9
              Click image for larger version

Name:	2020-09-28_16h42_34.png
Views:	343
Size:	37.6 KB
ID:	409871 There are three triggers configured for this item type:

              Comment

              • vso
                Zabbix developer
                • Aug 2016
                • 190

                #10
                Trigger seems fine, do you have information regarding history for this item, how much is there ? Something like:
                Code:
                select count(*) from history_text where itemid=71907 and clock=0;
                select count(*) from history_text where itemid=71907;

                Comment

                • grommir
                  Senior Member
                  • Mar 2013
                  • 134

                  #11
                  Actually query is in first post
                  Code:
                  curl -XGET 'http://zabbix-cluster-elasticsearch:9201/uint*/_search?pretty' -H 'Content-Type: application/json' -d'
                  {
                  "query": {
                  "bool": {
                  "must": [
                  {
                  "match": {
                  "itemid": {
                  "query": 71907,
                  "operator": "OR",
                  "prefix_length": 0,
                  "max_expansions": 50,
                  "fuzzy_transpositions": true,
                  "lenient": false,
                  "zero_terms_query": "NONE",
                  "boost": 1
                  }
                  }
                  }
                  ],
                  "filter": [
                  {
                  "range": {
                  "clock": {
                  "from": null,
                  "to": 1601385919,
                  "include_lower": true,
                  "include_upper": true,
                  "boost": 1
                  }
                  }
                  }
                  ],
                  "disable_coord": false,
                  "adjust_pure_negative": true,
                  "boost": 1
                  }
                  }
                  }
                  '
                  Output:
                  Code:
                  {
                  "took" : 317,
                  "timed_out" : false,
                  "_shards" : {
                  "total" : 440,
                  "successful" : 440,
                  "skipped" : 0,
                  "failed" : 0
                  },
                  "hits" : {
                  "total" : 375677,
                  "max_score" : 1.0,
                  "hits" : [
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "EJoJGHMBALcie_Lg3Yi7",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 796630574,
                  "clock" : 1593836192,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "VZsLGHMBALcie_LgXDA8",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 389424544,
                  "clock" : 1593836292,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "gZwOGHMBALcie_Lg6aIV",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 235077395,
                  "clock" : 1593836513,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "pbT3F3MBNJ1rygLwmFwG",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 899246622,
                  "clock" : 1593834992,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "QZgEGHMBALcie_Lg02KR",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 766155894,
                  "clock" : 1593835852,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "lIINGHMBK1ZgDOmYr06V",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 608026776,
                  "clock" : 1593836413,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "up4TGHMBALcie_Lg-rnX",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 277633922,
                  "clock" : 1593836854,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "iIAJGHMBK1ZgDOmYW4qz",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 200114531,
                  "clock" : 1593836132,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "uL4UGHMBNJ1rygLwaJga",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 767367730,
                  "clock" : 1593836874,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  },
                  {
                  "_index" : "uint-2020-07-04",
                  "_type" : "values",
                  "_id" : "c50SGHMBALcie_LgJv8a",
                  "_score" : 1.0,
                  "_source" : {
                  "itemid" : 71907,
                  "ns" : 118240178,
                  "clock" : 1593836734,
                  "value" : "0",
                  "ttl" : 7776000
                  }
                  }
                  ]
                  }
                  }

                  Comment

                  Working...