Ad Widget

Collapse

Zabbix server |LLD | lost of data because of OOM

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • db100
    Member
    • Feb 2023
    • 61

    #1

    Zabbix server |LLD | lost of data because of OOM

    This is a bug report. I will be providing logs and other pieces of information next.

    I am running Zabbix 6.4.0 on kubernetes and i have imposed a limit to its memory.

    The server handles about 2000 hosts and >20000 items, all discovered via means of LLDs. All LLD discovery rules are set to have a retention of undiscovered items of 30 days

    For some reason, at some point the server simply eliminates all discovered Hosts and Items in a housekeeping job that (according to the log) takes about 400 seconds to execute (regular housekeeping ususally take up just few seconds).

    I am still investigating this issue but i fear it might be linked to the CPU or RAM limits imposed on the Pod. That's why the title of this post.

    Has anyone had similar symptoms ? This is quite a bad bug i must say

    ---

    UPDATE

    Here the log i see in the server container: "invalid discovery rule ID [51085]" ---> this goes for most of the rules


    and before that:

    ```
    244:20230517:065155.162 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: deadlock detected
    DETAIL: Process 1680 waits for ShareLock on transaction 2741830; blocked by process 1683.
    Process 1683 waits for ShareLock on transaction 2741132; blocked by process 1680.
    HINT: See server log for query details.
    CONTEXT: while deleting tuple (36,3) in relation "item_rtdata"
    SQL statement "DELETE FROM ONLY "public"."item_rtdata" WHERE $1 OPERATOR(pg_catalog.=) "itemid""
    [delete from functions where (itemid in​
    ```



    if i try to create more LLDs it seems the they are not executed, i see no useful logs in the server process. only these two things:
    • postgres DB log: `LOG: could not receive data from client: Connection reset by peer`
    • zabbix server:

    Code:
    ...
    271:20230517:073822.065 server #35 started
    7:20230517:073822.068 "zabbix-server-..." node started in "active" mode
    272:20230517:073822.068 server #36 started
    273:20230517:073822.072 server #37 started
    Bad operator (INTEGER): At line 73 in /var/lib/mibs/ietf/SNMPv2-PDU
    243:20230517:073823.368 thread started
    243:20230517:073823.368  thread started
    243:20230517:073823.368  thread started
    this is the onllog i see ...

    Last edited by db100; 17-05-2023, 09:47.
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    This is not a place for bug reports... Those can be submitted https://support.zabbix.com/browse/ZBX

    Comment

  • db100
    Member
    • Feb 2023
    • 61

    #3
    ok, i have tried to create a new instance of zabbix 6.4.2 and apply all hosts and templates needed for the LLD to run and it DOES not create any host ... i am not sure if the Host protoype specification has changed, maybe there was a breaking change with the upgrade? i see no error in the console log, but from the message posted above it seems that something does not work anymore in the host generation process of the LLD (which uses a JAvaScript preprocessor ... so maybe something was broken there with the update ??)

    Comment

    • db100
      Member
      • Feb 2023
      • 61

      #4
      ok, i have figured out what is wrong not:

      "Zabbix does not support nested host prototypes, i.e. host prototypes are not supported on hosts that are discovered by low-level discovery rule."



      i have no idea how did i manage to create all hosts using these nested host prototypes rules ... but it seems to have worked ???

      but now all hosts are gone ... which is strange, because i was expecting them to simply become "disabled" or unsupported ...

      Comment

      Working...