Ad Widget

Collapse

Zabbix agent unreachable alert for the whole environment - reoccurring issue

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ConstantCurrent
    Junior Member
    • May 2018
    • 4

    #1

    Zabbix agent unreachable alert for the whole environment - reoccurring issue

    Recently I've taken over administration over Zabbix environment. It consist of 597 hosts currently. Required NVPS 465.85. Zabbix server, frontend and DB (Postgres) are hosted one VM.

    There is a strange issue going on. Occasionally (once every 1-2 weeks) i can observe strange behavior of the whole Zabbix environment. Zabbix can’t read items for any host (including Zabbix server). When I take a look at history values, it’s blank (screenshots attached) on every item.
    What can be wrong here? Where to start looking for the solution? When it comes to queue length, memory usage, CPU load it all looks good to me. So far i've disabled internal housekeeping for history and trends. Timeout is already set to 30. I've also incresed number of pollers slightly. However, if i add more pollers, Zabbix frontend throws an error. So far it's:
    StartPollers=30
    StartPollersUnreachable=35
    StartPingers=5

    I will be very grateful for any suggestions.
  • LenR
    Senior Member
    • Sep 2009
    • 1005

    #2
    What version? Check to see if it happens while housekeeping is running. Do you lose the values of history write cache also? If zabbix gets behind writing history to the database, it will look like you're not collecting data but the problem isn't on the collection side, it's on the backend. We are a lot larger, mysql, but had similar problems after zabbix restart from 3.4.5 thru 3.4.7.

    Comment

    • ConstantCurrent
      Junior Member
      • May 2018
      • 4

      #3
      We're running Zabbix version 3.0. I've attached housekeeper process graph. It did run beforehand. However, it went from 100% to 0 around 1 hour before the issue occurred. Can you tell me how to check, if I loose values of history write cache?
      When it comes to pinpointing the issue, i also think the problem is that the database doesn't keep up and it's staying behind. Do you know how could i check, if that is a valid assumption?
      Attached Files

      Comment

      • josephmuli
        Junior Member
        • Apr 2018
        • 1

        #4
        I've also had this problem for a couple of weeks now and the fix I applied was increasing the number of pollers(StartPollers and StartPollersUnreachable) my queue was really bad (about 200 delayed). I'm running Zabbix version 3.2

        Comment

        Working...