Ad Widget

Collapse

Large Amount of Agent unreachable

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • unficyp
    Junior Member
    • Dec 2014
    • 27

    #1

    Large Amount of Agent unreachable

    Hi,
    since our upgrade from Zabbix 3.4.4 to 3.4.6 we experience a large amount of "..Agent unreachable.." waves. Last night we had alone > 400 Mails with unreachable messages coming in from 02:03h to 03:30h.
    I was able to check some of the mentioned hosts during the "Mailstorm" with ping/fping and agent.ping - all of the hosts were up and reachable. Checked the switch ports also - no errors on the interface and no bandwith problem.

    The only thing we changed was the upgrade to 3.4.6.

    Out setup:
    Host: Bare Metal Thomas Krenn Server, 4x Xeon E5-2403, 12GB RAM
    OS: FreeBSD 11.1 with ZFS Root, 2 (SATA Disks in Mirror :/)
    DB: PostgreSQL 9.5.10, Datafiles on ZFS
    Zabbix: 3.4.6,compiled from Ports Collection

    some changed config options:

    Code:
    StartPollers=25
    StartIPMIPollers=4
    StartPreprocessors=4
    StartPollersUnreachable=10
    StartTrappers=4
    StartPingers=20
    StartDiscoverers=2
    StartHTTPPollers=6
    StartTimers=5
    StartEscalators=4
    StartAlerters=6
    StartJavaPollers=4
    StartVMwareCollectors=10
    StartDBSyncers=6
    CacheSize=64M
    HistoryCacheSize=32M
    HistoryIndexCacheSize=8M
    TrendCacheSize=16M
    ValueCacheSize=128M
    Included zabbix internal process busy graph. Checked queuesizes, memory swap, etc.

    The only possible problem i see is that the preprocessing process has to much busy % - and the Server specs may be too low.

    Aside from that, i don't have a clue where to start looking.

    Any thoughts ?
    Thanks in advance,
    Gerald
    Attached Files
  • kaspars.mednis
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2017
    • 349

    #2
    Please check also Zabbix data gathering processes busy graph, the data gathering processes are responsible for data coillection from hosts

    If you run out of pingers or pollers, you are starting to miss data...

    Regards,
    Kaspars

    Comment

    • unficyp
      Junior Member
      • Dec 2014
      • 27

      #3
      The data gathering processes are looking good, none of them above 60%, most of the time they are at 25-40%.

      But i think i found the problem - after disabling the housekeeping process, i have no more agent unreachable messages. The Screenshot shows the processes during the "problem". I guess the server/database is a way too slow...

      EDIT can't upload the picture, link here: https://ibb.co/kK7vob
      Last edited by unficyp; 27-01-2018, 18:39.

      Comment

      Working...