Ad Widget

Collapse

A lot of alerts "agent unreachable 5 minutes"

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    A lot of alerts "agent unreachable 5 minutes"

    My zabbix environment has a zabbix server, zabbix db, zabbix proxy, all agents connect to proxy.
    zabbix version is 3.4.8, operation system is redhat 7.4.

    Lately,there are lots of alerts " xxx is unreachable for 5 minutes", and the those alerts will recovery in 15 - 20 mintues. and this happens once a day, but not at the same time.
    this proxy connect about 400 agent, about 150 thousands items, 1.3k NVPS.
    And I found, when the alerts happen, there is a proxy alert "Zabbix busy data sender processes is 100%" is happen.
    Any one can help me resolve this problem?

    Click image for larger version

Name:	20190613232955.png
Views:	2
Size:	136.9 KB
ID:	380694
    Click image for larger version

Name:	20190614105223.png
Views:	1
Size:	94.2 KB
ID:	380695
    Attached Files

    #2
    1) Are you using automatic Housekeeping?
    2) Can you try to stress your enviroment, like reducing the update interval to see if you can replicate the error?
    3) What is the topology of your zabbix solution (same machine for DB, server and web?)
    4) Can you post the relevant configurantion itens of the servers, proxies and agents?
    5) What kind of itens are you using (snmp, passive/active agent) ?
    6) It's a new zabbix solution, or it was working before ? Had something changed?





    Comment


      #3
      Thank you,
      1) Are you using automatic Housekeeping?
      The Housekeeping setting is default, HousekeepingFrequency=1, MaxHousekeeperDelete=5000.
      2) Can you try to stress your enviroment, like reducing the update interval to see if you can replicate the error?
      I create a new proxyc, and move all the snmp agent to the new proxyc last friday, now it seen become normal, but I don't know when it is will happen again.
      3) What is the topology of your zabbix solution (same machine for DB, server and web?)
      one machine for db, one machine server and web, one machine zabbix proxy and proxy db.
      4) Can you post the relevant configurantion itens of the servers, proxies and agents?
      This system cann't upload the config file, show "upload failed", how to upload the config file?
      5) What kind of itens are you using (snmp, passive/active agent) ?
      snmp and mainly passive agent
      6) It's a new zabbix solution, or it was working before ? Had something changed?
      last year we build this solution, and last month appear this problem。

      I'd like to ask, is there a bottleneck in the db or zabbix server or zabbix proxy?

      Comment


        #4
        Mostly the bottleneck is the database...
        There is a post in the zabbix blog: hitting 9400 NVPS, this post helped me a lot...
        I strongly recommend it...

        I disabled the auto housekeeping, will make it manually (in the future).

        The relevant values on my setup:
        server.conf:
        StartPollers=32
        StartPollersUnreachable=128
        StartTrappers=16
        StartPingers=16
        CacheSize=1G
        CacheUpdateFrequency=300
        StartDBSyncers=5
        HistoryCacheSize=128M
        HistoryIndexCacheSize=128M
        TrendCacheSize=512M
        ValueCacheSize=512M
        Timeout=30
        UnreachablePeriod=60
        UnavailableDelay=120
        UnreachableDelay=30
        LogSlowQueries=3000

        Proxy.conf:
        ProxyMode=0
        HeartbeatFrequency=60
        ConfigFrequency=300
        DataSenderFrequency=60
        StartTrappers=255
        StartPingers=255
        StartDiscoverers=1
        HousekeepingFrequency=0
        CacheSize=2G
        HistoryIndexCacheSize=512M
        Timeout=30
        TrapperTimeout=300
        UnreachablePeriod=3600
        UnavailableDelay=60
        UnreachableDelay=15
        LogSlowQueries=3000
        AllowRoot=0
        DataSenderFrequency=1
        StartPollers=256
        StartPollersUnreachable=128
        StartTrappers=512
        StartPingers=512
        CacheSize=128M
        HistoryCacheSize=32M

        Comment


          #5
          Hi, this is my config, please help to see if the configuration is reasonable.

          server.conf:
          StartPollers=500
          StartPreprocessors=20
          StartPingers=10
          StartDiscoverers=5
          CacheSize=1024M
          StartDBSyncers=20
          HistoryCacheSize=256M
          HistoryIndexCacheSize=32M
          TrendCacheSize=1G
          Timeout=30
          AlertScriptsPath=/app/zabbix-3.4.8/share/zabbix/alertscripts
          LogSlowQueries=3000
          StartProxyPollers=5

          proxy.conf:
          ConfigFrequency=60
          DataSenderFrequency=5
          StartPollers=500
          StartPollersUnreachable=10
          StartPingers=5
          CacheSize=1024M
          HistoryCacheSize=128M
          Timeout=30
          LogSlowQueries=3000

          Comment


            #6
            Disabling housekeeper without immediate replacement of proper partitioning will only make matters worse because you'll run out of disk space sooner than you'll notice.

            Config files and tuning values in them are meaningless without the context of Zabbix internal performance graphs.

            You have 500 pollers. Why? for 150000 items at 1.3K nvps really?

            StartDBSyncers=20 - one of the mostly overtuned values in Zabbix config. Rule of thumb is one syncer per each 1000 nvps. In your case adding more than default 4 is asking for degraded performance straight away.

            Timeout=30 - do you really have such long running passive checks that require such long timeouts?

            And the next thing - what about the database and its tuning?

            Comment


              #7
              It looks like I'm going to readjust my server.conf:
              StartPollers=500 //it's should reduce the number to 50 or less
              StartDBSyncers=20 //It's should use the default.

              and the database and its tuning // How to do it ? post the my.cnf configure ?

              Comment


                #8
                You have to adjust your settings according to Zabbix internal performance graphs on your Zabbix server, not by blind forum suggestions.

                And regarding database - it is a huge topic but there are many discussions about it here in other threads too. my.cnf is a good starting point but it can't help much not knowing the hardware specs and version of MySQL.

                Comment

                Working...
                X