Announcement

Collapse
No announcement yet.

first network error, wait for 15 seconds

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    first network error, wait for 15 seconds

    Hi everybody,

    I know that there are some other topics related, but I think mine is a bit different so I create another post...

    I have the Zabbix 2.2.4 installed. My nvps is about 24 so it's not too high. Even CPU and pollers are OK I assume...

    The issue is that I have more hosts with a problem like explained on one particular host:

    The host has more than 300 Items (about 330). I'm pinging the host every 4 seconds and I'm reading its "nameStation" every 10 seconds. Other Items are read every 300 seconds.

    The problem begins when Zabbix starts to ask the host for those 300+ items... Regarding the tcpdump output, there is a huge number of requests at almost the same time, but just from time to time snmp response. And after a while, this error occurs and Zabbix does not read any new values even though the host is sending (late) snmp responses...

    After a while, it resumes and the situation repeats. It works until Zabbix wants to update those 300+ Items. It happens every 5 minutes...

    1823:20140904:123638.996 SNMP agent item "rTxFrequency" on host "X" failed: first network error, wait for 15 seconds
    1893:20140904:123653.913 resuming SNMP agent checks on host "X": connection restored
    1824:20140904:124138.988 SNMP agent item "ifType[1]" on host "X" failed: first network error, wait for 15 seconds
    1888:20140904:124154.005 resuming SNMP agent checks on host "X": connection restored
    1857:20140904:124638.970 SNMP agent item "stTermServNumber" on host "X" failed: first network error, wait for 15 seconds
    1884:20140904:124654.137 resuming SNMP agent checks on host "X": connection restored

    Can I somehow configure Zabbix not to want all the values "at once"?

    If you need more detailed configation, let me know...

    (I think maybe changing the interval to various values could solve this? Instead of 300 seconds, configure them with 290sec., 291sec, 292sec, ...)

    Or in the new Zabbix version (2.4) there is a note in the release notes (for the beta version) that the snmp-bulk-request will be configurable... So I could say "ask just for 5 values in one bulk request"?

    #2
    Update:

    I tried to change the intervals, but it was even worse (280, 285, 290, ... sec.). I.e. that 300 items were divided into groups of 20 and every group had a different update interval. The errors were occurring more frequently.

    Of course, when I disabled those 300 items and I was reading only 3-4 values, it was OK... But that's not my goal.

    This behaviour can be seen on more Hosts...

    Comment


      #3
      1. try increasing timeout:
      # grep Timeout /usr/local/etc/zabbix_server.conf
      Timeout=30
      2. try monitoring your queue:
      Administration -> Queue

      Comment


        #4
        Hi... Thanks...

        But I already have the timeout set to 30 seconds...

        Also see the attached Queue graphs... They seem to me OK... But still, the last day I had 1072 "resuming SNMP agent checks on host" messages from various hosts (about 25 different hosts).

        Plus 68 times the "query failed: [2006] MySQL server has gone away" message...

        I'm confused a lot...
        Last edited by Crypty; 19-02-2018, 16:29.

        Comment


          #5
          I've just upgraded to 2.4.0 and I'll see what happens... But it seems nothing changes so far...

          Comment


            #6
            Some other information about my installation:

            # mysql -V
            mysql Ver 14.14 Distrib 5.5.38, for debian-linux-gnu (x86_64) using readline 6.2

            ---

            zabbix_server.conf:

            LogFile=/var/log/zabbix/zabbix_server.log
            DebugLevel=3
            DBName=*****
            DBUser=*****
            DBPassword=*****
            StartPollers=80
            StartIPMIPollers=10
            StartPollersUnreachable=20
            StartTrappers=10
            StartPingers=12
            SNMPTrapperFile=/var/log/zabbix/zabbix_trapper.log
            StartSNMPTrapper=1
            HousekeepingFrequency=1
            MaxHousekeeperDelete=20000
            StartDBSyncers=12
            ValueCacheSize=32M
            Timeout=30
            UnreachablePeriod=120
            ExternalScripts=/usr/local/share/zabbix/externalscripts
            LogSlowQueries=3000

            Comment


              #7
              Hi Crypty,

              I will give you some very brief comments and recommendations. They will help you to tune your system to more optimal settings but it is not guaranteed that all the errors will disappear.

              FYI, before every tuning request here at forum it is highly recommended to tell the status of Zabbix server (host count) plus attach all graphs from Template App Zabbix server, or at least these three:
              • Zabbix internal process busy %
              • Zabbix data gathering process busy %
              • Zabbix cache usage

              Without them it is impossible to correctly tune your system.

              Anyways, let's tune the server config a bit.
              StartPollers=80 #for 24 NVPS, really? Check the graphs, I think it is too much but you decide. Keep all the pollers at ~30% busy
              StartSNMPTrapper=1 #could be too few if you are having SNMP issues
              StartDBSyncers=12 #return to default 4 ASAP. Too high value here causes lots of problems. Each DB syncer is capable of doing 1000 NVPS on a well tuned DB server. You have 24 NVPS so keep to defaults here
              ValueCacheSize=32M #Increase to 64M
              # All other caches = set to 32M because I suspect you have the default 8M everywhere
              Plus 68 times the "query failed: [2006] MySQL server has gone away" message...
              Check wait_timeout in MySQL config file (my.cnf). It should be 28800 seconds by default and it should be ok for your instance.

              And last but not least - first network error, wait for 15 seconds - you will tell me everything is fine but still check for network errors.

              Restart your zabbix_server after editing these cahnges and monitor it for a day or so.

              Best Regards,
              Ingus

              Comment


                #8
                Here are the graphs (7 days):
                Last edited by Crypty; 19-02-2018, 16:29.

                Comment


                  #9
                  I have nothing to add to my previous comments after reading those blurry jpeg images. (please, before posting, check that this stuff is readable).

                  Edit the configs to what I told you and check what happens.

                  Best Regards,
                  Ingua

                  Comment


                    #10
                    Okay, I tried this:

                    1)
                    StartPollers=20

                    2) This is not the # of pollers... Only enable/disable...

                    ### Option: StartSNMPTrapper
                    # If 1, SNMP trapper process is started.
                    #
                    # Mandatory: no
                    # Range: 0-1
                    # Default:
                    StartSNMPTrapper=1


                    3)
                    StartDBSyncers=4

                    4)
                    I could not configure the cache to 64M

                    4012:20140915:151251.820 using configuration file: /usr/local/etc/zabbix_server.conf
                    4012:20140915:151251.820 cannot allocate shared memory of size 67108864: [22] Invalid argument
                    4012:20140915:151251.820 cannot allocate shared memory for value cache size


                    32M was okay... I configured all other caches to 32M too...

                    Comment


                      #11
                      Ok, great!

                      Check the RAM usage of your zabbix server.

                      Comment


                        #12
                        Memory seems OK to me...

                        AVG memory available - 800 MB

                        Comment


                          #13
                          800MB.. ok.

                          Typo or other misconfiguration then.

                          Anyways see how it runs with 32M now.

                          Best Regards,
                          Ingus

                          Comment


                            #14
                            Btw I haven't found any parameter called "wait_timeout" in the my.cnf configuration file. May I consider it to be the needed 28800?

                            Comment


                              #15
                              And I also have some messages like the following ones:

                              7368:20140916:073921.032 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,status from hosts where host='dmz-zabbix' and status in (0,1) and flags<>2 and proxy_hostid is null]

                              7360:20140916:074321.062 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,status from hosts where host='dmz-zabbix' and status in (0,1) and flags<>2 and proxy_hostid is null]


                              What can be the reason? I have no idea... :/

                              Comment

                              Working...
                              X