Ad Widget

Collapse

Zabbix_agent unreachable

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • porkcharsui
    Junior Member
    • May 2012
    • 15

    #1

    Zabbix_agent unreachable

    Hello,

    I keep having trouble with the zabbix agent(or server, I suspect). On my zabbix server(or actually on every host) I regularly get the message that the agent is unreachable, but when I try a zabbix_get from the server command line the agent(s) is/are perfectly reachable. It can't have anything to do with firewall , because they are always reachable from the command line and some times in the frontend. I don't think it is something with the agent, because it's still running when the zabbix frontend says its unreachable(and zabbix_get works). In the agent log nothing is shown, but the last restart of the agent, no errors at all.

    The only thing I can see in the server log:
    on different items I get "[<item>]failed: first network error, wait for 15 seconds" and mostly I don't get a second error. The other thing I see is "resuming Zabbix agent checks on host [<host>]: connection restored".
    How can it lose the connection? It's connecting to 127.0.0.1.

    This is leaving large holes in the data zabbix is collecting and makes it quite worthless, so is there someone that can please tell me what could be wrong here.

    Almost forgot to mention... I'm running Zabbix 2.0.0 server and client on Debian 6. The other hosts are mainly Ubuntu 10.04 with 2.0.0 client.

    Thanks in advance.
    Last edited by porkcharsui; 14-06-2012, 17:54.
  • rocksteady
    Junior Member
    • Apr 2010
    • 26

    #2
    I had exactly same problem

    Zabbix host is an ubuntu 12.04 with 5g ram,4 proc,raid 10 disks and 3 dedicated netcard in lacp .

    I try to use it also on single netcard, but I got same errors

    The db is hosted on postgresl9.0 with daily partition, running debian6 on 4 core raid 60 server.

    Got some problem also on the remost host (centos 5.8) 8 proc , 16 Gb of ram and another raid 60 disks.

    all apparently run smoothly , graphs and trends are really faster and I han no visible hole in the data connection.

    It's happen all on a couple of ms, I mean "Zabbix agent item [proc.num[]] on host [xxx] failed: first netrework error, wait for 15 seconds" and suddenly followed by "resuming Zabbix agent checks on host [midcbld02]: connection restored" .

    The same hardware before runs, with no erros zabbix 1.8.7 .
    Clean databases was added and server was added thru discovery.


    Ideas ???


    Cheers Marco

    Comment

    • Davidus
      Senior Member
      • Dec 2010
      • 281

      #3
      What is update interval for you item?

      Comment

      • porkcharsui
        Junior Member
        • May 2012
        • 15

        #4
        Depends on what item!? The agent.ping is set to 60 seconds.

        Comment

        • porkcharsui
          Junior Member
          • May 2012
          • 15

          #5
          Anyone... anything? It's still happening! How can this happen? Firewalls are open, agents are running, I can reach 'm with the zabbix_get command, but the frontend says they're unreachable... WHY!?!?

          Comment

          • f14.maverick
            Junior Member
            • Jun 2012
            • 3

            #6
            I solved the problem using the nodata time slot of the trigger bigger than item interval.
            * trigger agent.ping.nodata(2m)}=1
            * item Update interval (in sec) = 60

            Comment

            • porkcharsui
              Junior Member
              • May 2012
              • 15

              #7
              The only thing that happens when I do that is, I get the errors faster.
              * trigger agent.ping.nodata(2m)}=1 (this was 5m)
              * item Update interval (in sec) = 60 (this is the default setting)

              Comment

              • f14.maverick
                Junior Member
                • Jun 2012
                • 3

                #8
                How much busy icmp pinger processes is your zabbix server?

                Comment

                • herta
                  Senior Member
                  • Sep 2011
                  • 101

                  #9
                  What do you get if you go to Administration -> Queue? Do you have many items in the "minutes" columns?
                  If so, select "Details" from the drop down menu on the right. As you cannot sort these columns, I usually select all the data and copy it to a spreadsheet. Sort it by Host and check which host has the most delays.
                  If the host itself has enough resources (memory and cpu), try increasing the number of zabbix-agents.

                  Comment

                  • porkcharsui
                    Junior Member
                    • May 2012
                    • 15

                    #10
                    The "minutes" colums are pretty full... especially the "more then 10 minutes". I've increased the number of agents from the default 3 to 10 on most of the hosts now and it seems to be improving. Since I'm using it to monitor workstations as well as servers, it will take a few days for the new agent conf to be distributed to all the hosts and find out for certain, but I'm hopefull.

                    Comment

                    • f14.maverick
                      Junior Member
                      • Jun 2012
                      • 3

                      #11
                      Thanks herta, it was very useful. But I defined 15 agents for my slower host, and still items in the queue.
                      Is there some config in the server side to improve it?

                      Comment

                      • herta
                        Senior Member
                        • Sep 2011
                        • 101

                        #12
                        hints and pointer

                        Well, there's a lot that can go wrong performancewise. And without detailed information about your set-up and limited experience using zabbix myself, I can only point out the more obvious sores - and you probably have already checked most of these:

                        - Can your hardware cope with the load?
                        Run "top" and check the load on your server and on your zabbix clients. The very rough rule of thumb I use is that the load average should remain less than twice the number of CPU cores on my systems. The CPU wait percentage should be low (if not, your I/O system cannot cope). Swap space preferably is unused, but at least should not change constantly. Is there a process which almost constantly shows up first on the list? Then investigate if it is running optimally.
                        You could also install sysstat, enable "sar" and monitor the load on both your zabbix clients and server. (My initial server was choking because of lack of memory.)

                        - Check syslog for hw errors

                        - Check your network connections - as you seem to have issues with several clients, start with a "netstat -e" and "ifconfig" on your server and check for errors. From your zabbix server, run "ping" against one of your clients until you see a missing value in zabbix. Did you lose packets? Were there moments when the response was slow? If so, ask your network admin for help in troubleshooting the network.

                        - If you use dns to resolve hostnames, verify that it is responsive enough. (From your zabbix server, you can e.g. check with "time nslookup xxx", where xxx is the hostname of one of your zabbix clients.)

                        - Take a look at your database. Is it tuned properly? (Use your favourite webbrowser to find articles on how to tune your database.)

                        - Have a critical look at the number of items you are monitoring and the frequency at which you monitor them. E.g., if the filesystems on your client don't change frequently, there's no point in checking for new ones every 10 minutes.
                        (I defined a bunch of items without enabling them. They are there so that I can quickly enable them when I need to troubleshoot a problem.)

                        - And definitely take a look at http://www.slideshare.net/xsbr/alexe...formancetuning

                        Hope this helps.

                        If you ever discover what causes your issues, please report back so that we can all learn from your experience.

                        Kind regards,

                        Herta

                        Comment

                        Working...