Ad Widget

Collapse

Host unreachable, but other Items are collecting data

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ptader
    Member
    • Sep 2007
    • 52

    #1

    Host unreachable, but other Items are collecting data

    Server: 1.8.1
    Client: 1.8

    After a DNS upgrade went bad this morning, the old DNS server was put back online. Because we monitored nodes via DNS we saw a lot of "unreachable" clients. This was to be expected. But now, about 6 hours later, I still have some clients unreachable.

    What I know and did:
    • All other configured Items are being updated by Zabbix with these unreachable nodes. So networking works.
    • The only Item not being updated is "Host status" (key: status)
    • ssh'ing and telnet to the Zabbix ports works both ways as well as pings.
    • IP addresses are correct.
    • From the server log: ZABBIX Host [client1]: first network error, wait for 30 seconds
    • Server and client have been restarted.
    • Disabling the client for awhile and re-enabling within the server doesn't it fix it.


    I'd really like to know what this Item Host status does. I've read elsewhere that it's based on having more than one check and the node is online. I would think these nodes pass that test.

    The only fix I found was to delete the client in the Zabbix server configuration and configure a new one or export the client XML confgiration, delete the client from the server configuration and then import the XML. I have a about 30 clients in this unreachable state, some with custom Triggers/Items and Template membership. Sorry to report but importing doesn't create a exact copy of the client configuration after an import.

    Thanks for any suggestions,
    ptader
  • qix
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2006
    • 423

    #2
    Just an idea, did you switch agents from dns to ip to see if the status would change after that?
    With kind regards,

    Raymond

    Comment

    • ptader
      Member
      • Sep 2007
      • 52

      #3
      Unfortunately no change when the client is configured with an IP address.

      Not a fix but a work-around was to create a "Full Clone" of the troubled nodes to another name, deleted the original, and then renamed the clone back to the original client name.

      Additional information: This happen on two separate Zabbix servers on the same network.

      Comment

      • qix
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Oct 2006
        • 423

        #4
        That's not very encouraging...
        With kind regards,

        Raymond

        Comment

        • jclariana
          Junior Member
          • Nov 2009
          • 2

          #5
          Same problem here:

          Zabbix_server 1.8.3
          Zabbix_proxy 1.8.3
          Zabbix_agent 1.8.3

          Regards.

          Comment

          • qix
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Oct 2006
            • 423

            #6
            Perhaps you guys should file a bug for the dev team.
            With kind regards,

            Raymond

            Comment

            • jclariana
              Junior Member
              • Nov 2009
              • 2

              #7
              qix, it already exists:


              I've just posted a new comment, but it does not seem to be fixed in the near time, the issue was created on 03/Jul/09, regarding version 1.6.5! And the problem is still there...

              Regards.

              Comment

              • Surge
                Junior Member
                • Sep 2010
                • 16

                #8
                I had the same problem in 1.8.3 and ended up disabling the host status key for all hosts.
                Host status is particularly problematic over unstable network connections.

                Instead I use icmpping and define a trigger of icmpping.max(240)=0 so that if no ICMP reply packets are received for 4 minutes the host is deemed as being down.

                Comment

                • qix
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Oct 2006
                  • 423

                  #9
                  Hello Surge,

                  You might run into a problem when the agent dies.
                  You won't notice the lack of incoming values until you need them.

                  I suggest to also set a trigger with nodata() on a item that periodically returns values (e.g. uptime) to make sure you don't miss this event when it occurs.
                  With kind regards,

                  Raymond

                  Comment

                  • qix
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Oct 2006
                    • 423

                    #10
                    Come to think of it, I think you can work around networklag using .min(xx)=2 instead of .last(0)=2 in the 'Host unreachable trigger'.
                    With kind regards,

                    Raymond

                    Comment

                    • hoper
                      Junior Member
                      • Mar 2010
                      • 3

                      #11
                      Same here. Is there any howto available ?

                      Hi,

                      I have the same issue here. And our network est quite unstable. So, at least once per week, we "loose" the network.

                      I need a clean way to know if my servers are up or down. (or at least if the network is up or down !). Using host status seemed to be a good idea, but with this bug

                      I'm a new user of zabbix, and begin to learn how to use it. Please, can someone help me to make a working trigger to know if my servers are up or no ?

                      I suggest to also set a trigger with nodata() on a item that periodically returns values (e.g. uptime) to make sure you don't miss this event when it occurs.
                      This seems to be a good idea, with the use of a value like cpu user time or something else that is often updated. (I need to quickly know when the net is down, and when it come up again). But... can you explain it with more details ? How should I make the trigger exaclty ?

                      Please note that my servers are in a DMZ, and that I can't send any pings or use anything else than TCP to the zabbix agent port.

                      Thanks a lot.
                      Last edited by hoper; 02-03-2011, 11:11.

                      Comment

                      • hoper
                        Junior Member
                        • Mar 2010
                        • 3

                        #12
                        I made this trigger (thanks google)

                        {my_template:agent.ping.nodata(20)}

                        And it's seems to work.
                        Any bug ? Things I need to know ? Is it good like this or do I need to add something ? Thanks again.

                        Comment

                        • Surge
                          Junior Member
                          • Sep 2010
                          • 16

                          #13
                          How does one set up a single trigger to handle host unreachable and the possibility of nodata?

                          I tried creating a host unreachable trigger with the following syntax:
                          {Template_test:icmpping.max(240)}=0 | {Template_test:icmpping.nodata(240)}=1

                          However when values cannot be retrieved Zabbix flags the trigger as having an error (Evaluation failed for function: max).
                          How does Zabbix evaluate the expression? Left to right?
                          I don't want to have to create two "host unreachable" triggers.

                          Comment

                          Working...