Ad Widget

Collapse

zabbix v1.1.4 - stops monitoring client

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dcrandall
    Member
    • Apr 2006
    • 59

    #1

    zabbix v1.1.4 - stops monitoring client

    Hi,

    Zabbix_server v1.1.4

    I'm having this problem where if I reboot a client machine, when it comes back, zabbix_server does not start monitoring it again.

    I can kick start the process if I do this:
    mysql -u root -D zabbix -e 'update hosts set errors_from=0'
    Then it start monitoring again.

    Can anybody shed light on this? I'd like to know what is happening? It's a pain because sometimes machines need to be restarted, and they have to call me everytime to get zabbix montitoring to work.

    Thanks,
    Daniel
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    ZABBIX does not start monitoring of unreachable hosts immediately but after 60 (or 120?) seconds by default.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • DiedX
      Senior Member
      • Oct 2004
      • 106

      #3
      60. Adjustable BTW.
      https://www.diederik.nl

      Comment

      • dcrandall
        Member
        • Apr 2006
        • 59

        #4
        I am aware of the unreachable time limit and have verified that it is 60 seconds on my server.
        The thing is, that my clients go unreachable and then come back and zabbix_server never updates. They just sit there with nodata untill I run the command:
        mysql -u root -D zabbix -e 'update hosts set errors_from=0'

        It's pretty annoying because sometimes under heavy load, a host will lag in responding, so zabbix changes to it to unreachable and never changes back when the host is more responsive.

        I never had this problem until upgrading to v1.1.4.
        The server and hosts are all FreeBSD 5.x or 6.x.
        As I have described in other posts regarding zabbix_agentd, it seems that support for FreeBSD has taken a hit in zabbix 1.1.x stable.

        Thanks for your help,
        Daniel.

        Comment

        • pdwalker
          Senior Member
          • Dec 2005
          • 166

          #5
          What do your log files say?

          Can you turn on debug logging and learn anything new about why the server is not connecting to the client any longer?

          With my installation, the server has not failed to restart monitoring a cliient after it has been "away" and marked as unavailable.

          - Paul

          Comment

          • dcrandall
            Member
            • Apr 2006
            • 59

            #6
            Hi,

            Thanks for your reply.

            For instance, I have a host that is responsive, returning values from zabbix_agentd -p, and also from the server side with zabbix_get.

            In the log I have:
            075050:20061231:032951 Host [contb004]: first network error, wait for 15 seconds

            That message is obviously from 2 days ago and the host is still showing [no data] in the graphs.

            If I run the mysql command in my previous post, it will come back.

            I really believe that the problems I'm having are specific to FreeBSD 5.x and 6.x with zabbix 1.1 through 1.1.4

            Daniel

            Comment

            • dcrandall
              Member
              • Apr 2006
              • 59

              #7
              This is getting serious

              Hi,

              Last night a client host dropped out the same way I described above. This is what was left in the zabbix_server.log...

              075054:20070104:031059 Host [contb002]: first network error, wait for 15 seconds
              075049:20070104:031102 Host [contb002]: first network error, wait for 15 seconds
              075050:20070104:031102 Timeout while receiving data from [contb002]
              075050:20070104:031102 Getting value of [betacontent.photoshow.com] from host [contb002] failed
              075051:20070104:031102 Host [contb002]: first network error, wait for 15 seconds
              075052:20070104:031111 Timeout while receiving data from [contb002]
              075052:20070104:031111 Getting value of [vm.memory.size[free]] from host [contb002] failed
              075053:20070104:031111 Timeout while receiving data from [contb002]
              075053:20070104:031111 Getting value of [vm.memory.size[shared]] from host [contb002] failed
              075050:20070104:031132 Timeout while receiving data from [contb002]
              075050:20070104:031132 Getting value of [system.cpu.load[,avg1]] from host [contb002] failed

              From that moment on the server reported [no data] from that host.

              Unfortunately, later on, that host crashed.

              Because it was in [no data] state, we didn't receive any alerts. Meaning nobody was paged, meaning that the CEO of the company called first thing this morning to tell us that our service was not working.
              I don't have to tell you that this is not good.

              Now I need to find some resolution to this problem, or it's quite likely that my management will lose confidence in Zabbix.
              This is not the only host that is having this problem, it happens to all of my FreeBSD machines from time to time and I have to massage the database myself to get the server to start monitoring them again.

              The server is Zabbix v1.1.4 running on FreeBSD 5.4, and the clients are all either FreeBSD 5.x or 6.x. As I've mentioned before, this is one of a few major problems with FreeBSD support in Zabbix 1.1.x

              Thanks

              Daniel

              Comment

              • Mintairov Mihail
                Junior Member
                • Nov 2006
                • 2

                #8
                I have got the same problem with zabbix. I am monitoring a wifi routers by using SNMP-items
                and sometimes zabbix tells that some items have [no data] state and ERROR_FROM
                value is not 0, however the wifi routers work correctly, and if I try to get some
                values with SNMPGET command everithing works fine.

                I have made the test with a network sniffers to see if zabbix tries to define a state of "not working"
                hosts, and I saw that zabbix did nothing, so if ERROR_FROM is not 0 then zabbix does nothing with those hosts.

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #9
                  Originally posted by Mintairov Mihail
                  I have got the same problem with zabbix. I am monitoring a wifi routers by using SNMP-items
                  and sometimes zabbix tells that some items have [no data] state and ERROR_FROM
                  value is not 0, however the wifi routers work correctly, and if I try to get some
                  values with SNMPGET command everithing works fine.
                  Are you on FreeBSD as well?
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • Mintairov Mihail
                    Junior Member
                    • Nov 2006
                    • 2

                    #10
                    Originally posted by Alexei
                    Are you on FreeBSD as well?
                    Yes I use FreeBSD 6.0-STABLE

                    Comment

                    Working...