Ad Widget

Collapse

Strange issue when 2 servers defined in ServerActive

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • zillions
    Junior Member
    • Jan 2013
    • 22

    #1

    Strange issue when 2 servers defined in ServerActive

    Hi guys,
    Running Zabbix 2.0.5 in a CentOS clustered environment with a remote clustered MySQL database (corosync and pacemaker for both clusters). We have about the following:

    Number of hosts (monitored/not monitored/templates) 2300 2077 / 66 / 157
    Number of items (monitored/disabled/not supported) 356961 240523 / 23373 / 93065
    Number of triggers (enabled/disabled)[problem/unknown/ok] 33121 26309 / 6812 [201 / 0 / 26108]
    Number of users (online) 158 13
    Required server performance, new values per second 1359.51

    We have run into a fairly annoying issue.
    I've done a bunch of checking as to possible causes, as well as searched online, and I can't seem to find anyone with a similar issue.

    Basically, when I have one server or proxy in the ServerActive line of /etc/zabbix/zabbix_agentd.conf, everything works great, no errors in the logs.
    However, when I put the server AND a proxy, it starts throwing the common "No active checks on server: host [boweb41.csnzoo.com] not found" error in the agent logs on the host.

    The only thing I've found in my extensive troubleshooting, is that for the error to go away, the server or proxy the host is assigned to in the UI needs to be the sole entry for ServerActive.

    I've tried:
    Changing the order of the 2 entries
    Validating the permissions, which are ok: -rw-r--r--. 1 zabbix zabbix 5957 Jan 22 15:53 zabbix_agentd.conf
    Declaring the server and/or proxy via dns names instead of IP's
    Validating that the "hostname" of each server is indeed what the zabbix server has in the UI
    Validating that the host has the correct name listed in the UI, as confirmed by using a zabbix_get against it with system.hostname
    Changing all passive checks to active in my templates

    The strangest thing is that hosts have no issue reporting data, and that all the checks (both active and passive work).
    This is happening for ALL CentOS hosts in our infrastructure (over 1.5k)

    Here are the agent logs from both states (hostnames sanitized):
    Btw, 10.22.65.20 = proxy, and 10.22.165.100 = primary zabbix server
    The host in question is currently associated with the proxy in the UI.
    Switching it to the server via the UI, results in the same issue if 2 IP's are in ServerActive.


    With: ServerActive=10.22.65.20
    10973:20140122:155710.131 Zabbix Agent stopped. Zabbix 2.0.5 (revision 33558).
    11524:20140122:155712.337 Starting Zabbix Agent [<hostname>]. Zabbix 2.0.5 (revision 33558).
    11530:20140122:155712.381 agent #0 started [collector]
    11531:20140122:155712.383 agent #1 started[listener]
    11532:20140122:155712.385 agent #2 started[listener]
    11533:20140122:155712.387 agent #3 started[listener]
    11534:20140122:155712.389 agent #4 started [active checks]
    11535:20140122:155712.392 agent #5 started [active checks]

    Added back in: ServerActive=10.22.65.20,10.22.165.100
    11524:20140122:160755.870 Zabbix Agent stopped. Zabbix 2.0.5 (revision 33558).
    13892:20140122:160756.082 Starting Zabbix Agent [<hostname>]. Zabbix 2.0.5 (revision 33558).
    13898:20140122:160756.143 agent #0 started [collector]
    13899:20140122:160756.148 agent #1 started[listener]
    13900:20140122:160756.149 agent #2 started[listener]
    13903:20140122:160756.152 agent #5 started [active checks]
    13901:20140122:160756.154 agent #3 started[listener]
    13902:20140122:160756.151 agent #4 started [active checks]
    13903:20140122:160756.160 No active checks on server: host [<hostname>] not found

    I've done a lot of searching, and have tried everything I can think of. It's obviously related to having the 2 entries in the file, however the documentation says you can do that. I think this is especially strange, because if you use a proxy, you want to have the ability to flip hosts back to the primary server if you need to do maintenance on the proxy, and unless 2 entries are there, all your active checks will fail when you switch the host back to the server.

    Any ideas?
  • tchjts1
    Senior Member
    • May 2008
    • 1605

    #2
    We had this same issue. I was thinking we could use ServerActive as a sort of backup to direct hosts to if one of our proxies went down. Doesn't work that way. One other thing you will notice on that type of setup is that your hosts will continuously flip-flop as to what server they are reporting to.

    This post may shed some light on what you are seeing:

    Comment

    • zillions
      Junior Member
      • Jan 2013
      • 22

      #3
      Hi tchjts1,
      Thanks for the response!
      I read through everything you linked to, and it was helpful.
      So I'm a bit confused by two things.

      First,
      How would you switch a host that's monitored by a proxy (via ServerActive), back to the primary server? For example, if ServerActive=ProxyIP, and the proxy is going to be down for maintenance, how would I get the host to talk to the server? Manually changing that line to "ServerActive=ZabbixServerIP" seems unwieldly.

      Second,
      The Zabbix documentation shows that you can have multiple addresses defined in that line, my impression was that it was ok to do (ie, it wouldn't cause problems). Am I wrong?

      I found some info in the Zabbix documentation here:

      Specifically:
      "[1] To make sure that an agent asks the proxy (and not the server) for active checks, the proxy must be listed in the ServerActive parameter in the agent configuration file."

      Then in more documentation here:

      Specifically:
      "ServerActive - List of comma delimited IPort (or hostnameort) pairs of Zabbix servers for active checks. No spaces allowed"

      This indicates that you can have multiple servers listed here. Additionally, from the first documentation, you can have the zabbix proxy listed in the same line.


      The newer versions of Zabbix don't use anything in the Server line for active checks (only ServerActive), so I feel that this is a risky proposition, to not have a quick way to flip back. How do you guys handle it?
      I mean, we use Puppet to manage the config files, and can switch within a couple hours, but what about some kind of unplanned issue with a proxy where you need to flip back?

      Bottom line, does this mean that we really have to go "all in" with hosts tied to a proxy, and don't have an easy/quick way to flip back?


      Thanks again, I am very appreciative of your responses!
      -Zillions

      Comment

      • tchjts1
        Senior Member
        • May 2008
        • 1605

        #4
        All very good (and familiar) questions... that I don't have an answer to.

        We don't really have a workaround in place if one of our proxies goes down and we need to point hosts somewhere else. I guess my strategy would be to quickly change all of my active agent items to be passive, then point the monitored hosts to another proxy or to the Zabbix App server itself.

        Of course, you will lose anything that requires an item to be active agent.

        We still populate our zabbix_agentd.conf file field of Server= with our Zabbix App server and all proxies, which would allow monitoring of passive items at the click of a mouse to any of those Zabbix servers.

        For this very reason, I have hostgroups of "Proxy A, Proxy B, Proxy C"

        Maybe zalex_UA will chime in with his thoughts on this.

        Comment

        • zillions
          Junior Member
          • Jan 2013
          • 22

          #5
          Thanks tchjts1!
          Your answer perfectly confirms my thoughts, and I don't think I have any other questions.
          I agree, and thought about the idea of flipping everything to a passive check. In fact, I did the same thing as you, and have all of my hosts in a number of hostgroups that correspond to each proxy (zbxprx.1, zbxprx.2, etc...). Great minds think alike!

          This solves at least why I am getting the errors, and I can fix it once I get everything over to a proxy. There are some logistical hurdles for us, as we have logic in Puppet that generates the config files, but I'm sure we can sort it out.

          Thanks again for your clear and knowledgeable answers, they are much appreciated. I'll pay it forward by helping someone else on the forums. I generally only come here if I have questions, but I'd say I'm pretty solid at Zabbix and am sure I could pick off at least a handful of the threads.

          Have a nice day!
          -Zillions

          Comment

          • tchjts1
            Senior Member
            • May 2008
            • 1605

            #6
            Just be glad you don't have auto-registration enabled with e-mail notification... If you have 2 servers listed on the ServerActive= line, you get continuous registration e-mails as mentioned by the poster in that thread I linked to, because of the hosts flip-flopping the server they report to. I have experienced that as well, but not on the scale that he did.

            On a sidenote, we use a generic zabbix_agentd.conf file on our hosts, and let auto-registration take care of the rest. No need to have unique conf files.

            Comment

            • zillions
              Junior Member
              • Jan 2013
              • 22

              #7
              That's a good point, I could see that happening if those email notifications are enabled.

              We don't use autoregistration currently. In a previous, older implementation of Zabbix we had did use it, but it wasn't well planned out (on our end) and we ended up with things being auto-added all over the place, and it was not organized at all.

              We may revisit it at some point, but we are almost done setting up our in house asset tracking tool to manage the addition/deletion of Zabbix hosts via the API. Add a host to the asset tool, it kicks off the build process, and adds all the relevant stuff into Zabbix. After a reboot, the host is online and monitored.

              We also just about done adding an API button in our asset tools web interface, where if you go to a host in the asset tool, you can click a "maintenance" button, and it will put the machine into maintenance mode. Same thing for turning maintenance mode off.

              -Zillions

              Comment

              Working...