Ad Widget

Collapse

Proxy not handling triggers?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tchjts1
    Senior Member
    • May 2008
    • 1605

    #1

    Proxy not handling triggers?

    Two things with the proxy that don't seem to be working properly -

    First:
    I have about 20 servers being monitored by proxy and I can create graphs for them just fine. The data is populating the graphs, so they are actively being reported in. However, their status on the Zabbix server is reported under Configuration --> hosts as:
    Monitored ;Not available ; ZBX_TCP_READ() failed [Connection reset by peer]

    Secondly:
    Two of the hosts went down last night, but they are not being reported as down, and no trigger was fired for the host being unreachable. Instead, it appears that Zabbix is reporting their error as [Interrupted system call]. On a sienote, the dashboard shows no errors of any type for these hosts.

    Do triggers need to be configured differently if they are monitored by a proxy?
    Last edited by tchjts1; 02-10-2008, 22:10.
  • mauibay
    Junior Member
    • Jan 2008
    • 23

    #2
    I'm having similar symptoms with my triggers for proxies hosts. However, mine show fine in Configuration->Hosts as Monitored;Available just as they did before I moved them to the proxy.

    I have 3 remote hosts behind a firewall that were being monitored through the firewall's NAT, one of them is Linux so I built and installed zabbix_proxy on that one and upgraded it's zabbix_agentd to 1.6. The other two hosts are windows and I left their 1.4 agents alone except to add the LAN IP of the proxy box to their zabbix_agentd.conf. (I've read numerous reports that the 1.6 windows agent is unstable/broken.)

    On the zabbix server I changed the IP addresses of each remote host from the external NAT IP to the internal LAN IP, created the proxy and added all 3 remote hosts.

    I restarted the agents and started the proxy and tailed the proxy log with debug=4 to see it was looking fine, could see the hosts and date being exchanged, and sure enough the server shows data just fine as before.

    However, I've done some brief testing, and if I stop an agent or the proxy, all that happens is I stop getting data. No triggers fire. If I take the host out of proxy membership and change the IP back to using the external NAT directly, triggers work fine. When I use the LAN IP and make it a proxy member, it collects data but triggers do not fire.

    More unnerving is that if I stop the proxy, that doesn't even fire any triggers. Data stops for all 3 hosts (naturally) but the overview stays green and no triggers fire.

    I'm still looking for the solution and need to understand the cause before I can use the proxy for production hosts. I figure there's some significant step I've missed, but the manual seems to be _very_ light on info about the proxy.

    Comment

    • tekknokrat
      Senior Member
      • Sep 2008
      • 140

      #3
      I'm having similar symptoms with my triggers for proxies hosts. However, mine show fine in Configuration->Hosts as Monitored;Available just as they did before I moved them to the proxy.
      when you moved the hosts availability flag will be kept.

      see this thread for future of avaiability:
      Last edited by tekknokrat; 09-10-2008, 16:27.

      Comment

      • tekknokrat
        Senior Member
        • Sep 2008
        • 140

        #4
        The status of host is not retrieved through proxy:

        non-proxied host:
        Host status 13 Oct 19:43:59 Up (0) -2
        proxied host:
        Host status - - -
        Thatswhy the "Trigger Server {HOSTNAME} is unreachable" does not work because it checks against 2

        Comment

        • mauibay
          Junior Member
          • Jan 2008
          • 23

          #5
          No offense intended, but you are using circular logic, saying it doesn't work because it doesn't work.

          "Host status" not having data _is_ my point, and why the "Server unreachable" trigger doesn't fire for proxied hosts. Your statement that "Host status" isn't retrieved through the proxy is irrelevant. It's not retrieved without the proxy either, "Host status" is not an item retrieved from the host. It's a special "magic" item that only has data when Zabbix is unable to contact the host. When the host is reachable, the item has no data, presumably because it's not pulling data from an actual item. (I suggested once in the past that this item should have a zero value rather than no data.)

          My whole point is that the "Host status" always has "no data" for hosts checked by proxy, which renders it useless and since the trigger can't fire. My question is why, and can it be fixed? Or is there another item or trigger we are supposed to use to check the availability status of hosts monitored via zabbix_proxy?

          I'm also confused in general by the situation. Isn't the "Server unreachable" trigger one of the main features that people use? Isn't this the most common trigger to set notifications for and let us know when something is down? How can this functionality be missing for everything monitored via the proxy? Wouldn't this be a showstopper for most everybody who is considering deploying zabbix_proxy? I have a strong feeling I'm missing something obvious again if I'm the only one this is an issue for.

          Comment

          • Alexei
            Founder, CEO
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2004
            • 5654

            #6
            The "Host status" is just a special item, which DOES NOT represent host status, it provide us with status of a passive agent (ZABBIX and SNMP). This is really important to understand!

            Because of this, use of the status for a host availability trigger is not a good idea. Consider using of a combination on function nodata() for a reliable item with a TCP or ICMP ping, if possible. This is much more efficient and bullet proof method with "built-in" flap detection adjusted by nodata's period.
            Alexei Vladishev
            Creator of Zabbix, Product manager
            New York | Tokyo | Riga
            My Twitter

            Comment

            • mauibay
              Junior Member
              • Jan 2008
              • 23

              #7
              Originally posted by Alexei
              The "Host status" is just a special item, which DOES NOT represent host status, it provide us with status of a passive agent (ZABBIX and SNMP). This is really important to understand!
              That does make it clearer to me, thanks. I know the "Host status" item only contains data if the server cannot contact the passive agent. (Ignoring SNMP for now.) I'm not understanding why zabbix_proxy can't provide this data to the server though. If a host running zabbix_agentd in passive mode is down, why can't "Host status" return data if it's being monitored via the proxy? I think I'm still misunderstanding what "Host status" is intended to be used for. I've been using it to mean zabbix_agentd is unreachable. What I'm not seeing is why an unreachable instance of zabbix_agentd is always considered reachable if monitored via proxy.
              Originally posted by Alexei
              Because of this, use of the status for a host availability trigger is not a good idea. Consider using of a combination on function nodata() for a reliable item with a TCP or ICMP ping, if possible. This is much more efficient and bullet proof method with "built-in" flap detection adjusted by nodata's period.
              I don't use it solely for a general host availability trigger, but I do include it in the processes that I want notifications sent for. I'll take your advice and replace it with something using the nodata() function, I never had reason to look at this before I started using the proxy, it just worked fine for me for so long.

              And yes, I'll say it yet again, zabbix rocks!

              Comment

              • tekknokrat
                Senior Member
                • Sep 2008
                • 140

                #8
                Sorry to use the term "host status" misleading as status of host.
                What i want to show is that there is a difference how data is returned from this check when behind a proxy or not.
                I dont want to use this for host check but it would be nice if theres some other return then "-" when behind a proxy. When you tell its not good use this check atm. its ok for me. But in general it shouldnt be the case that checks differ that way.

                Comment

                • mauibay
                  Junior Member
                  • Jan 2008
                  • 23

                  #9
                  I agree it might be better named. "Agent status" might be a better description. I just created a trigger using agent.ping.nodata(180)=1 and it works as I expected.

                  The overview still uses the normal method for determining that the agent is down though, and for proxied hosts the overview shows them as always available regardless of actual state. It's annoying, not just because I'm used to a host's row going gray in the overview when it's down, but because now the behavior is different between proxied hosts and direct hosts. To me, remembering differences like this is bad and leads to mistaken interpretation.

                  When a directly monitored agent is unreachable, it's clearly shown in the overview by all the trigger boxes going gray. When a proxy monitored agent is unreachable, the overview doesn't change it's display at all, it looks like everything is still up and green. I believe this is because the "Host status" for proxied agents is always reported as available, which brings me back to my main point: that this functionality is broken for monitoring via proxy.

                  I can (and now have) added my own "Agent unreachable" trigger, so now I at least get a single red box on the overview, but all the other triggers show green and it's otherwise not obvious that those items are down and not collecting data. I just don't see why it should be considered as intended behavior for proxied hosts to behave so differently from directly monitored hosts, I think the proxy should be transparent at the item/trigger/overview level and as far as I can tell it's solely caused by the different behavior of "Host status" for agents monitored via proxy.

                  Comment

                  • bjoneson
                    Junior Member
                    • Nov 2008
                    • 6

                    #10
                    More info

                    I've been coping with the same issue. I think I understand from a development perspective what the intent of the host status is. That said... it appears that the host status IS recorded "correctly" in the database of the proxy server. That is the "available" column in the hosts table matches what you would expect to see if the host were monitored directly from the zabbix_server. However, it does not appear that this data is in the scope of what is currently transmitted from the proxy back to the server. (This appears to be limited to item history?). I did a bit of poking around in the code, and it seems that implementing this functionality would take a bit of work to implement, but would be a trivial task. I have to agree that from a UI perspective, it makes a lot of sense to mimic the behavior of directly monitored servers, as it pertains to the availability of the agent. There are probably a number of ways to skin the cat to achieve that, but I think that bringing the appropriate data in scope for transmission to the master server makes some sense. Am I off base?

                    Comment

                    • bhboots
                      Junior Member
                      • Apr 2008
                      • 5

                      #11
                      Hello,
                      I have been searching for the best way to monitor the status of a host with Proxy and it seems that most people are using a ping. Some people, however, are using nodata() mixed with a ping... could someone post their item and trigger for general status monitoring? Thanks!!

                      Comment

                      • mauibay
                        Junior Member
                        • Jan 2008
                        • 23

                        #12
                        Originally posted by bhboots
                        Hello,
                        I have been searching for the best way to monitor the status of a host with Proxy and it seems that most people are using a ping. Some people, however, are using nodata() mixed with a ping... could someone post their item and trigger for general status monitoring? Thanks!!
                        For some time now to determine host agent status I've been using a trigger that fires if the cpu load has no data for 5 minutes, something like this:

                        {SomeHost:system.cpu.load[,avg1].nodata(300)}=1

                        Any reliable parameter key that gets checked most frequently can be useful for this. I chose system.cpu.load because it gets checked at least every minute for most of my hosts and so makes for a convenient template trigger.

                        This makes the assumption that if no data is being returned for this one key then the agent or host is unavailable in general, but it's been working out good so far since it's not too wild of an assumption.

                        The main drawback I've had so far is the lag in trigger recovery for proxied hosts after a lengthy outage. My zabbix proxies are configured to store data for up to 8 hours if they can't contact the zabbix server. When they do re-establish contact after a long outage it can take up to 20 minutes in some cases for the stored data to catch up to real time on the server, and the trigger naturally can't reset until it sees the data. When this happens I can open a graph of the cpu load and watch the data slowly appear from left to right as it refreshes every minute while the proxy uploads the stored data. When the data finally reaches within 5 minutes of current time at the end of the graph, the trigger resets and the host agent shows green on the overview again.

                        I would prefer that agent.ping work properly for proxied hosts, but lacking that I've had good enough results with the nodata() method.
                        Last edited by mauibay; 12-06-2009, 00:37.

                        Comment

                        • bhboots
                          Junior Member
                          • Apr 2008
                          • 5

                          #13
                          That is very interesting and thank you for your reply. If i was to put this into production is there a way to do a force sync if say i was to resolve the issue and then want to force a sync to catch the proxy up quickly?

                          Comment

                          • mauibay
                            Junior Member
                            • Jan 2008
                            • 23

                            #14
                            Originally posted by bhboots
                            That is very interesting and thank you for your reply. If i was to put this into production is there a way to do a force sync if say i was to resolve the issue and then want to force a sync to catch the proxy up quickly?
                            If the issue is resolved quickly (meaning the host was only unavailable for a short time) then there isn't much of an issue since there isn't a significant amount of data stored on the proxy that needs to "catch up" on the server. This isn't an issue of needing to force a sync or otherwise kick-off the process, the proxy will begin sending it's cached data to the server as soon as it re-establishes contact. The issue is simply the amount of time it takes the proxy to transmit the cached data to the server, it can take a while when several hours of data builds up in the cache.

                            Note here I'm talking about a proxy outage, not an outage of a monitored host. If it's only the host that goes down, there isn't any data being collected by the proxy anyway, and it the data will resume collection and propagation very quickly via proxy to the server as soon as the host comes back up.

                            The largest way this impacts me is when I have a WAN network outage for my zabbix server that causes all remote proxies to hold their data until the server becomes available again. When the network comes back up all the proxies slam the server trying to send their cached data simultaneously and it can take quite a while. This doesn't happen for me often, but when it does and the network outage is more than a couple hours I usually end up manually shutting down the remote proxies and bringing them up one or two at a time until they are all caught up. I'm not aware of any other practical workarounds.

                            I'm only monitoring several dozen hosts and have only a handful of proxies so this works ok for me.
                            Last edited by mauibay; 13-06-2009, 01:23.

                            Comment

                            • burn1024
                              Member
                              • Jun 2012
                              • 52

                              #15
                              Originally posted by Alexei
                              The "Host status" is just a special item, which DOES NOT represent host status
                              Oh my, that's just so intuitive. Maybe the "unreachable" trigger should not use it, then?

                              Comment

                              Working...