Ad Widget

**Farzad FARID** · 16-04-2012, 16:46

I finally found out that there is already a Feature Request describing the very same problems I wrote about: https://support.zabbix.com/browse/ZBXNEXT-341.

But the ticket was opened two years ago and still has no fix although it's the third most popular feature request on Zabbix (https://support.zabbix.com/browse/ZB...arissues-panel).

Are there any large system Zabbix users hit by this issue that wish to comment this or provide some hints on how they bypassed it?

Regards

**bob.todd** · 26-04-2012, 15:03

Testimony

Hi there,
I'm a system engineer working for a big finance corporation. I'm trying to deploy a full fledged zabbix supervision platform as part of a global effort toward open-source solutions awareness.

I'm working on this project since the end of last year, and I must admit I'm bit surprised by this "lack of feature"/"bug"/"design".
So much, that I didn't even verified that this part is working as expected.
My bad. I should have.

One of the mean we found to compensate this situation was to develop some kind of wrapper which ensure that whatever is the value collected, the value that will be transmitted to the server will be compliant with the format expected, so as not freezing the item, and subsequent triggers.

It is ugly, and doesn't handle well all situations...

To Zabbix development team : Please, implement a correct way to handle this "unknown data" situation.

Best regards,
Bot Todd

**Pax** · 27-05-2012, 18:47

Yes, it looks like Zabbix is not designed to be reliable in large environments. But you always can create nodata() triggers for each item. The trouble is that you get much notifications if a host goes down.

**Farzad FARID** · 29-05-2012, 17:17

Originally posted by Pax

Yes, it looks like Zabbix is not designed to be reliable in large environments. But you always can create nodata() triggers for each item. The trouble is that you get much notifications if a host goes down.

Sure, and I think that creating all those triggers just to monitor Zabbix itself (or counterbalance its weaknesses) is counterproductive...

Zabbix 2.0 is now ready (congrats to the Zabbix team!) but there is still no progress on ticket ZBXNEXT-341. What's more, some recent patches like ZBXNEXT-522 go even further in hiding "unknown" triggers where the right choice should be to treat them as potentially real problems.

Are we still only a minority to believe that the mishandling of unsupported items (cause by timeouts or script errors for example) and unknown triggers by Zabbix can make the whole supervision platform unreliable?

Regards

**MarkusL** · 31-05-2012, 14:04

Hi all!

We digged into this a while ago. Out of my experience, I can tell you: it´s complicate and hard to manage! Zabbix does not have an included "I am consistent with all my monitored hosts and items"-function. You have to manage this (for now) by your own.

Our situation is quite complex, as we work with many proxies (one per customer). Every proxy starts automatically an autossh-session to our zabbix-server where all zabbix-stuff is going to. WAN-connections from us and customers can be A-DSL / S-DSL, sometimes X21. Most it is A-DSL,...
Now to see EXACTLY, where a root-cause comes from, we have to see the whole picture starting from our server to a customer-proxy and the network behind the proxy (he is monitoring). In our example this is abstract:
server - ups & firewall - wan(up) - customer-wan(up) - proxy-autossh ok - proxy-services ok - ups ok - backoffice-switches ok - host with basic services ok.

We do very very much "baseline-monitoring" with VERY much nodata-stuff, just to be sure with this point: all I see in my zabbix-frontend is 100% what is going on in the real systems out their; I miss NOTHING.

As soon as we have checked all the single points betwenn us (zabbix-server that generates the triggers) and the customer-hosts to be monitored, we start the "real" monitoring. All parts of the "real monitoring"-templates depend on our baseline-monitoring. Does a baseline go down (f.e. snmp-service on a windows machine) the host gets an corresponding trigger and our zabbix-server gives ONLY ONE message, not 200x snmp-item nodata,...

Our baseline-monitoring triggers all on the lowest severity and is not visible for our agents on the dashboard (severity not shown). On these triggers our real templates (nodata for every item) depend. If a trigger from baseline-monitoring stays longer than f.e. 60sec a second to the agent VISIBLE trigger is generated because we say: if 60sec a host is not pingable / service not started or something: this can´t be a "something"-problem inside zabbix (communication timeout ore anything else), this must be a real problem we have to get into.

The whole concept took around 3 weeks of heavy brainwork,...
Because of this, all I wrote is just a "simple discription"; would take to long to discuss every detail here.

I would really love to discuss this thing with Alexei or someone from zabbix who has hopefully :-) interest to solve this very heavy problem. In my eyes this point is the biggest con with zabbix.

Best regards,

Markus.

Ad Widget

Do "UNKNOWN" triggers & "stale items" make Zabbix base monitoring unreliable?

Do "UNKNOWN" triggers & "stale items" make Zabbix base monitoring unreliable?

Comment

Comment

Comment

Comment

Comment