Ad Widget

**mushero** · 22-08-2013, 17:47

Our system is about your size and we are moving to proxies for everything, in part to support HA since we can fail over to our standby zabbix server in a different country by just changing a few proxy options (not every host, as they are all locked down to our zabbix public IP) - we run in 100 data centers globally, so our system is all public Internet-based.

We are on 1.8.3 and problem with proxies is they don't tolerate connection issues very well - actually quite poor, as they just get stuck and won't time out or retry so we have to restart them - soon we'll have a tool to detect this and restart when the local queue gets too large (using SQL).

Some times we run two routes via route or iptables NAT so our proxies can route around the world in different ways - some day that will be automatic, too, so after 5 bad restarts that still get stuck, we'll change routes.

I think/hope 2.x proxies will be better at this, as REALLY need them to timeout if no data sent or reply in 30 seconds and re-connect; would solve a lot for us.

Also, be sure to monitor the queues for the proxies, at the proxy and in Zabbix, with graphs and triggers so we trigger if more than x00 items behind. I can send you SQL for this if you want.

Overall, we want a central system as our triggers/templates are very complex (about 200 items, 50 triggers/host, lots of custom parts, GUI, etc.) so we love the proxy idea and are working to improve it.

We're also working on a PHP GUI for the proxy to help show the queue, local data, servers managed, and simple things. Also to refresh the config, etc. We'll share this when it's usable.

**BHG_2008** · 23-10-2013, 20:37

Master-child hierarchy is essential

We are planning to use localized alerting as well, which is only possible via a node. We have about 2,000 sites in which we are putting a node, so that an aggregated data feed comes in from each site. Also, when the master is in maintenance mode (optimizing the database or upgrading, etc), the local child nodes cache the data points and resume in "catch-up" mode until near real time again. I do not believe the mechanism for caching on proxies is sufficient for this purpose. I feel that removing the child node option is the wrong direction. In fact, I would like to see it expanded in 3 ways:
1) Allow configuration of hosts on the master that belong to a child of a child
2) Expand the node limit to 10,000
3) Allow multiple masters for children, so higher fault tolerance is achieved where necessary

**neominder** · 04-12-2013, 00:02

Originally posted by BHG_2008

We are planning to use localized alerting as well, which is only possible via a node. We have about 2,000 sites in which we are putting a node, so that an aggregated data feed comes in from each site. Also, when the master is in maintenance mode (optimizing the database or upgrading, etc), the local child nodes cache the data points and resume in "catch-up" mode until near real time again. I do not believe the mechanism for caching on proxies is sufficient for this purpose. I feel that removing the child node option is the wrong direction. In fact, I would like to see it expanded in 3 ways:
1) Allow configuration of hosts on the master that belong to a child of a child
2) Expand the node limit to 10,000
3) Allow multiple masters for children, so higher fault tolerance is achieved where necessary

I agree with this for the most part. One major issue we've run into with proxies is that even though the data gets stored on the proxy when the server is unavailable, calculated items don't get recorded during that time since the calculations happens on the Server not the proxy. With a child node setup the child nodes would be doing the calculated items themselves.

**richlv** · 06-12-2013, 10:12

Originally posted by neominder

One major issue we've run into with proxies is that even though the data gets stored on the proxy when the server is unavailable, calculated items don't get recorded during that time since the calculations happens on the Server not the proxy.

doing calculated items on proxies might be an interesting feature request, but there could be a problem that calculated item references items that are monitored by different proxies.

Ad Widget

Zabbix HA success stories

Zabbix HA success stories

Comment

Comment

Comment

Comment