Ad Widget

**xs-** · 16-06-2008, 09:57

Some questions:

1)
I;m running zabbix 1.4.5 with 750+ hosts, 20k+ items and 15k+ triggers, 3 nodes (1 big, 2 small).
I don't have any of the problems you are describing (not judging your problems or patch), is the performance of 1.5.x that bad compared to 1.4.x? (i was under the impression 1.5.x would have huge performance improvements). Could you elaborate more specifically at which point (how many nodes, hosts per node, average item update interval, etc) you got your problems, just curious

2)
Dividing the zabbix server processes into multiple 'standalone' daemons (if i understand your post right) could be nice, but i kind of like the part where the entire zabbix_server process tree dies on error. Will your patch have the same behavior with these separate daemons?

3)
3 hours timeout?? isn't that a bit too much. have you tested this in worst case scenarios. It could make matters worse on the master node side when child nodes start reconnecting and timeout again really fast. (open connections).

I like the other fixes tho

**vinny** · 17-06-2008, 11:32

Hi palmertree,
i ll test it with great envy because I have faced all the problems u signaled.

Question 2 of xs- is pertinent too, because to me, this behaviour is the major drawback of using zabbix.

vinny

**Palmertree** · 18-06-2008, 13:43

1. Not sure where the cutoff point was but with a few host and items I did not see a problem. There would be time when the history would stop and then catch backup. But after adding more host and items, the problem got worse.

2. I just separated the existing daemon into 2. It will behave the same when the process dies.

3. I used 3 hour timeout to account for my backups. This can be set to whatever is appropriate to your environment.

**NOB** · 26-06-2008, 10:20

Originally posted by Palmertree

1. Not sure where the cutoff point was but with a few host and items I did not see a problem. There would be time when the history would stop and then catch backup. But after adding more host and items, the problem got worse.

2. I just separated the existing daemon into 2. It will behave the same when the process dies.

3. I used 3 hour timeout to account for my backups. This can be set to whatever is appropriate to your environment.

I like the distribution in two daemons:

It follows the idea of ZABBIX (and other projects/people) very well, i. e. the
separation of independent tasks wherever possible (Poller, Trapper, etc.)
for obvious reasons.

The implementation (patch) is clean and IMHO very easy to understand.

I hope that the ZABBIX team will include it into 1.6 !

According to the current Progress Report there is still 10% of the work to do for better distributed monitoring.
So let's assume integrating this patch is part of it !

The situation in Palmertrees case, AFAIR, is different from xs-.
Palmertree uses one ZABBIX master with a large slave, while the latter
uses one large ZABBIX master and two small ZABBIX slaves.
Their statements in previous posts fit into the ZABBIX world very well:
With a small number of hosts on the slaves you won't notice these problems.

Keep up the good work, palmertree, xs- and, last not least, the ZABBIX team
to create one of the best monitoring solutions !

Regards

Norbert.

**mcortinas** · 17-11-2011, 11:01

Hi,

First of all, thank you for this post, this is a very interesting data for a big monitorins solutions based on Zabbix.

I've implemented a infraestructura with a 1 Master and 3 Childs nodes and i've just changed the parameter of TrapperTimeout

Regards,
Marc

Ad Widget

Master/Child Node in Large Distributed Monitoring (DM) Environments

Master/Child Node in Large Distributed Monitoring (DM) Environments

Comment

Comment

Comment

Comment

Comment