Ad Widget

**nelsonab** · 29-12-2009, 23:16

What is the speed of the links between the nodes? Also what is the system load looking like between the nodes and how loaded is the network between the nodes.

One thing I noted earlier is it seems that when a node is transmitting change data between nodes those nodes are now blocked. On this case when the master is receiving config changes for a child node it is blocked from receiving from the other node.

This was especially acute and painful when we were testing with 5 VM nodes on one machine. Ya, it was "painful".

Normally this isn't an issue if the machines are not overly loaded and the inter-node links are fast.

**stage** · 30-12-2009, 16:43

think it's not the hardware

This is a 1Gbit network, node 2 monitors over 20 machines and sends only events , node 3 monitors only it self and sends everything.

The cpu load of the master is in average of 30%, node 2 has an average of 60% and node 3 has an average of 5%.

I’m positive that it’s not the hardware.

I have upgraded all the machines to zabbix1.6.7

The only deference is that node 3 is a ubuntu9.10(64b) and the other one’s are ubuntu9.04(32b)

What I find very strange is that is only blocks with communication of “configuration changes” end all the other communication are no problem.

If I exclude node 3 the communication between node 2 and the master is normal, if I put node 3 in and it sends a “configuration changes” communication every thing blocks(communication), the nodes keep monitoring the machines.

All help is appreciated,
Dominic

**stage** · 04-01-2010, 10:43

Happy new year

Does any one have an idea how I can resolve this problem

All machines are now ubuntu9.04(32b) and have zabbix1.6.7 installed.
But the problem still exists.

Every idea is welcome,
Dominic

**richlv** · 04-01-2010, 23:15

i'd give 99% that node sync is blocking all other node communication.
unfortunately, there is no user friendly way to see what the current status is.
for how long did you have node3 up and syncing ? give it some longer time, it should return to semi-normal state once the syncing finishes

**stage** · 05-01-2010, 09:30

It works

Yes, indeed it comes to a normal stat.
After an enormous long time, 28hours.

But the important thing is, it’s a live.

Thanks for your help,
Dominic

**richlv** · 05-01-2010, 09:49

now that's extreme.
is any of the databases on a virtual host ?

**stage** · 05-01-2010, 10:51

Standard

No, every node has a standard installation.

every thing on the localhost, without any bells or whistles

ubuntu 9.04 and 9.10
zabbix 1.6.7
mysql database

over a flat 1Gb network

the first node is running for over a year now, idem for the master node.
but I expect that that may not be of any difference.

**data7** · 07-01-2010, 20:01

Solution and a short story...

stage, I already had this problem, which was solved on this thread thanks to Norbert.

Simply log into your zabbix database and create an index with:

Code:

create index node_cksum_index_2 on node_cksum (nodeid, cksumtype);

Synchronizations that took over an hour (mostly the first ones) finished on 5 minutes after that

It looks like these patches never got into production release

sadly.

In addition, Palmertree's great suggestions also seem to have been forgotten. I'm still stuck with all these deadlocks and occasionally corruption on nodes occurs, forcing me to delete all content related to some node on node_cksum table. Also Housekeeper process tends decrease master node's performance very much. The bad side is that I can't set it to more than default 1 hour or my database size grows abnormally in no time.

Nelsonab, in my case I have a nice master node (tweaked MySQL database, 18GB RAM, etc.) with 8 slaves, on which 3 (large ones) normally spend a minute during configuration changes check. In the end it takes ~7 minutes to update any node's data.

When zabbix Firefox plugin start supporting multiple GUIs I'll definitely cut off from production the distributed monitoring system since my only intention is to have all child's alerts on one screen in the smallest possible interval. I would rather increase configuration checks interval to an hour on every child node so that my master become only an history and event centralizer.

Sorry for the large post, but I had to say it

Edit: stage, I strongly recommend an update to 1.6.8 as 1.6.7 has known bug (search the forums) where housekeeper doesn't clean all history data properly.

Ad Widget

node communication

node communication

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment