Hi,
as we consider to switch to DM here, i was trying to get an idea about it with zabbix 1.8. Sadly i ran into a lot of problems:
I set up 3 boxes (vm) z10, z20 and z30, with NodeIDs 10, 20 and 30, fixed IPs etc. and ran zabbix_server -n NN on each of the machines with corresponding IDs, edited the nodenames and (local) IPs in administration->DM. now i thought i'd be ready for adding masters/childs.
i started all zabbix_server processes on the three boxes, then edited z10 config, adding z20 and z30 as a child, then editing z20/z30 configs adding z10 as a master.
then, suddenly nothing happend. i watched the logs... no data send or received from any master/child. so i restarted the master and then the child servers, eventually it started transferring data. it looked like it got stuck, but investigation showed it just takes AGES to just hook up the default configuration to a master. send about 4.5MB, takes more than 1 hour to sync to the master. so in later tests i deleted all templates but one etc, that speed up things a lot.
so... with the smaller config, they seem to hook up pretty quick after a zabbix_server restart, so i have a look on the dashboard. i can view any child from the master now, or i can select "all" via "select nodes". "system status" looks ok, but in "last 20 issues" there does not pop up anything from the childs. each server should have 8 failed services in the default configuration on my systems.
then, after like 20 minutes and a bunch of relaods, for a short time, there appear child-issues in the "last 20 issues" list, but i have never seen issues for all three servers at the same time. maximum is two, if there i a display of one child, at the same time there is 8 error messages "Warning: Invalid argument supplied for foreach() in /var/www/include/blocks.inc.php on line 471" - which seem to correspond with the 8 warning/error items of the missing child. for the child being displayed, the field "host" is also empty, just the node gets shown.
in monitoring->triggers i see all active triggers, but also for the non-z10 node the hostnames are missing. also all of the non-z10 (=master) ones are acknoleged, while the z10 ones are not.
also, the selection of nodes "radomly" drops one or both of the childs.
i repeated the test with a simpler setup, one master, one child, basically same results: errors on dashboard, missing hostnames for child-triggers etc.
richlv gave me an index to at least speed up the initial sync a lot:
create index node_cksum_index_2 on node_cksum (nodeid, cksumtype);
as we consider to switch to DM here, i was trying to get an idea about it with zabbix 1.8. Sadly i ran into a lot of problems:
I set up 3 boxes (vm) z10, z20 and z30, with NodeIDs 10, 20 and 30, fixed IPs etc. and ran zabbix_server -n NN on each of the machines with corresponding IDs, edited the nodenames and (local) IPs in administration->DM. now i thought i'd be ready for adding masters/childs.
i started all zabbix_server processes on the three boxes, then edited z10 config, adding z20 and z30 as a child, then editing z20/z30 configs adding z10 as a master.
then, suddenly nothing happend. i watched the logs... no data send or received from any master/child. so i restarted the master and then the child servers, eventually it started transferring data. it looked like it got stuck, but investigation showed it just takes AGES to just hook up the default configuration to a master. send about 4.5MB, takes more than 1 hour to sync to the master. so in later tests i deleted all templates but one etc, that speed up things a lot.
so... with the smaller config, they seem to hook up pretty quick after a zabbix_server restart, so i have a look on the dashboard. i can view any child from the master now, or i can select "all" via "select nodes". "system status" looks ok, but in "last 20 issues" there does not pop up anything from the childs. each server should have 8 failed services in the default configuration on my systems.
then, after like 20 minutes and a bunch of relaods, for a short time, there appear child-issues in the "last 20 issues" list, but i have never seen issues for all three servers at the same time. maximum is two, if there i a display of one child, at the same time there is 8 error messages "Warning: Invalid argument supplied for foreach() in /var/www/include/blocks.inc.php on line 471" - which seem to correspond with the 8 warning/error items of the missing child. for the child being displayed, the field "host" is also empty, just the node gets shown.
in monitoring->triggers i see all active triggers, but also for the non-z10 node the hostnames are missing. also all of the non-z10 (=master) ones are acknoleged, while the z10 ones are not.
also, the selection of nodes "radomly" drops one or both of the childs.
i repeated the test with a simpler setup, one master, one child, basically same results: errors on dashboard, missing hostnames for child-triggers etc.
richlv gave me an index to at least speed up the initial sync a lot:
create index node_cksum_index_2 on node_cksum (nodeid, cksumtype);
Comment