Hi,
Environment:
- 3 zabbix hosts, running latest 1.6 version (svn:/branches/1.6) on mysql5
- 1 master node (ID 10), 2 slaves (ID 1 and 30)
- Not using dbsync (as per advice from devs)
- Master node as no server or items defined locally. All hosts, hostgroups, items, triggers, etc are located on the child nodes. The master node is only used as a consolidated view.
This list is meant as a FYI for the devs (and others who are using DM ofcourse) as what my current findings are on the distributed monitoring.
Ofcourse i hope this list helps in bettering zabbix (specifically DM
)
Environment:
- 3 zabbix hosts, running latest 1.6 version (svn:/branches/1.6) on mysql5
- 1 master node (ID 10), 2 slaves (ID 1 and 30)
- Not using dbsync (as per advice from devs)
- Master node as no server or items defined locally. All hosts, hostgroups, items, triggers, etc are located on the child nodes. The master node is only used as a consolidated view.
This list is meant as a FYI for the devs (and others who are using DM ofcourse) as what my current findings are on the distributed monitoring.
- Weird log entries
These log entries are from the master node only. All host/item/trigger related errors are coming from node-sync?- I believe a similar issue was fixed in a previous version, seems to still be there.
7414:20081117:192014 Query failed: [insert into hosts_profiles_ext (hostid,hostid,device_alias,device_type,device_cha ssis,device_os,device_os_short,device
_hw_arch,device_serial,device_model,device_tag,dev ice_vendor,device_contract,device_who,device_statu s,device_app_01,device_app_02,device_app_03,device _app_0
4,device_app_05,device_url_1,device_url_2,device_u rl_3,device_networks,device_notes,device_hardware, device_software,ip_subnet_mask,ip_router,ip_macadd ress,o
ob_ip,oob_subnet_mask,oob_router,date_hw_buy,date_ hw_install,date_hw_expiry,date_hw_decomm,site_stre et_1,site_street_2,site_street_3,site_city,site_st ate,si
te_country,site_zip,site_rack,site_notes,poc_1_nam e,poc_1_email,poc_1_phone_1,poc_1_phone_2,poc_1_ce ll,poc_1_screen,poc_1_notes,poc_2_name,poc_2_email ,poc_2
_phone_1,poc_2_phone_2,poc_2_cell,poc_2_screen,poc _2_notes) values(100100000010032,100100000010032,'','','','' ,'','','','','','','','','','','','','','','',
'','','','','','','','','','','','','','','','','' ,'','','','','','','','','','','','','','','','',' ','','','','','')] Column 'hostid' specified twice [1110
] - I didnt expect these errors to happen on the master server. I have some one of the slave nodes, but why on the master?
7414:20081117:192029 Query failed: [insert into httpstepitem (httpstepitemid,httpstepid,itemid,type) values(100100000000437,100100000000013,10010000003 6689,1)] Duplicate entry '100100000000013-100100000036689' for key 2 [1062] - Deadlocks

These happen how and then, not too much, once per couple of hours or so.
7419:20081118:055411 Query failed: [delete from history_uint where itemid=100100000022045 and clock<1225732338] Deadlock found when trying to get lock; try restarting transaction [1213] - Dont know if this is harmless or not. Lock failed?
I got several of these sets today (i added node 2 to the master this morning).
18458:20081119:093439 Deleted 633612 records from history and trends
18454:20081119:093551 Timeout while answering request
18453:20081119:093748 Timeout while answering request
/data/zabbix/sbin/zabbix_server [18453]: Lock failed [Interrupted system call] - Weird, these two nodes are next to eachother (same switch, no firewall, etc). What could cause this to happen?
(I get lots of 'Timeout while answering request' log entries on this child node to, i suspect this is related to sending history updates to the main node).
18504:20081119:115521 NODE 10: Error while sending data to Node [1] error: ZBX_TCP_WRITE() failed [Broken pipe]
- I believe a similar issue was fixed in a previous version, seems to still be there.
- Functional issues
- all last_* fields of the items table are not updated on the master node.
Would be nice to have this (it did work in 1.4)
(zabbix_server) - Can't use non-local hostgroups for trigger actions
You can't use hostgroups from another node in trigger action conditions.
If you force this, the trigger action becomes unselectable/uneditable.
(web frontend) - Can't edit screens/maps with non-local items
Screen configuration selects on the wrong criteria. If a screen or map has items from another node (which should be possible), you cannot select or edit this screen in the configuration menu.
(web frontend) - I think nodesync does not send enough history after network outage.
If the communication between 2 nodes breaks for say 4 hours, not all data is synced from this missing period. So far i suspect this is the case for events, but there might be more data objects.
(zabbix_server)
- all last_* fields of the items table are not updated on the master node.
Ofcourse i hope this list helps in bettering zabbix (specifically DM
)
Comment