Ad Widget

Collapse

DM open issues and log errors

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • xs-
    Senior Member
    Zabbix Certified Specialist
    • Dec 2007
    • 393

    #1

    DM open issues and log errors

    Hi,

    Environment:
    - 3 zabbix hosts, running latest 1.6 version (svn:/branches/1.6) on mysql5
    - 1 master node (ID 10), 2 slaves (ID 1 and 30)
    - Not using dbsync (as per advice from devs)
    - Master node as no server or items defined locally. All hosts, hostgroups, items, triggers, etc are located on the child nodes. The master node is only used as a consolidated view.

    This list is meant as a FYI for the devs (and others who are using DM ofcourse) as what my current findings are on the distributed monitoring.
    • Weird log entries
      These log entries are from the master node only. All host/item/trigger related errors are coming from node-sync?
      • I believe a similar issue was fixed in a previous version, seems to still be there.
        7414:20081117:192014 Query failed: [insert into hosts_profiles_ext (hostid,hostid,device_alias,device_type,device_cha ssis,device_os,device_os_short,device
        _hw_arch,device_serial,device_model,device_tag,dev ice_vendor,device_contract,device_who,device_statu s,device_app_01,device_app_02,device_app_03,device _app_0
        4,device_app_05,device_url_1,device_url_2,device_u rl_3,device_networks,device_notes,device_hardware, device_software,ip_subnet_mask,ip_router,ip_macadd ress,o
        ob_ip,oob_subnet_mask,oob_router,date_hw_buy,date_ hw_install,date_hw_expiry,date_hw_decomm,site_stre et_1,site_street_2,site_street_3,site_city,site_st ate,si
        te_country,site_zip,site_rack,site_notes,poc_1_nam e,poc_1_email,poc_1_phone_1,poc_1_phone_2,poc_1_ce ll,poc_1_screen,poc_1_notes,poc_2_name,poc_2_email ,poc_2
        _phone_1,poc_2_phone_2,poc_2_cell,poc_2_screen,poc _2_notes) values(100100000010032,100100000010032,'','','','' ,'','','','','','','','','','','','','','','',
        '','','','','','','','','','','','','','','','','' ,'','','','','','','','','','','','','','','','',' ','','','','','')] Column 'hostid' specified twice [1110
        ]
      • I didnt expect these errors to happen on the master server. I have some one of the slave nodes, but why on the master?
        7414:20081117:192029 Query failed: [insert into httpstepitem (httpstepitemid,httpstepid,itemid,type) values(100100000000437,100100000000013,10010000003 6689,1)] Duplicate entry '100100000000013-100100000036689' for key 2 [1062]
      • Deadlocks
        These happen how and then, not too much, once per couple of hours or so.
        7419:20081118:055411 Query failed: [delete from history_uint where itemid=100100000022045 and clock<1225732338] Deadlock found when trying to get lock; try restarting transaction [1213]
      • Dont know if this is harmless or not. Lock failed?
        I got several of these sets today (i added node 2 to the master this morning).
        18458:20081119:093439 Deleted 633612 records from history and trends
        18454:20081119:093551 Timeout while answering request
        18453:20081119:093748 Timeout while answering request
        /data/zabbix/sbin/zabbix_server [18453]: Lock failed [Interrupted system call]
      • Weird, these two nodes are next to eachother (same switch, no firewall, etc). What could cause this to happen?
        (I get lots of 'Timeout while answering request' log entries on this child node to, i suspect this is related to sending history updates to the main node).
        18504:20081119:115521 NODE 10: Error while sending data to Node [1] error: ZBX_TCP_WRITE() failed [Broken pipe]

    • Functional issues
      • all last_* fields of the items table are not updated on the master node.
        Would be nice to have this (it did work in 1.4)
        (zabbix_server)
      • Can't use non-local hostgroups for trigger actions
        You can't use hostgroups from another node in trigger action conditions.
        If you force this, the trigger action becomes unselectable/uneditable.

        (web frontend)
      • Can't edit screens/maps with non-local items
        Screen configuration selects on the wrong criteria. If a screen or map has items from another node (which should be possible), you cannot select or edit this screen in the configuration menu.

        (web frontend)
      • I think nodesync does not send enough history after network outage.
        If the communication between 2 nodes breaks for say 4 hours, not all data is synced from this missing period. So far i suspect this is the case for events, but there might be more data objects.
        (zabbix_server)


    Ofcourse i hope this list helps in bettering zabbix (specifically DM )
    Last edited by xs-; 19-11-2008, 12:59.
  • welkin
    Senior Member
    • Mar 2007
    • 132

    #2
    I can confirm all of the issues above. I switched to a distributed setup in 1.6 and till now it is not useable ;(. I hope these bugs get fixed soon.

    regards
    welkin

    Comment

    • welkin
      Senior Member
      • Mar 2007
      • 132

      #3
      I think i found another bug in the distribiuted monitoring.

      I switched off a slave node for about 2 hours. Now it is online again since about 3 hours and still not all events are properly synced with the master! Looking at the web interface of the master i still see like 10 Triggers in TRUE state which are defenetly in FALSE state looking at the slave web interface.
      Come on zabbix guys this can't be true, i mean i'm fed up with all the bugs in the distribiuted monitoring and all the posts here not beeing answered. I contacted your sales team to get some detailed information on the support contracts and now after one week of waiting i still got no draft support contract. My boss complains every day why the web interface is totally useless in our setup ( no latest data from slaves ?!?! screens with combined graphs of two nodes?!?) and i'm running out of excuses.It's not that i expect a bug-free software for free, but at least the features you present on your website should be useable.

      Comment

      • Aly
        ZABBIX developer
        • May 2007
        • 1126

        #4
        I believe that most of DM problems will be fixed in ver 1.6.3
        Zabbix | ex GUI developer

        Comment

        • welkin
          Senior Member
          • Mar 2007
          • 132

          #5
          any timeline ?

          Comment

          • Aly
            ZABBIX developer
            • May 2007
            • 1126

            #6
            Oops, I meant 1.6.2 (next release).
            Zabbix | ex GUI developer

            Comment

            • welkin
              Senior Member
              • Mar 2007
              • 132

              #7
              When will it be ready?

              Comment

              • xs-
                Senior Member
                Zabbix Certified Specialist
                • Dec 2007
                • 393

                #8
                Glad to hear this.
                Do you have a list of DM issues to be fixed (milestones?) for 1.6.2? Perhaps we can compare this to our own lists and help in the bug hunting/testing.

                Comment

                • Aly
                  ZABBIX developer
                  • May 2007
                  • 1126

                  #9
                  I have committed some fixes for GUI (actions, screens, maps). You may test it by downloading our Nightly build (rev. 6376).
                  Zabbix | ex GUI developer

                  Comment

                  • xs-
                    Senior Member
                    Zabbix Certified Specialist
                    • Dec 2007
                    • 393

                    #10
                    So far things look good, but i did find some things to comment on

                    - Most views (especially the actions) could benefit if you prefix the hostname/hostgroup selection from other nodes with the nodename
                    - When editing a screen, selecting a 'graph' (regular, not simple) from a node, the selection stays empty. I can view these via monitoring->graphs, so they do work.
                    - I see the rights system did fix half of the ''current node' selection issue. Its not quite there yet tho. and please, a user should always be able to select the current node here, even if the user has no permissions for hosts on the current node. The reason for this is the 'view with subnodes' option (graphs/screens/maps with items from multiple nodes, but not the main node).

                    Good work so far.

                    Comment

                    Working...