Ad Widget

Collapse

DM Distributed Monitoring at least strange in 1.8

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • svenw
    Junior Member
    • May 2008
    • 26

    #1

    DM Distributed Monitoring at least strange in 1.8

    Hi,

    as we consider to switch to DM here, i was trying to get an idea about it with zabbix 1.8. Sadly i ran into a lot of problems:

    I set up 3 boxes (vm) z10, z20 and z30, with NodeIDs 10, 20 and 30, fixed IPs etc. and ran zabbix_server -n NN on each of the machines with corresponding IDs, edited the nodenames and (local) IPs in administration->DM. now i thought i'd be ready for adding masters/childs.

    i started all zabbix_server processes on the three boxes, then edited z10 config, adding z20 and z30 as a child, then editing z20/z30 configs adding z10 as a master.

    then, suddenly nothing happend. i watched the logs... no data send or received from any master/child. so i restarted the master and then the child servers, eventually it started transferring data. it looked like it got stuck, but investigation showed it just takes AGES to just hook up the default configuration to a master. send about 4.5MB, takes more than 1 hour to sync to the master. so in later tests i deleted all templates but one etc, that speed up things a lot.

    so... with the smaller config, they seem to hook up pretty quick after a zabbix_server restart, so i have a look on the dashboard. i can view any child from the master now, or i can select "all" via "select nodes". "system status" looks ok, but in "last 20 issues" there does not pop up anything from the childs. each server should have 8 failed services in the default configuration on my systems.

    then, after like 20 minutes and a bunch of relaods, for a short time, there appear child-issues in the "last 20 issues" list, but i have never seen issues for all three servers at the same time. maximum is two, if there i a display of one child, at the same time there is 8 error messages "Warning: Invalid argument supplied for foreach() in /var/www/include/blocks.inc.php on line 471" - which seem to correspond with the 8 warning/error items of the missing child. for the child being displayed, the field "host" is also empty, just the node gets shown.

    in monitoring->triggers i see all active triggers, but also for the non-z10 node the hostnames are missing. also all of the non-z10 (=master) ones are acknoleged, while the z10 ones are not.

    also, the selection of nodes "radomly" drops one or both of the childs.

    i repeated the test with a simpler setup, one master, one child, basically same results: errors on dashboard, missing hostnames for child-triggers etc.

    richlv gave me an index to at least speed up the initial sync a lot:
    create index node_cksum_index_2 on node_cksum (nodeid, cksumtype);
    Attached Files
  • exfish
    Junior Member
    • Jun 2009
    • 18

    #2
    I got the same issue, would it be fixed in next version?

    Comment

    • machtech
      Junior Member
      • Jan 2010
      • 11

      #3
      Very similar issue for us with 1.8 and postgresql.

      Master is receiving data, per server log output:
      Code:
        1536:20100126:164019.529 NODE 1: Received history from node 2 for node 2 datalen 617
        1533:20100126:164019.739 NODE 1: Received history_uint from node 2 for node 2 datalen 529
        1536:20100126:164029.431 NODE 1: Received history from node 2 for node 2 datalen 688
        1533:20100126:164029.651 NODE 1: Received history_uint from node 2 for node 2 datalen 357
      But Node has a db error sending:
      Code:
       14249:20100126:164019.363 NODE 2: Sending history_sync of node 2 to node 1 datalen 617
       14249:20100126:164019.571 NODE 2: Sending history_uint_sync of node 2 to node 1 datalen 529
       14249:20100126:164029.068 [Z3005] Query failed: [0] PGRES_FATAL_ERROR:ERROR:  function md5(numeric) does not exist
      LINE 1: ...||status||','||md5(macros)||','||md5(agent)||','||md5(time)|...
                                                                   ^
      HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
       [insert into node_cksum (nodeid,tablename,recordid,cksumtype,cksum) select 2,'httptest',httptestid,1,md5(name)||','||applicationid||','||lastcheck||','||nextcheck||','||curstate||','||curstep||','||lastfailedstep||','||delay||','||status||','||md5(macros)||','||md5(agent)||','||md5(time)||','||md5(error)||','||authentication||','||md5(http_user)||','||md5(http_password) from httptest where 1=1 and httptestid between 200000000000000 and 299999999999999]
       14249:20100126:164029.268 NODE 2: Sending history_sync of node 2 to node 1 datalen 688
       14249:20100126:164029.479 NODE 2: Sending history_uint_sync of node 2 to node 1 datalen 357

      Comment

      • machtech
        Junior Member
        • Jan 2010
        • 11

        #4
        Hi all

        We have worked around this PostgreSQL issue via the following:

        Code:
        root@z2:~# su - zabbix
        postgres@z2:~$ psql
        zabbix=# CREATE FUNCTION md5(integer) RETURNS text AS 'SELECT md5(($1)::text);' LANGUAGE SQL IMMUTABLE RETURNS NULL ON NULL INPUT;
        CREATE FUNCTION
        
        //        A quick test to show it’s working:
        
        zabbix=# SELECT md5(10);
                       md5
        ----------------------------------
         d3d9446802a44259755d38e6d163e820
        (1 row)
        zabbix=# \q
        postgres@z2:~$
        However, our Master (z1) is still not displaying our child Node's (z2) data. I will create a separate and new post re this issue, as I no longer believe it is related to this one.

        Hopefully the above workaround will assist others (until the source code is tweaked to call the new PostgreSQL 8.3 syntax).

        Cheers

        Paul

        Comment

        • machtech
          Junior Member
          • Jan 2010
          • 11

          #5
          FYI - Spin off new post per above is here, cheers. Paul

          Comment

          Working...