Ad Widget

Collapse

Need help... proxy is not returning results

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • r3dn3ck
    Member
    • Jul 2008
    • 43

    #1

    Need help... proxy is not returning results

    Here's the deal: I have ~2500 hosts in 3 data centers. DC2 and DC3 have zabbix-proxy servers in them, each host had it's install of zabbix-1.6.1 from source fresh and each uses a local MySQL instance to store local configuration and host data. DC1 houses the master and has a zabbix server and a separate DB server.

    I had it all working as a proof of concept last week but with this many hosts I needed to have the host add/change/delete automated, so we have some scripts that run XML imports to the master of configuration data for the monitored hosts and I wanted to test from an empty DB to find out if it'd work (the load works great BTW).

    The master does not directly monitor any hosts except the zabbix servers/proxy hosts (no other agent(d).conf file points to the master), all monitored hosts report to a zabbix-proxy server with their results.

    The crux of the problem is that only a few of my hosts return any data as far as the master and they do it very irregularly.

    I have confirmed that telnet to port 10051 from any of the 3 server/proxy hosts to any of the other 2 and I've confirmed that the DB,host,passwd etc.. are correct. There has to be something I've messed up but I can't figure it out.

    I know this is something stupid that I forgot but I can't seem to figure it out. The instructions for proxy are a little thin as well so those weren't much help.
  • r3dn3ck
    Member
    • Jul 2008
    • 43

    #2
    Here is the zabbix_server.conf for the master

    NodeID=1
    StartPollers=15
    StartPollersUnreachable=1
    StartTrappers=20
    StartPingers=5
    StartHTTPPollers=5
    HousekeepingFrequency=6
    SenderFrequency=30
    DebugLevel=4
    Timeout=20
    TrapperTimeout=20
    UnreachablePeriod=120
    UnavailableDelay=60
    UnavailableDelay=120
    PidFile=/var/run/zabbix/zabbix.pid
    LogFile=/var/log/zabbix/zabbix_server.log
    LogFileSize=100
    AlertScriptsPath=/var/lib/zabbix
    DBHost=qi212
    DBName=zabbix
    DBUser=****<redacted>
    DBPassword=****<redacted>


    Here are the zabbix_proxy.confs for the proxy hosts:

    NodeID=2
    Server=10.92.4.53
    ServerPort=10051
    Hostname=je10-8
    StartPollers=20
    StartPollersUnreachable=2
    StartTrappers=20
    StartPingers=5
    StartHTTPPollers=2
    SourceIP=10.102.10.8
    ListenIP=10.102.10.8
    HeartbeatFrequency=300
    ConfigFrequency=120
    HousekeepingFrequency=6
    ProxyLocalBuffer=24
    ProxyOfflineBuffer=24
    DebugLevel=4
    Timeout=20
    TrapperTimeout=20
    UnreachablePeriod=120
    UnavailableDelay=60
    UnavailableDelay=120
    PidFile=/var/run/zabbix/zabbix-proxy.pid
    LogFile=/var/log/zabbix/zabbix_proxy.log
    LogFileSize=100
    AlertScriptsPath=/home/zabbix/bin/
    ExternalScripts=/admin/ops/zabbix.ExternalScripts
    DBHost=localhost
    DBName=zabbix
    DBUser=**** <redacted>
    DBPassword=**** <redacted>
    DBSocket=/var/lib/mysql/mysql.sock

    ########################
    NodeID=3
    Server=10.92.4.53
    ServerPort=10051
    Hostname=ki5-1
    StartPollers=15
    StartPollersUnreachable=2
    StartTrappers=20
    StartPingers=5
    StartHTTPPollers=2
    HeartbeatFrequency=300
    ConfigFrequency=120
    HousekeepingFrequency=1
    ProxyLocalBuffer=24
    ProxyOfflineBuffer=24
    DebugLevel=4
    Timeout=20
    TrapperTimeout=20
    UnreachablePeriod=120
    UnavailableDelay=60
    UnavailableDelay=120
    PidFile=/var/run/zabbix/zabbix-proxy.pid
    LogFile=/var/log/zabbix/zabbix_proxy.log
    LogFileSize=100
    AlertScriptsPath=/home/zabbix/bin/
    ExternalScripts=/admin/ops/zabbix.ExternalScripts
    FpingLocation=/usr/local/mysql/bin/fping
    DBHost=localhost
    DBName=zabbix
    DBUser=**** <redacted>
    DBPassword=**** <redacted>
    DBSocket=/var/lib/mysql/mysql.sock

    Comment

    • r3dn3ck
      Member
      • Jul 2008
      • 43

      #3
      Here's a sample of the crud spitting out in my proxy logs:

      31180:20081113:084850 In substitute_simple_macros (data:"net.if.out[eth0,bytes]")
      31180:20081113:084850 End substitute_simple_macros (result:net.if.out[eth0,bytes])
      31180:20081113:084850 In int_in_list(list:,value:100100000010484)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"vfs.fs.size[/,pfree]")
      31180:20081113:084850 End substitute_simple_macros (result:vfs.fs.size[/,pfree])
      31180:20081113:084850 In int_in_list(list:,value:100100000010488)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"agent.ping")
      31180:20081113:084850 End substitute_simple_macros (result:agent.ping)
      31180:20081113:084850 In int_in_list(list:,value:100100000010491)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"net.if.out[eth1,bytes]")
      31180:20081113:084850 End substitute_simple_macros (result:net.if.out[eth1,bytes])
      31180:20081113:084850 In int_in_list(list:,value:100100000010499)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"vfs.fs.size[/,pfree]")
      31180:20081113:084850 End substitute_simple_macros (result:vfs.fs.size[/,pfree])
      31180:20081113:084850 In int_in_list(list:,value:100100000010503)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"agent.ping")
      31180:20081113:084850 End substitute_simple_macros (result:agent.ping)
      31180:20081113:084850 In int_in_list(list:,value:100100000010506)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"net.if.out[eth1,bytes]")
      31180:20081113:084850 End substitute_simple_macros (result:net.if.out[eth1,bytes])
      31180:20081113:084850 In int_in_list(list:,value:100100000010514)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"vfs.fs.size[/,pfree]")
      31180:20081113:084850 End substitute_simple_macros (result:vfs.fs.size[/,pfree])
      31180:20081113:084850 In int_in_list(list:,value:100100000010518)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"agent.ping")
      31180:20081113:084850 End substitute_simple_macros (result:agent.ping)
      31180:20081113:084850 In int_in_list(list:,value:100100000010521)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"net.if.out[eth1,bytes]")
      31180:20081113:084850 End substitute_simple_macros (result:net.if.out[eth1,bytes])
      31180:20081113:084850 In int_in_list(list:,value:100100000010529)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"vfs.fs.size[/,pfree]")
      31180:20081113:084850 End substitute_simple_macros (result:vfs.fs.size[/,pfree])
      31180:20081113:084850 In int_in_list(list:,value:100100000010533)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"agent.ping")
      31180:20081113:084850 End substitute_simple_macros (result:agent.ping)
      31180:20081113:084850 In int_in_list(list:,value:100100000010536)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"net.if.out[lo,bytes]")
      31180:20081113:084850 End substitute_simple_macros (result:net.if.out[lo,bytes])
      31180:20081113:084850 In int_in_list(list:,value:100100000010544)
      31180:20081113:084850 End int_in_list(ret:FAIL)
      31180:20081113:084850 In substitute_simple_macros (data:"system.cpu.load[,avg1]")
      31180:20081113:084850 End substitute_simple_macros (result:system.cpu.load[,avg1])
      31180:20081113:084850 In int_in_list(list:,value:100100000010553)
      31180:20081113:084850 End int_in_list(ret:FAIL)


      Notice all the (ret:FAIL), those appear to be returning failure to the master... thing is all these are known to have worked before... none of the probes that shouldn't have failed ever did until now.

      Comment

      • aivarss
        Junior Member
        • Jan 2009
        • 17

        #4
        I have similar problem (End int_in_list(ret:FAIL)) but with nodes. Slave node does not send/sync integer and float data to/with master node. Items with type Text synchronize with master node. Zabbix version 1.6.5.
        Can someone help?

        Comment

        Working...