Ad Widget

Collapse

Weird discovery problem, "fping failed" with 0% loss

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Infidel
    Junior Member
    • Jun 2014
    • 2

    #1

    Weird discovery problem, "fping failed" with 0% loss

    I'm trying to use host discovery in Zabbix but have some weird problem with fping.

    The discovery rule is configured like this:
    IP range: 192.168.7.1-254
    Delay: 7200 seconds
    Checks: ICMP ping
    Enabled: Check

    Zabbix shows no hosts in Monitoring-Discovery. Although, every 2 hours I see a series of lines in zabbix_server.log like this:

    Code:
    ....
     24380:20140626:114402.679 fping failed: "192.168.7.56 : [1], 84 bytes, 2.15 ms (1.99 avg, 0% loss)"
     24380:20140626:114406.900 fping failed: "192.168.7.57 : [1], 84 bytes, 2.04 ms (1.97 avg, 0% loss)"
     24380:20140626:114430.065 fping failed: "192.168.7.62 : [1], 84 bytes, 2.57 ms (2.61 avg, 0% loss)"
     24380:20140626:114443.992 fping failed: "192.168.7.65 : [1], 84 bytes, 2.14 ms (2.00 avg, 0% loss)"
     24380:20140626:114446.736 fping failed: "192.168.7.66 : [1], 84 bytes, 1.97 ms (1.77 avg, 0% loss)"
     24380:20140626:114450.691 fping failed: "192.168.7.67 : [1], 84 bytes, 2.24 ms (1.98 avg, 0% loss)"
    ....
    The lines mention all live hosts. I suppose, the discovery process is working as intended but something prevents it from interpreting fping output correctly. Though I have a ton of active items pinging many hosts via icmpping[] and icmppingsec[] keys working perfectly on the same server.

    Upgraded Zabbix from 2.0.5 to 2.2.4 - still the same problem.
    Upgraded fping from 3.4 to 3.10 - still no change.
    Zabbix server runs on FreeBSD 8.4/i386 in a Hyper-V virtual machine. I remember the discovery working some months ago when the system was just installed initially but have no idea what change broke it and when.

    Any ideas?
  • Infidel
    Junior Member
    • Jun 2014
    • 2

    #2
    Trying to isolate the issue further I have enabled debug logs and even dove a bit into the source code.

    Here are exerpts from debug logs related to the issue with all unrelated activity filtered out:
    Code:
     30544:20140626:121541.623 In discovery_update_host()
     30544:20140626:121541.623 End of discovery_update_host()
     30544:20140626:121541.623 query [txnlev:1] [commit;]
     30544:20140626:121541.623 process_rule() IP:'192.168.7.154'
    ....
     30544:20140626:121542.258 query [txnlev:0] [select dcheckid,type,key_,snmp_community,snmpv3_securityname,snmpv3_securitylevel,snmpv3_authpassphrase,snmpv3_privpassphrase,snmpv3_authprotocol,snmpv3_privprotocol,ports,snmpv3_contextname from dchecks where druleid=7 order by dcheckid]
     30544:20140626:121542.258 In process_check()
     30544:20140626:121542.258 process_check() port:0
     30544:20140626:121542.258 In discover_service()
     30544:20140626:121542.258 In do_ping() hosts_count:1
     30544:20140626:121542.258 In process_ping() hosts_count:1
     30544:20140626:121542.258 /tmp/zabbix_server_30544.pinger
     30544:20140626:121542.258     192.168.7.154
     30544:20140626:121542.258 /usr/local/sbin/fping -C3 2>&1 </tmp/zabbix_server_30544.pinger;/usr/local/sbin/fping6 -C3 2>&1 </tmp/zabbix_server_30544.pinger;
     30544:20140626:121542.262 read line [192.168.7.154 : [0], 84 bytes, 1.94 ms (1.94 avg, 0% loss)]
    ....
     30544:20140626:121543.712 read line [192.168.7.154 : [1], 84 bytes, 1.60 ms (1.77 avg, 0% loss)]
    ....
     30544:20140626:121544.346 End of process_ping()
     30544:20140626:121544.346 fping failed: "192.168.7.154 : [1], 84 bytes, 1.60 ms (1.77 avg, 0% loss)"
     30544:20140626:121544.346 End of do_ping():NOTSUPPORTED
     30544:20140626:121544.346 End of discover_service():FAIL
     30544:20140626:121544.346 query [txnlev:1] [begin;]
     30544:20140626:121544.346 In discovery_update_service() ip:'192.168.7.154' port:0 status:1
     30544:20140626:121544.346 In discovery_register_host() ip:'192.168.7.154' status:1 value:''
     30544:20140626:121544.346 query [txnlev:1] [select dh.dhostid,dh.status,dh.lastup,dh.lastdown from dhosts dh,dservices ds where ds.dhostid=dh.dhostid and dh.druleid=7 and ds.ip='192.168.7.154' order by dh.dhostid]
     30544:20140626:121544.347 End of discovery_register_host()
     30544:20140626:121544.347 End of discovery_update_service()
     30544:20140626:121544.347 query [txnlev:1] [commit;]
     30544:20140626:121544.347 End of process_check()
     30544:20140626:121544.347 query [txnlev:1] [begin;]
     30544:20140626:121544.347 In discovery_update_host()
     30544:20140626:121544.347 End of discovery_update_host()
     30544:20140626:121544.347 query [txnlev:1] [commit;]
     30544:20140626:121544.347 process_rule() IP:'192.168.7.155'
    Here are exerpts from the same log for icmppingsec[] item update from the same network and at the same time (no issues):
    Code:
     30539:20140626:121511.215 In get_pinger_hosts()
     30539:20140626:121511.215 In DCconfig_get_poller_items() poller_type:3
     30539:20140626:121511.215 End of DCconfig_get_poller_items():3
     30539:20140626:121511.215 In substitute_key_macros() data:'icmpping[,3,1000,2000,5000]'
     30539:20140626:121511.215 End of substitute_key_macros():SUCCEED data:'icmpping[,3,1000,2000,5000]'
     30539:20140626:121511.215 In add_icmpping_item() addr:'10.18.0.142' count:3 interval:1000 size:2000 timeout:5000
     30539:20140626:121511.215 End of add_icmpping_item()
     30539:20140626:121511.215 In substitute_key_macros() data:'icmppingsec[,3,1000,,3000,]'
     30539:20140626:121511.215 End of substitute_key_macros():SUCCEED data:'icmppingsec[,3,1000,,3000,]'
     30539:20140626:121511.215 In add_icmpping_item() addr:'10.18.0.142' count:3 interval:1000 size:0 timeout:3000
     30539:20140626:121511.215 End of add_icmpping_item()
     30539:20140626:121511.215 In substitute_key_macros() data:'icmppingsec[,3,1000,,3000,]'
     30539:20140626:121511.215 End of substitute_key_macros():SUCCEED data:'icmppingsec[,3,1000,,3000,]'
     30539:20140626:121511.215 In add_icmpping_item() addr:'192.168.7.73' count:3 interval:1000 size:0 timeout:3000
     30539:20140626:121511.215 End of add_icmpping_item()
     30539:20140626:121511.215 End of get_pinger_hosts():3
     30539:20140626:121511.215 In process_pinger_hosts()
     30539:20140626:121511.215 In add_pinger_host() addr:'192.168.7.73'
     30539:20140626:121511.215 End of add_pinger_host()
     30539:20140626:121511.215 In add_pinger_host() addr:'10.18.0.142'
     30539:20140626:121511.215 End of add_pinger_host()
     30539:20140626:121511.215 icmp pinger #3 [pinging hosts]
     30539:20140626:121511.215 In do_ping() hosts_count:2
     30539:20140626:121511.215 In process_ping() hosts_count:2
     30539:20140626:121511.215 /tmp/zabbix_server_30539.pinger
     30539:20140626:121511.215     192.168.7.73
     30539:20140626:121511.215     10.18.0.142
     30539:20140626:121511.215 /usr/local/sbin/fping -C3 -p1000 -t3000 2>&1 </tmp/zabbix_server_30539.pinger;/usr/local/sbin/fping6 -C3 -p1000 -t3000 2>&1 </tmp/zabbix_server_30539.pinger;
     30539:20140626:121511.218 read line [192.168.7.73 : [0], 84 bytes, 1.87 ms (1.87 avg, 0% loss)]
     30539:20140626:121511.250 read line [10.18.0.142  : [0], 84 bytes, 2.00 ms (2.00 avg, 0% loss)]
    ....
     30539:20140626:121512.219 read line [192.168.7.73 : [1], 84 bytes, 2.14 ms (2.00 avg, 0% loss)]
     30539:20140626:121512.250 read line [10.18.0.142  : [1], 84 bytes, 2.15 ms (2.07 avg, 0% loss)]
    ....
     30539:20140626:121513.251 read line [192.168.7.73 : [2], 84 bytes, 1.63 ms (1.88 avg, 0% loss)]
    ....
     30539:20140626:121513.666 read line [10.18.0.142  : [2], 84 bytes, 1.71 ms (1.95 avg, 0% loss)]
     30539:20140626:121513.666 read line []
     30539:20140626:121513.666 read line [192.168.7.73 : 1.87 2.14 1.63]
     30539:20140626:121513.666 read line [10.18.0.142  : 2.00 2.15 1.71]
    ....
     30539:20140626:121514.006 read line [192.168.7.73: hostname nor servname provided, or not known]
     30538:20140626:121514.070 read line [10.18.0.138: hostname nor servname provided, or not known]
     30539:20140626:121514.148 read line [10.18.0.142: hostname nor servname provided, or not known]
     30539:20140626:121514.148 End of process_ping()
     30539:20140626:121514.148 End of do_ping():SUCCEED
     30539:20140626:121514.148 In process_values()
     30539:20140626:121514.148 Host [192.168.7.73] cnt=3 rcv=3 min/max/avg=0.001630/0.002140/0.001880
     30539:20140626:121514.148 In process_value()
     30539:20140626:121514.148 End of process_value()
     30539:20140626:121514.148 Host [10.18.0.142] cnt=3 rcv=3 min/max/avg=0.001710/0.002150/0.001953
     30539:20140626:121514.148 In process_value()
     30539:20140626:121514.148 End of process_value()
     30539:20140626:121514.148 End of process_values()
    It looks like something interferes with spawned fping process and either truncates or mangles its output. That makes Zabbix unable to collect final (3rd) result of the ping, so it bails out after getting first two results only.

    I found the process_ping function in the source and looked through it but since I have only minimal C experience, I've got no idea where to dig further.

    Comment

    Working...