Ad Widget

Collapse

Zabbix Network Discovery Stops

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • syntax53
    Member
    • Mar 2018
    • 40

    #1

    Zabbix Network Discovery Stops

    I just recently started dabbling in network discovery. I have a good understanding of it now, but I've noticed an issue with new devices getting picked up by existing discovery rules. Or perhaps more precisely, when an existing discovery rule is updated to capture additional devices those new devices are not discovered. I had three rules that I made these changes to. All three of these were for /23 subnets, so not too many addresses to check. The first one worked fine. The second two would not find the hosts. I had the interval set at 15 minutes at this point. It was late so I decided to go to bed. Before doing that, I set the interval to 1 hour ("1h").

    When I woke up, the devices still had not been detected. I started looking into debugging options, not seeing anything obvious in the server log. I found and ran "zabbix_server --runtime-control log_level_increase=discoverer" to try and see what was going on. After increasing it twice, I set the interval on one of my two remaining discovery rules to 15 minutes again ("15m"). As I watched the server log, after a few minutes I saw the discovery rule go into action and the missing devices were discovered.

    So I did nothing other than modify the interval (for a second time; 15m -> 1h -> 15m) and enable debugging and the devices were detected. The only thing I could be doing that possibly abnormal is I have two checks in one rule for the same SNMP OID, but with different community strings as my APs and switches have different communities. So there are two SNMPv2 checks for "SNMPv2-MIB::sysDescr.0" but with different community IDs specified for each. IP address is set for the uniqueness factor. This works as configured though as both APs and switches are detected fine. Further, if I create a brand new rule for another subnet, everything is detecting fine on the first sweep. It's only after modifying an existing rule. I also don't know what will happen in the future as I do actually just add new devices.

    Any insight?

    Code:
    24085:20180513:100144.251 __zbx_zbx_setproctitle() title:'discoverer #24 [processed 0 rules in 0.000520 sec, idle 60 sec]'
     24075:20180513:100144.251 __zbx_zbx_setproctitle() title:'discoverer #14 [processed 0 rules in 0.000470 sec, idle 60 sec]'
     24072:20180513:100144.251 __zbx_zbx_setproctitle() title:'discoverer #11 [processed 0 rules in 0.000724 sec, idle 60 sec]'
     24063:20180513:100144.251 __zbx_zbx_setproctitle() title:'discoverer #2 [processed 0 rules in 0.000787 sec, idle 60 sec]'
     24079:20180513:100144.251 __zbx_zbx_setproctitle() title:'discoverer #18 [processed 0 rules in 0.000653 sec, idle 60 sec]'
     24076:20180513:100144.251 __zbx_zbx_setproctitle() title:'discoverer #15 [processed 0 rules in 0.000584 sec, idle 60 sec]'
     24074:20180513:100144.251 get_minnextcheck(): no items to update
     24074:20180513:100144.251 __zbx_zbx_setproctitle() title:'discoverer #13 [processed 0 rules in 0.000862 sec, idle 60 sec]'
     24069:20180513:100145.243 __zbx_zbx_setproctitle() title:'discoverer #8 [processed 0 rules in 0.000392 sec, performing discovery]'
     24069:20180513:100145.243 query [txnlev:0] [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.dr
    uleid and c.uniq=1 where r.status=0 and r.nextcheck<=1526220105 and mod(r.druleid,25)=7]
     24069:20180513:100145.243 query [txnlev:0] [select count(*),min(nextcheck) from drules where status=0 and mod(druleid,25)=7]
     24069:20180513:100145.243 __zbx_zbx_setproctitle() title:'discoverer #8 [processed 0 rules in 0.000384 sec, idle 60 sec]'
     24070:20180513:100213.244 __zbx_zbx_setproctitle() title:'discoverer #9 [processed 0 rules in 0.000330 sec, performing discovery]'
     24070:20180513:100213.245 query [txnlev:0] [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.dr
    uleid and c.uniq=1 where r.status=0 and r.nextcheck<=1526220133 and mod(r.druleid,25)=8]
     24070:20180513:100213.245 query [txnlev:0] [select count(*),min(nextcheck) from drules where status=0 and mod(druleid,25)=8]
     24070:20180513:100213.245 __zbx_zbx_setproctitle() title:'discoverer #9 [processed 0 rules in 0.000288 sec, idle 60 sec]'
     24065:20180513:100226.246 __zbx_zbx_setproctitle() title:'discoverer #4 [processed 0 rules in 0.000365 sec, performing discovery]'
     24065:20180513:100226.246 query [txnlev:0] [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.dr
    uleid and c.uniq=1 where r.status=0 and r.nextcheck<=1526220146 and mod(r.druleid,25)=3]
     24065:20180513:100226.246 query [txnlev:0] [select count(*),min(nextcheck) from drules where status=0 and mod(druleid,25)=3]
     24065:20180513:100226.246 __zbx_zbx_setproctitle() title:'discoverer #4 [processed 0 rules in 0.000312 sec, idle 60 sec]'
     24067:20180513:100228.246 __zbx_zbx_setproctitle() title:'discoverer #6 [processed 0 rules in 0.000287 sec, performing discovery]'
     24067:20180513:100228.246 query [txnlev:0] [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.dr
    uleid and c.uniq=1 where r.status=0 and r.nextcheck<=1526220148 and mod(r.druleid,25)=5]
     24067:20180513:100228.247 query [txnlev:0] [select count(*),min(nextcheck) from drules where status=0 and mod(druleid,25)=5]
     24067:20180513:100228.247 __zbx_zbx_setproctitle() title:'discoverer #6 [processed 0 rules in 0.000312 sec, idle 60 sec]'
     24071:20180513:100229.246 __zbx_zbx_setproctitle() title:'discoverer #10 [processed 0 rules in 0.000546 sec, performing discovery]'
     24071:20180513:100229.247 query [txnlev:0] [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.dr
    uleid and c.uniq=1 where r.status=0 and r.nextcheck<=1526220149 and mod(r.druleid,25)=9]
     24071:20180513:100229.247 query [txnlev:0] [select count(*),min(nextcheck) from drules where status=0 and mod(druleid,25)=9]
     24071:20180513:100229.247 __zbx_zbx_setproctitle() title:'discoverer #10 [processed 0 rules in 0.000285 sec, idle 60 sec]'
     24068:20180513:100232.246 __zbx_zbx_setproctitle() title:'discoverer #7 [processed 0 rules in 0.000407 sec, performing discovery]'
     24068:20180513:100232.247 query [txnlev:0] [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.dr
    uleid and c.uniq=1 where r.status=0 and r.nextcheck<=1526220152 and mod(r.druleid,25)=6]
     24068:20180513:100232.252 In substitute_simple_macros() data:'5m'
     24068:20180513:100232.253 In process_rule() rule:'Wireless Network' range:'10.120.20.0/23'
     24068:20180513:100232.253 process_rule() range:'10.120.20.0/23'
     24068:20180513:100232.253 process_rule() ip:'10.120.20.1'
     24068:20180513:100232.253 query [txnlev:0] [select dcheckid,type,key_,snmp_community,snmpv3_securityname,snmpv3_securitylevel,snmpv3_authpassphrase,snmpv3_privpassphrase,snmpv3_authprotocol,snmpv3_privprotocol,ports,snmpv3_contextname from dchecks where druleid=6 order by dcheckid]
     24068:20180513:100232.253 In process_check()
     24068:20180513:100232.253 process_check() port:161
     24068:20180513:100232.253 In discover_service()
  • syntax53
    Member
    • Mar 2018
    • 40

    #2
    Still can't figure out a rhyme or reason for this. I was just trying to kick off a rule for a SNMP community string I fixed on one switch. I changed the interval on the rule from 12h to 60m... waited a couple minutes... nothing. Thought maybe it hadn't been an hour since it last ran or something so I changed it to 5m.... waited... nothing. I then did some database digging...

    Code:
    mysql> select * from drules;
    +---------+--------------+----------------------+----------------------------------+-------+------------+--------+
    | druleid | proxy_hostid | name                 | iprange                          | delay | nextcheck  | status |
    +---------+--------------+----------------------+----------------------------------+-------+------------+--------+
    |       3 |         NULL | AAAAAAAAAreless      | 10.190.64.2-255, 10.190.65.0-255 | 12h   | 1526317537 |      0 |
    |      16 |         NULL | AAAAAAAAAility       | 10.190.0.2-255, 10.190.1.0-255   | 12h   |          0 |      0 |
    |      18 |         NULL | AAAAAAAAASwitches    | 10.110.0.2-99, 10.110.20.2-54    | 12h   | 1526346637 |      0 |
    |      19 |         NULL | AAAAAAAAASwitches    | 10.120.0.2-99, 10.120.20.2-54    | 12h   | 1526348257 |      0 |
    |      20 |         NULL | AAAAAAAAASwitches    | 10.130.0.2-99, 10.130.20.2-54    | 12h   | 1526348257 |      0 |
    |      21 |         NULL | AAAAAAAAAitches      | 10.140.0.2-99, 10.140.20.2-54    | 12h   | 1526348317 |      0 |
    |      22 |         NULL | AAAAAAAAAtches       | 10.160.0.2-99, 10.160.20.2-54    | 12h   | 1526348317 |      0 |
    |      23 |         NULL | AAAAAAAAAll Switches | 10.150.0.2-99, 10.150.20.2-54    | 12h   | 1526348317 |      0 |
    +---------+--------------+----------------------+----------------------------------+-------+------------+--------+
    which translates by the way to:
    Code:
    +--------------------------+
    | from_unixtime(nextcheck) |
    +--------------------------+
    | 2018-05-14 13:05:37      |
    | 1969-12-31 19:00:00      |
    | 2018-05-14 21:10:37      |
    | 2018-05-14 21:37:37      |
    | 2018-05-14 21:37:37      |
    | 2018-05-14 21:38:37      |
    | 2018-05-14 21:38:37      |
    | 2018-05-14 21:38:37      |
    +--------------------------+
    ... The one with "nextcheck" value of 0 / 1969-12-31 is the one I wanted to run. So the intervals are working, but the discover process(es) are not firing. I had the number of discoverers set to 25 in the config. I changed it to 10 instead and restarted the zabbix server and then the process fired off. Seems like there is some other unknown delay in having these discoveries go off.

    Comment

    • Atsushi
      Senior Member
      • Aug 2013
      • 2028

      #3
      Have you changed the value of StartDiscoverers in zabbix_server.conf?

      If the range of the discovery network is large, it will take a long time.
      It may be improved if multiple processes are started in order to make processing in parallel.

      Comment

      • syntax53
        Member
        • Mar 2018
        • 40

        #4
        Originally posted by Atsushi
        Have you changed the value of StartDiscoverers in zabbix_server.conf?

        If the range of the discovery network is large, it will take a long time.
        It may be improved if multiple processes are started in order to make processing in parallel.
        I indicated in my last post that I had it set to to 25 and lowered it to 10. I think I may have since raised it back up to 15.

        Comment

        Working...