Ad Widget

**Linwood** · 24-05-2016, 18:11

Depending on the flavor of unix, you can tell if it's running by looking here:

ps -F -A | grep -i discovery #<<< get PID’s

The process strings should be self explanatory, and if you want to see what host it is on (say it is scanning a subnet) you can do:

strace -p 12345

where 12345 is the PID. I've found these two together to be the simplest way to tell when it is done, provided there is adequate delay between it rerunning as it won't show you whether it is on run 1, 2, 3, etc.

Debugging is harder, you can use the runtime control of zabbix_server (if you are on ... I think it was 2.4.6 or later) to increase the debug level of just the discover process, and get way more detail than you want in the server log if you set it to 4 (with 3 you get way less than you need). It's helpful to limit the number of discoveries running in that case.

If you are doing wide subnets (say more than a class C), I find it helpful to divide it up into no more than a class C, even wrote a (rather safe) routine to clone a class B into 256 class C discoveries so they run more in parallel. If you have SNMP and other protocol polling (not just a ping) in there, large numbers of candidate addresses can take a long time to run. I've also found it helpful at times to discover with ping, then clone that discovery into one for each host found and re-discover all the other protocols just for those IP's. If you had (say) 65534 possible addresses and 150 hosts, it is MUCH faster to poll the 150 specifically for services than all 65534 and let them time out for each protocol.

**perun.84** · 25-05-2016, 09:06

First of all thanks a lot for answer.

I'm doing snmp v2 discovery on very large network. I'm planning to install zabbix proxy servers for parts of network. I wrote 6 /16 subnets for discovery. It's tough to divide it to /24 subnets (I don't know if it is even possible). What about number of discoverer? How many of them should I start if zabbix server has 16G of RAM and 4vCPU-s?

**Linwood** · 25-05-2016, 14:42

Originally posted by perun.84

First of all thanks a lot for answer.

I'm doing snmp v2 discovery on very large network. I'm planning to install zabbix proxy servers for parts of network. I wrote 6 /16 subnets for discovery. It's tough to divide it to /24 subnets (I don't know if it is even possible). What about number of discoverer? How many of them should I start if zabbix server has 16G of RAM and 4vCPU-s?

I have no numerical guidance but found generally they put very little load on the system, I ran around 150 at a time. They can add up to a significant network load if you are on a low bandwidth WAN doing the discovery though with large numbers.

But I'm unclear what you mean by /16 not dividing into /24.

In the best scenarios in a /16, you know what the 3rd digit might be. Say you have 10.1.x.y as a subnet, but you actually know that the x is only 1, 2, 3 or 4, and you haven't used the rest. You could do 4 separate /24 searches then, and those run in parallel. If you search using 1-4 it will use only one poller.

So far as I know (from observation not looking at the code), each discovery range is executed sequentially. This means if you discover:

10.1.0-255.0-255

That it will, one at a time, test all 65536 entries, and in the vast majority have to wait through the entire timeout period as nothing will be there. But if you actually had random usage throughout the whole range, and instead set up 256 entries:

10.1.0.0-255
10.1.1.0-255
10.1.2.0-255
etc.

Then these run 256 in parallel (limited by the number of pollers), and will be done MUCH faster, but with the same result. The only time this doesn't work is if you are looking in the discovery actions for very specific rule names as opposed to service names or responses as then you would need to clone all the actions as well.

I mention this in case you have what I ran into the last site -- they had put very sparse /16 networks, and no one could tell me much of anything about what was in use, and wanted me to just hunt them all down. Over WANs. That's why I started decomposing /16's into /24's for the discovery scan, so I could finish before my attention span wandered.

**perun.84** · 25-05-2016, 14:46

I set a lot of /24 subnets now. With strace I've noticed next, after 25-30 adresses check discovery proces is restarting (go back to first address). Strace says:

sendmsg(8, {msg_name(16)={sa_family=AF_INET, sin_port=htons(161), sin_addr=inet_addr("10.1.0.27")}, msg_iov(1)=[{"0)\2\1\1\4\6xxxxx\240\34\2\4[\204r\212\2\1\0\2\1\0000\0160\f\6"..., 43}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 43
select(9, [8], NULL, NULL, {3, 999992}) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=21163, si_uid=996} ---
close(3) = 0
exit_group(1) = ?
+++ exited with 1 +++

After that it goes back to 10.1.0.1 address.:-/

**Linwood** · 25-05-2016, 14:56

That seems odd. I have never explored failure conditions to see what it does, but I would assume that means it terminated unexpectedly. I'd suggest running one at time, in debug mode for discovery only, and see if there are errors.

Is zabbix itself happy, the internal checks items all look good for caches and other processes?

**perun.84** · 27-05-2016, 09:03

I don't know how to set discovery in debug mode..

**perun.84** · 27-05-2016, 11:02

I found way for debug. I have following message:

5403:20160527:100008.236 Got signal [signal:15(SIGTERM),sender_pid:5128,sender_uid:996, reason:0]. Exiting ...

And after that, discovery process is being restarted.

**perun.84** · 27-05-2016, 13:13

I found pid of discovery process and I caught logs of discovery. Problem was in low value for TrendCacheSize. When I increase it, discovery seems to be OK. Now there are no service resets. Thanks.

**Linwood** · 27-05-2016, 15:17

Glad you got it.

For those curious the nice feature in latter zabbix that allows debug control at runtime is described here:

Runtime loglevel changing

And example command:

$ zabbix_server --runtime-control log_level_increase=trapper

Here "trapper" is an example and the full list is under the internal items checks in the manual but includes for example some common ones:

discoverer
poller
http poller
icmp pinger
snmp trapper

Quote ones in spaces. Obviously remember to do a "decrease' after.

The tough one is that LLD is included in poller, so it's really hard to debug those without a lot of noise from regular polling activity. I've even at times disabled ever host but one to limit the noise then turned them back on after debugging. It would sure be nice to have a debug for "host=xxxx".

Ad Widget

Debug Autodiscovery

Debug Autodiscovery

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment