Ad Widget

Collapse

An advanced configuration question

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • iyossi
    Junior Member
    • May 2016
    • 15

    #1

    An advanced configuration question

    Hi
    We would like start using Zabbix in our system, and I would like to get your recommendation how to do it, since this is not the common case:

    We have one NMS application, that manages few EMS stations, and each one manages hundreds of devices. We would like Zabbix to monitor the devices, but all the monitoring will be by calling our scripts that check various aspects of the devices. (the device will be accessed by the script remotely by telnet/ssh, because we are not permitted to install there any software)
    When the user wants to monitor one aspect of the device, he will want to run the same script for each device in a specific EMS, which means hundreds (or even thousands) of tests at the same time .
    What is the best way to do it ?
    I guess that each EMS station should be an agent, but even using active agents means many tests at the same time.
    Thanks for any help
    Regards
    Yossi
  • Linwood
    Senior Member
    • Dec 2013
    • 398

    #2
    Just one general comment. If you have a high-cost (in terms of run time or resources) check you are doing, try to make it gather a lot of information and upload via zabbix_sender.

    So if you have to ssh into a device to get items, and there are 10 things you want to monitor, do one external check routine for one item, and inside of that check have it return the other 9 items via zabbix_sender, so you only need to log in once.

    One issue you might hit is that there's a hard coded timeout limit of 30 seconds for checks, so you need to be able to get in and out in less than that (there is a configuration file setting for timeout, it just can't be over 30 without patching the system).

    I am guessing that you cannot install software on the EMS stations, but if you could you might put a zabbix proxy there, so it could do the polling locally and consolidate and feed back to a central zabbix server to provide some efficiency.

    I guess for people to be much more help, it might be worthwhile to indicate where you expect the issue -- is it bandwidth from a central site to the EMS locations, to the actual devices, speed of response from the devices (independent of bandwidth), load on the zabbix server, interference with the existing NMS, or with the EMS stations, or...

    Comment

    • iyossi
      Junior Member
      • May 2016
      • 15

      #3
      Originally posted by Linwood
      Just one general comment. If you have a high-cost (in terms of run time or resources) check you are doing, try to make it gather a lot of information and upload via zabbix_sender.

      So if you have to ssh into a device to get items, and there are 10 things you want to monitor, do one external check routine for one item, and inside of that check have it return the other 9 items via zabbix_sender, so you only need to log in once.
      This sounds like a good idea, I'll try it

      One issue you might hit is that there's a hard coded timeout limit of 30 seconds for checks, so you need to be able to get in and out in less than that (there is a configuration file setting for timeout, it just can't be over 30 without patching the system).
      I have no problem patching it, but I guess there is a good reason for such a timeout.
      But, if I continue with your first idea, can't I fork from the first/main external check item, exit the check in few seconds, and send the many results in the next few minutes ?

      Regards,
      Yossi
      Last edited by iyossi; 03-07-2016, 07:37.

      Comment

      • Linwood
        Senior Member
        • Dec 2013
        • 398

        #4
        Originally posted by iyossi
        I have no problem patching it, but I guess there is a good reason for such a timeout.
        My GUESS is that the limited number of polling processes do not get filled with ones in a wait state, but I really am not sure. I felt the same way and didn't, though I still might, I find some powershell scripts tend to just barely compelte in time, and sometimes not quite in time.
        Originally posted by iyossi
        But, if I continue with your first idea, can't I fork from the first/main external check item, exit the check in few seconds, and send the many results in the next few minutes ?
        Sure. I had one like that which I forked (you need to fork twice of course to keep the parent's demise from taking out the child), and then had it loop to send values without polling. I ended up not using it as I found a better way; mine was a pain as it was going to run for hours (looping, not each test), and in particular while debugging I hated having to track down all those processes and kill them. But there's no reason in principle you can't let them run as long as you want and send with zabbix_sender. Of course, by being disconnected you can't control them from zabbix (enable/disable/restarts/config changes, etc.).

        Comment

        Working...