Ad Widget

Collapse

Best way of running tests that take a long time

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Jason
    Senior Member
    • Nov 2007
    • 430

    #1

    Best way of running tests that take a long time

    I'm looking at ways of extracting statistical information from Dell Powervault Storage units. However, to get meaningful information I really need stats over a minute or more.

    Trying to run the command direct from the agent/proxy will timeout due to the 30 second limit. I could schedule the commands as a cronjob on the host, but that requires extra manual config for each host rather than just calling a discovery routine.

    Is there a better way of doing this?

    I was toying with the idea of a script that launches the command and then reports back the output, but getting error information if it fails would be trickier.

    How have others dealt with long running tasks/commands?
  • jan.garaj
    Senior Member
    Zabbix Certified Specialist
    • Jan 2010
    • 506

    #2
    I'm not sure if it's the best way: http://zabbix.org/wiki/Escaping_timeouts_with_atd
    Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
    My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

    Comment

    • Linwood
      Senior Member
      • Dec 2013
      • 398

      #3
      I've tried a number of experiments and all suffer from some problems, especially if you disconnect the polling process from zabbix, such as in cron.

      One way you can approach this, though I only experimented slightly, was to:

      - Create an external check for an item that essentially does nothing (or returns success/failure based on the following step). This drives the polling.

      - Inside of that external check, fork to a new process twice (you have to fork twice as the second time disconnects it from the parent which will otherwise cause it to delete).

      - In the forked process, do your work, and return the values via zabbix traps which are asynchronous.

      If you do this, you only have one poll running that is not directly connected to zabbix (e.g. if you stop the server or remove a host/item), while still disconnecting the actual process from the time limits in zabbix. You do need to make sure that the poll frequency on the driving item is slower than the time it takes to run, otherwise these will build up out there as they do not finish as fast as they start. You could add a check in the first process pre-fork to look for a specific process name or other coordination mechanism to ensure that (and indeed let it pass back failure to the otherwise pointless item and you can trigger from it to show you have a timing issue).

      Comment

      • Jason
        Senior Member
        • Nov 2007
        • 430

        #4
        Currently I've got some cron jobs running and using user parameters on an agent. Whilst this is fine if monitoring a single host if I want to monitor more than 1 then I figure I need to use the proxy and external checks. It's a shame that the proxies don't support userparameter.

        I think that if I want it to work this way I'll have to use external checks that fork to do the actual run and then return using the trapper making sure I pass enough information to the external script to ensure I can identify the return trapper items. I'll use a lock file whilst the script is running and the external file can look at the existence and age of this file. If the file is there and age less than Y then another task is running. If file exists and age > Y then it's crashed or hung.

        Comment

        Working...