Ad Widget

Collapse

Increase maximum Agent script timeout...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • djmuk
    Junior Member
    • Dec 2017
    • 20

    #1

    Increase maximum Agent script timeout...

    Yes I know this has been brought up before, but the 30 sec limit on agent scripts is a pain... I am TRYING to monitor DFS Replication on windows so I have scripts / templates that will autodiscover the replication groups and create relevant items to monitor but it fails at the last hurdle becuse the script (or more accurately the dfsdiag windows utility) takes 45 seconds to return the backlog item counts - so the agent times it out. Using a trapper is NOT an option as that means being unable to use autodiscovery and having to manually create an object for every group on every server and corresponding task scheduler entries on every server. This only needs to run a few times an hour so I am not worried about the impact on 'normal' monitoring.
    Can we have the OPTION to use bigger timeouts, even if there is another parameter to say 'i know what i am doing' (eg "EnableBigTimeouts" ) before it can be enabled...

    David
  • kloczek
    Senior Member
    • Jun 2006
    • 1771

    #2
    Just try to think about one small detail that sampling data with such long timeout makes your measurements not precise on time scale.
    Solution: switch from zabbix agent/zabbix agent (active) items to zabbix trapper items where you you be able to put correct (probe finish time) timestamp.
    Other solution is just change max timeout in source code and recompile.
    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
    https://kloczek.wordpress.com/
    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
    My zabbix templates https://github.com/kloczek/zabbix-templates

    Comment

    • djmuk
      Junior Member
      • Dec 2017
      • 20

      #3
      I am not worried about time precision - given the delay in obtaining the data it doesn't make sense to sample more than every 5 minutes (and probably more like 15...). The primary resaon for the data is to alert if it exceeds a threshold so precise change over time is not important. Given that it is in SNMP terms a 'Gauge' variable and not a counter it is only ever taking a sample at a point in time.

      As I said - trapper is NOT an option because this also needs to use autodiscovery as there will be multiple replication groups and folders on any server.

      Yes I could recompile from source... If I had a day to set up a compilation environment...

      Comment

      • kloczek
        Senior Member
        • Jun 2006
        • 1771

        #4
        What you mean autodiscovery? You mean LLD?
        You can use LLD iterator item using any items types and trapper item as well.
        If you will populate set of items over trapper items all items generated by LLD can be trapper items as well.
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment

        • Hamardaban
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • May 2019
          • 2713

          #5
          I agree with kloczek. Make a script that collects the data you need and gives them through zabbix_sender . Configure this script to run on a windows server on a schedule.

          Comment

          • djmuk
            Junior Member
            • Dec 2017
            • 20

            #6
            kloczek - yes but LLD WON'T create the scheduled items on the servers to send the data - so this is a manual step and makes the solution non-scalable.
            Hamardaban - the same applies, completely non-scalable, the whole point of using templates etc is to be able to monitor additional devices without manual processes.

            Anyway - is is working well enough with the 30 sec timeout, misses occasional data points but is good enough for what I need for now (until I get bothered enough to recompile)...

            Comment

            • splitek
              Senior Member
              • Dec 2018
              • 101

              #7
              Originally posted by djmuk
              Using a trapper is NOT an option as that means being unable to use autodiscovery and having to manually create an object for every group on every server and corresponding task scheduler entries on every server.
              What you mean? You think you cant use trapper in discovery rules? Then you wrong.

              Comment

              • djmuk
                Junior Member
                • Dec 2017
                • 20

                #8
                No that is NOT what I think - I KNOW that LLD cannot create the required scheduled tasks on the monitored hosts to be able to use LLD to create and monitor Trapper objects. Yes perhaps I was wrong in my description - LLD can create trapper (or any other type) objects.

                Let's get this straight - having ANY manual operation on the monitored hosts is unacceptable for my use case. So LLD creating monitored object as Trappers only does 1/2 the job...

                Comment

                • splitek
                  Senior Member
                  • Dec 2018
                  • 101

                  #9
                  (powershell) Make script that will run you longrunning script as a job. This new script should run job or connect to it to receive status/result and return value to agent/trapper. You will get not only job done but can return "time spend on job" or "is it running now or not".

                  Comment

                  • kloczek
                    Senior Member
                    • Jun 2006
                    • 1771

                    #10
                    Originally posted by djmuk
                    kloczek - yes but LLD WON'T create the scheduled items on the servers to send the data - so this is a manual step and makes the solution non-scalable.
                    Hamardaban - the same applies, completely non-scalable, the whole point of using templates etc is to be able to monitor additional devices without manual processes.

                    Anyway - is is working well enough with the 30 sec timeout, misses occasional data points but is good enough for what I need for now (until I get bothered enough to recompile)...
                    You can schedule them on sampling data externally over even cron jobs.
                    You can use agent items to create LLD trapper items or trapper items to create agent items. There is no any limits here.
                    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                    https://kloczek.wordpress.com/
                    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                    My zabbix templates https://github.com/kloczek/zabbix-templates

                    Comment

                    • djmuk
                      Junior Member
                      • Dec 2017
                      • 20

                      #11
                      Thanks for the various answers - they did point me towards a reasonable solution..
                      LLD creates 2 items, A trapper type item and a 'dummy' item with a zabbix agent userdefined key, which executes the trapper script when called, I still got occasional timeouts but it seemed more robust. I then changed the 'dummy' item to be a system.run agent key and that seems to be working OK.

                      So I just need to add the agent config file entries and scripts to my agent distribution.

                      Does anyone know if system.run is subject to the agent execution timeout?

                      thanks
                      David

                      Comment

                      • max.ch.88
                        Senior Member
                        • Oct 2018
                        • 206

                        #12
                        You could read about "system.run" here https://www.zabbix.com/documentation...s/zabbix_agent.
                        But I would suggest to use LLD rule with type "Zabbix agent (active)".

                        Comment

                        Working...