Ad Widget

Collapse

LLD Discovery with external script cannot test

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Linwood
    Senior Member
    • Dec 2013
    • 398

    #1

    LLD Discovery with external script cannot test

    Version 6.0.5 on ubuntu 20.04.

    I have an external script that returns a discovery of disk partitions on windows. It's external as it does a few other things, but I decided I wanted to filter the name returned and tried using the preprocessing with regular expression.

    It fails, so I wanted to test run it. The test run (get value from host) times out. It does not time out run normally. I updated the debug level and it looks like it tries to run and times out.

    Here is some output from the debug level when running the test from the LLD in the gui.

    Code:
    1437726:20220604:174005.160 End of substitute_key_macros_impl():SUCCEED data:'snmpDiskDiscovery.pl["192.168.130.14","LEF","xxx"]'
    1437726:20220604:174005.160 In get_value() key:'snmpDiskDiscovery.pl["{HOST.CONN}","{HOST.HOST}","{$SNMP_COMMUNITY} "]'
    1437726:20220604:174005.160 In get_value_external() key:'snmpDiskDiscovery.pl["192.168.130.14","LEF","xxx"]'
    1437726:20220604:174005.160 In zbx_popen() command:'/usr/local/share/zabbix/externalscripts/snmpDiskDiscovery.pl '192.168.130.14' 'LEF' 'xxx''
    1437856:20220604:174032.862 Failed to execute command "/usr/local/share/zabbix/externalscripts/snmpDiskDiscovery.pl '192.168.130.14' 'LEF' 'xxx'": Timeout while executing a shell script.
    Here is a test run using the zabbix server account cut and pasted from the above:

    Code:
    root@nms:/tmp# su - zabbix
    su: warning: cannot change directory to /home/zabbix: No such file or directory
    $ /usr/local/share/zabbix/externalscripts/snmpDiskDiscovery.pl '192.168.130.14' 'LEF' 'xxx'
    [
    {"{#SNMPINDEX}":"1", "{#SNMPVALUE}":"C:\\ Label: Serial Number a8db00b"},
    {"{#SNMPINDEX}":"3", "{#SNMPVALUE}":"T:\\ Label:VDisk Serial Number 22f86f4"}
    ]
    $
    As you can see it runs fine, and fast (under a second, and timeout is set to 30 seconds).

    I'm drawing a blank as to what the context could be in the GUI call that is different and causing it to time out. Again, while my preprocessing step is not working, the script itself runs fine called from the server in the normal course of polling.

    What's different about the test mode?

    I'm also curious if there's something special you do to use preprocessing on external checks, is it running the regular expression only on the {#SNMPVALUE} replacement, or the whole line, or.... ? I haven't found an example from anyone using an external check.

    Thanks,
    Linwood

    PS. I can't test it even if there are no preprocessing steps.
  • josoko
    Junior Member
    • Feb 2021
    • 25

    #2
    Regarding the entry:

    Code:
    1437856:20220604:174032.862 Failed to execute command "/usr/local/share/zabbix/externalscripts/snmpDiskDiscovery.pl '192.168.130.14' 'LEF' 'xxx'": Timeout while executing a shell script.
    Perhaps a permission issue while Zabbix is trying to execute the script?

    Comment

    • Linwood
      Senior Member
      • Dec 2013
      • 398

      #3
      Originally posted by josoko
      Regarding the entry:
      Perhaps a permission issue while Zabbix is trying to execute the script?
      [/CODE]

      Only if the GUI is doing the test in a context other than the zabbix server. Does it? I tested (in the OP) with the zabbix server account which works.

      The debug is from the zabbix server log, so I assume it has to be running in the zabbix server account context.

      Comment

      • Markku
        Senior Member
        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
        • Sep 2018
        • 1781

        #4
        The explanation that comes in my mind is that your script behaves differently when it is running non-interactively.

        Add some debugging outputs in the script and run it via cron (in a way that the outputs get emailed to you, or save the debugging outputs to a logfile, using logger command or otherwise), to find out the problematic feature.

        Markku

        Comment

        • Linwood
          Senior Member
          • Dec 2013
          • 398

          #5
          Originally posted by Markku
          The explanation that comes in my mind is that your script behaves differently when it is running non-interactively.

          Add some debugging outputs in the script and run it via cron (in a way that the outputs get emailed to you, or save the debugging outputs to a logfile, using logger command or otherwise), to find out the problematic feature.

          Markku
          I must not be being clear...

          The script runs fine as a discovery external check. If I hit the "Execute now" button it runs without error and quickly. The script has been in use for years and multiple sites, no issues.

          It is only if I hit the "test" button it times out.

          The reason for testing had nothing to do with the script, it was to test a newly added preprocessing step. Removing that preprocessing step has no impact - the "execute now" still works, the "test" still times out.

          I am not asking about the difference in running interactively from a shell prompt and running in the server -- I am asking about what is different in the "test" button's invocation inside the server and the regular (or execute now) invocation in the server.

          Comment

          • markfree
            Senior Member
            • Apr 2019
            • 868

            #6
            Linwood , while hiding sensitive data, could you share your host parameters and test parameters?

            Comment

            • Linwood
              Senior Member
              • Dec 2013
              • 398

              #7
              Sure. This may help, a screen shot. On the underlying screen if I "execute now" it works. Indeed, the preprocessing rule is gone, and the discovery external check is working fine at present. But if I hit the test button, and get-value, it times out.

              No host address is shown, nor can any be entered, I think because the external check relies on parameter macros.

              The actual external check key is: snmpDiskDiscovery.pl["{HOST.CONN}","{HOST.HOST}","{$SNMP_COMMUNITY} "]

              So the macros shown are correct, and indeed in the server debug it appears to have resolved them properly and even quoted them properly.

              I'm starting to wonder if the test button is even designed to work for external check discovery rules?

              Click image for larger version

Name:	lld_check.jpg
Views:	963
Size:	115.5 KB
ID:	445640

              Comment

              • Linwood
                Senior Member
                • Dec 2013
                • 398

                #8
                ** Update: Not this ***

                Hmmm... possible clue... I went around testing other external LLD checks, and those returning a single value work, those returning multiple values fail with a timeout. I do not have a lot like this, but so far it's been consistent. Returns of zero or one do not time out and return appropriate values, returns of 2+ timeout (and again, all these are working scripts used in production).

                ** Update later ** I found one with one return that also timed out, so it's not that simple. But something about this "test" option causes working scripts to time out.
                Last edited by Linwood; 05-06-2022, 16:44.

                Comment

                • markfree
                  Senior Member
                  • Apr 2019
                  • 868

                  #9
                  Your issue is very curious. If I'm not mistaken, the frontend sends a corresponding request to the server and waits for the result. So, the Server would execute the test as well.

                  Comment

                  • Linwood
                    Senior Member
                    • Dec 2013
                    • 398

                    #10
                    That's what I would expect as well, and indeed I see it appearing to do that in the debug log. That's why this was puzzling. I fixed my preprocessing need by just changing my script to do the preprocessing, but I am curious what is causing this. Just not sure (short of a deep dive into the code which I do not really have time for right now) how to figure it out.

                    Comment

                    Working...