Ad Widget

Collapse

Zabbix Server process not picking up complete server config file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cdslaughter
    Member
    • Jun 2018
    • 69

    #1

    Zabbix Server process not picking up complete server config file

    I have an interesting yet frustration issue with my Zabbix server (v4.0) I am troubleshooting a notification issue and noticed the following issue.

    In my /etc/zabbix/zabbix_server.conf config file, I have the alert scripts directory set to /usr/local/bin.
    Code:
    ### Option: AlertScriptsPath
    #       Full path to location of custom alert scripts.
    #       Default depends on compilation options.
    #
    # Mandatory: no
    # Default:
    # AlertScriptsPath=${datadir}/zabbix/alertscripts
    
    AlertScriptsPath=/usr/local/bin
    but even after a restart, the zabbix server keeps reporting that it is using the default directory for my alerts.
    (From my Action Log)
    Cannot exectue command "/usr/lib/zabbix/alertscripts/zbx-notify": [2] No such file or directory
    I am asking this question because sometimes my alert script works (I have copies if the script in both the default location AND /usr/local/bin) and sometimes it fails with the above error. I am perplexed as to why it sometimes fails and sometimes works.

    Might anyone know what I am missing by chance?

    Carl
  • LenR
    Senior Member
    • Sep 2009
    • 1005

    #2
    Could some of the media entries have a full path? Does the script have a reference to the /usr/lib/zabbix/ path or any environment vars? Does zabbix have the proper permissions into the files? Does the server log show that this is the config file used?

    Comment

    • cdslaughter
      Member
      • Jun 2018
      • 69

      #3
      Originally posted by LenR
      Could some of the media entries have a full path?
      A: No, I tried specifying the full path, but that returned an error showing that zabbix prepends the name of the script with the "AlertScripts" path automatically.

      Does the script have a reference to the /usr/lib/zabbix/ path or any environment vars?
      A:No as it is not needed.. It is simply passing the variables passed ($!, $2, and $3) to a curl command.

      Does zabbix have the proper permissions into the files?
      A: Yes. Keep in mind that sometimes it runs the script without issue. See attached screenshot.

      Does the server log show that this is the config file used?
      A: Yes.

      There seems to be no rime or reason for when Zabbix fails to run the script and when it works without issue. It sort of seems like depending on which of the alerter instances picks up the event determines if it fails or not.

      Click image for larger version

Name:	zabbix.error.png
Views:	1175
Size:	127.1 KB
ID:	372613

      Comment

      • LenR
        Senior Member
        • Sep 2009
        • 1005

        #4
        I think those would be handled by the alerter process, maybe increase log level and see if it adds any info

        zabbix_server -R log_level_increase=alerter

        Comment

        • cdslaughter
          Member
          • Jun 2018
          • 69

          #5
          Originally posted by LenR
          I think those would be handled by the alerter process, maybe increase log level and see if it adds any info

          zabbix_server -R log_level_increase=alerter
          I tried that, but the only thing I am getting are the successful sends in the log file.
          I tried increasing all the logging and comparing the results of a successful send and a failed. From what I can tell, the poller process evaluates the trigger successfully, then inserts the row into the database alerts table (so that the alerter process can pick it up). the successful alerts show another query by the alerter process that successfully returns 1 row, but on the failed ones, the same query returns 0 rows.
          It is very confusing. Not sure what else I can do at this point, I cant really change the query the alerter process is using so that it always returns the correct value..

          Carl

          Comment

          • LenR
            Senior Member
            • Sep 2009
            • 1005

            #6
            Do you have more than one action that would process the same event? Maybe increase the logging level of the alert manager, I think you have to do that by pid #

            Comment

            • cdslaughter
              Member
              • Jun 2018
              • 69

              #7
              Originally posted by LenR
              Do you have more than one action that would process the same event? Maybe increase the logging level of the alert manager, I think you have to do that by pid #

              I did have more than one type of alert going out for each event, but I can try setting it to just 1.
              As for alert manager, I was able to increase it's logging to max, but did not see anything useful in it either.

              Comment

              • cdslaughter
                Member
                • Jun 2018
                • 69

                #8
                my prod server is a Cent 7 running zabbix 4.0

                I have tested it working with the exact same scripts and config on an older test system I have running 3.4 on ubuntu 16.04

                So I know the scripts should work. I just cant seem to identify to root cause of what's going wrong here.

                Comment

                • cdslaughter
                  Member
                  • Jun 2018
                  • 69

                  #9
                  I have tried putting the full path in the media type script section, but it just errors.

                  Click image for larger version

Name:	Script error1.png
Views:	1101
Size:	6.1 KB
ID:	372704

                  I also tried setting the zabbix-server.conf
                  Code:
                  AlertScriptsPath=/
                  so that the full path of the script had to be provided in the media type configuration, but that did not work either.. It just used the default.

                  Comment

                  • cdslaughter
                    Member
                    • Jun 2018
                    • 69

                    #10
                    I did some additional testing and cloned the action that is having the issue. I modified the clone to run a command instead of sending an alert via the media-type configured.
                    In the remote command, I used the exact same script in the exact same location as is configured in the media-type. This way for each time that an event occurs that triggers alert via the media type, it also runs a remote command that does the same thing. This confirms that the zabbix process can access the files/directories needed.

                    Thus far, I can see that it executes the remote command every time without issue.
                    Click image for larger version

Name:	actions.png
Views:	1130
Size:	12.1 KB
ID:	372706

                    Comment

                    • cdslaughter
                      Member
                      • Jun 2018
                      • 69

                      #11
                      ok, I found the issue, it is a bug in the alerter or alert manager process.

                      Basically when an event occurs and an action condition evaluation results in a true, there is an entry in the actions table in the database.
                      Below is an example of one of these entries. as it would be added. Note that the status is set to zero. Note that in this specific example, the operation is to send a message to a user via media-type #5 which in my case is a custom media type.
                      Code:
                      [localhost zabbix40]>select * from alerts where status=0\G
                      *************************** 1. row ***************************
                            alertid: 23082
                           actionid: 30
                            eventid: 1339772
                             userid: 56
                              clock: 1548363103
                        mediatypeid: 5
                             sendto: @carlslaughter
                            subject: Problem: Random event test. Value is 1 1339772 {ALERT.ID}
                            message: Problem started at 15:51:42 on 2019.01.24
                             status: 0
                            retries: 0
                              error:
                           esc_step: 1
                          alerttype: 0
                          p_eventid: NULL
                      acknowledgeid: NULL
                      1 row in set (0.00 sec)
                      During my testing I had created a second action that ran the same script locally as a remote command to test if it was a file permissions issue. I found that running the script as a remote command ALWAYS worked without issue this indicating that the zabbix server process DOES have the necessary access (Both read and execute) to the file. Note that THIS action ALWAYS worked successfully and never failed or had any issues. Keep in mind that it was setup using the same file in the same location with the same parameters as the custom media-type that is failing intermittently. Below is an example of an entry in the actions table using this method.

                      Code:
                      *************************** 4. row ***************************
                            alertid: 23031
                           actionid: 32
                            eventid: 1339630
                             userid: NULL
                              clock: 1548360981
                        mediatypeid: NULL
                             sendto:
                            subject:
                            message: Zabbix server:/usr/local/bin/zbx-notify @carlslaughter 'This is from the remote command. Event ID is 1339630' '202719lpadaptec.mhsl.test.local Random event test. Value is 3' --api_token=(obfuscate) --slack --no-fork
                             status: 1
                            retries: 0
                              error:
                           esc_step: 1
                          alerttype: 1
                          p_eventid: NULL
                      acknowledgeid: NULL

                      I noticed that when an action that used the custom notification failed, I could simply update the table in the database setting status=0 reties=1 error='' that it did not automatically try and pick that up and try again. In order for zabbix to try and pick that record back up and try again, I had to reset the alerter process (kill -9 {alerter process IDs here}) or just "service zabbix-server restart". once the kill / restart was complete, the alerter process would pick the records that I had reset to "status=0 reties=1 error=''" and it would process them like normal AND do so successfully!

                      I will be doing some additional testing around this and possibly writing a script to automate this as a cron job. Hopefully anyone else that runs into this can use this to help them solve their issue until this gets fixed.

                      I would enjoy having a chance to talk to one of the devs if they care to. I used to work on a very similar where I have to troubleshoot these kinds of bugs on a very regular basis.

                      Carl

                      Comment

                      • dimir
                        Zabbix developer
                        • Apr 2011
                        • 1080

                        #12
                        Alert manager caches some data, perhaps this is why restart helps. But will forward this to one of the devs that were working on this functionality. If you are sure there's a bug, I suggest creating an issue and continue the discussion there.

                        Comment

                        • dimir
                          Zabbix developer
                          • Apr 2011
                          • 1080

                          #13
                          One of suggestions is to check if AlertScriptsPath is specified more than once in the config file.
                          Last edited by dimir; 25-01-2019, 10:41.

                          Comment

                          • cdslaughter
                            Member
                            • Jun 2018
                            • 69

                            #14
                            Originally posted by dimir
                            One of suggestions is to check if AlertScriptsPath is specified more than once in the config file.
                            I thought that too, I confirmed it was not in the main one and then checked to see if there were any other config set as includes (none found).
                            I suspect that the alert process is picking up the default alerts scripts path from somewhere else every other cycle. Perhaps it is hard coded somewhere?


                            Carl Slaughter

                            Comment

                            • cdslaughter
                              Member
                              • Jun 2018
                              • 69

                              #15
                              Originally posted by dimir
                              Alert manager caches some data, perhaps this is why restart helps. But will forward this to one of the devs that were working on this functionality. If you are sure there's a bug, I suggest creating an issue and continue the discussion there.

                              I will be conducting some additional tests to validate that the specific scripts I am using are not contributing to the issue so that I can provide an easily reproducible test case for your dev/QA team.

                              Carl Slaughter

                              Comment

                              Working...