Ad Widget

**LenR** · 24-01-2019, 00:17

Could some of the media entries have a full path? Does the script have a reference to the /usr/lib/zabbix/ path or any environment vars? Does zabbix have the proper permissions into the files? Does the server log show that this is the config file used?

**cdslaughter** · 24-01-2019, 01:13

Originally posted by LenR

Could some of the media entries have a full path?

A: No, I tried specifying the full path, but that returned an error showing that zabbix prepends the name of the script with the "AlertScripts" path automatically.

Does the script have a reference to the /usr/lib/zabbix/ path or any environment vars?

A:No as it is not needed.. It is simply passing the variables passed ($!, $2, and $3) to a curl command.

Does zabbix have the proper permissions into the files?

A: Yes. Keep in mind that sometimes it runs the script without issue. See attached screenshot.

Does the server log show that this is the config file used?

A: Yes.

There seems to be no rime or reason for when Zabbix fails to run the script and when it works without issue. It sort of seems like depending on which of the alerter instances picks up the event determines if it fails or not.

**LenR** · 24-01-2019, 01:53

I think those would be handled by the alerter process, maybe increase log level and see if it adds any info

zabbix_server -R log_level_increase=alerter

**cdslaughter** · 24-01-2019, 18:26

Originally posted by LenR

I think those would be handled by the alerter process, maybe increase log level and see if it adds any info

zabbix_server -R log_level_increase=alerter

I tried that, but the only thing I am getting are the successful sends in the log file.
I tried increasing all the logging and comparing the results of a successful send and a failed. From what I can tell, the poller process evaluates the trigger successfully, then inserts the row into the database alerts table (so that the alerter process can pick it up). the successful alerts show another query by the alerter process that successfully returns 1 row, but on the failed ones, the same query returns 0 rows.
It is very confusing. Not sure what else I can do at this point, I cant really change the query the alerter process is using so that it always returns the correct value..

Carl

**LenR** · 24-01-2019, 18:44

Do you have more than one action that would process the same event? Maybe increase the logging level of the alert manager, I think you have to do that by pid #

**cdslaughter** · 24-01-2019, 19:25

Originally posted by LenR

Do you have more than one action that would process the same event? Maybe increase the logging level of the alert manager, I think you have to do that by pid #

I did have more than one type of alert going out for each event, but I can try setting it to just 1.
As for alert manager, I was able to increase it's logging to max, but did not see anything useful in it either.

**cdslaughter** · 24-01-2019, 19:29

my prod server is a Cent 7 running zabbix 4.0

I have tested it working with the exact same scripts and config on an older test system I have running 3.4 on ubuntu 16.04

So I know the scripts should work. I just cant seem to identify to root cause of what's going wrong here.

**cdslaughter** · 24-01-2019, 21:44

I have tried putting the full path in the media type script section, but it just errors.

I also tried setting the zabbix-server.conf

Code:

AlertScriptsPath=/

so that the full path of the script had to be provided in the media type configuration, but that did not work either.. It just used the default.

**cdslaughter** · 24-01-2019, 22:08

I did some additional testing and cloned the action that is having the issue. I modified the clone to run a command instead of sending an alert via the media-type configured.
In the remote command, I used the exact same script in the exact same location as is configured in the media-type. This way for each time that an event occurs that triggers alert via the media type, it also runs a remote command that does the same thing. This confirms that the zabbix process can access the files/directories needed.

Thus far, I can see that it executes the remote command every time without issue.

**cdslaughter** · 24-01-2019, 23:46

ok, I found the issue, it is a bug in the alerter or alert manager process.

Basically when an event occurs and an action condition evaluation results in a true, there is an entry in the actions table in the database.
Below is an example of one of these entries. as it would be added. Note that the status is set to zero. Note that in this specific example, the operation is to send a message to a user via media-type #5 which in my case is a custom media type.

Code:

[localhost zabbix40]>select * from alerts where status=0\G
*************************** 1. row ***************************
      alertid: 23082
     actionid: 30
      eventid: 1339772
       userid: 56
        clock: 1548363103
  mediatypeid: 5
       sendto: @carlslaughter
      subject: Problem: Random event test. Value is 1 1339772 {ALERT.ID}
      message: Problem started at 15:51:42 on 2019.01.24
       status: 0
      retries: 0
        error:
     esc_step: 1
    alerttype: 0
    p_eventid: NULL
acknowledgeid: NULL
1 row in set (0.00 sec)

During my testing I had created a second action that ran the same script locally as a remote command to test if it was a file permissions issue. I found that running the script as a remote command ALWAYS worked without issue this indicating that the zabbix server process DOES have the necessary access (Both read and execute) to the file. Note that THIS action ALWAYS worked successfully and never failed or had any issues. Keep in mind that it was setup using the same file in the same location with the same parameters as the custom media-type that is failing intermittently. Below is an example of an entry in the actions table using this method.

Code:

*************************** 4. row ***************************
      alertid: 23031
     actionid: 32
      eventid: 1339630
       userid: NULL
        clock: 1548360981
  mediatypeid: NULL
       sendto:
      subject:
      message: Zabbix server:/usr/local/bin/zbx-notify @carlslaughter 'This is from the remote command. Event ID is 1339630' '202719lpadaptec.mhsl.test.local Random event test. Value is 3' --api_token=(obfuscate) --slack --no-fork
       status: 1
      retries: 0
        error:
     esc_step: 1
    alerttype: 1
    p_eventid: NULL
acknowledgeid: NULL

I noticed that when an action that used the custom notification failed, I could simply update the table in the database setting status=0 reties=1 error='' that it did not automatically try and pick that up and try again. In order for zabbix to try and pick that record back up and try again, I had to reset the alerter process (kill -9 {alerter process IDs here}) or just "service zabbix-server restart". once the kill / restart was complete, the alerter process would pick the records that I had reset to "status=0 reties=1 error=''" and it would process them like normal AND do so successfully!

I will be doing some additional testing around this and possibly writing a script to automate this as a cron job. Hopefully anyone else that runs into this can use this to help them solve their issue until this gets fixed.

I would enjoy having a chance to talk to one of the devs if they care to. I used to work on a very similar where I have to troubleshoot these kinds of bugs on a very regular basis.

Carl

**dimir** · 25-01-2019, 09:13

Alert manager caches some data, perhaps this is why restart helps. But will forward this to one of the devs that were working on this functionality. If you are sure there's a bug, I suggest creating an issue and continue the discussion there.

**dimir** · 25-01-2019, 10:15

One of suggestions is to check if AlertScriptsPath is specified more than once in the config file.

**cdslaughter** · 25-01-2019, 15:51

Originally posted by dimir

One of suggestions is to check if AlertScriptsPath is specified more than once in the config file.

I thought that too, I confirmed it was not in the main one and then checked to see if there were any other config set as includes (none found).
I suspect that the alert process is picking up the default alerts scripts path from somewhere else every other cycle. Perhaps it is hard coded somewhere?

Carl Slaughter

**cdslaughter** · 25-01-2019, 15:58

Originally posted by dimir

Alert manager caches some data, perhaps this is why restart helps. But will forward this to one of the devs that were working on this functionality. If you are sure there's a bug, I suggest creating an issue and continue the discussion there.

I will be conducting some additional tests to validate that the specific scripts I am using are not contributing to the issue so that I can provide an easily reproducible test case for your dev/QA team.

Carl Slaughter

Ad Widget

Zabbix Server process not picking up complete server config file

Zabbix Server process not picking up complete server config file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment