Hello,
I work with a large zabbix server (5.0) monitoring hundreds of hosts. Among those are two, on which I have the issue that their userparameter graphs intermittently have "no data".
The interruptions occur at different times across different parameters and hosts and the other zabbix data isn't interrupted at those times. The gaps are a few minutes long every few hours (the parameter updates every 2 minutes), they are more than 1 request long.
The hosts are running Zabbix Agent 5.2.2 and are configured the same as two other hosts (part of the same cluster) which have no such issues.
I suspect that the issue might be load-dependant as two servers out of four are going to be active at any one time in the cluster. I've tried changing the userparameter scripts in such a way that they always return data deterministically (the scipt saves the result to a temporary file and reads that file when run, while running the update in the background), but this had no effect. It's a systemctl lookup and I've measured it to take about 0,05 seconds, well below the 3 second limit.
I am puzzled regarding what could be causing this and would appreciate any tips you might have regarding what to try and/or how to debug this issue further.
LP,
Jure
I work with a large zabbix server (5.0) monitoring hundreds of hosts. Among those are two, on which I have the issue that their userparameter graphs intermittently have "no data".
The interruptions occur at different times across different parameters and hosts and the other zabbix data isn't interrupted at those times. The gaps are a few minutes long every few hours (the parameter updates every 2 minutes), they are more than 1 request long.
The hosts are running Zabbix Agent 5.2.2 and are configured the same as two other hosts (part of the same cluster) which have no such issues.
I suspect that the issue might be load-dependant as two servers out of four are going to be active at any one time in the cluster. I've tried changing the userparameter scripts in such a way that they always return data deterministically (the scipt saves the result to a temporary file and reads that file when run, while running the update in the background), but this had no effect. It's a systemctl lookup and I've measured it to take about 0,05 seconds, well below the 3 second limit.
I am puzzled regarding what could be causing this and would appreciate any tips you might have regarding what to try and/or how to debug this issue further.
LP,
Jure