Howdy,
I'm running into a problem with a simple check through a proxy. I have an simple check item for ssh that is applied with a template to a large number of machines. These checks, along with 7 active agent items are split up among proxies to avoid overloading the central server.
On some systems, this ssh check is reporting the service down when it is merely slow. I saw on a thread elsewhere that a value of 2 reported by a simple check would indicate a timeout, however, the value being returned in these cases is 0 (down). I can ssh from the proxy to the hosts reporting ssh is down, so the service is running, it is simply responding slowly due to load on the target host. Other hosts with similar load, and similarly slow to respond ssh, are reporting up.
While these items & triggers have some false positive protection, this is still causing a lot of down alerts to be sent out when they shouldn't be. Is it possible to increase the timeout that zabbix uses for these ssh checks to avoid having the service reported down when it is just responding very slowly? Other ideas on how to possibly reduce the false positive rate?
Zabbix 1.8
Thank you,
I'm running into a problem with a simple check through a proxy. I have an simple check item for ssh that is applied with a template to a large number of machines. These checks, along with 7 active agent items are split up among proxies to avoid overloading the central server.
On some systems, this ssh check is reporting the service down when it is merely slow. I saw on a thread elsewhere that a value of 2 reported by a simple check would indicate a timeout, however, the value being returned in these cases is 0 (down). I can ssh from the proxy to the hosts reporting ssh is down, so the service is running, it is simply responding slowly due to load on the target host. Other hosts with similar load, and similarly slow to respond ssh, are reporting up.
While these items & triggers have some false positive protection, this is still causing a lot of down alerts to be sent out when they shouldn't be. Is it possible to increase the timeout that zabbix uses for these ssh checks to avoid having the service reported down when it is just responding very slowly? Other ideas on how to possibly reduce the false positive rate?
Zabbix 1.8
Thank you,
Comment