Ad Widget

Collapse

Timing Problems with Apache2 and PHP-FPM remote monitoring since last debian upgrade

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • wuppi
    Junior Member
    • Aug 2020
    • 6

    #1

    Timing Problems with Apache2 and PHP-FPM remote monitoring since last debian upgrade

    Dear Forum members,
    I am running Zabbix Server 5.0.2 on debian 10.9 and am monitoring a remote zabbix_agent2 Version 5.0.14 in the net, also running under debian 10.9
    Amongst other services the remote server runs an apache2-Server with php-fpm.

    The zabbix-monitoring system used to be reliable and stable since more than 9 months now but since the last
    apt-get update
    apt-get upgrade
    Zabbix reports multiple times a day a potential service interruption of the apache web server as well of the php-fpm.
    The corresponding resolve message comes latest 1 minute after the alert.

    Does anybody know how to overcome that?

    Advice is appreciated and please take into consideration that I am pretty novice in Zabbix.

    Best regards
    Wuppi
  • tim.mooney
    Senior Member
    • Dec 2012
    • 1427

    #2
    Originally posted by wuppi
    Dear Forum members,
    I am running Zabbix Server 5.0.2 on debian 10.9 and am monitoring a remote zabbix_agent2 Version 5.0.14 in the net, also running under debian 10.9
    Amongst other services the remote server runs an apache2-Server with php-fpm.

    The zabbix-monitoring system used to be reliable and stable since more than 9 months now but since the last
    apt-get update
    apt-get upgrade
    Zabbix reports multiple times a day a potential service interruption of the apache web server as well of the php-fpm.
    The corresponding resolve message comes latest 1 minute after the alert.

    Does anybody know how to overcome that?

    Advice is appreciated and please take into consideration that I am pretty novice in Zabbix.

    Best regards
    Wuppi
    It's not clear to me which system (the system running the Zabbix server or the remote system running zabbix_agentd2) was updated? Was the upgrade applied on the remote client, or on the Zabbix server?

    You may wish to also consider that Zabbix server 5.0.14 is available and should be a good upgrade from Zabbix server 5.0.2 -- it should be mainly bugfixes and is probably worth considering. Still, I don't think that's the source of your problem.

    Have you looked in /var/log/zabbix/zabbix_serverd.log on your server? Do you see messages about network connection or other loss of connectivity with the remote client?

    How is the apache web server and the php-fpm pool(s) being monitored on the remote client? Is the zabbix server connecting to them directly over the network, or is it asking the zabbix_agentd2 running on that remote client to monitor those items?

    If the zabbix_agentd2 on the remote client is actually the one gathering the items for httpd and php-fpm, are you having the zabbix_agentd2 log anywhere on the client? Since the problem happens frequently, you may want to enable logging on the client for a while (if zabbix_agentd2 plays any part in monitoring httpd/php-fpm) to see if anything obvious is being logged.

    My guess is that the connection and gathering of item data is taking slightly too long now, so that the built-in Zabbix timeout is preventing the item data from being reliably collected. That's just a guess, though. Investigating the things I've mentioned may help point you in a useful direction.

    Comment

    • wuppi
      Junior Member
      • Aug 2020
      • 6

      #3
      First, thank you very much for your input.

      >>> It's not clear to me which system (the system running the Zabbix server or the remote system running zabbix_agentd2) was updated? Was the upgrade applied on the remote client, or on the Zabbix server?
      The remote system was updated when the problem started.

      Based on your input I checked the logfile on the remote system for the zabbix_agent2 and figured indeed timepout problems like

      2021/08/16 08:08:51.901698 [101] active check configuration update from [87.139.43.187:10051] started to fail (dial tcp :0->xxx.xxx.xxx.xxx:10051: i/o timeout)
      2021/08/16 08:10:52.916439 [101] active check configuration update from [87.139.43.187:10051] started to fail (dial tcp :0->xxx.xxx.xxx.xxx:10051: connect: no route to host)

      I now set the parameter

      Timeout=8

      in the zabbix_agent2.conf file and restarted the zabbix.agent2
      I hope this is the best parameter for it.

      We will see if that solves the problem. I will let you know in a few days.


      Best regards

      Wuppi

      Comment

      • wuppi
        Junior Member
        • Aug 2020
        • 6

        #4
        I would like to inform how I was able to overcome the problem.

        Base of my solution is the assumption that there is simply an unstable connection to the remote host which causes the problem.
        All alerts were cleared within 1 minute when the next connection attempt was made.

        I therefore updated the trigger and added a tolerance of 1 minute to the trigger definiton as follows:

        {PHP-fpm Template[proc.num[,,,php-fpm].last()}=0

        was simply changed to

        {PHP-fpm Template[proc.num[,,,php-fpm].last()}=0 and {PHP-fpm Template[proc.num[,,,php-fpm].last(#1)}=0

        which says that the trigger fires just in case the current as well as the previous value is 0.

        That seems to be working as expected.

        Comment

        Working...