Ad Widget

Collapse

proc.num includes zombie processes...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • vanessa
    Member
    • Oct 2024
    • 38

    #1

    proc.num includes zombie processes...

    Well.. checking if a process "myprocessd" is running with proc.num[httpd].

    The problem here is that it'll count to 1 or more if myprocessd is a zombie process. There are only a set of states that I'm interested in, obviously: running, interruptable sleep, etc .. but not zombie processes.

    The workarounds are:
    * a custom script to count how many functional myprocess processes there are.
    * make an additional item that count zombie myprocessd processes, and then use that in my trigger logic. I'll have to do this for a ton of items and triggers.

    Is there a sane way of doing this in zabbix without a lot of sillyness?
  • ISiroshtan
    Senior Member
    • Nov 2019
    • 324

    #2
    Is it normal for your systems to have zombie processes? I would hope not. And if it is indeed not - try to make one per-host check to count all zombie processes. If it is >0 -> alert to get it investigated. Easy to implement, can be implemented en-mass by adding this rule to existing wide-used template (like Linux by Zabbix agent, etc.).
    And if it's normal for your systems to have zombies... I think main issue is not monitoring

    Process count is quite a basic check for daemon status, if you need more in-depth look to ensure that process still operates as intended - it makes more sense to work on implementing actual health check of daemon operation. Doing some reads from DB, fetching some URL from web-server (with status verification), invoking some ping-type or health-check-type communication with the process.
    Would it be more work - yes. Would it yield more reliable results - also yes. Is it worth it? Depends on how critical normal operation of process is.

    Comment

    • vanessa
      Member
      • Oct 2024
      • 38

      #3
      Originally posted by ISiroshtan
      Is it normal for your systems to have zombie processes? I would hope not. And if it is indeed not - try to make one per-host check to count all zombie processes. If it is >0 -> alert to get it investigated. Easy to implement, can be implemented en-mass by adding this rule to existing wide-used template (like Linux by Zabbix agent, etc.).
      And if it's normal for your systems to have zombies... I think main issue is not monitoring

      Process count is quite a basic check for daemon status, if you need more in-depth look to ensure that process still operates as intended - it makes more sense to work on implementing actual health check of daemon operation. Doing some reads from DB, fetching some URL from web-server (with status verification), invoking some ping-type or health-check-type communication with the process.
      Would it be more work - yes. Would it yield more reliable results - also yes. Is it worth it? Depends on how critical normal operation of process is.
      Not quite true. Being able to count normally running processes (without zombie processes) is the norm in all other established monitoring systems that I know of - the problem IS the monitoring system here. It's a design flaw that doesn't make sense and is an oversight rather than a feature of any kind. It has multiple applications. Sometimes, you literally do want to count the number of processes and what you suggested ("actual health checks") are sometimes not relevant, applicable or feasable.

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4811

        #4
        Accusing tool in things that you can avoid with just reading docs...
        proc.num[<name>,<user>,<state>,<cmdline>,<zone>]
        The number of processes. Integer name - process name (default is all processes)
        user - user name (default is all users)
        state (disk and trace options since version 3.4.0) - possible values:
        all (default),
        disk - uninterruptible sleep,
        run - running,
        sleep - interruptible sleep,
        trace - stopped,
        zomb - zombie
        cmdline - filter by command line (it is a regular expression)
        zone - target zone: current (default), all. This parameter is supported on Solaris only.
        What do we see here? Default is counting ALL, no matter of state... if you want to count only running use proc.num[httpd,,run]. It is not a design flaw, that you don't like how default behaviour is defined...

        Comment

        • vanessa
          Member
          • Oct 2024
          • 38

          #5
          Originally posted by cyber
          Accusing tool in things that you can avoid with just reading docs...
          proc.num[<name>,<user>,<state>,<cmdline>,<zone>]
          The number of processes. Integer name - process name (default is all processes)
          user - user name (default is all users)
          state (disk and trace options since version 3.4.0) - possible values:
          all (default),
          disk - uninterruptible sleep,
          run - running,
          sleep - interruptible sleep,
          trace - stopped,
          zomb - zombie
          cmdline - filter by command line (it is a regular expression)
          zone - target zone: current (default), all. This parameter is supported on Solaris only.
          What do we see here? Default is counting ALL, no matter of state... if you want to count only running use proc.num[httpd,,run]. It is not a design flaw, that you don't like how default behaviour is defined...
          I'm not accusing anything for anything. I'm simply stating things as they are.

          You're just repeating what I said: ALL includes the zombie processes.

          A non-defunct program is in one of three states: run,disk,sleep.
          A defunct program (program is dead) would be in the Z state potentially indefinitely (and that's another problem).

          If you'd like to check the number of active httpd processes, and you have 100 of them running and you use proc.num[httpd,,run], then you'd get 0 if all of them are in a waiting state while waiting for a connection or momentarily are waiting for a disk read or something.

          There are legit reasons to check the number of running processes of something.

          I'm trying to find out if there's a better solution than workaround such as creating two more extra items for each process (one for counting zombies and one to calculate items).
          Last edited by vanessa; 18-12-2024, 02:49.

          Comment

          • vanessa
            Member
            • Oct 2024
            • 38

            #6
            If anyone stumble upon this thread while researching, at the time of writing you can't use the agent2 to only count non-defunct or non-finished/halted processes, i.e. healthy active processess so to speak.

            (I should clarify that this thread is about the linux agent2)

            Source code of proc.num[]: https://github.com/zabbix/zabbix/blo...o/linux/proc.c

            First, the state parameter to proc.num[] is read as a string (is never parsed as an array or something you can combine with pipes or anything like that). That string only allows you to pick one state (or all att once) to count:

            Code:
            if (NULL == param || '\0' == *param || 0 == strcmp(param, "all"))
                    zbx_proc_stat = ZBX_PROC_STAT_ALL;
                else if (0 == strcmp(param, "run"))
                    zbx_proc_stat = ZBX_PROC_STAT_RUN;
                else if (0 == strcmp(param, "sleep"))
                    zbx_proc_stat = ZBX_PROC_STAT_SLEEP;
                else if (0 == strcmp(param, "zomb"))
                    zbx_proc_stat = ZBX_PROC_STAT_ZOMB;
                else if (0 == strcmp(param, "disk"))
                    zbx_proc_stat = ZBX_PROC_STAT_DISK;
                else if (0 == strcmp(param, "trace"))
                    zbx_proc_stat = ZBX_PROC_STAT_TRACE;
            The processes on the machine are iterated once, and each process is "++counted" unconditionally if the state parameter was "all" or if its state is the one you chose:

            Code:
                if (ZBX_PROC_STAT_ALL == zbx_proc_stat)
                    return SUCCEED;
            ....
                    switch (zbx_proc_stat)
                    {
                        case ZBX_PROC_STAT_RUN:
                            return ('R' == *p) ? SUCCEED : FAIL;
                        case ZBX_PROC_STAT_SLEEP:
                            return ('S' == *p) ? SUCCEED : FAIL;
                        case ZBX_PROC_STAT_ZOMB:
                            return ('Z' == *p) ? SUCCEED : FAIL;
                        case ZBX_PROC_STAT_DISK:
                            return ('D' == *p) ? SUCCEED : FAIL;
                        case ZBX_PROC_STAT_TRACE:
                            return ('T' == *p) ? SUCCEED : FAIL;
                        default:
                            return FAIL;
                    }
            Workaround suggestions if you need what I need:
            * Define a remote command in agent2.conf, execute something in the shell to do the work.
            * Make a calculated item: create an item for all the "really running" states and add them together
            * Accept the limitation and cross your fingers (depends on how critical proc.num is to you)
            * Patch agent2 and send a pull request (would take some time, need some refactoring to be a good idea as to not bloat the code and documentation.. and need testing).. potentially to never get approved. I'd do this if I had the time.

            I'm not sure what I'm going to do yet, but I might post the solution here.
            Last edited by vanessa; 18-12-2024, 04:11.

            Comment

            • vanessa
              Member
              • Oct 2024
              • 38

              #7
              Final note. There are special cases where zombie processes may linger and become a critical problem (when they trick the monitoring system that there are active processess running). You are most likely do not need what i need here and can go with the default value. My usecase here applies to complex systems with parent processes not doing what they should. Usually, running processes in the background shouldn't keep zombies ingering around.

              Comment

              • cyber
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • Dec 2006
                • 4811

                #8
                Fine, I interpreted it badly, my mistake, happens...

                You can always suggest a change (select multiple states at once) https://support.zabbix.com/projects/ZBXNEXT or consider it as a bug and report it as one... https://support.zabbix.com/projects/ZBX

                Comment

                Working...