Ad Widget

Collapse

Number of running ps = 0 whereas the proc is running

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Kai-Kai
    Senior Member
    • Apr 2009
    • 142

    #1

    Number of running ps = 0 whereas the proc is running

    Hello all.

    I've a small strange problem. I use Zabbix to monitore the number of running processes for several processes.

    It works fine for 7 ps out of 8, but for the last one it works not. The proc is running, I can see it in "ps -Af" (solaris box), but Zabbix says : "0" for the number of this proc running.

    I think the only difference between this one and the other one is, the other one give a line in ps such as this one :
    /path/to/procname

    whereas the last one has a line such as :
    /path/to/PR_ocname @/path/to/conf_file @anotherpath

    The name of the process is something matching this : XX_xxxx where XX are uppercase, xx lower and _ is _.

    Does anybody have an idea of why it doesn't give the right number of ps running ?
    Last edited by Kai-Kai; 23-07-2009, 17:03. Reason: Problem solved.
  • richlv
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2005
    • 3112

    #2
    any chance that's a daemon written in perl or other similar language, like amavisd or so ?
    i don't remember for sure whether/which version improved upon this, but you could try syntax for <process>/<parameters>, something like "perl,,amavisd" - see key description for exact syntax
    Zabbix 3.0 Network Monitoring book

    Comment

    • Kai-Kai
      Senior Member
      • Apr 2009
      • 142

      #3
      Thanks for your answer.

      The idea is a really interesting one, but I think it's not a perl script... I think the problem can be linked with the parameters of the program.

      More infos :

      bash-3.00# ps -Af | grep NB_dbsrv
      root 762 1 0 Jun 03 ? 11:26 /usr/openv/db/bin/NB_dbsrv @/usr/openv/var/global/server.conf @/usr/openv/var/g...
      root 27936 14717 0 09:41:26 pts/1 0:00 grep NB_dbsrv


      If I try file on the program :
      bash-3.00# file /usr/openv/db/bin/NB_dbsrv
      /usr/openv/db/bin/NB_dbsrv: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped


      But in Zabbix :
      Number of running processes NB_dbsrv 08 Jun 09:15:56 0 -

      And the item is :
      Number of running processes NB_dbsrv proc.num[NB_dbsrv]

      Comment

      • Kai-Kai
        Senior Member
        • Apr 2009
        • 142

        #4
        Nobody with a new idea ?
        Alexei perhaps ?

        Thanks in advance.

        Comment

        • Calimero
          Senior Member
          • Nov 2006
          • 481

          #5
          If you run "ps" as zabbix, can you see the process you're looking for ?

          I'm not familiar with Solaris but maybe there's some sort of isolation (a la GRSEC): non-privileged users can only see their processes ?

          Comment

          • Kai-Kai
            Senior Member
            • Apr 2009
            • 142

            #6
            Yes, I can see this process in the list, even if I'm logged as zabbix.
            And this process is a part of a list of processes of the same application and all the other processes of the application are seen by zabbix.

            Until now, the only difference I've noticed is that the other processes are :
            /usr/openv/db/bin/process_name

            whereas this one is :
            /usr/openv/db/bin/process_name @/arg1 @/arg2


            :s

            Comment

            • Kai-Kai
              Senior Member
              • Apr 2009
              • 142

              #7
              Nobody with an idea ?
              I hope Alexei will see this topic and bring us his light ! ^^

              Comment

              • Calimero
                Senior Member
                • Nov 2006
                • 481

                #8
                Hacker's way of problem solving: refer to the most up-to-date documentation: the source !

                Look at the source to see how processes are searched on Solaris. src/libs/zbxsysinfo/solaris/proc.c

                I suspect that the process name (from the system's point of view) contains the arguments that's why it won't match when zabbix_agentd search the process with only it's basename. Or something like that.

                On linux for example /proc/<pid>/cmdline contains the command line with elements separated by the null character. So depending on the exact content, process name (as seen by zabbix: first field) is not always what you'd expect.

                Looks like on Solaris information is given through /proc/<pid ?>/psinfo
                You should check the content (hexdump -C ?) for the various process you want to monitor and see if there any significant differences.

                Comment

                • repudi8or
                  Junior Member
                  • Feb 2009
                  • 1

                  #9
                  ok so I have been having a similar problem with my zabbix 1.6.5 agents running on solaris 8 and solaris 9 specifically with proc.num[zabbix_agentd]
                  failing.

                  Thanks to Calimero's hint i found the following :-

                  $ ps -ef | grep [z]abbix_agentd
                  zabbix 29152 29146 0 10:37:00 ? 0:00 /admin/bin/zabbix_agentd
                  zabbix 29153 29146 0 10:37:00 ? 0:00 /admin/bin/zabbix_agentd
                  zabbix 29151 29146 0 10:37:00 ? 0:00 /admin/bin/zabbix_agentd
                  zabbix 29149 29146 0 10:37:00 ? 0:00 /admin/bin/zabbix_agentd
                  zabbix 29146 1 0 10:37:00 ? 0:00 /admin/bin/zabbix_agentd
                  zabbix 29150 29146 0 10:37:00 ? 0:00 /admin/bin/zabbix_agentd
                  zabbix 29148 29146 0 10:37:00 ? 0:01 /admin/bin/zabbix_agentd
                  zabbix 29147 29146 0 10:37:00 ? 0:00 /admin/bin/zabbix_agentd
                  $ ps -ef | grep [z]abbix_agentd|wc -l
                  8
                  $ /admin/bin/zabbix_agentd -t proc.num[zabbix_agentd]
                  proc.num[zabbix_agentd] [u|0]

                  $ strings /proc/29146/psinfo
                  Jf_,
                  zabbix_agentd.s
                  /admin/bin/zabbix_agentd
                  Jf_,
                  $

                  hmmmm... so i wonder ......
                  $ /admin/bin/zabbix_agentd -t proc.num[zabbix_agentd.s]
                  proc.num[zabbix_agentd.s] [u|9]

                  Ok so how about the same thing on a solaris 10 box (which seems to work fine)
                  sol10host $ /admin/bin/zabbix_agentd -t proc.num[zabbix_agentd]
                  proc.num[zabbix_agentd] [u|9]

                  sol10host $ strings /proc/20346/psinfo
                  zabbix_agentd
                  /admin/bin/zabbix_agentd
                  sol10host $
                  sol10host $ /admin/bin/zabbix_agentd -t proc.num[zabbix_agentd.s]
                  proc.num[zabbix_agentd.s] [u|0]

                  Ok so there is something different between sol10 hosts and older solaris hosts. Im guessing its in the field mapping of the psinfo file done by solaris/proc.c, but as im no C coder i will leave this to the devs to figure out.

                  In the meantime I have discovered if i change my trigger to proc.num[zabbix_agentd.s] for all my solaris 8 and 9 hosts the counter works correctly.

                  I suggest for solaris older than sol10 you check the name of the process u want to count in its corresponding psinfo file using strings and change your process name to match what u see there until this bug is fixed.

                  Regards Rep
                  Last edited by repudi8or; 22-07-2009, 03:29.

                  Comment

                  • Kai-Kai
                    Senior Member
                    • Apr 2009
                    • 142

                    #10
                    YOU GOT IT !!!


                    Thanks for this command : strings /proc/<pid>/psinfo. It was the solution.

                    As I'm not a great C programmer... I've tried to read and understand the problem directly in the code... but it was a bit difficult... so i've left the problem a few time because I was busy on another project...

                    I've just tried your command, and the name of the processus in psinfo is :
                    dbsrv9 instead of NB_dbsrv.

                    Thanks a lot Calimero for your precious help and thanks repudi8or for this solution.

                    Comment

                    Working...