Ad Widget

Collapse

Monitoring multi-process systems

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • KarmaPolice
    Member
    • Oct 2005
    • 95

    #1

    Monitoring multi-process systems

    Is there any ideas/methods/best-practices for monitoring processes which can exist in variable numbers on a particular server??

    I know Zabbix excels at whole system monitoring, and single process monitoring, but i really am stumped on how to deal with multiple concurrent processes of the same name.

    For example, I have an application which spawns a process (every process has the same binary name but different parameters after that) for every user connection.

    So on my server i might have

    proc1 id=proc@134
    proc1 id=proc@135
    proc1 id=proc@136

    At any time there can be variable numbers of users on the server, and thereby variable numbers of these processes... the information i'd like to chart/trend is as follows:

    (easy/medium/hard is how difficult i believe this info would be to gather)

    Easy:
    cumulative cpu resources
    cumulative mem resources
    cumulative network resources (thinking of using nethogs to gather this)
    count of processes

    Medium:
    average cpu resources
    average mem resources
    average network resources

    Hard:
    Top 5 process IDs by CPU for a given 1 hour interval
    By process&start-time recurring monitoring (so each new process that got started, aka every user's session) would get individually monitored for resource consumption during their session (you can see where this might be handy for a cloud based application)

    I'm looking at using Zabbix as the basis for my "system" performance monitoring demonstration to our customers (I'm the vendor for the above mentioned software) and if i can capture these metrics i'm sold... so any help you can offer would be awesome...
  • nelsonab
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2006
    • 1233

    #2
    You can use Zabbix to handle most of that. You will need to look at it from a perspective of abstracting it into small scripts which return discreet values based on an input parameter. For instance if you're looking for the number of processes you might use something like this in your script:
    ps ax | egrep "proc1 id=proc@\d+" | wc -l
    to show the total number. (You may not need the \d+ regex)

    From there you can then build your Zabbix monitoring.
    RHCE, author of zbxapi
    Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
    Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

    Comment

    • qix
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2006
      • 423

      #3
      Also, the proc.num item accepts a command line filter so you only get the results for the specified commandline. This is very usefull for java apps for instance. See the manual for details.
      With kind regards,

      Raymond

      Comment

      • kewan
        Member
        Zabbix Certified Specialist
        • Apr 2011
        • 33

        #4
        Along the lines of proc.num, there is also proc.mem, with similar parameters.
        I would also look at system.stat.
        For everything else, you could use either a UserParameter or system.run. You could feed both the output of something like "top -bn1 | head..."

        cheers

        Stefano

        Comment

        • KarmaPolice
          Member
          • Oct 2005
          • 95

          #5
          thanks for the replies thus far... i'll start looking into that stuff... let's go one step further into crazy territory...

          what if i wanted to monitor/store/track EVERY process of a given name on a server for individual cpu/mem utilization?

          has anyone done something like this?...

          there are obviously a lot of complexities to this such as:

          A) variable number of processes would mean variable number of counters
          B) processes start and stop and so you'd need a counter to be created, tracked, then put in some kind of "dead" mode so that the metrics were still viewable but were no longer captured
          C) you'd have to dynamically creates the counters as the processes were spawned...

          obviously very difficult if at all possible... but figured i'd ask if anyone has done something like this.

          Thanks

          Comment

          Working...