Ad Widget

Collapse

Zabbix architecture for multiple process monitoring in a large environment

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Sa2015
    Junior Member
    • Jan 2015
    • 2

    #1

    Zabbix architecture for multiple process monitoring in a large environment

    Hello, everyone.

    I have a quite large environment with 30k+ processes running. Instead of monitoring each physical machine's status (which is already being done in a separate way), I would like to setup a Zabbix separately to monitor each process and recover (manually) in case of failure ASAP.

    The problem that I am stuck on right now is that I have several processes running on the same machine and I would like to monitor each process using Zabbix agent. As far as I understand, I would need to setup a Zabbix agent for each process that I am monitoring, and that would be impractical as I would like to reduce the load on the physical machines (for obvious reasons).
    (**Utilizing Zabbix agent is not a requirement, though)
    Also, I would like to register each process on the Zabbix server as a separate host, so I cannot just put one Zabbix agent per machine and let them deal with checking all the processes within that machine.

    Is there any way that I can solve this problem?
    (Plugin development is OK, but one requirement that I have for my environment is that I should use the default packages for agent and server)

    My requirements essentially are
    - Utilize Zabbix default packages
    - Register each process in the Zabbix server as a different host;
    - The monitoring data should flow from Zabbix agent (or something with an analogous role) to the Zabbix server; and
    - The interval between checks does not to have too short; a few minutes (<5min) is acceptable.

    (By the way, the Zabbix version I'm using is 2.2.)

    Any ideas and comments are welcome.
    Thank you very much!
    Last edited by Sa2015; 08-01-2015, 09:39. Reason: grammar
  • bagni
    Senior Member
    Zabbix Certified Specialist
    • Mar 2012
    • 164

    #2
    Hi,
    if you have 30K items with 5m polling frequency that mean a teorical 100 nvs, so the zabbix server is not so under pressure.

    You don't need to setup a Zabbix Agent for each process, but for each host.
    Every hosts would have an item for any process, you could use:
    • proc.mem[<name>,<user>,<mode>,<cmdline>]
    • proc.num[<name>,<user>,<mode>,<cmdline>]

    to monitor the memory usage or the presence of process.

    Configure 30K items is time consuming, you can think to use the LLD and intem/trigger prototype.

    Comment

    • timbo
      Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2013
      • 50

      #3
      In a typical implementation you would have each process setup as an Item under a Host (typically a physical/virtual machine) - as bagni suggests. In this instance I imagine LLD would automate the process of creating Items significantly (though I haven't used LLD myself yet), you will need an Agent on the server to use LLD (I believe). In this is particular case I would avoid installing multiple agents on the one server.

      But if your requirements stipulate you need each process to be it's own Host you could use the Zabbix API to automate the creation of the hosts (though the API can be rather tricky).

      You do not need to use the Agent to send Process data to the Zabbix server. You could script something with the Zabbix Sender:


      The Zabbix Sender allows you to specify which Host and Item should receive which data regarding any particular process.

      Potential Steps:
      Create Hosts:
      - Manually via Web Frontend
      - Automated via API
      - Network discovery (will find machines on the network, would not create Hosts as Process names)

      Create Items:
      - Manually via Web Frontend
      - Automated via API
      - LLD (If you use Zabbix Agents)

      Collect data and send to Zabbix:
      - Agent installed on servers (see bagni's post)
      - Script in conjunction with Zabbix Sender

      Recovering the process:
      - Manually, as you have said is fine
      - For automated recovery you would need the Zabbix Agent installed, and you would need to enable "EnableRemoteCommands" in the Agent config file. When Zabbix Server detects a specific Process has failed it can send the Agent a command to kill/restart the Process (say by executing a script). But if you're happy manually restarting the Process, and you're happy with using Zabbix Sender, then there is no need to install the Zabbix Agent.


      This is all theoretical, so please do your own research. (I also don't have access to my Zabbix install while I'm typing this)


      So, maybe you can manually setup a few Host/Items first to test the waters:
      Typical setup:
      1. Create a Host: Server1
      2. Create Items in that Host for each Process and Data Type you want to monitor. E.g.

      Active/Passive Items (using Zabbix Agent):
      Item Name | Item Key | Value
      cmdexeMem | proc.mem[process] | 23452
      cmdexeNum | proc.num[process] | 342

      sqlexeMem | proc.mem[process] | 23452
      sqlexeNum | proc.num[process] | 342

      Zabbix Trapper Items (using Zabbix Sender):
      Item Name | Item Key | Value
      cmdexeUp | process[cmd.exe,'up'] | 1
      cmdexeUptime | process[cmd.exe,'uptime'] | 3600
      cmdexePID | process[cmd.exe,'pid'] | 954

      sqlexeUp | process[sql.exe,'up'] | 1
      sqlexeUptime | process[sql.exe,'uptime'] | 3600
      sqlexePID | process[sql.exe,'pid'] | 954

      OR

      Non-Typical Setup:
      1. Create a Host: cmdexe
      2. Create Items within that Host for each Data Type you'd like to collect.

      Zabbix Trapper Items (using Zabbix Sender):
      Item Name | Item Key | Value
      Up | process['up'] | 1
      Uptime | process['uptime'] | 3600
      PID | process['pid'] | 954

      OR (simpler Item Keys)

      Item Name | Item Key | Value
      Up | processUp | 1
      Uptime | processUptime | 3600
      PID | processPID | 954


      I honestly think the "typical" method is cleaner and easier to administer. (aka. create a Host for each Server, then use LLD to create an Item per process you want monitored in each Host)


      First stage done!

      With the above done, hopefully you're now collecting data about the status of your processes. You should be able to eyeball these and react manually. But you'd obviously rather people be notified when there's a problem rather than stare at the Zabbix Dashboard all day.

      That's when you need a Trigger to fire when an Item matches a specific criteria (i.e. Process is down). This Trigger can fire off an "Action" - the Action can be an Email, SMS, or a Zabbix Agent "RemoteCommand" (allowing you to automate the recovery of the Process).

      Though as you mentioned you'd be happy manually recovering these processes, you'd probably be happy with something as simple as an Email notification.

      Hope this helps!

      -Timbo
      Last edited by timbo; 09-01-2015, 05:18. Reason: Clarification

      Comment

      • Sa2015
        Junior Member
        • Jan 2015
        • 2

        #4
        Thank you very much bagni and timbo for your ideas!
        Your comments sure gave me a lot of food for thought! Thank you!

        Actually I already have an automated way of creating the necessary host and item information for each process in the server using the API, but the architecture is as I said in my first post -- 1 process is modeled in the Zabbix server as 1 host... And that doesn't work with Zabbix agent unless I start one agent per process.
        Also, If possible I would like to get free from the host and item generation code that I wrote, because even though it works fine the importing process takes a lot of time. But if it's not possible, then I guess I can live with that.


        So, right now what I'm trying to do is to re-design my architecture to model 1 host as 1 physical server and use LLD, but my custom LLD code doesn't seem to work well.
        When an agent self-registers, after it is added as host and applied a template with the custom LLD rules I created, after several seconds the discovery rules it inherits from its templates shows as "Not supported (ZBX_NOTSUPPORTED)"... And that's where I'm stuck now.

        But I think that since I already got the code for host creation, as timbo suggested I will try my way out with Zabbix sender instead of agent... Especially because I need to be able to run scripts on each process separately (sometimes even if it is not down), and that would only be possible on my current architecture (1 process = 1 host). If I change it to 1 host = 1 physical machine, I guess that my processes will show only on the trigger window when they're down...

        Anyways, I'm still working on it! Thanks a lot for the ideas, they were really helpful!
        And if you have new thoughts on what I wrote I would be very happy to hear them Thank you so much!

        Comment

        Working...