I've used Nagios in the past, but never been fabulously taken with it's configuration - templates seem to be much better in v2 than v1 but it still seems just a tad hacky :-)
So, google led me to Zabbix and I was amazed I hadn't seen it mentioned before. I've not actually tried zabbix yet, just evaluating its suitability for managing a few hundred servers across the net (one machine per site).
First off, my understanding of Zabbix (the docs don't seem to do a summary of how it works) is that you have a central server which polls agentd (or agent) on each client. agentd can also send out updates from "active checks". How do you specify which checks are active (there seems to be a global parameter in agentd.conf)? This is one thing that I feel could do with a little more expanding in the docs "here is the architecture, how the pieces fit together and what possibilities this gives you".
Firstly, there seems to be no authentication or encryption of traffic between servers and agents. I could use ssh-tunnel or a VPN but I wondered if there were any plans to tighten this?
I presume triggers are implemented on the server? I have a slight concern about the load imposed on the server given how many machines we might end up monitoring. One thing I'd love to do is to have a client monitor e.g apache response time and if it grew too large for several minutes then it would alert the central server. Perhaps I'll have to run a zabbix server on each client machine to do the basic monitoring and then get the trigger to send an alert back to a central server? So the central server would basically just monitor alerts, for historical information etc you would jump onto a zabbix server running on an "edge" machine.
Most checks will probably be external scripts, I'm happy to distribute these scripts out to the clients, however I wondered how easy it was to add new checks to all clients (not played with the GUI yet) - presumably worst case is a manual tweak to the DB.
I suppose my biggest concern ATM is that we are going to find that we need some of the flexibility/complexity of nagios - e.g. it seems that there is an assumption that all keys have the same properties whereas I need e.g. CPU load and "is the RAID array okay" to be monitored on different timescales.
Looks like it's time to go, play and test :-) Thanks for listening !
Adrian
So, google led me to Zabbix and I was amazed I hadn't seen it mentioned before. I've not actually tried zabbix yet, just evaluating its suitability for managing a few hundred servers across the net (one machine per site).
First off, my understanding of Zabbix (the docs don't seem to do a summary of how it works) is that you have a central server which polls agentd (or agent) on each client. agentd can also send out updates from "active checks". How do you specify which checks are active (there seems to be a global parameter in agentd.conf)? This is one thing that I feel could do with a little more expanding in the docs "here is the architecture, how the pieces fit together and what possibilities this gives you".
Firstly, there seems to be no authentication or encryption of traffic between servers and agents. I could use ssh-tunnel or a VPN but I wondered if there were any plans to tighten this?
I presume triggers are implemented on the server? I have a slight concern about the load imposed on the server given how many machines we might end up monitoring. One thing I'd love to do is to have a client monitor e.g apache response time and if it grew too large for several minutes then it would alert the central server. Perhaps I'll have to run a zabbix server on each client machine to do the basic monitoring and then get the trigger to send an alert back to a central server? So the central server would basically just monitor alerts, for historical information etc you would jump onto a zabbix server running on an "edge" machine.
Most checks will probably be external scripts, I'm happy to distribute these scripts out to the clients, however I wondered how easy it was to add new checks to all clients (not played with the GUI yet) - presumably worst case is a manual tweak to the DB.
I suppose my biggest concern ATM is that we are going to find that we need some of the flexibility/complexity of nagios - e.g. it seems that there is an assumption that all keys have the same properties whereas I need e.g. CPU load and "is the RAID array okay" to be monitored on different timescales.
Looks like it's time to go, play and test :-) Thanks for listening !
Adrian
Comment