I am senior system/network/etc engineer working in USA, and I just completed adapting Zabbix to my needs (few small data center monitoring).
Just to have a background, I maintain my own network monitoring tool snmpstat.sf.net, and use it as a management portal which integrates different systems. So Zabbix became one (very important) part of it. I used CA Unicenter, ProactiveNet, AppManager, nagios, mrtg and so on so have something to compare with.
Zabbix is deployed (I am lazy) using your Virtual Appliance (99% our systems are virtual) with added X11 gui and fixed few bugs (forgotten dom module, wrong database setting, separate data disk for all data and databases and so on). It is in production for few weeks, and few other Zabbix deployments exists in our company (1 in other data center and few in corporate network).
So, overall impression:
- Overall quality - EXCELLENT product.
- Configuration - very good with a few drawbacks.
- Performance charting - very good.
- Presentation - so/so - no overall picture (I'll show what I mean), difficult to see system status; the rest is good.
- System performance - more or less suitable with a few minor exceptions (and remember, it was VM which now monitors about 100+ servers and can definitely monitor much more).
- network monitoring - unusable for me, after snmpstat (so it will be just a good addition). So I did not tested SNMP component.
I am really impressed by this system; it is much better then most open-source behemoths I saw.
Strong sides:
- use agent!
- web based (no config files).
- templates;
- multi severity alerts and escalations;
- integrated performance and alerting (sime dumb systems count it as 2 different things, making life very interesting).
- excellent charts, screens, macros.
Now, what is weak.
1) Virtual Appliance - good idea, bad implementation.
1.1) must be stored as ovf appliance, not as VM disk images.
1.2) must be well configured. I improved it's speed 3x by recreating database and fixed import/export by adding packages (in addition I added full X11 stack because this is my requirement - any Linux must have X11 installed).
1.3) I love OpenSuSe but maybe using CentOs 5.4 is more practical for the VA purpose. Format must be, AGAIN, OVF (it is zipped and use open standard and you can install VM by import from http directly).
1.4) More and more systems became virtual today; zabbix can be used as virtual system in production installations. Maybe, it should be physical starting with 500 - 1000 monitored hosts, but it can definitely monitor 100 - 400 hosts running as VM (with a lot of items, not empty lists!).
2) Configuration.
2.1) Application concept is not well developed. It can be much better. How can I create a group of items and triggers for the file system, and then replicate them for '/', '/usr', '/var' (for example). The same about processes and many other things. It all require manual replication.
2.2) Item copy should copy triggers as well.
2.3) Item cloning should clone related triggers as well (at least allow to do it).
2.4) Full template clone should not link new template to the same hosts as old one (it really create a huge disaster sometimes).
2.5) Item data type should be known better - it is now a good adventure to create item for disk IO monitoring, then guess, are data integer or float...
2.6) Can use $1 in item name; but why can't use $1 in trigger name?? It creates a lot of fun.
2.7) No way to find all triggers with the same or similar name, except some tricks in the search filters.
2.8) Trigger dependencies - can't set up dependency for the group.
2.9) Template usage - I can change trigger severity when trigger is templated; can I change trigger threshold number? ideally I should specify: Application: FileSystem(/usr,Warning=15,Alert=1) and it should generate items for /usr with triggers from template with Warning trigger 15% and Alert trigger 1% (for example).
2.10) No way to get list of objects (processes, file systems) from agent and then configure items based on this list.
2.11) Can I add last key value into the Subject in notification? (Not obvious how)
2.12) Can I add a link to the chart for the trigger key value (extremely useful - when you got 'Cpu high , link to the cpu chart')
3) Presentation layer.
It don't make what it should. You can see events but can't see objects and their status until they really fail.
See what I mean. I would like to see something like snmpstatd ACTIVE or TOTAL or ERROR screens - they show all objects with (just all, or al active, or al with problems) and shows in a tool-bars current status such as traffic, errors, cpu and so on. See an example - I can see list of objects, their status, if something wrong - see the reason in snmpstat, but I can see only color and number in zabbix (can't click on group and expand it into the list of hosts with cpu, memory and disk op compact charts).
This is a very first impression (after a few weeks of work with zabbix), not well organized. I'll try to make more detailed report in some time, maybe using very last versions to make them more useful (problem is - it WORKS, and works well after all, so I don't have a strong reason to upgrade it immediately). Not sure if I'll have a time to work with it's insides (my dream is to integrate my management portal with it, having common user list, permissions, alerts, seen the same Active view as I have for network).
Alexei Roudnev
San Francisco Bay Area
California, USA
Just to have a background, I maintain my own network monitoring tool snmpstat.sf.net, and use it as a management portal which integrates different systems. So Zabbix became one (very important) part of it. I used CA Unicenter, ProactiveNet, AppManager, nagios, mrtg and so on so have something to compare with.
Zabbix is deployed (I am lazy) using your Virtual Appliance (99% our systems are virtual) with added X11 gui and fixed few bugs (forgotten dom module, wrong database setting, separate data disk for all data and databases and so on). It is in production for few weeks, and few other Zabbix deployments exists in our company (1 in other data center and few in corporate network).
So, overall impression:
- Overall quality - EXCELLENT product.
- Configuration - very good with a few drawbacks.
- Performance charting - very good.
- Presentation - so/so - no overall picture (I'll show what I mean), difficult to see system status; the rest is good.
- System performance - more or less suitable with a few minor exceptions (and remember, it was VM which now monitors about 100+ servers and can definitely monitor much more).
- network monitoring - unusable for me, after snmpstat (so it will be just a good addition). So I did not tested SNMP component.
I am really impressed by this system; it is much better then most open-source behemoths I saw.
Strong sides:
- use agent!
- web based (no config files).
- templates;
- multi severity alerts and escalations;
- integrated performance and alerting (sime dumb systems count it as 2 different things, making life very interesting).
- excellent charts, screens, macros.
Now, what is weak.
1) Virtual Appliance - good idea, bad implementation.
1.1) must be stored as ovf appliance, not as VM disk images.
1.2) must be well configured. I improved it's speed 3x by recreating database and fixed import/export by adding packages (in addition I added full X11 stack because this is my requirement - any Linux must have X11 installed).
1.3) I love OpenSuSe but maybe using CentOs 5.4 is more practical for the VA purpose. Format must be, AGAIN, OVF (it is zipped and use open standard and you can install VM by import from http directly).
1.4) More and more systems became virtual today; zabbix can be used as virtual system in production installations. Maybe, it should be physical starting with 500 - 1000 monitored hosts, but it can definitely monitor 100 - 400 hosts running as VM (with a lot of items, not empty lists!).
2) Configuration.
2.1) Application concept is not well developed. It can be much better. How can I create a group of items and triggers for the file system, and then replicate them for '/', '/usr', '/var' (for example). The same about processes and many other things. It all require manual replication.
2.2) Item copy should copy triggers as well.
2.3) Item cloning should clone related triggers as well (at least allow to do it).
2.4) Full template clone should not link new template to the same hosts as old one (it really create a huge disaster sometimes).
2.5) Item data type should be known better - it is now a good adventure to create item for disk IO monitoring, then guess, are data integer or float...
2.6) Can use $1 in item name; but why can't use $1 in trigger name?? It creates a lot of fun.
2.7) No way to find all triggers with the same or similar name, except some tricks in the search filters.
2.8) Trigger dependencies - can't set up dependency for the group.
2.9) Template usage - I can change trigger severity when trigger is templated; can I change trigger threshold number? ideally I should specify: Application: FileSystem(/usr,Warning=15,Alert=1) and it should generate items for /usr with triggers from template with Warning trigger 15% and Alert trigger 1% (for example).
2.10) No way to get list of objects (processes, file systems) from agent and then configure items based on this list.
2.11) Can I add last key value into the Subject in notification? (Not obvious how)
2.12) Can I add a link to the chart for the trigger key value (extremely useful - when you got 'Cpu high , link to the cpu chart')
3) Presentation layer.
It don't make what it should. You can see events but can't see objects and their status until they really fail.
See what I mean. I would like to see something like snmpstatd ACTIVE or TOTAL or ERROR screens - they show all objects with (just all, or al active, or al with problems) and shows in a tool-bars current status such as traffic, errors, cpu and so on. See an example - I can see list of objects, their status, if something wrong - see the reason in snmpstat, but I can see only color and number in zabbix (can't click on group and expand it into the list of hosts with cpu, memory and disk op compact charts).
This is a very first impression (after a few weeks of work with zabbix), not well organized. I'll try to make more detailed report in some time, maybe using very last versions to make them more useful (problem is - it WORKS, and works well after all, so I don't have a strong reason to upgrade it immediately). Not sure if I'll have a time to work with it's insides (my dream is to integrate my management portal with it, having common user list, permissions, alerts, seen the same Active view as I have for network).
Alexei Roudnev
San Francisco Bay Area
California, USA




Comment