Ad Widget

Collapse

New user coming from Nagios/Cacti

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • clahti
    Senior Member
    • Jan 2007
    • 126

    #1

    New user coming from Nagios/Cacti

    Hello all:

    Sorry this is a bit long winded...so I have been a Nagios user for a long time and am (shiny) brand new to Zabbix. I will try not to bore you with Nagios implementation details but some things about Zabbix are not patently obvious to someone who is a pretty hard-core Nagios type. Zabbix looks pretty awesome from a frontend and visualization standpoint and seems to bring all the functionality of Nagios, perfParse, Cacti and probably a few more things together under one system. I am trying to map certain concepts from Nagios to Zabbix, here are my questions:
    • Service Checks - Nagios is great for monitoring service availability, like dns, mail, ftp, you name it. The check is performed from a client perspective via a plugin so that even if the service is running on the host, if the nagios plugin can't interact with the service then it returns warning or critical. Zabbix seems geared toward getting metrics *from* individual hosts like disk, cpu, and if mysql or apache is running, but how do you approach Zabbix to say "check to see if this website is available on that remote host"?
    • Host Checks - Nagios checks the availability of a host when a "service" goes unavailabe and if the host is down then it surpresses notifications for all the services associated with that host until it comes back. If I setup Zabbix triggers for a bunch of things on a host, will I get them all if the host itself is down? Nagios can check host level items like disk space, cpu, etc. but the *easy* performance metrics and graphing are nonexistent. Zabbix seems to do this fantastically.
    • Parent/Child Hierarchies - when defining a well planned Nagios installation you define parent child relationships between devices, i.e. a server is behind a switch so if the switch goes down don't check or alert on the server until the switch comes back. Service checks can also have dependencies, i.e. a web server uses a mysql backend on another server, so if that mysql is down don't check or alert on the web service until the mysql comes back...
    • Nagios Plugins - there is a great mindshare of plugins for Nagios that do everything under the sun. Rather than roll new solutions is there any way to reuse the nagios plugins to perform checks? For example I have a dns server, and I want to query it for a particular record and return warning if a response time is "x" threshold, critical if response time is "y" threshold, or critical if the record is not found or does not match a predefined value. The check_dns plugin can do all that, and that is actually a minor example.
    • Notification and Escalation - Nagios has escalations based on conditions. For example, check_dns 5 times over a period of 5 minutes before alerting. If an alert condition is still present follow this chain, alert sysadmins group. If alert goes unacknowledged for 1 hour then alert sysadmins AND 2nd level support. Does Zabbix have this kind of escalation?
    • Reports - How do you create reports in Zabbix? For example I have a weekly rollup report that shows is a list hosts and their associated services that I kick back to management, it just has columns for the various states that host or service was over dd/dd/dddd - dd/dd/dddd period. The columns are percentages of up, down, warning, or critical states. I cannot see how to make something analogous in Zabbix
    • Zabbix Templates - so I create a host and assign it the Unix_t template which associates a bunch of items with that host. Some are applicable, some are not. how do I delete the ones that are not? for instance not every host runs imap server Once I create a host with a set of items I like, can I clone that entry? This is pretty easy to do in Nagios using Monarch.


    Nagios in of itself is reasonably hard to configure, I would expect any capable tool (including Zabbix) to have some learning curve. I really like Zabbix so far since it goes beyond point in time monitoring and handles metrics over time very well. Cacti can do this as a companion to Nagios but it seems to me that your monitoring and metric gathering should happen at the same time to reduce network load, and your web frontend looks great too! I would like to replace my Nagios/PerfParse/Cacti with Zabbix if the above functionality can be reasonably remapped. I am sure I will have a whole bunch more questions once I give this more thought

    Thanks for your time.

    /Chris
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    I never used Nagios (why, if there is ZABBIX?! ) , so please correct me if I say something incorrect.
    Originally posted by clahti
    Service Checks - Nagios is great for monitoring service availability, like dns, mail, ftp, you name it. The check is performed from a client perspective via a plugin so that even if the service is running on the host, if the nagios plugin can't interact with the service then it returns warning or critical. Zabbix seems geared toward getting metrics *from* individual hosts like disk, cpu, and if mysql or apache is running, but how do you approach Zabbix to say "check to see if this website is available on that remote host"?
    ZABBIX does both agent and agent-less monitoring. There are several types of checks ZABBIX is able to perform, and Simple Check is one of them. Simple Checks (telnet, http, ldap, ssh, etc) do not require agents running on a monitored host and they are performed by ZABBIX Server itself.

    Originally posted by clahti
    Host Checks - Nagios checks the availability of a host when a "service" goes unavailabe and if the host is down then it surpresses notifications for all the services associated with that host until it comes back. If I setup Zabbix triggers for a bunch of things on a host, will I get them all if the host itself is down?

    Parent/Child Hierarchies - when defining a well planned Nagios installation you define parent child relationships between devices, i.e. a server is behind a switch so if the switch goes down don't check or alert on the server until the switch comes back. Service checks can also have dependencies, i.e. a web server uses a mysql backend on another server, so if that mysql is down don't check or alert on the web service until the mysql comes back...
    No, you won't. You may define dependencies on a trigger level. For example, if service S1 depends on service S2 (host, cpu load, disk space, whatever), then you may define that trigger S1 depends on S2. In this case if both services are down, you'll get only one notification about S2.

    Network dependencies, service dependencies, resource dependencies - everything can be configured this way.

    Originally posted by clahti
    Nagios Plugins - there is a great mindshare of plugins for Nagios that do everything under the sun. Rather than roll new solutions is there any way to reuse the nagios plugins to perform checks? For example I have a dns server, and I want to query it for a particular record and return warning if a response time is "x" threshold, critical if response time is "y" threshold, or critical if the record is not found or does not match a predefined value. The check_dns plugin can do all that, and that is actually a minor example.
    I'm pretty sure Nagios plugins can be easily integrated with ZABBIX. However going this way you will loose lots of functionality ZABBIX provides:

    - won't be able to see graph of dns response time
    - no trending
    - no complex conditions like "dns1 server response time is below warning level within last 15 minutes and dns2 server is down", etc
    - centralized configuration
    - performance. ZABBIX native checks and agents do not use Perl, Python, bash, whetever scripts to get performance/availability data
    - perhaps other things as well

    Originally posted by clahti
    Notification and Escalation - Nagios has escalations based on conditions. For example, check_dns 5 times over a period of 5 minutes before alerting. If an alert condition is still present follow this chain, alert sysadmins group. If alert goes unacknowledged for 1 hour then alert sysadmins AND 2nd level support. Does Zabbix have this kind of escalation?
    ZABBIX does not support this kind of escalation yet. However it is in our TODO list and we have a great chance to have it implemented in coming 1.4.

    Currently one may use different conditions for a different group of people. Say, one condition to notify sysadmins, other to notify sysadmins and 2nd level support. It is not true escalation but it works.

    Originally posted by clahti
    Reports - How do you create reports in Zabbix? For example I have a weekly rollup report that shows is a list hosts and their associated services that I kick back to management, it just has columns for the various states that host or service was over dd/dd/dddd - dd/dd/dddd period. The columns are percentages of up, down, warning, or critical states. I cannot see how to make something analogous in Zabbix
    Here I see big advantage of ZABBIX over other products. All information (configuration, data, hsitory, trends, everything) is stored in a SQL database. ZABBIX users are free to use Crystal Reports, Excel, you name it, and create any custom reports.

    There are several standard reports ZABBIX provides and we are adding more and more.

    Originally posted by clahti
    Zabbix Templates - so I create a host and assign it the Unix_t template which associates a bunch of items with that host. Some are applicable, some are not. how do I delete the ones that are not? for instance not every host runs imap server Once I create a host with a set of items I like, can I clone that entry? This is pretty easy to do in Nagios using Monarch.
    Just disable it! On a related note, next beta release already supports many-to-many relationaship between hosts and templates, this will make maintenance of large ZABBIX setups much much easier.

    Originally posted by clahti
    Nagios in of itself is reasonably hard to configure, I would expect any capable tool (including Zabbix) to have some learning curve. I really like Zabbix so far since it goes beyond point in time monitoring and handles metrics over time very well. Cacti can do this as a companion to Nagios but it seems to me that your monitoring and metric gathering should happen at the same time to reduce network load, and your web frontend looks great too! I would like to replace my Nagios/PerfParse/Cacti with Zabbix if the above functionality can be reasonably remapped. I am sure I will have a whole bunch more questions once I give this more thought
    Feel free to ask! ZABBIX community is very friendly. I'm sure you'll find excellent support here. Take care and keep us informed about any progress!
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • johanpre44
      Member
      • Apr 2006
      • 40

      #3
      Dependancies?

      The parent/child dependancies of nagios does make more sense (to me). But I also understand the value of the dependancies in Zabbix which is that you actually have a dependancy on an item level and not just host level.

      I have these issues with the zabbix dependancies though:
      1. I have to go through a huge/unsorted list to try and get the trigger dependancy I want to add (I currently have 1039 triggers active)
      2. I cannot add dependancies to triggers on a host level if those triggers are associated to the host from the template (and I don't actually want to create all those triggers manually)

      Will this be different in the new Zabbix 1.4?

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        Yes, both issues will be addressed in 1.4. The issue #1 is already fixed in 1.3.x.
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • clahti
          Senior Member
          • Jan 2007
          • 126

          #5
          Making Progress

          Ok, so the more I dig into Zabbix the more I like it . So far I downloaded and installed 1.3.2 and setup the zabbix server and added the server as a client with the ip 127.0.0.1. I spent alot of time exporting and re-importing all of the templates to break them up into much more logical groups such as:

          app_ftp
          app_http
          app_imap
          app_mysql
          app_nntp
          app_pop
          app_smtp
          app_ssh
          stat_disk_lgcl_BASE
          stat_disk_lgcl_export
          stat_disk_lgcl_home
          stat_disk_lgcl_opt
          stat_disk_lgcl_tmp
          stat_disk_lgcl_usr
          stat_disk_lgcl_var
          stat_disk_phys_hda
          stat_disk_phys_hdb
          stat_disk_phys_hdc
          stat_disk_phys_hdd
          stat_disk_phys_sda
          stat_disk_phys_sdb
          stat_disk_phys_sdc
          stat_disk_phys_sdd
          stat_host_linux
          stat_memory
          stat_network_eth0
          stat_network_eth1


          That way when I add a host I can pick and choose the applications and hardware groups that match. Is this a good approach? This prevents me from having to disable alot of stuff for each host. Things seem to be working, triggers and actions, EXCEPT the following items show as "Unknown":

          version[mysql]
          mysql[uptime]
          mysql[qps]
          mysql[threads]
          mysql[slowqueries]
          mysql[ping]
          io[disk_io]
          io[disk_wblk]
          io[disk_rblk]
          disk_write_ops5[sda]
          disk_write_ops1[sda]
          disk_write_ops15[sda]
          disk_read_ops5[sda]
          kern[maxfiles]
          kern[maxproc]

          I enabled the default Mysql lines in the zabbix_agentd.conf and looked at the logs, there is no errors, is there something more I need to do here? Mysql is running on the host and I can execute the commands on the box that the agent would be executing successfully. How do I test the agent? Also it looks like none of the physical level I/O stuff is working...I have 2 SATA drives in this box, does the agent support this?

          FINALLY, could you give an example of how you would setup a simple check? Let's use the following requirements:

          dns ip: 192.168.1.2
          dns record to check: somehost.example.com
          dns answer expected: 192.168.1.100
          data item: collect dns response time
          trigger: is dns alive?
          trigger: is the record wrong or non-existent?

          Based on a detailed example of the above I can then adapt this for other services Thanks!

          Comment

          • clahti
            Senior Member
            • Jan 2007
            • 126

            #6
            Not getting eth0 stats either

            I am not getting eth0 stats either, the following also say Unknown:

            netloadin15[eth0]
            netloadin1[eth0]
            netloadin5[eth0]
            netloadout15[eth0]
            netloadout1[eth0]
            netloadout5[eth0]

            Should I be using version 1.1.4 instead of 1.3.2 (beta)???

            Comment

            • bbrendon
              Senior Member
              • Sep 2005
              • 870

              #7
              Alexei, why are you torturing these nice people with old item keys?

              The item names you're listing were deprecated somewhere in the 1.1alpha releases.

              Check the docs for new item key names. You can also search the groups for migration information. Ahh. an old wiki page might be of use!
              http://www.zabbix.com/doku/doku.php?....1beta5_syntax

              Sorry, I haven't used 1.3 series yet. Good luck with your zabbix.
              Unofficial Zabbix Expert
              Blog, Corporate Site

              Comment

              Working...