Ad Widget

Collapse

How to setup regular "sanity checks" in Zabbix

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • IsaacA
    Junior Member
    • Mar 2022
    • 1

    #1

    How to setup regular "sanity checks" in Zabbix

    Hi All,
    I did some searching but it's kind of hard to even figure out what keywords I should use for a search for this question/how-to, so bear with me if it's been asked before:

    Running Zabbix 6.2.9 on Ubuntu 20.04.

    What I'd like to do is setup a regular "sanity check" that verifies that Zabbix is up and running and fully functional every [interval]. We get regular alerts often enough from various systems that when we don't get an alert for 4-5 days I start getting very nervous. We have had some issues. What I'd like to be able to do is just have Zabbix fire off an email every, say, 3-4 days, or maybe every Monday and Thursday or daily or something, to an address simply saying, "I'm up and running!"

    I cannot for the life of me figure out how to do this. I can setup a cronjob in the OS to monitor the Zabbix service and send an "all good" email, but I've learned that that's not good enough, a few months ago we lost Zabbix for a few days and I found out from a customer noting one of our sites was down. The webserver was down, but Zabbix hadn't notified us. On checking I found that while the Zabbix server and service were up, it just ... wasn't working. I couldn't access the dashboard, and it never peeped when I rebooted the aforementioned webserver. So what I'm looking for is something within Zabbix that sends an "all good" email so that we know that if we stop getting it, it should be checked.

    Any ideas on how to set this up in Zabbix?

    Cheers,
    Isaac
  • ISiroshtan
    Senior Member
    • Nov 2019
    • 324

    #2
    Hey.
    That is quite a complicated question to answer.
    First of all you should understand that Zabbix server would consist of 3 independent components (at least). Each component responsible for their tasks and failure of one of them might or might not impact other functions,

    You provide example where you realised that you identified one of the components - webserver. It's usually running on apache or nginx. Failure of this component will make webUI unreachable BUT it will NOT impact actual monitoring operation. Data will still be collected, trigger will still be processed and notification (if setup) will still be sent out. For all that Zabbix Server component is responsible. And that is what your cron was monitoring.

    Additionally, there is DB. Mysql, Postgresql or whatever you setup. Now failure of this component (based on my experience) would not cause the crash of Server of Frontend service. But they effectively will stop operating.


    So for Zabbix status monitoring1 you'd need to ensure normal operation of each of this components. And optimally not just services status but actually that service operates as expected.

    Comment

    Working...