Ad Widget

Collapse

Simple check ping and pingloss

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • deanw
    Junior Member
    • Aug 2021
    • 6

    #1

    Simple check ping and pingloss

    Hi everybody. Been seeking for help for some time and read thru the topics listed in the forum, but can't get it working. Any help would be appreciated.
    I am monitoring about 250+ devices without a Zabbix agent, just using simple checks icmpping and others. My devices should be online 24/7 but are not critical devices. Thus I want to know when one of these devices goes offline and get an alarm. I would like 2 scenarios:
    1. the device being offline for at least an hour (or it could be even more) -> I think icmpping.max(60m) could be used here?
    2. the device loosing connectivity (in case of a cabling problem, broken cable or something) -> might be icmppingloss?

    So I would like to be notified when a device is losing ping replies constantly and another notification in case of full conectivity loss. I tried with ping, pingloss, pingsec...but I can't get it working right - I am getting notifications of inaccessible hosts every few minutes, being all unclear at this much of emails I receive. If it would help, I can post my items and triggers screenshots. But maybe it would be easier to just delete all of them and start from scratch with your suggestions and help.
    Thank you very much in advance!

    BR, Dean
  • johndoe2374
    Member
    • Aug 2021
    • 80

    #2
    I'm using default "ICMP ping" template with default item keys, I think they're pretty reasonable. There's also {$ICMP_LOSS_WARN} and {$ICMP_RESPONSE_TIME_WARN} macros which you can adjust globally or per host individually. I think it's more about understanding how ICMP checks work.

    In this template device treated as unavailable after 3 unsuccessfull pings: each ping is sending 3 packets with 1000 milliseconds interval each minute, and if even one packet out of 3 is returned, ping will be successfull. So, if device becomes completely unavailable for ping for all of the last 3 item receivings (1 minute interval), default trigger will fire. Why your devices become unavailable so often? It's more question about your network. First you need to understand which parameters, delays, response times are normal for your network and which are not. It's network knowledge, people won't be able to do that for you. You can disable triggers temporarily and collect statistics for some time and then find out your thresholds.

    If you want to delay your notifications, but still fire the trigger as soon as possible, you can change step from 1 to 2 and notification will be sent after step duration, but if device will become available after like 30 minutes, this step won't do anything. Also this can help:

    Comment

    • deanw
      Junior Member
      • Aug 2021
      • 6

      #3
      Hi,
      thanks for your answer. Maybe I digged too deep into details, which might not be as important. It's about devices that aren't realy critical, that don't have to be up 24/7 and which we actually don't care of if they are offline for few days before being serviced. But because of some network issues from time to time, some packets might get lost which is acceptable. But what I would like is to get notified if one of these devices goes offline because of a malfunction - in this case the device would be offline for a long period of time (let's say for an hour or more). In this case I would call it an offline device, which should be considered as malfunction and we have to call some support to get it work again. But on the other side, we have these devices in an unfriendly environment which causes damage to the cabling or sockets and then these devices get unstable, not offline. Or in case of a network problem from the ISP side... And in that case I wouldn't get any notification. That's why I would like to use a second rule to get notified about losing packets over a longer time period - let's say for about a few hours of constantly losing packets. I hope you understand my two requirements. Thank you in advance.

      BR, Dean

      Comment

      • deanw
        Junior Member
        • Aug 2021
        • 6

        #4
        Originally posted by johndoe2374
        I'm using default "ICMP ping" template with default item keys, I think they're pretty reasonable. There's also {$ICMP_LOSS_WARN} and {$ICMP_RESPONSE_TIME_WARN} macros which you can adjust globally or per host individually. I think it's more about understanding how ICMP checks work.

        In this template device treated as unavailable after 3 unsuccessfull pings: each ping is sending 3 packets with 1000 milliseconds interval each minute, and if even one packet out of 3 is returned, ping will be successfull. So, if device becomes completely unavailable for ping for all of the last 3 item receivings (1 minute interval), default trigger will fire. Why your devices become unavailable so often? It's more question about your network. First you need to understand which parameters, delays, response times are normal for your network and which are not. It's network knowledge, people won't be able to do that for you. You can disable triggers temporarily and collect statistics for some time and then find out your thresholds.

        If you want to delay your notifications, but still fire the trigger as soon as possible, you can change step from 1 to 2 and notification will be sent after step duration, but if device will become available after like 30 minutes, this step won't do anything. Also this can help:
        https://www.zabbix.com/documentation...on/escalations
        I tried now the default ICMP ping template with default values. I get lot of triggers (email notifications) regarding response time is too high and neither one notification regarding offline devices, thus I know there are a few that are inacessible on the network. I can't figure out the right expressions I think. Attached are my items and my triggers.

        Click image for larger version

Name:	items.png
Views:	12911
Size:	15.9 KB
ID:	429638

        Click image for larger version

Name:	triggers.png
Views:	12895
Size:	17.6 KB
ID:	429639

        Comment

        • johndoe2374
          Member
          • Aug 2021
          • 80

          #5
          If your response time considered being too high it means that average response time for last 5 minutes is higher than 0.15 seconds (150 ms). Again, check actual response time values for different hosts and find out whether it's normal or not. Then adjust macros or time range (instead of 5 minutes) accordingly. You can check graphs in "some host's latest data" > "filter by name - icmp" > "graph link":
          Click image for larger version

Name:	2.jpg
Views:	12775
Size:	105.9 KB
ID:	429653
          As you can see, for this 12-hour period normal response time is below 10 millliseconds. But starting at 13:50 there's anomally, so I can adjust macro to 0.1.

          Comment

          • deanw
            Junior Member
            • Aug 2021
            • 6

            #6
            Thank you again, I'm gonna leave it now for a few hours, to get at least some data onto my graphs and then start struggling around. So, just correct me if I'm wrong - if you would now adjust your macro to 0.1 you still wouldn't get any notification or triggers, because 0.1 means 100ms, your peak response times were on the graph above about 55ms. And if you changed your macro to 0.01 you would get notified at about 13:50 when the times peaked up above 10ms. Is that correct?

            Comment


            • johndoe2374
              johndoe2374 commented
              Editing a comment
              Yep, my bad, I've meant 0.01. You've got the idea.

              Also, it's not a great idea to get e-mail notifications for each and every trigger, especially when you have lots of hosts. Is it really good idea to get like thousands of such emails every day? Probably not.
          • deanw
            Junior Member
            • Aug 2021
            • 6

            #7
            Yes, you are right. Too many mails means /dev/null automaticaly by users. It's just the case, that I'm setting up this monitoring for a group of users, who isn't meant to be using Zabbix at all, so I have to get this settings (macros) as good as possible, to let them get notified only when this is really needed. Thank you once again. I'll do some more tests and try out different options - just have to get familiar with all this expressions and the units also - what is seconds, what ms, % and all the other stuff in those expresions.

            BR, Dean

            Comment

            • deanw
              Junior Member
              • Aug 2021
              • 6

              #8
              One more question raised up. I would like to get notified daily, everyday at 8:00 about offline hosts, but still be able to lookup in Zabbix frontend the history (graphs). So it should ping the hosts every hour but only notify me on offline hosts once a day in the morning. How do I have to configure Zabbix? Where do I have to define that and/or put those 86400 seconds - is that in the update interval of the item or in the default step duration in the Action operation, maybe in the step itself? Or do I maybe have to define the Item interval flexible and enter h8 in the scheduling interval field? I'm confused with all that time options everywhere...

              My current setup is as follows:
              Click image for larger version

Name:	item.png
Views:	12982
Size:	32.7 KB
ID:	429738

              Click image for larger version

Name:	trigger.png
Views:	12830
Size:	13.2 KB
ID:	429739
              Click image for larger version

Name:	action.png
Views:	12804
Size:	38.2 KB
ID:	429740

              Last edited by deanw; 12-08-2021, 15:44.

              Comment

              • johndoe2374
                Member
                • Aug 2021
                • 80

                #9
                Read the docs, there's plenty of good examples (table about scheduling intervals):


                Don't forget to adjust triggers if they're used with an item accordingly.

                Comment

                Working...