Ad Widget

Collapse

Continuous CPU utilisation monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dusanj
    Junior Member
    • Jan 2023
    • 3

    #1

    Continuous CPU utilisation monitoring

    Hi everyone, I am a complete newbie to Zabbix with no experience and I am only just learning using it. My current task was to come up with a trigger that would go off once the CPU usage of over 80 % reached 30 minutes in the day. I read through the documentation and came up with
    count(/*** host name ***/system.cpu.util,1d,"ge",80)>=30
    This trigger works exactly as it should, but it is not exactly what I was asked for. What we need is this: if possible, we need Zabbix to constantly monitor the CPU usage on the host and locally store the data and then an item with a set interval that would read this updated information and based on that update its value. In particular we want to monitor the time when CPU usage is 80 % or more. I would have an item with an update interval of 15 minutes. That would each time read the locally stored data about the length of time when the CPU was utilised over 80 %. So, for example, starting at midnight, the initial value for the day would be 0. Then during the first 15 minutes of the day the CPU utilisation of 80+ % would be 50 seconds, hence after the first update of the Item its value would be 50 seconds. If in the next 15 minutes the 80+ % utilisation was for 30 seconds, the Item's value after the next check would show 1 minute 20 seconds and so on. This value should always reset to zero at midnight. And then based on that I would create a trigger that would go off when this value reached 30 minutes, that should be fairly easy to make, but the first part is the one that I struggle with.

    If anyone could tell me whether this is even possible and if so, provide some help, I would be very grateful. Thank you all in advance.
  • Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1781

    #2
    For me this sounds like a case for an external script and a trapper item. That script would be constantly running on the monitored host and collecting the current CPU level, and counting the time the CPU level is high enough. When the time threshold is exceeded, it would send the trapper item value (like "1") to Zabbix to activate the trigger. It could also send a "reset" value (like "0") at midnight if you need to reset the trigger automatically instead of manual close.

    If you are using the Zabbix-collected CPU value, you need to decide the interval value that provides you a "continuous" view of the CPU utilization and then figure out the trigger, maybe with count() as you mentioned, using the time shift (https://www.zabbix.com/documentation...ion#time-shift) to get it to count values during today only (not just during the last 24 hours), based on the interval you selected. For example, with 1m interval the count should obviously be >=30, but with 10s interval you are looking for count >=180.

    Markku

    Comment

    • dusanj
      Junior Member
      • Jan 2023
      • 3

      #3
      Thank you for your advice, I eventually managed to discover trapper items while working on another thing and, like you say, that seems to be the way to go. However, after some clarification the task I had was slightly different to what I understood originally. It did seem strange to me to monitor all the times that the CPU is used over 80 % and sum that time, it's quite logical that the CPU is to be used and thus its load will sometimes go all the way up to 100 %, I wouldn't call that a problem. Turns out I was correct and "slightly" misunderstood what has been asked off me. In the end the task was as follows: we have the CPU utilisation checks set with a 1 minute interval and we want a trigger that sets off when the CPU load is over 80 % for at least 30 consecutive minutes. So, in the end, it was just a case of a slight alteration to my original trigger:
      count(/*** host name ***/system.cpu.util,#30,"ge",80)=30
      This is working and I'd say is a good indicator of a problem, because if you have a one-minute interval of that check and you still manage to get the CPU usage over 80 % thirty times in a row, there might be an issue.
      Anyway, like I said, my original question was, in the end, not what we were after and I have now managed to resolve the real one.
      Thank you nevertheless for your help.

      Comment

      • Markku
        Senior Member
        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
        • Sep 2018
        • 1781

        #4
        Thanks for getting back, sounds fine.

        For the CPU alert I would use a simpler one:

        min(xxx,30m) >= 80

        (= triggers when all the CPU values within last 30 minutes are at least 80, regardless of the item interval)

        avg() is also usually useful because even a single below-80 value in the min() trigger will prevent it from triggering.

        Markku

        Comment

        Working...