Ad Widget

Collapse

Help on monitoring memory spike

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Jeep
    Junior Member
    • Aug 2025
    • 7

    #1

    Help on monitoring memory spike

    Hi all,
    Glad to enter in the Zabbix world and thanks to all the comunity for this nice product.

    I'm creating a monitoring solution for memory spike and my question about the media template: I receive a message with common information: problem name, start time, duration time, resolution time, ecc... i wuould like to add the spike duration and eventually how many spike during the recovery expression.

    Actually my trigger is:

    Code:
     Problem expr:[INDENT](
    last(/CUSTOM - Windows - Host Importanti/vm.memory.util) > {$LIMIT}
    )[/INDENT]
     Resolution expr:[INDENT](
     max(/CUSTOM - Windows - Host Importanti/vm.memory.util, 10m) < {$LIMIT}
    ) [/INDENT]
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4806

    #2
    It will never tell you, was it one long spike or multipe short ones... it is just the nature of the expression, fire at first breach and rearm after 10 minutes of no-breach... but if you get a 15m duration event... was there one 5 minute spike or two 1m spikes with 3m between them??... you never know without specifically looking up values and their dispersion over time (ie looking at graph)...

    Comment

    • Jeep
      Junior Member
      • Aug 2025
      • 7

      #3
      I worked a bit on the trigger studying supported function, i found interesting the count() function, but at the end i
      think we will follow another way...
      We will have different kind of trigger for the varius component and fondamentally for the Memory we will have:
      • Very high usage:
      ◦ severity: Critical
      ◦ problem:
      ▪ avg(vm.memory.size[pused], 10m) => 95%
      ◦ resolution:
      ▪ max(vm.memory.size[pused], 5m) < 85 % (will follow fine tuning)
      • High alert spike:
      ◦ severity: High
      ◦ problem:
      ▪ last(vm.memory.size[pused]) > 95
      ◦ resolution:
      ▪ max(vm.memory.size[pused], 5m) < 85 % (will follow fine tuning)
      ◦ dependant on: Very high usage

      (Instead of hardcoded value for time and level i will set macro to easily tune each host)

      I think that something interesting can be achieved also with the usage of 'count()' and 'calculated item' to
      adeguate the dataset and extract more information.

      For example if you count number of items above 90% from the start of the trigger and you know that those items
      are retrieved each minutes you can extimate [N]minutes of heavy work in resolution message
      Last edited by Jeep; 13-08-2025, 22:44.

      Comment

      • Jeep
        Junior Member
        • Aug 2025
        • 7

        #4
        Originally posted by Jeep
        I think that something interesting can be achieved also with the usage of 'count()' and 'calculated item' to
        adeguate the dataset and extract more information.

        For example if you count number of items above 90% from the start of the trigger and you know that those items
        are retrieved each minutes you can extimate [N]minutes of heavy work in resolution message
        I hope that someone with experience could help to estimate if it is feasible.

        Comment

        • cyber
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Dec 2006
          • 4806

          #5
          Theres the issue of explaining to "count" that it has to start counting from beginning of the event...

          Comment

          • Jeep
            Junior Member
            • Aug 2025
            • 7

            #6
            Yeah Difficutl topic,if i find some interesting i will share with the comunity.

            Thanks!!

            Comment

            Working...