Ad Widget

Collapse

New item type:triggered item

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • erikgreen
    Junior Member
    • Sep 2010
    • 9

    #1

    New item type:triggered item

    So, this idea may not be new (I haven't read ALL the posts here yet). But, it was inspired by a thread over in the cookbook.

    One common type of script we find ourselves adding to zabbix_agentd.conf to be run by the central server when a trigger (eg. CPU load) goes off is a "snapshot" of current program status... open files, more detailed CPU stats, maybe even a debug dump from a JVM. We can't run super detailed checks like this all the time, because it would not only kill the server performance (relatively speaking) but would generate more data than we want to keep.

    Hence, we run it only when needed.

    Unfortunately this all happens locally and outside the agent. So, it would make sense to me to create a new type of Item, one with the same attributes as any other Item, but with an additional on/off switch and a duration. So it won't collect data or trigger anything except when the server flips it on in response to another trigger, and will do so for the set duration.

    Sort of like a template that is applied when triggered, then removed when the duration expires.

    If this would fit into Zabbix's existing item handling, then we could see a charted "snapshot" of what was going on in fine detail when the initial trigger happened, but we could skip the load of the server collecting fine grained, detail heavy stats the rest of the time.

    As a related issue, Zabbix needs the ability to show tabular data in the GUI

    So for example if I set a server to check CPU and memory use once every 30 seconds then on a too high load condition trigger a detailed template that records both of those plus the output of top every 10 seconds that lasts 5 minutes.

    That should provide enough data for a post-mortem of the problem. I already have to be careful not to kill the target with these kinds of checks, especially when it's already having problems, but it's better to look hard and figure out what's going on than to hope it doesn't happen again.

    For output, allow users to browse through generated snapshots in the GUI, and mark them with an icon on the graphs for that host so it's obvious when they are generated.

    To be even more useful:

    *It should be possible for a trigger to run scripts on both the same target and on other (including multiple) targets. So on triggering, agents on servers hosting several parts of an application could run to provide enough cross-platform data to locate a problem in a multi-tier application. This also lets a problem detected on one platform that doesn't host the problem process trigger actions elsewhere.

    *Follow-up checks (aftershocks?) should be able to be scheduled from the initial snapshot. Eg. permit it to schedule another "intense" check an hour later, to record after-effects.

    *Even cooler, permit one-time scheduling of snapshots (defined as a group of checks) at arbitrary times. So if a user solving a problem wants extra intense checks done for 10 minutes at 9am, 5pm, and midnight, they can be scheduled. This would also permit relaxing of the regular schedule during known intense processing periods, like daily back-ups or database vacuum.

    *Unrelated, but if you have a scheduler built in, allow rotating lists of media entries or action settings, so it's possible to have one individual targeted by actions but to have that individual changed on a scheduled basis. This permits a rotating "on call" list inside the app. We can already manually rotate by creating an "on call" user and changing the media settings manually, but automated rotation would be nice.

    Thanks,
    Erik
  • richlv
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2005
    • 3112

    #2
    while i'm not sure about the details of the possible implementation, feature requests in general should be registered in the issue tracker
    Zabbix 3.0 Network Monitoring book

    Comment

    • erikgreen
      Junior Member
      • Sep 2010
      • 9

      #3
      Ok

      I'll put something in the issue tracker, and I'll try to phrase it more cleanly. We're finally getting ramped up here at the University of Minnesota on using Zabbix, and we're already excited about what we can do with it. Hopefully our maintenance contract goes through soon.

      Even better, maybe our administrators will let us submit code or utilities for Zabbix long term.

      Erik

      Comment

      Working...