Ad Widget

Collapse

Problems with triggers going *unknown*

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mdiorio
    Junior Member
    • Mar 2016
    • 27

    #1

    Problems with triggers going *unknown*

    Hi Everyone...

    We've been using Zabbix for about 6 months now and it's been working really well. Sometime today, our triggers for windows host disk usage that have been firing changed their {ITEM.LAST1} and {ITEM.LAST2} macros from the proper values to Unknown for no known reason, even after a Zabbix server reboot. The metrics are flowing in properly for disk space and there is no break in monitoring coverage, so the Values are right, but the Trigger is "stuck". Especially since one server the trigger should have been cleared on.

    Stranger is that some things seem out of sync. I have a host group for Citrix servers, in it are about 18 servers. If I go to Configuration > Host Groups, I can see them all there just fine.

    If I go to Monitoring > Triggers and change the Host Group to Citrix, the page sits there for about 10 seconds, then the main body just shows a black background. Only the header is on the page.

    Same exact thing happens for my "Production Internal Servers" host group.

    Both these groups are the ones that are giving me *UNKNOWN* for the trigger Macros.

    Any thoughts on this random issue? Running 3.2.3 on MySql hosted on the same server.

    Thanks!
  • batchenr
    Senior Member
    • Sep 2016
    • 440

    #2
    Originally posted by mdiorio
    Hi Everyone...

    We've been using Zabbix for about 6 months now and it's been working really well. Sometime today, our triggers for windows host disk usage that have been firing changed their {ITEM.LAST1} and {ITEM.LAST2} macros from the proper values to Unknown for no known reason, even after a Zabbix server reboot. The metrics are flowing in properly for disk space and there is no break in monitoring coverage, so the Values are right, but the Trigger is "stuck". Especially since one server the trigger should have been cleared on.

    Stranger is that some things seem out of sync. I have a host group for Citrix servers, in it are about 18 servers. If I go to Configuration > Host Groups, I can see them all there just fine.

    If I go to Monitoring > Triggers and change the Host Group to Citrix, the page sits there for about 10 seconds, then the main body just shows a black background. Only the header is on the page.

    Same exact thing happens for my "Production Internal Servers" host group.

    Both these groups are the ones that are giving me *UNKNOWN* for the trigger Macros.

    Any thoughts on this random issue? Running 3.2.3 on MySql hosted on the same server.

    Thanks!
    first i suggest you to increase debug level at zabbix_server.conf
    to see more info on that.

    second,
    you have on the item configuration "Clear history and trends"
    try to see if it helps.
    and if you can remove completely the trigger and assign it.

    Comment

    • mdiorio
      Junior Member
      • Mar 2016
      • 27

      #3
      Thanks for the reply. I just increased the logging level.

      Strange thing is over the weekend when I looked, the correct values starting showing up again, however the triggers for disk space were firing for <10% space left, but there was actually 13+% left for some for some of them.

      Today, they're back to an *UNKNOWN* status.

      As for "Clear history and trends", this is a problem for production environment monitoring. We need to retain historic metrics and can't be clearing them out when a trigger starts having an issue.

      I'll see if I can find anything in the logs throughout the day. Is there specific verbiage I should be looking for?

      Thanks.

      Comment

      • mdiorio
        Junior Member
        • Mar 2016
        • 27

        #4
        I bumped up the logging and grepped for the trigger ID and got

        9792:20170123:115414.537 In zbx_process_trigger() triggerid:23156 value:1(0) new_value:3
        9791:20170123:115415.677 In zbx_process_trigger() triggerid:23156 value:1(0) new_value:3

        Not really helpful. Do I need to turn logging up higher?

        Comment

        • Pada
          Senior Member
          • Apr 2012
          • 236

          #5
          Hi,

          I presume you're taking about the
          Code:
          {ITEM.LASTVALUE<1-9>}
          as mentioned in the docs: https://www.zabbix.com/documentation...ed_by_location

          From the docs it says
          It will resolve to *UNKNOWN* in the frontend if the latest history value has been collected more than the ZBX_HISTORY_PERIOD time ago (defined in defines.inc.php).
          where ZBX_HISTORY_PERIOD has a default of 24h.

          Could you perhaps show us your exact trigger expression and the item update intervals of the items included in that trigger expression?

          Also check when last those items had data, by checking the "Last check" column under Monitoring > Lastest data.

          I hope this may help a bit...

          Comment

          • mdiorio
            Junior Member
            • Mar 2016
            • 27

            #6
            Yes, that is what I mean.

            The data collection for the pfree and free items have never been changed from the template and collects every 1 minute per the template. Both items have never had a hiccup in collection, each 1 minute the data has a value.

            That's why the evaluation of *UNKNOWN* makes no sense here. Also the fact that occasionally it does evaluate properly and other times is resolves *UNKNOWN*.


            Trigger Name:
            Low disk space {ITEM.VALUE1} {ITEM.VALUE2} is less than
            Code:
            {$FREEDISKALERTPERCENT}% on volume {#FSNAME}
            Trigger Definition:
            Code:
            ({Template OS Windows:vfs.fs.size[{#FSNAME},pfree].max(5m)}<{$FREEDISKALERTPERCENT} and {Template OS Windows:vfs.fs.size[{#FSNAME},free].last()}<10G) and ({Template OS Windows:vfs.fs.size[{#FSNAME},pfree].max(5m)}<1 and {Template OS Windows:vfs.fs.size[{#FSNAME},free].last()}>1G)

            Comment

            • Pada
              Senior Member
              • Apr 2012
              • 236

              #7
              I've never used those macros in the name of a trigger.

              If I understand correctly, the name of the trigger would then change according to the latest value of
              Code:
              ({Template OS Windows:vfs.fs.size[{#FSNAME},pfree].max(5m)}
              and
              Code:
              {Template OS Windows:vfs.fs.size[{#FSNAME},free].last()}
              For example if
              1) #FSNAME = c:\
              2) and the maximum free disk space the past 5m = 50%, but the last 5minutes' data looked as follow: 20% 30% 40% 50% 25%. So if its a 1TB disk, then the last free size would be 250G
              3) and FREEDISKALERTPERCENT = 10, then
              your trigger's name of
              Low disk space {ITEM.VALUE1} {ITEM.VALUE2} is less than {$FREEDISKALERTPERCENT}% on volume {#FSNAME}
              would become like "Low disk space 25% 25G is less than 10% on volume c:\"

              Perhaps having those dynamic macros in the item name is too much for Zabbix's database to handle when trying to view the trigger names.

              I typically include the host macro's or item parameters in my trigger names and then in the Actions that generate Emails/SMSs I include {ITEM.VALUE1} {ITEM.VALUE2} and {ITEM.VALUE3}
              For example my trigger name would only be like
              Code:
              Low free disk space on {HOST.NAME} for volume {#FSNAME} is less than {$FREEDISKALERTPERCENT}%
              And our Default message on the Action was the following in Zabbix 1.8:
              Affected Host name: {HOSTNAME}
              Current Time: {TIME}

              Trigger Details
              Name: {TRIGGER.NAME}
              Status: {TRIGGER.STATUS}
              Severity: {TRIGGER.SEVERITY}
              URL: {TRIGGER.URL}
              Comments: {TRIGGER.COMMENT}

              Current Item Values (not necessarily the value used in the trigger):
              1. {ITEM.NAME1} ({HOSTNAME1}:{TRIGGER.KEY1}): {ITEM.VALUE1}
              2. {ITEM.NAME2} ({HOSTNAME2}:{TRIGGER.KEY2}): {ITEM.VALUE2}
              3. {ITEM.NAME3} ({HOSTNAME3}:{TRIGGER.KEY3}): {ITEM.VALUE3}
              By the way, I believe that you'll need to review your trigger expression as well, because it will only go into problem state IF
              the free disk space % is between (1,{FREEDISKSPACE}) AND the free disk space (in GB) is between (1,10)

              Comment

              • mdiorio
                Junior Member
                • Mar 2016
                • 27

                #8
                I would think dynamic macros in the trigger would be fine - that's what it's designed to do.

                I want to look at the trigger from the dashboard and see what the status is right away (25GB free, ok, I have some time - 1GB free need to act immediately), not have to dig into the details of it to find out. I know I can create multiple triggers and escalate, but There's a big difference between 25GB free, and 10 GB free, especially on a database log drive that can fill that in 10 minutes or less.


                Originally posted by Pada
                If I understand correctly, the name of the trigger would then change according to the latest value of
                Code:
                ({Template OS Windows:vfs.fs.size[{#FSNAME},pfree].max(5m)}
                and
                Code:
                {Template OS Windows:vfs.fs.size[{#FSNAME},free].last()}
                For example if
                1) #FSNAME = c:\
                2) and the maximum free disk space the past 5m = 50%, but the last 5minutes' data looked as follow: 20% 30% 40% 50% 25%. So if its a 1TB disk, then the last free size would be 250G
                3) and FREEDISKALERTPERCENT = 10, then
                your trigger's name of would become like "Low disk space 25% 25G is less than 10% on volume c:\"
                Yes, that is exactly what we want - the trigger name to change as the calculated values change. That's why we use {ITEM.VALUE1} and {ITEM.VALUE2}. {ITEM.VALUE1} would give you the value of {Template OS Windows:vfs.fs.size[{#FSNAME},pfree].max(5m)} as of the time the trigger was fired. There is also {ITEM.LASTVALUE1} (I tried this and it actually yield the same *UNKNOWN* randomly) that is updated every time the trigger is evaluated.

                To borrow from another post:
                For example, let's say I have a trigger set up like this:

                Name: File size > 64 ({ITEM.VALUE}, {ITEM.LASTVALUE})
                Expression: {myhost:vfs.file.size[/tmp/foo].last()}>64

                If I create a 75 byte file called /tmp/foo, an event gets generated with the description "File size > 64 (75 B, 75 B)". However, if the file continues to grow to 128 B, then the after the next check, the event name changes to "File size > 64 (75 B, 128 B)".

                So, ITEM.VALUE does not get updated, but ITEM.LASTVALUE does.


                Perhaps having those dynamic macros in the item name is too much for Zabbix's database to handle when trying to view the trigger names.
                The trigger needs to evaluate the definition items every time it checks anyway, so used the value returned shouldn't generate much if any additional load on the SQL server.


                I typically include the host macro's or item parameters in my trigger names and then in the Actions that generate Emails/SMSs I include {ITEM.VALUE1} {ITEM.VALUE2} and {ITEM.VALUE3}
                For example my trigger name would only be like
                Code:
                Low free disk space on {HOST.NAME} for volume {#FSNAME} is less than {$FREEDISKALERTPERCENT}%
                This is the way it's set up by default, but as I said, provides no details. There's no quick way to see the values from the dashboard this way, especially if you're looking at it from a TV screen. I often don't want to send alerts until they become critical otherwise you're flooded with alerts that aren't actionable or not high priority and you fatigue checking emails and you start ignoring. Again this can be handled with escalations, but do it right from the start and your escalation paths become easier.

                By the way, I believe that you'll need to review your trigger expression as well, because it will only go into problem state IF
                the free disk space % is between (1,{FREEDISKSPACE}) AND the free disk space (in GB) is between (1,10)
                We have servers with wide ranging disks, 10GB - 4TB even in the same server. If I used the one size fits all default percentage of 10%:

                If a 50GB disk < 10%, that's 5GB free, that's an issue as it can fill up quick.
                If a 4TB disk < 10%, that's 400GB free, not as pressing an issue usually.

                If I use only percentage, then the 50GB disk may be firing all the time. I don't want to have to set macros per disk, that's a nightmare.

                This was an attempt to have a single trigger be able to cover both options by requiring both percentage and free space = TRUE.

                I have a second trigger for less than 1% and less than 1GB that sets hypothetical bells off.

                Comment

                • Pada
                  Senior Member
                  • Apr 2012
                  • 236

                  #9
                  I'm glad you fully understand all those macro things and use them appropriately.
                  Perhaps this is a bug or performance related issue that your having. I'm really not sure what kind of performance impact those dynamic ITEM macros would have on a large system, which is partly why I'm avoiding them.

                  I presume all your Zabbix Internal performance items are OK, summarized by like the "Zabbix internal process busy %" and "Zabbix cache usage, % free" graphs that comes with the "Template App Zabbix Server".

                  Would you mind sharing some performance stats (values per second)l, the rough specs of your server and the DB engine that you're using (eg. InnoDB)?

                  I know your frustration with multiple disk sizes!
                  For that reason I started using "predictive" items (see attached image) to tell me when my disk will completely run out and then add a trigger to give me a 1 day notice.

                  Unfortunately the documentation is terrible and when you just use the prediction stuff in the trigger expression, you have no visibility of whether the trigger will work or not, which is why I'd prefer to add a calculated item!
                  For example see this thread with regards to prediction https://www.zabbix.com/forum/showthread.php?t=54455

                  As for the trigger expression you had, my concerns were that it would go into an OK state if either of the following happened:
                  a) Free space is equal or less than 1G
                  b) Free space % is equal or less than 1%
                  So hopefully you have another trigger (and/or recovery rule) to cater for that scenario.

                  As for that log output:
                  Code:
                  9792:20170123:115414.537 In zbx_process_trigger() triggerid:23156 value:1(0) new_value:3
                  9791:20170123:115415.677 In zbx_process_trigger() triggerid:23156 value:1(0) new_value:3
                  ... it translates to:
                  Code:
                  9792:20170123:115414.537 In zbx_process_trigger() triggerid:23156 value:TRIGGER_VALUE_PROBLEM(TRIGGER_STATE_NORMAL) new_value:TRIGGER_VALUE_NONE
                  9791:20170123:115415.677 In zbx_process_trigger() triggerid:23156 value:TRIGGER_VALUE_PROBLEM(TRIGGER_STATE_NORMAL) new_value:TRIGGER_VALUE_NONE
                  See src/libs/zbxdbhigh/trigger.c for the logger and include/common.h for the constants:
                  Code:
                  /* trigger values */
                  #define TRIGGER_VALUE_OK                0
                  #define TRIGGER_VALUE_PROBLEM           1
                  #define TRIGGER_VALUE_UNKNOWN           2       /* only in server code, never in DB */
                  #define TRIGGER_VALUE_NONE              3       /* only in server code, never in DB */
                  const char      *zbx_trigger_value_string(unsigned char value);
                  
                  /* trigger states */
                  #define TRIGGER_STATE_NORMAL            0
                  #define TRIGGER_STATE_UNKNOWN           1
                  Hopefully other people who are also using (or avoiding) the likes of ITEM.VALUE<1-9> in their trigger names could assist...
                  Attached Files

                  Comment

                  • mdiorio
                    Junior Member
                    • Mar 2016
                    • 27

                    #10
                    Originally posted by Pada
                    Would you mind sharing some performance stats (values per second)l, the rough specs of your server and the DB engine that you're using (eg. InnoDB)?
                    See attachments

                    Zabbix server is virtualized on brand new Cisco UCS blades with a NetApp FAS all flash storage array.

                    4 Intel Xeon E5-2680 v4 vCPU's
                    16GB of Memory

                    Running on MySQL using InnoDB

                    I know your frustration with multiple disk sizes!
                    For that reason I started using "predictive" items (see attached image) to tell me when my disk will completely run out and then add a trigger to give me a 1 day notice.
                    Thank you for this! This sounds like a much better option and yields actionable items.


                    So hopefully you have another trigger (and/or recovery rule) to cater for that scenario.
                    I sure do.

                    Thanks!

                    Comment

                    • db100
                      Member
                      • Feb 2023
                      • 61

                      #11
                      hi there, somewhat related question: do you know if zabbix server is still caching values for unknown triggers ? or are those triggers simply ignored?

                      cheers

                      Comment

                      Working...