Ad Widget

Collapse

Duplicate triggers and alerts

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • spaghetti
    Junior Member
    • Apr 2023
    • 5

    #1

    Duplicate triggers and alerts

    I've been working on a monitoring template for CommVault jobs and each Job Item is being created from an item prototype, however the jobs are being read from an API request which only returns jobs from the last 24 hours. So after some time, some Items are no longer discovered and when this happens, every item changes (?), So Job A becomes Job B and so on, this creates an issue where if a Job failed and triggered an alert, the alert is then moved to a different item (different item ID) but has the same name and data and then the trigger has a different ID and to Zabbix this is a new alert that has never happened before.
    Now ths wouldn't be such an issue if JIRA wasn't included. The scenario I described causes JIRA to create 2 different tickets about the same alert. So imagine this

    THIS PART IS THE MOST IMPORTANT

    1. You get an alert in Zabbix that Job A has failed.
    2. A ticket is created in Jira with the title "Job A has failed"
    3. After 20 minutes, one item is deleted which causes every other Item to change ID'S
    4. Now Job A has a different trigger ID and Zabbix thinks that the trigger as never been triggered before.
    5. You get an alert in Zabbix that Job A has failed.
    6. A ticket is created in Jira with the title "Job A has failed".

    Now Jira normally handles scenarios when an alert repeats (to the original ticket adds a comment "alert repeated"), but only when it's the same trigger ID. So because of this I will keep getting duplicate tickets on Jira. After 1 hour of testing I had 22 tickets but only 7 real alerts. Do you have any idea how to tackle this issue?
  • Answer selected by spaghetti at 03-08-2023, 14:21.
    ISiroshtan
    Senior Member
    • Nov 2019
    • 324

    Yeah that is what I was after. I had a feeling about why you have the issue but needed solid proof before I will start giving specific advises. Sooooo... here is the catch with your setup: the preprocessing you have, specifically the $.[{#ID}] part just points to specific element based on it's position inside of the Array of all jobs, so when jobs removed from Array the same dependent item now points to a different element in json array, whch is exactly what causes the issue for you.

    Instead you'd better to use built-in JSONPath functionality in Zabbix (requires Zabbiox 4.2.5 or higher). You can extract specific parts of JSON based on value of specific fields. So I see you have "jobId": 357271, (as example) and I assume you already extract it in discovery macro (I'll assume it's {#JOBID}). Using JSONPath like below you should be able to avoid issue you having
    Code:
    $[?(@.jobId == "{#JOBID}")].status.first()
    (written it from top of my head, exact syntax might be slightly different, definetly use test functionality in pre-processing UI to make sure it's works as expected)
    With such approach you'd no longer care about job positioning in array, it will actually extract data of specific job based on jobId. And when it's gone - item will just become unsuported with failed preprocessing untill it's removed by "keep lost resources" thingy.

    Do advise if it's still failing for you one way or the other.

    Comment


    • spaghetti
      spaghetti commented
      Editing a comment
      That's really good advice, I think already have something set up this exact way. I have reconfigured it as you suggested and I am getting correct data. However I'll need it to run for a but more time before being sure that it actually works - I will let you know. Thank you for helping me out and I'll keep you posted .

    • spaghetti
      spaghetti commented
      Editing a comment
      Update: After an hour of testing only one alert in Zabbix was generated and only one Jira ticket was created, during all that, a lot of jobs were deleted and added, so it is working as expected. I can't believe I didn't catch that earlier, it's basic logic. I would kindly like to thank you, a thing that I've spend 44 hours on looks to be finally working
  • ISiroshtan
    Senior Member
    • Nov 2019
    • 324

    #2
    I would be interested in how you implemented discovery and data collection. I would expect jobs to have some unique non-repeating ID and you to use this value in one way or the other to make sure the described scenario not happens
    As of right now it feels like you have discovery that returns array and you just disvcover the items based on array placements (i.e. [{JOB:A, Status: Done}, {JOB:B, Status: Done},....] so you take job A is always number one - [1], Job B is [2], etc.) so when one of jobs leaves array all data gets screwd).

    Can you show configuration you set up for discovery and item prototypes and also JSON of one of the jobs (feel free to blur or remove values that you think expose your company or are sensetive info)?

    Comment

    • spaghetti
      Junior Member
      • Apr 2023
      • 5

      #3
      ISiroshtan Sure, I'll share my config. The discovery rule is creating application prototypes for each job, so an application prototype would be named as follows

      Job 234123 Backup - subclient324 - client 3143
      And its items:
      Click image for larger version

Name:	image.png
Views:	1087
Size:	43.2 KB
ID:	468219​For each job that the rule creates 2 triggers from a trigger prototype, they would be named the same way as the application prototype:

      Job 234123 Backup - subclient324 - client 3143 Failed

      Job 234123 Backup - subclient324 - client 3143Failed to Start
      The triggers check the prototype's item called "Job status" to determine if it should alert.

      I also should mention that Keep Lost Resources period is set to 1 hour, since I'd have 20 or 15 items no longer discovered in the 24 hour period.
      Now I also have noticed something, the application prototypes are being updated with new names and data - not deleted and replaced, see the history of "Job status" for one application prototype that has changed names and data since the beginning:
      Click image for larger version

Name:	image.png
Views:	1066
Size:	46.2 KB
ID:	468220
      As you can imagine, it causes a lot of issues, since Job status is critical for the trigger to work. I hope this clears things up a bit ​

      Comment

      • ISiroshtan
        Senior Member
        • Nov 2019
        • 324

        #4
        Could you actually show configuration you have in place not the values?

        How do you get data? Do you have a single item that collets data about all jobs (master item) and then you create item prototypes of dependent item type with preprocessing? - If so please do show data from master item and preprocessing setup you have on item prototype.
        Or is it some other setup? If so please do explain how data is collected for discovery and for each of the item prototype items.

        Also please show example of the data you pass into the discovery.

        Comment

        • spaghetti
          Junior Member
          • Apr 2023
          • 5

          #5
          ISiroshtan Sure, there's one master item that the discovery rule gets the data from and the items are created as dependent. Here are the item prototypes:
          Click image for larger version  Name:	image.png Views:	0 Size:	88.0 KB ID:	468226​Each is passed to an application prototype that's created as follows:
          Click image for larger version  Name:	image.png Views:	0 Size:	6.2 KB ID:	468227
          Using LLD Macros of course

          It's preprocessed from a json:
          Click image for larger version  Name:	image.png Views:	0 Size:	8.9 KB ID:	468228
          The ID you see, goes from 0 to n, each job gets assigned one - IT IS NOT THE SAME ID IN THE NAME, it's just for preprocessing
          Here's the said json:
          [{
          "sizeOfApplication": 127561090793,
          "vsaParentJobID": 0,
          "commcellId": 2,
          "backupSetName": "defaultBackupSet",
          "opType": 4,
          "totalFailedFolders": 0,
          "totalFailedFiles": 0,
          "alertColorLevel": 0,
          "jobAttributes": 288230376151711740,
          "jobAttributesEx": 0,
          "isVisible": true,
          "localizedStatus": "Completed",
          "isAged": false,
          "totalNumOfFiles": 495,
          "jobId": 357164,
          "sizeOfMediaOnDisk": 69068310402,
          "currentPhase": 0,
          "status": "Completed",
          "lastUpdateTime": 1690962253,
          "percentSavings": 45.8547,
          "localizedOperationName": "Backup",
          "statusColor": "black",
          "errorType": 0,
          "backupLevel": 1,
          "jobElapsedTime": 78233,
          "jobStartTime": 1690884010,
          "jobType": "Backup",
          "isPreemptable": 0,
          "backupLevelName": "Full",
          "attemptStartTime": 0,
          "appTypeName": "Windows File System",
          "percentComplete": 100,
          "localizedBackupLevelName": "Full",
          "jobEndTime": 1690962253,
          "dataSource": {
          "dataSourceId": 0
          },
          "subclient": {
          "clientName": "osw001",
          "instanceName": "DefaultInstanceName",
          "backupsetId": 77,
          "commCellName": "osw000",
          "instanceId": 1,
          "subclientId": 121,
          "clientId": 122,
          "appName": "Windows File System",
          "backupsetName": "defaultBackupSet",
          "applicationId": 33,
          "subclientName": "DDBBackup"
          },
          "userName": {
          "userName": "admin",
          "userId": 1
          },
          "clientGroups": [
          {
          "_type_": 28,
          "clientGroupId": 1,
          "clientGroupName": "Infrastructure"
          },
          {
          "_type_": 28,
          "clientGroupId": 3,
          "clientGroupName": "Media Agents"
          }
          ],
          "id": "0"
          },{
          "sizeOfApplication": 2328019075072,
          "vsaParentJobID": 0,
          "commcellId": 2,
          "thresholdTime": 56450,
          "backupSetName": "defaultBackupSet",
          "opType": 4,
          "totalFailedFolders": 0,
          "totalFailedFiles": 0,
          "alertColorLevel": 0,
          "jobAttributes": 288230376151711740,
          "jobAttributesEx": 4194304,
          "isVisible": true,
          "localizedStatus": "Completed",
          "isAged": false,
          "totalNumOfFiles": 24256,
          "jobId": 357271,
          "sizeOfMediaOnDisk": 43490133491,
          "currentPhase": 0,
          "status": "Completed",
          "lastUpdateTime": 1690938455,
          "percentSavings": 98.1319,
          "localizedOperationName": "Backup",
          "statusColor": "black",
          "errorType": 0,
          "backupLevel": 1,
          "jobElapsedTime": 25630,
          "jobStartTime": 1690912806,
          "jobType": "Backup",
          "jobEndTime": 1690938455,
          "dataSource": {
          "dataSourceId": 0
          },
          "subclient": {
          "clientName": "DAG",
          "instanceName": "defaultInstanceName",
          "backupsetId": 122,
          "
          },
          "storagePolicy": {
          "storagePolicyName": "Databases",
          "storagePolicyId": 7
          "id": "1"
          },​

          I think it's badly processed here but each job is one json element, so something like "storagePolicy" is an object within the job object, sorry but I couldn't format it properly
          ​​
          And so on, I've erased some senitive data but you should get the idea, I assume it's the structure you were after?
          It is preprocessed the same way in both Discovery Rule and the master item - which is the json itself.
          Click image for larger version  Name:	image.png Views:	0 Size:	15.2 KB ID:	468229
          Last edited by spaghetti; 02-08-2023, 21:15.

          Comment

          • ISiroshtan
            Senior Member
            • Nov 2019
            • 324

            #6
            Yeah that is what I was after. I had a feeling about why you have the issue but needed solid proof before I will start giving specific advises. Sooooo... here is the catch with your setup: the preprocessing you have, specifically the $.[{#ID}] part just points to specific element based on it's position inside of the Array of all jobs, so when jobs removed from Array the same dependent item now points to a different element in json array, whch is exactly what causes the issue for you.

            Instead you'd better to use built-in JSONPath functionality in Zabbix (requires Zabbiox 4.2.5 or higher). You can extract specific parts of JSON based on value of specific fields. So I see you have "jobId": 357271, (as example) and I assume you already extract it in discovery macro (I'll assume it's {#JOBID}). Using JSONPath like below you should be able to avoid issue you having
            Code:
            $[?(@.jobId == "{#JOBID}")].status.first()
            (written it from top of my head, exact syntax might be slightly different, definetly use test functionality in pre-processing UI to make sure it's works as expected)
            With such approach you'd no longer care about job positioning in array, it will actually extract data of specific job based on jobId. And when it's gone - item will just become unsuported with failed preprocessing untill it's removed by "keep lost resources" thingy.

            Do advise if it's still failing for you one way or the other.

            Comment


            • spaghetti
              spaghetti commented
              Editing a comment
              That's really good advice, I think already have something set up this exact way. I have reconfigured it as you suggested and I am getting correct data. However I'll need it to run for a but more time before being sure that it actually works - I will let you know. Thank you for helping me out and I'll keep you posted .

            • spaghetti
              spaghetti commented
              Editing a comment
              Update: After an hour of testing only one alert in Zabbix was generated and only one Jira ticket was created, during all that, a lot of jobs were deleted and added, so it is working as expected. I can't believe I didn't catch that earlier, it's basic logic. I would kindly like to thank you, a thing that I've spend 44 hours on looks to be finally working
          Working...