Ad Widget

Collapse

Multiple cross emails from actions upgrade from 3.0 to 4.0.7 : broken

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • zabbixfk
    Senior Member
    • Jun 2013
    • 256

    #16
    Thanks for the reply vso. I am sure its a bug, but to open a report it says i need to have some login which i don't have.
    As i stated above, i have provided all the logs, and if you need more i can give more too.
    Ok, so looks like i am not clear in providing info, here i re iterate again.

    My old system was 3.0.13 (all in one, DB, engine and WebUI). Here are the details. - This is before upgrade -
    Code:
    zabbix_server (Zabbix) 3.0.13
    Revision 74336 7 November 2017, compilation time: Dec 22 2017 06:02:28
    More Details
    Code:
     [TABLE]
    [TR="class: cke_show_border"]
    [TD]Zabbix server is running[/TD]
     			[TD]Yes[/TD]
     			[TD]localhost:10051[/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]Number of hosts (enabled/disabled/templates)[/TD]
     			[TD]6577[/TD]
     			[TD]3135 / 2952 / 490[/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]Number of items (enabled/disabled/not supported)[/TD]
     			[TD]22218[/TD]
     			[TD]19403 / 2276 / 539[/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]Number of triggers (enabled/disabled [problem/ok])[/TD]
     			[TD]7861[/TD]
     			[TD]7092 / 769 [210 / 6882][/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]Number of users (online)[/TD]
     			[TD]93[/TD]
     			[TD]6[/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]Required server performance, new values per second[/TD]
     			[TD]65.46[/TD]
     		[/TR]
    [/TABLE]
     Number of actions : 337.
    This above is working fine and as expected - close to 18months, And i thought of upgrading , here is the upgraded system. (DB in one machine, WebUI and Engine in another.)
    Code:
    zabbix_server (Zabbix) 4.0.7
    Revision 92831 18 April 2019, compilation time: Jun  3 2019 15:59:42
    This product includes software developed by the OpenSSL Project
    for use in the OpenSSL Toolkit (http://www.openssl.org/).
    Compiled with OpenSSL 1.0.2k-fips  26 Jan 2017
    Running with OpenSSL 1.0.2k-fips  26 Jan 2017
    And system details
    Code:
     [TABLE]
    [TR="class: cke_show_border"]
    [/TR]
    [TR="class: cke_show_border"]
    [TD]Yes[/TD]
     			[TD]localhost:10051[/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]6576[/TD]
     			[TD]32 / 6054 / 490[/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]43947[/TD]
     			[TD]973 / 42943 / 31[/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]13229[/TD]
     			[TD]302 / 12927 [12 / 290][/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]92[/TD]
     			[TD]1[/TD]
     		[/TR]
    [TR="class: cke_show_border"]
    [TD]2.62[/TD]
     			[TD] [/TD]
     		[/TR]
    [/TABLE]
    In the upgrded i have enabled only certain hosts. Not all, about 20 hosts for testing. Now out of these 20, some are hitting problems and they are genuine alerts.
    Only problem is
    - Alerts are triggering emails which is more than expected.
    - Somewhere Any action which is initiated by trigger is firing actions - which is not only for that trigger but also from other actions as well
    - This is where i am seeing more than required amount of emails going through one action.
    Since i can't attach all the images here, i am going to do reply to this thread.

    Bottomline after upgrade from 3.0.13 to 4.0.7 LTS via compiling source -
    Question : Is triggers are firing when there's a actual problem - Yes
    Question : Is corresponding action is firing when theres a trigger - Yes
    Question : Is action which has email sending enabled working for configured user group - Yes
    So what's the problem - Problem is , in addition to configured email sending action, there are some more ( 30+ ) emails are getting fired with the action body - yes same action body , see my previous threads for the pics - i.e Let's say Action A to be fired for Trigger A, and email to be sent for user group B and C, in addition to this action body of E, F, G , H ...actions are also getting fired for the email group B & C. This is causing , for one trigger (PROBLEM or OK state) i am seeing 30+ emails getting fired where it suppose to only two - beacuse user group configured to receive email is only two.
    Question : Is this happening to only specific triggers or all of the 337? As of now i have enabled 20 hosts for monitoring and most of the hosts have PROBLEMS coming in and this is happening for all of them.
    Since the log files are huge, i can't upload it here.

    Hope i am able to provide context, please let me know if more logs needed.

    Log file - which clearly has multiple action email initiated for a trigger function evaluated .

    Thanks,

    Comment

    • zabbixfk
      Senior Member
      • Jun 2013
      • 256

      #17
      *bump* - any pointers are greatly helpful...

      Comment

      • vso
        Zabbix developer
        • Aug 2016
        • 190

        #18
        You can register and create a bug report on support.zabbix.com.

        Comment

        • zabbixfk
          Senior Member
          • Jun 2013
          • 256

          #19
          Thanks vso, but did you find any anomaly on the log file link i provided.

          Thanks.

          Comment

          • vso
            Zabbix developer
            • Aug 2016
            • 190

            #20
            Unfortunately not, I would suggest increasing log level of alerter only and then checking the log when issue occur:
            zabbix_server -R increase_log_level=alerter

            Comment

            • zabbixfk
              Senior Member
              • Jun 2013
              • 256

              #21
              loglevel is configured as 5 as of now.
              Code:
              DebugLevel=5

              Comment

              • vso
                Zabbix developer
                • Aug 2016
                • 190

                #22
                Attached patch should log mails for you, hopefully this will provide more information on what Zabbix server is sending and to whom.
                Attached Files

                Comment

                • zabbixfk
                  Senior Member
                  • Jun 2013
                  • 256

                  #23
                  Thanks vso, can you tell me how do i import this patch please...

                  Comment

                  • vso
                    Zabbix developer
                    • Aug 2016
                    • 190

                    #24
                    In the directory with source code please do:
                    patch -p1 -i log_mails.txt

                    Then make -s install

                    Comment

                    • zabbixfk
                      Senior Member
                      • Jun 2013
                      • 256

                      #25
                      BTW, i am able to see to whom all the emails are going, problem is, why are they going, i.e which action triggered those emails - i.e which trigger triggered thos actions because of which 30+ emails are going for each action - cross action body being sent on emails.
                      I guess the log which i added to previous thread has some details on the item fetched - low memory - and then trigger evaluated. Once trigger becomes true, i am seeing some action ID's are being fired for that trigger and then lot of email body' being added to send_email function - all in that log. Just wanted to make sure, that logic is fine or not.

                      Comment

                      • zabbixfk
                        Senior Member
                        • Jun 2013
                        • 256

                        #26
                        Code:
                        [root@zbx-upgrade-app zabbix-4.0.7]# patch -p1 -i log_mails.txt
                        patching file src/libs/zbxmedia/email.c
                        Hunk #1 succeeded at 547 (offset -1 lines).
                        [root@zbx-upgrade-app zabbix-4.0.7]#
                        [root@zbx-upgrade-app zabbix-4.0.7]# make -s instal 
                        make: *** No rule to make target `instal'.  Stop.
                        [root@zbx-upgrade-app zabbix-4.0.7]#
                        Should i configure again? or run make again?

                        Comment

                        • vso
                          Zabbix developer
                          • Aug 2016
                          • 190

                          #27
                          Please do:
                          make -s install

                          I can only see one action condition matching in the log:
                          28162:20190607:132017.748 In check_trigger_condition()
                          28162:20190607:132017.748 query [txnlev:1] [select templateid from triggers where triggerid=48116]
                          28162:20190607:132017.748 End of check_trigger_condition():SUCCEED
                          28162:20190607:132017.748 End of check_action_condition():SUCCEED

                          You could try increasing log level of history syncer only and looking what conditions match,
                          zabbix_server -R increase_log_level="history syncer"

                          Comment


                          • zabbixfk
                            zabbixfk commented
                            Editing a comment
                            But for that one action, can you also see, number of other action emails generated - per low memory thing, lines 6616, 6626, 6675, 6718, 6794, 6810,6827, 6871, 6914... till 10192. I suspect this is where things are wrong, basically it should't have created so many action bodies, because only one trigger so only one action should be there right...

                          • vso
                            vso commented
                            Editing a comment
                            It's only 3 alerts:
                            28164:20190607:132019.309 query [txnlev:0] [insert into alerts (alertid,actionid,eventid,userid,clock,mediatypeid ,sendto,subject,message,status,error,esc_step,aler ttype,acknowledgeid) values (17737636,10,179309896,110,1559893819,1,'user@abcd .com','Low memory on system-DC2','
                            28164:20190607:132019.311 query [txnlev:0] [insert into alerts (alertid,actionid,eventid,userid,clock,mediatypeid ,sendto,subject,message,status,error,esc_step,aler ttype,acknowledgeid) values (17737637,10,179309896,163,1559893819,1,'user1@abc d.com','Low memory on system-DC2','',3,'',1,0,null);
                            28164:20190607:132019.313 query [txnlev:0] [insert into alerts (alertid,actionid,eventid,userid,clock,mediatypeid ,sendto,subject,message,status,error,esc_step,aler ttype,acknowledgeid) values (17737638,10,179309896,109,1559893819,1,'user2@abc d.com','Low memory on system-DC2','',3,'',1,0,null);

                          • zabbixfk
                            zabbixfk commented
                            Editing a comment
                            But for three alerts, there are so many
                            In substitute_simple_macros() data:'Trigger
                            also line numbers i mentioned please check ( 10191, 10174,10157,10086,10063,9993,9939,9914,9776,9730,9 714,9697,9654, 9632,9589,9542,9524,9505,9429,9405,9329,9282,9264, 9245,9170,9145,9070,9007,8983,8956,8873,8841,8757, 8710,8692,8673,8602,8577,8507,8414,8398,8373,8330, 8307,8265,8220,8204,8187,8143,8122,8079,8034,8010, 7950,7909,7866,7802,7786,7769,7726,7704,7653,7608, 7592,7532,7591) etc
                        • zabbixfk
                          Senior Member
                          • Jun 2013
                          • 256

                          #28
                          Code:
                          [root@zbx-upgrade-app zabbix-4.0.7]#
                          [root@zbx-upgrade-app zabbix-4.0.7]# zabbix_server -R increase_log_level="history syncer"
                          zabbix_server [5658]: invalid runtime control option: increase_log_level=history syncer
                          [root@zbx-upgrade-app zabbix-4.0.7]#
                          [root@zbx-upgrade-app zabbix-4.0.7]# zabbix_server -R increase_log_level=alerter
                          zabbix_server [5666]: invalid runtime control option: increase_log_level=alerter
                          [root@zbx-upgrade-app zabbix-4.0.7]
                          Last edited by zabbixfk; 11-06-2019, 11:40.

                          Comment


                          • vso
                            vso commented
                            Editing a comment
                            Sorry it's
                            zabbix_server -R log_level_increase=alerter

                          • zabbixfk
                            zabbixfk commented
                            Editing a comment
                            Should i be restarting the zabbix_server after patch?

                          • vso
                            vso commented
                            Editing a comment
                            yes, please do
                        • zabbixfk
                          Senior Member
                          • Jun 2013
                          • 256

                          #29
                          Here is the logs again.
                          o3.txt

                          For example, i don't have any action for zabbix pinger process,(none defined for any zabbix internal process, only trigger defined for them , no action defined, you can check the screenshot attached), but i am getting emails ( 30+).

                          Also for other triggers, its sending out so many emails. i.e somewhere actions are being matched for any trigger and they are being fired.
                          I just don't know where to look now .

                          Thanks
                          Last edited by zabbixfk; 11-06-2019, 12:38.

                          Comment

                          • vso
                            Zabbix developer
                            • Aug 2016
                            • 190

                            #30
                            Now I see multiple escalations created for the same trigger, meaning different actions occurred, please provide action conditions of actions with those id:
                            insert into escalations (escalationid,actionid,status,triggerid,itemid,eve ntid,r_eventid,acknowledgeid) values (631,295,0,13475,null,179310664,null,null),(632,28 5,0,13475,null,179310664,null,null),(633,180,0,134 75,null,179310664,null,null),(634,108,0,13475,null ,179310664,null,null),(635,425,0,13475,null,179310 664,null,null),(636,72,0,13475,null,179310664,null ,null),(637,366,0,13475,null,179310664,null,null), (638,294,0,13475,null,179310664,null,null),(639,14 3,0,13475,null,179310664,null,null),(640,239,0,134 75,null,179310664,null,null),(641,181,0,13475,null ,179310664,null,null),(642,109,0,13475,null,179310 664,null,null),(643,365,0,13475,null,179310664,nul l,null),(644,77,0,13475,null,179310664,null,null), (645,110,0,13475,null,179310664,null,null),(646,36 ,0,13475,null,179310664,null,null);
                            Last edited by vso; 11-06-2019, 12:58.

                            Comment


                            • vso
                              vso commented
                              Editing a comment
                              Marked some, for example:
                              select name from actions where actionid in (108,425);

                              Then find in UI by name please and see conditions

                            • zabbixfk
                              zabbixfk commented
                              Editing a comment
                              Here you go, you are right, they are all different actions being matched ... oh oh something is seriously wrong. For one trigger so many actions being matched....
                              Code:
                               SELECT actionid, name FROM actions WHERE actionid in (295,28,180,108,425,72.366,294,14,239,181,109,365,77,110,36);
                              +----------+-----------------------------------------------------------------------------------------------------------+
                              | actionid | name                                                                                                      |
                              +----------+-----------------------------------------------------------------------------------------------------------+
                              |       28 | Operational Status on ( x-x-x ) {HOST.HOST} -> {HOST.IP} : PortName:{#SNMPVALUE}, Value:{ITEM.VALUE1}
                              |       36 | CPU LOAD ON x x-x-x is more than 7 for 15mins
                              |       77 | Unable to reach to abcd.com from {HOST.IP}                                          
                              |      108 | Zabbix Agent is not reachable on x-x -> 172.x.x.74 for 5 minutes                                
                              |      109 | Zabbix Agent is not reachable on x-x -> 172.x.x.75 for 5 minutes                                
                              |      110 | Zabbix Agent is not reachable on x-x -> 172.x.x.73 for 5 minutes                                
                              |      180 | x-x-x-x-x-port1-Operational Status changed -172.x.x.88                              
                              |      181 | x-x-x-x-x-port1-Operational Status changed -172.x.x.88                              
                              |      239 | x-x-x2-x-x-ge-1/1/5-6MB-Link crossing 80%                                                  
                              |      294 | x-x-x-ge-1/1/6-x-Secondary-10Mb Traffic Crossing 90%                                   
                              |      295 | x-x-ge-1/0/2-x-x-x-10Mb Traffic crossing 90%                                                  
                              |      365 | x_Servers : x-x-x-GW1 operational status & ping response                                    
                              |      425 | x-x-x-x2 : 172.x.x.73 - Traffic Status                                                             
                              +----------+-----------------------------------------------------------------------------------------------------------+
                              These actions conditions are what sent on email so now the issue is becoming more visible.....
                              that explains why the different email body for same trigger...
                              But why is this ? I mean it should match exact action for the trigger right? why is it matching other actions?
                              (P.S i don't know about escalations, haven't configured it form day 1 of zbx, asking this because the query you shared have something like escalations )

                              Thanks,
                              Last edited by zabbixfk; 11-06-2019, 13:11.

                            • vso
                              vso commented
                              Editing a comment
                              Please show action conditions for those actions
                          Working...