Ad Widget

Collapse

Zabbix Server Stops After A While

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lutphi
    Junior Member
    • Jun 2013
    • 9

    #1

    Zabbix Server Stops After A While

    Hi,

    I have a zabbix server which was working without problem almost a year, till monday. Since monday I am having problems. Zabbix server stops working after a while. I got the error "Zabix server is not running:the information displayed may not be current." I tried to open debug log level 4, but it did not give any useful information. When I restart mysql and zabbix server, it works fine for a while (anout an hour) but after that the same situation occurs, what may be the problem?

    OS is Deabian squeeze,
    zabbix_server version: Zabbix server v2.0.2 (revision 29214)

    Thanks for help
    Attached Files
  • Colttt
    Senior Member
    Zabbix Certified Specialist
    • Mar 2009
    • 878

    #2
    please post you server-log file
    Debian-User

    Sorry for my bad english

    Comment

    • lutphi
      Junior Member
      • Jun 2013
      • 9

      #3
      server logs are too big too attach if you want specific lines I can grab and attcah those ones.

      Comment

      • tchjts1
        Senior Member
        • May 2008
        • 1605

        #4
        2.0.2 is not the most stable of the 2.x release. I would consider upgrading to 2.0.6.

        What kind of setup do you have? Zabbix App and DB server on the same server, or are they split? If they are on the same server, what is your memory usage like?

        Comment

        • lutphi
          Junior Member
          • Jun 2013
          • 9

          #5
          They are all in same server, memory usage is 8,5GB/32GB I use mysql innoDB storage engine. CPU load is 7,5 averagely I have 12 cores. I also changed disk drives with new ones, if there was disk problem.

          Comment

          • tchjts1
            Senior Member
            • May 2008
            • 1605

            #6
            lutphi - Do you have these type of graphs available? See post in below link. I forget exactly which version these were introduced in...

            Comment

            • lutphi
              Junior Member
              • Jun 2013
              • 9

              #7
              unfortunately I don't those type of graphs, in addition i disabled some of my hosts but still the problem occurs. How can I produce those type of graphs?

              Comment

              • tchjts1
                Senior Member
                • May 2008
                • 1605

                #8
                Originally posted by lutphi
                unfortunately I don't those type of graphs, in addition i disabled some of my hosts but still the problem occurs. How can I produce those type of graphs?
                They are part of "Template App Zabbix Server". if you have that template, attach it to your Zabbix server. Those graphs are part of the template.

                If you don't have that template, you can get it here and then import it:
                Join the friendly and open Zabbix community on our forums and social media platforms.

                Comment

                • lutphi
                  Junior Member
                  • Jun 2013
                  • 9

                  #9
                  Sorry I got some vital custom made graph on zabbix server, some response time graphs etc.

                  Comment

                  • BDiE8VNy
                    Senior Member
                    • Apr 2010
                    • 680

                    #10
                    I got similar symptoms. Maybe it's the same issue, maybe it's a totally different one.

                    Last night it happens the third time within the last two months that the Zabbix-Server stops to process data collected by Zabbix-Proxies.

                    Symptoms after issue is occured:
                    - Frontend complains about unavailability of Zabbix-Server
                    - No data gets delivered (in the current case since 02:33 AM)
                    - Configuration from server is still received by proxies.
                    - All internal processes are more or less idle expect timer and configuration syncer processes which utilize <17%
                    - All gathering processes are more or less idle except trapper which utilize 100%
                    - CPUs idle ~95%
                    - Configuration caches are fine
                    - Disk I/O (write) is 4 MB/s and 550 O/s in average (normal usage is about 13 MB/s and 1800 O/s)
                    - After restart of Zabbix-Server all missing data gets processed.

                    Selection of information from Zabbix-Server log file (ordered by PID and time):
                    Code:
                     22302:20130609:105844.112 server #4 started [trapper #1]
                     22302:20130615:020605.631 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22302:20130615:020620.615 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22302:20130615:025150.700 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22302:20130615:030659.548 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22302:20130615:030702.599 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    
                     22303:20130609:105844.114 server #5 started [trapper #2]
                     22303:20130615:020603.055 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:020603.545 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:020604.466 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:020620.407 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:020631.274 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:024134.789 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:025140.779 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:025143.653 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:025648.026 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:030150.661 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:030651.043 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:031153.895 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:033158.844 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:033700.667 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:033702.833 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:040207.830 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:050711.907 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:061720.965 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:064736.029 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:070739.487 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:081242.731 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22303:20130615:083243.016 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    
                     22305:20130609:105844.113 server #6 started [trapper #3]
                     22305:20130615:020605.345 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22305:20130615:020620.862 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22305:20130615:023653.928 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22305:20130615:031200.291 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22305:20130615:032952.186 Error while sending configuration. ZBX_TCP_WRITE() failed: [110] Connection timed out
                     22305:20130615:040953.278 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22305:20130615:041453.498 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    
                     22307:20130609:105844.114 server #7 started [trapper #4]
                     22307:20130615:020604.119 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:020604.228 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:020620.134 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:020630.946 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:030709.637 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:040733.945 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:041234.596 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:050743.283 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:062754.593 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:062754.669 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22307:20130615:070755.034 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    
                     22309:20130609:105844.114 server #8 started [trapper #5]
                     22309:20130615:020605.288 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:020605.339 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:020620.752 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:023123.503 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:023126.694 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:025414.823 Error while sending configuration. ZBX_TCP_WRITE() failed: [110] Connection timed out
                     22309:20130615:025417.542 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:025420.543 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:025423.979 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:033438.040 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:033441.609 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22309:20130615:045457.596 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    
                     22312:20130609:105844.115 server #9 started [trapper #6]
                     22312:20130615:020605.608 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22312:20130615:020620.893 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22312:20130615:022646.146 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22312:20130615:023700.773 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22312:20130615:030736.278 Error while sending configuration. ZBX_TCP_WRITE() failed: [110] Connection timed out
                     22312:20130615:053303.092 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22312:20130615:070832.919 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    
                     22313:20130609:105844.115 server #10 started [trapper #7]
                     22313:20130615:020605.990 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22313:20130615:022152.069 Error while sending configuration. ZBX_TCP_WRITE() failed: [110] Connection timed out
                     22313:20130615:025217.019 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22313:20130615:033718.244 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22313:20130615:034719.965 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    
                     22315:20130609:105844.115 server #11 started [trapper #8]
                     22315:20130615:020605.275 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22315:20130615:020605.340 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22315:20130615:020620.610 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22315:20130615:025152.499 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22315:20130615:033738.979 Error while sending configuration. ZBX_TCP_WRITE() failed: [110] Connection timed out
                    
                     22317:20130609:105844.116 server #12 started [trapper #9]
                     22317:20130615:020605.560 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22317:20130615:022146.502 Error while sending configuration. ZBX_TCP_WRITE() failed: [110] Connection timed out
                     22317:20130615:041715.358 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22317:20130615:064234.803 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22317:20130615:064234.837 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22317:20130615:082251.989 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    
                     22318:20130609:105844.116 server #13 started [trapper #10]
                     22318:20130615:020605.818 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:020620.552 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:020631.152 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:025646.816 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:030647.437 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:034709.855 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:035210.558 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:035210.661 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:041335.540 Error while sending configuration. ZBX_TCP_WRITE() failed: [110] Connection timed out
                     22318:20130615:041335.557 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:041335.871 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                     22318:20130615:041335.923 Error while sending configuration. ZBX_TCP_WRITE() failed: [32] Broken pipe
                    Selection of information from one of the Zabbix-Proxy its log file:

                    Code:
                      1487:20130615:015527.557 Error while receiving answer from server [ZBX_TCP_READ() failed: [4] Interrupted system call]
                      1487:20130615:020053.301 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1490:20130615:020343.359 Error while sending data to the server [ZBX_TCP_WRITE() failed: [104] Connection reset by peer]
                      1490:20130615:020718.998 Error while sending data to the server [ZBX_TCP_WRITE() failed: [104] Connection reset by peer]
                      1487:20130615:021701.761 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:023221.088 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:023755.834 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:024321.582 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:024847.327 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:030252.746 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:033004.023 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:034321.539 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:041301.748 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:041848.494 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:042435.242 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:050305.985 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:050831.727 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:051357.470 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:051923.215 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:053440.835 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:055742.887 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:064828.946 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:065357.694 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:070618.819 Error while receiving answer from server [ZBX_TCP_READ() failed: [4] Interrupted system call]
                      1487:20130615:072457.247 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:073304.999 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:074432.906 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:075007.652 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:080010.264 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:081210.388 Error while receiving answer from server [ZBX_TCP_READ() failed: [4] Interrupted system call]
                      1487:20130615:081739.135 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:082307.878 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:084854.141 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:085419.885 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1487:20130615:085948.629 Error while receiving answer from server [ZBX_TCP_READ() failed: [104] Connection reset by peer]
                      1490:20130615:090136.364 Error while sending data to the server [ZBX_TCP_WRITE() failed: [104] Connection reset by peer]
                    Here some information about the corresponding Zabbix evaluation environment.
                    All Linux hosts are virtualized via OpenVZ

                    Zabbix-Server
                    - CentOS 6.4 (i386)
                    - PostgreSQL 9.1.9
                    - Zabbix-Server 2.0.6

                    Zabbix-Proxies (SQLite)
                    - CentOS 5.9 and 6.4 (i386 and x86_64)
                    - 32 proxies in use
                    - All proxies equipped with Zabbix-Java-Gateway, snmptrapd, snmptt, ODBC and DBforBix

                    Some numbers
                    Number of hosts (monitored/not monitored/templates): 945 / 336 / 176
                    Number of items (monitored/disabled/not supported): 83886 / 69040 / 3184
                    Number of triggers (enabled/disabled)[problem/unknown/ok]: 69490 / 3957 [528 / 0 / 68962]
                    Number of users: 86
                    Number of users groups: 24
                    Number of host groups: 154
                    Required server performance, new values per second: 841.67

                    Unfortunately I've currently no time for a deeper analysis :-(

                    Edit:
                    For every attached graph a time range of 12 hours beginning from 00:15 am is chosen. Every graph type is shown twice. The first graph shows the last night where the issue occurred. The second one shows the day before for comparison purposes.
                    Around 09:00 o'clock the Zabbix-Server has been restarted.
                    Around 12:15 o'clock the Zabbix-Server continuous normal operation.

                    Edit:
                    The peak that is seen around 02:00 o'clock is caused by lots of log[*] data.
                    Attached Files
                    Last edited by BDiE8VNy; 18-06-2013, 07:50.

                    Comment

                    • tchjts1
                      Senior Member
                      • May 2008
                      • 1605

                      #11
                      BDiE8VNy -

                      Did you happen to have a look at this post? Seems to have some similarities. Alexei offered up some possible causes.

                      Comment

                      • BDiE8VNy
                        Senior Member
                        • Apr 2010
                        • 680

                        #12
                        Yes, I did.

                        I already raised the number of trappers up to 35 (three more than proxies) but I'm not convinced that the issue was due to lack of trappers.

                        Something happened at 01:45 am (noticed peak loads quite often at this time, but haven't figured out the cause yet).
                        Then all available trappers (I think there were 10 of them configured) were 100% busy within 5 minutes. But why? The whole systems was quite busy for a few hours but finally idles around as if there is nothing to process.

                        Yes, the system runs out of trappers but why were they not doing anything? What caused Zabbix not to free the trappers if it obviously doesn't process any data?
                        After restarting the zabbix_server process it runs out of available trappers very quickly as well. But this time the server was under heavy load due to catching up - without any issues.

                        Comment

                        • tchjts1
                          Senior Member
                          • May 2008
                          • 1605

                          #13
                          Well, what caught my eye was that you were showing the broken pipe error on the proxies... and then Alexei mentioned this: https://www.zabbix.com/forum/showpos...85&postcount=8

                          But then that wouldn't explain why restarting the Zabbix server would fix the issue. Were you also restarting proxy services?

                          Comment

                          • BDiE8VNy
                            Senior Member
                            • Apr 2010
                            • 680

                            #14
                            Nope, just the zabbix_server process

                            Comment

                            • Alexei
                              Founder, CEO
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Sep 2004
                              • 5654

                              #15
                              I would track number of established connections to Zabbix server TCP/10051 (trapper). It does look indeed as proxies are trying to push information to the server and the write cache gets full. When it happens Zabbix does not accept any new data, it goes into loop of (sleep 1; try write to the cache while timeouts.

                              I believe it was caused by network issues between Zabbix Server and one of your proxies. It is just a guess.

                              Connect to your proxies to see how much unsent data they have in their databases. You may try to identify faulty proxy this way. However this situation caused all proxies to delay data sending, so you may find that all proxies are full of data.

                              Quick solution (data loss, not recommended): connect to proxies and drop unsent data.
                              Better solution: wait when Zabbix recovers. Note that currently it is not in a good shape, data collection is likely significantly delayed.
                              Alexei Vladishev
                              Creator of Zabbix, Product manager
                              New York | Tokyo | Riga
                              My Twitter

                              Comment

                              Working...