Ad Widget

Collapse

Zabbix b5 crashed

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bbrendon
    Senior Member
    • Sep 2005
    • 870

    #1

    Zabbix b5 crashed

    Zabbix b5 went ape shit on me this morning. Between 8:37 AM and 8:44 Zabbix triggered all nodata() triggers and everyone's pager went off. In addition, the CPU utilization dropped at the same time. It appears zabbix stopped allowing connections into it.

    It appears a bunch of zabbix_server processes are missing. After restarting the server, I have 11 zabbix_server processes. Below I see 4.

    I also saved the log files if anyone is interested. I don't understand the date format of them though.

    -bb

    # telnet localhost 10051
    Trying 127.0.0.1...
    telnet: Unable to connect to remote host: Connection refused


    root 10969 0.0 0.2 2388 1064 ? Ss Jan02 0:01 /bin/sh /usr/bin/zabbix_check_server 3 120 root
    zabbix 20690 0.1 0.1 2516 1012 ? SN Jan02 4:23 /usr/bin/zabbix_agentd
    zabbix 20691 0.0 0.1 2516 716 ? SN Jan02 0:00 /usr/bin/zabbix_agentd
    zabbix 20692 0.0 0.1 2516 716 ? SN Jan02 0:00 /usr/bin/zabbix_agentd
    zabbix 20693 0.0 0.1 2516 716 ? SN Jan02 0:00 /usr/bin/zabbix_agentd
    zabbix 20694 0.0 0.1 2516 716 ? SN Jan02 0:00 /usr/bin/zabbix_agentd
    zabbix 20695 0.0 0.1 2516 940 ? SN Jan02 0:23 /usr/bin/zabbix_agentd
    zabbix 26455 0.0 0.3 5324 1676 ? S 08:30 0:00 /usr/bin/zabbix_server
    zabbix 26457 0.1 0.3 5324 1780 ? S 08:30 0:10 /usr/bin/zabbix_server
    zabbix 26459 0.0 0.3 5324 1752 ? S 08:30 0:05 /usr/bin/zabbix_server
    zabbix 26461 0.0 0.3 5324 1608 ? S 08:30 0:00 /usr/bin/zabbix_server


    ##zabbix_server.conf:
    Server=1
    StartSuckers=6
    StartTrappers=5

    ListenPort=10051
    HousekeepingFrequency=1
    SenderFrequency=30

    DebugLevel=4
    Timeout=5
    PidFile=/var/run/zabbix/server.pid
    LogFile=/var/log/zabbix/server.log
    AlertScriptsPath=/home/zabbix/bin/
    FpingLocation=/usr/sbin/fping
    Unofficial Zabbix Expert
    Blog, Corporate Site
  • bbrendon
    Senior Member
    • Sep 2005
    • 870

    #2
    This happened again this morning. Has this happened to anyone else?
    Code:
    # ps aux|grep zab
    root      1634  0.0  0.2  2392 1068 ?        Ss   Jan05   0:02 /bin/sh /usr/bin/zabbix_check_server 3 120 root
    zabbix   13251  0.1  0.1  2516 1004 ?        SN   Jan09   4:01 /usr/bin/zabbix_agentd
    zabbix   13252  0.0  0.1  2516  712 ?        SN   Jan09   0:00 /usr/bin/zabbix_agentd
    zabbix   13253  0.0  0.1  2516  712 ?        SN   Jan09   0:00 /usr/bin/zabbix_agentd
    zabbix   13254  0.0  0.1  2516  712 ?        SN   Jan09   0:00 /usr/bin/zabbix_agentd
    zabbix   13255  0.0  0.1  2516  712 ?        SN   Jan09   0:00 /usr/bin/zabbix_agentd
    zabbix   13256  0.0  0.1  2516  940 ?        SN   Jan09   0:25 /usr/bin/zabbix_agentd
    zabbix    3635  0.0  0.3  5328 1680 ?        S    06:26   0:00 /usr/bin/zabbix_server
    zabbix    3637  0.1  0.3  5328 1780 ?        S    06:26   0:19 /usr/bin/zabbix_server
    zabbix    3639  0.1  0.3  5328 1756 ?        S    06:26   0:11 /usr/bin/zabbix_server
    zabbix    3641  0.0  0.3  5328 1620 ?        S    06:26   0:01 /usr/bin/zabbix_server
    root     16083  0.0  0.1  1640  532 pts/10   S+   09:38   0:00 grep zab
    Unofficial Zabbix Expert
    Blog, Corporate Site

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      Anything in log file?
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • bbrendon
        Senior Member
        • Sep 2005
        • 870

        #4
        i saved the logfile the first time it happened. before i restarted the server here is tail -100

        What is the date format? is it:
        PID:date:time ?

        Code:
        026457:20060105:101916 In evaluate_expression({11553})
        026457:20060105:101916 Before deleting spaces:{11553}
        026457:20060105:101916 After deleting spaces:{11553}
        026457:20060105:101916 BEGIN substitute_functions ({11553})
        026457:20060105:101916 Before find_char:{11553}[{]
        026457:20060105:101916 Before find_char:{11553}[{]
        026457:20060105:101916 Before find_char:{11553}[}]
        026457:20060105:101916 Executing query:select 0,lastvalue from functions where functionid=11553
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [1]
        026457:20060105:101916 Expression1:[{11553}]
        026457:20060105:101916 Expression2:[%lf553}]
        026457:20060105:101916 Expression3:[%lf    ]
        026457:20060105:101916 Before deleting spaces:1.000000    
        026457:20060105:101916 After deleting spaces:1.000000
        026457:20060105:101916 Expression4:[1.000000]
        026457:20060105:101916 Before find_char:1.000000[{]
        026457:20060105:101916 Expression:[1.000000]
        026457:20060105:101916 END substitute_functions
        026457:20060105:101916 In evaluate([1.000000])
        026457:20060105:101916 Before find_char:1.000000[)]
        026457:20060105:101916 Evaluating simple expression [1.000000]
        026457:20060105:101916 Evaluate end:[1.000000]
        026457:20060105:101916 exp_value trigger.value trigger.prevvalue [1] [1] [134673824]
        026457:20060105:101916 In update_trigger_value[12521,1,1136485156]
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [1]
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [12]
        026457:20060105:101916 In update_functions(19119)
        026457:20060105:101916 Executing query:select function,parameter,itemid,lastvalue from functions where itemid=19119 group by 1,2,3 order by 1,2,3
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [2]
        026457:20060105:101916 ItemId:19119 Evaluating last(134732137)
        
        026457:20060105:101916 In evaluate_FUNCTION() Function [last] flag [0]
        026457:20060105:101916 In evaluate_FUNCTION() 1
        026457:20060105:101916 In evaluate_FUNCTION() 2 value [0.016667]
        026457:20060105:101916 In evaluate_FUNCTION() pre-7
        026457:20060105:101916 In evaluate_FUNCTION() 7 Formula [0]
        026457:20060105:101916 In evaluate_FUNCTION() 7 Value [0.016667]
        026457:20060105:101916 In evaluate_FUNCTION() 7 Units []
        026457:20060105:101916 In evaluate_FUNCTION() 7 Value [0.016667] Units [] Formula [0]
        026457:20060105:101916 End of evaluate_FUNCTION. Result [0.016667]
        026457:20060105:101916 Result of evaluate_FUNCTION [0.016667]
        
        026457:20060105:101916 Do not update functions, same value
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [2]
        026457:20060105:101916 ItemId:19119 Evaluating nodata(134732195)
        
        026457:20060105:101916 In evaluate_FUNCTION() Function [nodata] flag [0]
        026457:20060105:101916 In evaluate_FUNCTION() pre-7
        026457:20060105:101916 In evaluate_FUNCTION() 7 Formula [0]
        026457:20060105:101916 In evaluate_FUNCTION() 7 Value [1]
        026457:20060105:101916 In evaluate_FUNCTION() 7 Units []
        026457:20060105:101916 In evaluate_FUNCTION() 7 Value [1] Units [] Formula [0]
        026457:20060105:101916 End of evaluate_FUNCTION. Result [1]
        026457:20060105:101916 Result of evaluate_FUNCTION [1]
        
        026457:20060105:101916 Do not update functions, same value
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [2]
        026457:20060105:101916 In update_triggers [19119]
        026457:20060105:101916 Executing query:select distinct t.triggerid,t.expression,t.status,t.dep_level,t.priority,t.value,t.description from triggers t,functions f,items i where i.status<>3 and i.itemid=f.itemid and t.status=0 and f.triggerid=t.triggerid and f.itemid=19119
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [1]
        026457:20060105:101916 In evaluate_expression({11568})
        026457:20060105:101916 Before deleting spaces:{11568}
        026457:20060105:101916 After deleting spaces:{11568}
        026457:20060105:101916 BEGIN substitute_functions ({11568})
        026457:20060105:101916 Before find_char:{11568}[{]
        026457:20060105:101916 Before find_char:{11568}[{]
        026457:20060105:101916 Before find_char:{11568}[}]
        026457:20060105:101916 Executing query:select 0,lastvalue from functions where functionid=11568
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [1]
        026457:20060105:101916 Expression1:[{11568}]
        026457:20060105:101916 Expression2:[%lf568}]
        026457:20060105:101916 Expression3:[%lf    ]
        026457:20060105:101916 Before deleting spaces:1.000000    
        026457:20060105:101916 After deleting spaces:1.000000
        026457:20060105:101916 Expression4:[1.000000]
        026457:20060105:101916 Before find_char:1.000000[{]
        026457:20060105:101916 Expression:[1.000000]
        026457:20060105:101916 END substitute_functions
        026457:20060105:101916 In evaluate([1.000000])
        026457:20060105:101916 Before find_char:1.000000[)]
        026457:20060105:101916 Evaluating simple expression [1.000000]
        026457:20060105:101916 Evaluate end:[1.000000]
        026457:20060105:101916 exp_value trigger.value trigger.prevvalue [1] [1] [134673824]
        026457:20060105:101916 In update_trigger_value[12535,1,1136485156]
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [1]
        026457:20060105:101916 In DBnum_rows
        026457:20060105:101916 Result of DBnum_rows [12]
        026455:20060105:101917 Got QUIT or INT or TERM or PIPE signal. Exiting...
        026457:20060105:101917 Got QUIT or INT or TERM or PIPE signal. Exiting...
        026459:20060105:101917 Got QUIT or INT or TERM or PIPE signal. Exiting...
        026461:20060105:101917 Got QUIT or INT or TERM or PIPE signal. Exiting...
        Unofficial Zabbix Expert
        Blog, Corporate Site

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          May I ask you to send full Log file to [email protected] ? Many thanks.
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • bbrendon
            Senior Member
            • Sep 2005
            • 870

            #6
            Did this ever come to anything? Was a bug found? Am I the only one experiencing this?
            Unofficial Zabbix Expert
            Blog, Corporate Site

            Comment

            • Shiva
              Junior Member
              • Apr 2005
              • 14

              #7
              Supposed to be fixed in beta6

              Hi
              I had a similar problem. This bug is probably fixed in CVS already. So i guess we will have to wait for beta6...

              Comment

              • azilber
                Member
                • Apr 2005
                • 33

                #8
                Crashing as well..

                I've been experiecing random crashes as well. The services just dies with nothin g in the logs. If anyone else is expereincing this, I'm using monit in the interim to handle this problem:



                My monit.conf additions for zabbix_server (for RedHat AS 4):

                check process zabbix_server with pidfile /var/run/zabbix/zabbix_server.pid
                start program = "/etc/init.d/zabbix_server start"
                stop program = "/etc/init.d/zabbix_server stop"
                group server

                Comment

                • bbrendon
                  Senior Member
                  • Sep 2005
                  • 870

                  #9
                  I'm still experiencing crashes here with beta 6. Now when doing "ps aux" after a crash, there are NO zabbix processes left.
                  Unofficial Zabbix Expert
                  Blog, Corporate Site

                  Comment

                  • elkor
                    Senior Member
                    • Jul 2005
                    • 299

                    #10
                    I have not experienced this bb,

                    I'm due to upgrade to the latest code (current agent v1.1b1)
                    is there ANYTHING else you can give us to track this down.. it looks like the parent process is killing all the others.

                    OS/version/checks/frequency

                    you really cant be too verbose here

                    Comment

                    • bbrendon
                      Senior Member
                      • Sep 2005
                      • 870

                      #11
                      As for our environment, we're using debian mysql 4.1.11-3.

                      All agents are active checks becuase the servers are behind firewalls. 3x unix systems, and about 15 windows servers. There are a few hosts with simple checks for availability and latency. We're using the debian package of zabbix. Thats pretty much it.

                      Let me know if I can provide more detail or what you're looking for specifically.
                      Unofficial Zabbix Expert
                      Blog, Corporate Site

                      Comment

                      • Alexei
                        Founder, CEO
                        Zabbix Certified Trainer
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Sep 2004
                        • 5654

                        #12
                        Yes, please, provide me with detailed log file of ZABBIX server.
                        Alexei Vladishev
                        Creator of Zabbix, Product manager
                        New York | Tokyo | Riga
                        My Twitter

                        Comment

                        • bbrendon
                          Senior Member
                          • Sep 2005
                          • 870

                          #13
                          I emailed it to Alexei. It won't fit in this posting. Its a few Kb over the limit.
                          Unofficial Zabbix Expert
                          Blog, Corporate Site

                          Comment

                          Working...